HyperThought
HyperThought™ is a cutting-edge LPU IP built on the next-generation Language Instruction Set Architecture (LISA v3), purpose-built for real-time, multimodal, and agent-based AI applications.
- Unique Design & Technology:
HyperThought™ features a distinctive quantization approach that compresses model weights to an average of 4 bits using a mixed-precision technique, significantly reducing DRAM bandwidth and capacity requirements. The architecture achieves over 90% MAC and bandwidth utilization, delivering exceptional cost-effectiveness and compute-memory efficiency. To ensure inference precision, model weights are dequantized back to 16 bits on the fly before being loaded into the MACs.
- Special Feature:
HyperThought™ boasts powerful scalability. It supports both a multi-core architecture on a single chip to enhance processing power and a multi-chip chaining mode to linearly scale compute and memory. For example, an accelerator card with six chained HyperThought™ LPUs can boost the LLaMA2 7B model's prefill speed to 1200 tps, while expanding memory bandwidth to 614.4 GB/s and capacity to 384 GB.
- Core Function:
The LISA v3 architecture natively supports multimodal data (text, vision) and Agent AI workflows, empowering devices with interactive and contextual reasoning. The architecture also integrates a security-focused instruction set to ensure the safety of every edge AI interaction.
+ Inquiry