www.design-reuse-embedded.com

SEARCH IP

- Categories
- RISC-V
- Embedded Processing
- 5G, 3GPP LTE IP
- IoT IP
- Artificial Intelligence IP
- Automotive IP
- Space and Avionics
- Security IP
- Audio & Video IP
- Design Platforms
- Monitoring and Verification
- SoC Design Services
- Find your best SoC design partner

Partner Videos D&R Events

- IP-SoC Days 2024
- IP-SoC Days 2023
- IP-SoC Days 2022
- IP-SoC Days 2021
- IP-SoC Days 2020
- IP-SoC 2023
- IP-SoC 2022
- IP-SoC 2021
- IP-SoC 2020

NEWS

Find Top SoC Solutions

for AI, Automotive, IoT, Security, Audio & Video...

Untether Unveils 2-PFLOPS AI Chip, Edge Roadmap

Find the Latest SoC Solutions
for... Automotive... IoT... Security... Audio... Video

www.eetimes.com, Aug. 23, 2022 –

At Hot Chips this week, Untether unveiled its second-gen architecture for AI inference, the first chip using this architecture, as well as plans to expand to edge and endpoint accelerators.

Untether's new architecture, internally codenamed Boqueria, addresses trends for very large neural networks, including transformer networks in natural language processing and beyond, endpoint applications that require power efficiency, and applications that require performance and power efficiency combined with prediction accuracy.

The first chip to use the Boqueria architecture, SpeedAI, is a data center inference accelerator capable of 2 PFLOPS of FP8 performance running at peak power consumption (66 W), or 30 TFLOPS/W based on a more usual 30-35 W power envelope. (Untether's first generation chip, RunAI, could handle 500 TOPS of INT8.)

This level of performance translates to running BERT-base inference at 750 queries per second per Watt, which the company says is 15× the performance of a state-of-the-art GPU.

The 35 by 35-mm chip is built on TSMC's 7 nm technology and uses more than 1,400 optimized RISC-V cores–the most EE Times has seen in a commercial chip (beating the previous record holder, Esperanto).

"[The performance] is a convergence of different factors," Bob Beachler, VP of product at Untether, told EE Times. "It's a combination of a lot of things, including circuit design, data types, understanding how neural networks operate–how does a transformer operate compared to a convolutional network?–all of these things we've been able to embody in our second-generation chip."

Untether carefully considered the balance between flexibility, performance, and scalability when working on Boqueria.

"To make general-purpose AI compute architecture, you have to have the right level of granularity and flexibility to efficiently be able to run this plethora of neural networks and be able to scale from small to large," Beachler said. Accuracy is also important for inference workloads, he added, particularly for recommendation where any percentage points of accuracy loss can mean substantial financial losses, and for safety-oriented applications like autonomous driving.

click here to read more...

Back

Untether Unveils 2-PFLOPS AI Chip, Edge Roadmap

Partner with us

List your Products

More about D&R Privacy Policy