www.design-reuse-embedded.com
Find Top SoC Solutions
for AI, Automotive, IoT, Security, Audio & Video...

Why Software Matters in the Age of AI?

Inference chips typically have lots of MACs and memory but actual throughput on real-world models is often lower than expected. Software is usually the culprit.

eetasia.com, Apr. 29, 2020 – 

Inference accelerators represent an incredible market opportunity not only to chip and IP companies, but also to the customers who desperately need them. As inference accelerators come to market, a common comment we hear is: "Why is my inference chip not performing like it was designed to?"

Oftentimes, the simple answer is the software.

Software is key

All inference accelerators today are programmable because customers believe their model will evolve over time. This programmability will allow them to take advantage of enhancements in the future, something that would not be possible with hard-wired accelerators. However, customers want this programmability in a way where they can get the most throughput for a certain cost, and for a certain amount of power. This means they have to use the hardware very efficiently. The only way to do this is to design the software in parallel with the hardware to make sure they work together very well to achieve the maximum throughput.

One of the biggest problems today is that companies find themselves with an inference chip that has lots of MACs and tons of memory, but actual throughput on real-world models is lower than expected because much of the hardware is idling. In almost every case, the problem is that the software work was done after the hardware was built. During the development phase, designers have to make many architectural tradeoffs and they can't possibly do those tradeoffs without working with both the hardware and software – and this needs to be done early on. Chip designers need to closely study the models, and then build a performance estimation model to determine how different amounts of memory, MACs, and DRAM would change relevant throughput and die size; and how the compute units need to coordinate for different kinds of models.

Today, one of the highest-volume applications for inference acceleration is object detection and recognition. That is why inference accelerators must be very good at mega-pixel processing using complex algorithms like YOLOv3. To do this, it is critical that software teams work with hardware teams throughout the entire chip design process – from performance estimation to building the full compiler and when generating code. As the chip designer has the chip RTL done, the only way to verify the chip RTL at the top level is to run entire layers of models through the chip with mega-pixel images. You need to have the ability to generate all the code (or bit streams) that control the device and that can only be done when software and hardware teams work closely together.

Click here to read more...

 Back

Partner with us

List your Products

Suppliers, list and add your products for free.

More about D&R Privacy Policy

© 2024 Design And Reuse

All Rights Reserved.

No portion of this site may be copied, retransmitted, reposted, duplicated or otherwise used without the express written permission of Design And Reuse.