Find Top SoC Solutions
for AI, Automotive, IoT, Security, Audio & Video...
You are here : design-reuse-embedded.com  > Artificial Intelligence  > AI Processor

Jotunn - Generative AI Platform


The "Memory Wall" was first conceived as a theory by Wulf and McKee in 1994. It posited that the development of the processing unit (CPU) far outpaced that of the memory. As a result the rate at which the data can be transferred to and from memory will force the processor to wait until the data is available for the processing.

In traditional architectures the problem has been mitigated by a hierarchical memory structure built around multiple levels of cache staging the data to minimize the amount of traffic to the main memory or to the external memory.

The recently introduced Generative AI (for example, ChatGPT, DALL-E, Diffusion, etc.) dramatically expanded the amount of parameters necessary for performing the task at hand. For example, GPT-3.5 requires 175 billion parameters and GPT-4, launched in April 2023, supposedly requires almost 2 trillion parameters. All of these parameters needs to be accessed during inference or training, posing a problem as traditional systems are not designed to handle such vast amount of data without resorting to the traditional hierarchical memory model. Unfortunately the more levels needed to be traversed to read or store the data the longer time it takes. As a result the processing elements will be forced to wait longer and longer for data to process, lengthening the latency and dropping the implementation efficiency.

Recent findings show that the efficiency running GPT-4, the most recent GPT algorithm, drops to around 3%. That is, the very expensive hardware designed to run these algorithms sits idle 97% of the time!

The flip-side is that the amount of hardware required to reach reasonable compute numbers will be staggering. In July 2023, EE Times reported that Inflection is planning to use 22,000 Nvidia H100 GPUs in their supercomputer, an investment of ~$800M. Assuming an average power consumption of 500Watts per H100, the total power draw would be an astounding 11 MWh!

Based on a fundamental new architecture, Jotunn allows data to be fed to the processing units 100% of the time regardless of the number of compute elements. Algorithm efficiencies, even for the large models like GPT-4, will exceed 50%. Jotunn4 will significantly outperform anything that is currently on the market!

Partner with us

List your Products

Suppliers, list and add your products for free.

More about D&R Privacy Policy

© 2024 Design And Reuse

All Rights Reserved.

No portion of this site may be copied, retransmitted, reposted, duplicated or otherwise used without the express written permission of Design And Reuse.