





# Open Source Virtual Platforms for SW Prototyping on FPGA

Mark Burton

#### Deep Learning Accelerator





• Nvidia has a Deep Learning Accelerator (called NVDLA)

The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome

• Nvidia also has a 'c' model of the DLA architecture (could be used as a systemc/tlm model)

### Turing Lecture 2017 : Hennessey and Patterson



Green**Socs**®

### https://www.youtube.com/watch?v=3LVeEjsn8Ts

Enabling System Level Design

### Goals





- Bring HW and SW together
- Minimize time to re-spin
  - (change in HW/change in SW)



- Enable simulation to be used by anybody
- Make it easy and quick to use



- Make the simulation FAST
- Enable S/W development



(Para-)Virtualization Or full system virtualization

Virtual Platform<br/>VirtualizationApplication<br/>'real<br/>binary'Full binary execution<br/>on virtual<br/>platform (model)Virtual platform<br/>(model)O/SFull binary execution<br/>on virtual<br/>platform (model)

| on REAL         |
|-----------------|
| platform (FPGA) |
|                 |

| ·          | Application | Full binary execution |
|------------|-------------|-----------------------|
| Hardware   | O/S         | on                    |
| i la awaro | Hardware    | Final Hardware        |
|            |             |                       |



(Para-)Virtualization Or full system virtualization

Virtual Platform<br/>VirtualizationApplication<br/>'real<br/>binary'Full binary execution<br/>on virtual<br/>platform (model)Virtual platform<br/>(model)O/SFull binary execution<br/>on virtual<br/>platform (model)

| Emulation | Application<br>O/S | Full binary execution<br>on REAL |
|-----------|--------------------|----------------------------------|
| Emalation | FPGA               | platform (FPGA)                  |

|          | Application | Full binary execution |
|----------|-------------|-----------------------|
| Hardware | O/S         | on                    |
|          | Hardware    | Final Hardware        |
|          |             |                       |



### Virtual Platform Standard is SystemC TLM-2.0 IEEE 1666

Open Source Simulator available for download from Accellera.org



Corporate members 2016

- **GreenSocs** technology at the heart of TLM-2.0 standard.
- All GreenSocs interfaces use TLM-2.0
- GreenSocs helping Accellera forge a new Model to tool standard.
  - Preview available in GreenConfig.
- Our solutions are tool independent, and work with all vendors.

### Qemu: Our Preferred source of CPU models

- Qemu is the defacto standard Virtualizer.
- Free and Open Source.
- It is over 10 years old



Green**Socs** 

EMU

- GreenSocs is a key contributor: Reverse execution and Multi-Core TCG Kernel.
- Regular committers from many organizations





### CPU Family coverage:

|                                              | X86 | ARM | MIPS | Alpha | PowerPC | SPARC | Micro-<br>blaze | Cold-<br>fire | Cris | SH4 | Xtensa |
|----------------------------------------------|-----|-----|------|-------|---------|-------|-----------------|---------------|------|-----|--------|
| Fast<br>SW dev<br>model<br>(LT)              | ~   | ~   | ~    |       | ~       | ~     | ~               | ~             | ~    | ~   |        |
| Cycle<br>Accurate<br>HW dev<br>model<br>(AT) | ~   |     |      | ~     |         | ~     |                 |               |      |     |        |

Full list (of several hundred) available on GreenSocs.com







 Wraps up Qemu in a TLM2-0 API such that it can be used in standard SystemC



 QEMU is a generic and open source virtualizer – it covers almost all CPU architectures and achieves extremely high performance.

### Qbox Syncronisation options





- Real Time
  - Each simulator runs as close to real time as possible.
  - Can be simple "run as fast as you can", no sync.



- Windowed
  - Each simulator is allowed to run within a window, but if it reaches the end, it must stop and wait
  - The window will automatically extend as simulators run.
  - (Windowed 'behind' to keep SystemC behind and the tlm delta time positive)



- Deterministic/single threaded
  - Each simulator runs in turn.
  - Pseudo random ordering to 'catch' S/W bugs.
    - (The advantage of a model…)

models

ALL PROGRAMMABLE

# GreenSocs is the XILINX partner upstreaming their device



## **Clock framework**

Enable the correct timing for events across the full Zyng device.



## Large packet DMA framework

Significantly increase the speed of DMA activity in the simulated device.



# **Fault Injection**

Model fault injection in a convenient and scriptable way, to enable safety and test features to be validated.



# Safety and Test Library extensions to devices

Model the suite of devices in the Zyng that can be self tested.

### Extending Qemu Speed





#### **MULTI Thread Qemu**

• A massive speed improvement for Qemu to take advantage of multi-core hosts



### Advanced features

- NON-Deterministic Reverse Execution
  - Ability to debug from an error backwards, irrespective of input stimulus

LAUTERBACH

- Supporting
- No H/W required, No 'JTAG collector' limit. lacksquare
- Cache modeling
  - Cache Coherency performance estimation
  - Cache flushing S/W checking







### What's OpenVP

- User Application and user level device code
- Kernel and kernel modules
- Virtual Platform model,
- Based on QEMU and SystemC
- 'C' model for NVDLA device itself





- Simulation speed… the NVDLA Accelerator is modelled on the host, so it will not 'accelerate'.
- Changes to the core NVDLA architecture require changes to the model.

### Adding FPGA

- User Application and user level device code
- Kernel and kernel modules
- Virtual Platform model, with FPGA wrapper

• AWS framework

- NVDLA FPGA hardware module
- Runs at full speed!



#### SPEED



- Anybody can download packaged Docker release
- Configurable build time  $\frac{1}{2}$  hour.
- FAST TO SET UP.



- SW on FPGA with NVDLA RTL
  - Anybody can run AWS env with pre-packages AMI and AFI
  - With AWS setup, easy to alter both FPGA images and associated drivers. (e.g. less than a day).
  - FAST TO RUN.

### Both available from nvdla.org







- Why we need HW tests on FPGA
  - To guarantee the quality of FPGA release
  - To identify corner case and issues in RTL





- Based on SW on Cmodel
- Replace all Cmodels (NVDLA, Mem model) with FPGA wrapper
- Full user code executable on combined QEMU + FPGA model



#### Generalisation

Making this 'generally' applicable requires more work ☺

- Enable any architecture to be modeled in a 'cloud' (public/ private), off-loading onto FPGA when required/appropriate.
- Enable 'Virtulization' when host/ guest match.



#### Future Possibilities

- NVDLA Performance Model integration for Performance evaluation
- More AWS FPGA images release for different NVDLA configuration
- Enable RISCV in Virtual Platform
- ARM Project Trillium
- SiFive





NVDLA page <a href="http://nvdla.org/">http://nvdla.org/</a>

OpenVP Doc <a href="http://nvdla.org/contents.html">http://nvdla.org/contents.html</a>

OpenVP Github page https://github.com/nvdla/vp https://github.com/nvdla/vp awsfpga

> www.greensocs.com mark@greensocs.com