Introduction: Neural Network (NN), Deployment of deep neural network for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in mapping the computation to hardware we present a nove
DNNs inference on FPGAs for High Throughput Applications
Introduction:
Neural Network (NN), Deployment of deep neural network for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in mapping the computation to hardware we present a novel method for designing neural network topologies that directly map to a highly efficient the FPGA implementation.
DNNs are what we call state-of-the-art computer vision and natural language processing system. Deep Convolutional Neural Networks (CNNs) have demonstrated amazing accuracy in a variety of applications, including image processing. These cutting-edge CNNs employ extremely deep models, requiring hundreds of ExaOps (1018 of operations per second) of compute and GBytes of model and data storage during training. Many applications like during data collection of particle physics applications or during packet line rate filtering for network intrusion detection and wireless communication (communication networks), would require DNNs with inferences rate of hundreds of millions of requests per second and latency at sub-micro level would be required to replace machine-learned component.
Through DNNs hardware architecture are customized to the specific of the topology. streaming inference, that is, the input is streamed through the architecture like a pipeline. These DNNs are put through high specialization for higher efficiency which is that they will only instantiate when needed. The latency is lowered as there are no buffering before and after layers and they are highly flexible due to reconfiguration as changing topology means a different architecture which is offered in FPGA through reconfiguration.
How can we create DNN implementations that can match the performance and latency requirements of high-throughput applications?
GPUs, FPGA overlays, and dedicated tensor processors are all popular hardware platforms for speeding up DNN inference. Mostly old ways of designing the hardware and software separately are used which would then be linked by complier to connect these two on the available hardware.Prior research by many peoples has shown that specialized co-design approaches can generate FPGA, DNN implementations with enhanced speed while also allowing for reconfiguration to meet changing requirements.
Our approach is based on the fact that artificial neurons with quantized inputs and outputs can be transformed to truth tables. However, for a truth table, an effective FPGA implementation is often only viable when the number of inputs is minimal.
Following are the tools and platforms we use for the creation of high throughput, ultra low latency DNN compute architectures.
Logicnets Designing, training and deploying sparse, quantized neural networks based on hardware building blocks.
FINN is the idea of transformations, which gradually transform the model into a synthesizable hardware description.
OBJECTIVES:
Our project main objective are as follows:
METHDOLOGY:
Now the question is how can we build DNN implementations that are able to meet the performance and latency constraints for extreme-throughput applications?
Here we describe a novel method named LogicNets and FINN for designing DNN topologies that map directly to an efficient FPGA implementation for extreme-throughput applications. Our scheme is based on the observation that artificial neurons with quantized inputs and outputs can be converted to truth tables.
LOGICNETS flow:

FINN flow:

DNNs designed and trained in this manner result in fast and efficient FPGA implementations that can fulfill the performance requirements for extreme-throughput application.
There are many exciting directions for future work here. On the machine learning side, we would be able to improve our understanding of how to design highly sparce and quantized topologies that retain high accuracy whereas on the tool flow and EDA side it would be great to have better synthesis techniques to scale neuron with larger FAN-IN and of course on the application side to discover and demonstrate more extreme-throughput applications that can leverage Logic Nets and FINN.
DNN model have become deeper and developed huge accuracy with equally high complexity and computation, which demands to design specialized accelerators on the development platform for these models. The deployments can be summarized into two kinds—cloud and edge. The cloud deployment needs to transport the data from sensor to data centers. Training and inferences of models are executed in data centers. The edge computing performs network inference closed to where data is produced, and models can be pre-trained in data centre.
We will be dealing with edge computing in our project with which we will a specialized DNN topology for the task at hand. Once we decide on the topology, we train it with standard training techniques and when we achieved the desired accuracy, we can directly convert the trained neural network into our FPGA circuit. If we choose the right topology, it will have low logic depth and it will be able to achieve high clock rate and classification performance to meet the need of extreme throughput applications.
Our Hardware contain any open-source machine learning framework that will be helpful the path from research prototyping to production deployment like PyTorch which will be used to train our specific DNN topology and when it achieves the desired, we will convert it into a LUT (LookUp Table) by enumerating each possible input and observing the output to construct the truth table. Model conversion and synthesis can be done by using python and we generate a Verilog code for it, then we synthesis our final product with VIVADO and implement it on our ULTRA96V2- MPSOC FPGA board.
| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| ULTRA96V2-MPSOC | Equipment | 1 | 60000 | 60000 |
| Adaptor | Equipment | 1 | 1500 | 1500 |
| connector | Equipment | 1 | 6000 | 6000 |
| printing | Miscellaneous | 1 | 4000 | 4000 |
| Total in (Rs) | 71500 |
Target drones are systems intended for defense role and as well as training role. T...
1) The product range of all spinning mills , their latest real-time yarn rates. 2) The be...
The drone finds its application in every single field where a human operates. Due to huge...
The Digital Health Avatar application has developed intending to detect human disease and...