Image Upscaling using Deep Neural Networks

The list of recent technological advancements would be incomplete without mentioning Neural Networks especially Convolutional Neural Networks that employ convolutional layers to extract features, surpassing previous techniques. This breakthrough has improved various fields over different domains, su

2025-06-28 16:33:02 - Adil Khan

Project Title

Project Area of Specialization Artificial IntelligenceProject Summary Premise

This trend of CNNs to outperform conventional methods has made it a first choice even for challenging tasks such as single image super-resolution (SISR). Generative Adversarial Networks or GANs are a special case of generative models which have been found to trump even CNNs in the task of SISR. GANs comprise of two distinct networks: a generative and a discriminative network are trained simultaneously, reinforcing one another. As the generator achieves a threshold accuracy, it is separated from the discriminator, and deployed. This "symbiotic" training allows GANs to understand the image better and insert realistic details where appropriate leading to better generated images.

In the race for developing GANs capable of upscaling an image by 4x, there is one model that has achieved better results and accuracy when compared to others in terms of standardized metrics such as Peak Signal to Noise Ratio (PSNR) and Perceptual Index (PI). Enhanced Super Resolution GAN, or simply ESRGAN, won first place in the PIRM2018-SR Competition by achieving the best PI, thanks to its Residual in Residual Dense Block (RRDB) architecture. RRDB uses recursive residual blocks to preserve the edges and fine details in an image as it traverses the network. This allows for better conservation of details while upscaling the image.

Currently, these GANs take hardware for granted as they require a large amount of computational power. Along with this complexity comes latency issues. One way to alleviate these problems would be to perform inference on cloud services or server farms. However, that brings issues of its own, such as sensitive data leakage, transmission latency etc.

The other option is to compress the model such that some of its parameters are removed. This removal of parameters has to be done carefully as to not degrade the model. This modification would allow the model to run on low power platforms such as embedded systems which solves the aforementioned issues while cutting costs.

Project Objectives

Enable Pakistani manufacturers to:

Deploy latency free AI based solutions on low cost embedded systems.
Improve manufacturing process with minimal changes to existing infrastructure.

Project Implementation Method

Our approach to model compression is to apply two powerful techniques which are:

Pruning is the application of a binary criteria to decide which weights to prune i.e eliminate: parameters which match the pruning criteria are assigned a value of zero. We can prune weights, biases, and activations. Biases are few and their contribution to a layer's output is relatively large, so there is little incentive to prune them. As such we concentrate on pruning the weights of the network. The most common pruning criteria is the absolute value of each element: the element's absolute value is compared to some threshold value, and if it is below the threshold the element is set to zero (i.e. pruned). The intuition is that, weights with small l1-norms (absolute value) contribute little to the final result (low saliency), so they are less important and can be removed.
In "To prune, or not to prune: exploring the efficacy of pruning for model compression", authors Michael Zhu and Suyog Gupta provide an algorithm to schedule a Level Pruner which is defined as:

"We introduce a new automated gradual pruning algorithm in which the sparsity is increased from an initial sparsity value si (usually 0) to a ?nal sparsity value sf over a span of n pruning steps. The intuition behind this sparsity function in equation (1) is to prune the network rapidly in the initial phase when the redundant connections are abundant and gradually reduce the number of weights being pruned each time as there are fewer and fewer weights remaining in the network."
Quantization refers to the process of reducing the number of bits that represent a number. In the context of deep learning, the predominant numerical format used for research and for deployment has so far been 32-bit floating point, or FP32. However, the desire for reduced bandwidth and compute requirements of deep learning models has driven research into using lower-precision numerical formats. It has been extensively demonstrated that weights and activations can be represented using 8-bit integers (or INT8) without incurring significant loss in accuracy. The use of even lower bit-widths, such as 4/2/1-bits, is an active field of research that has also shown great progress. The more obvious benefit from quantization is significantly reduced bandwidth and storage. For instance, using INT8 for weights and activations consumes 4x less overall bandwidth compared to FP32.Additionally integer compute is faster than floating point compute. It is also much more area and energy efficient:

Image Upscaling using Deep Neural Networks _1585517879.png

Quantization Aware Training

Benefits of the Project PC vs Embedded Solutions

Traditionally, production lines use cameras connected to a computer to take images of products, and then analyze the images to detect faults. This has its inherent shortcomings such as transmission latency which bottlenecks critical operations. In embedded-vision however, images are processed on the camera side without relaying any data to offsite computers, which removes the need for communication and associated latencies. This trade-off of complexity and flexibility (offered by a PC-based system) for the simplicity and compactness (of the embedded-Vision approach) is not without flaws though. Embedded-systems lack in computational-power which hamper their capabilities in performing operations.

The choice of the camera-system also plays a role, mostly in terms of the price. High resolution cameras cost significantly more than their low resolution counterparts and when this system has to be deployed throughout the entire manufacturing line, the price adds up exponentially.

Outcome

Our proposed method would be able to flourish in this environment, as the model upscales images of all types (regardless of their content) into much more distinguishable parts allowing us to overcome the limitations of cheaper low resolution camera and is also modified such that embedded devices would be able to run them despite their lack of computational resources.

This would allow Pakistani manufacturers to receive the benefits of the advancements in AI without investing heavily in expensive-equipment while enhancing the quality-standards of the products being manufactured simultaneously.

Technical Details of Final Deliverable The Demo

In order to demonstrate our system, we plan on interfacing a raspberry pi-cam with an single board computer which in our case would be a Jetson Nano.

The camera would capture images and transmit them to the Jetson for processing, where it would run our new and improved image upscaling model which the human operator could analyze or forward it to another software model for automated fault detection.

Final Deliverable of the Project Software SystemCore Industry OthersOther Industries Manufacturing , Telecommunication Core Technology Artificial Intelligence(AI)Other TechnologiesSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	71000
Nvidia Jetson Nano Dev Kit	Equipment	1	25000	25000
Raspberry Pi Camera Module V2	Equipment	1	10000	10000
Google Coral Dev Board	Equipment	1	30000	30000
Poster and Standee	Miscellaneous	2	3000	6000

Image Upscaling using Deep Neural Networks

More Posts