Adil Khan 10 months ago
AdiKhanOfficial #FYP Ideas

Deep Fashion

Extracting and interpreting information regarding humans from visual data is an active area of research in Computer Vision. Traditional models map information about humans to 2D models (stick like figures) leading to incomplete representation. We implement an optimized model for mapping pixels for a

Project Title

Deep Fashion

Project Area of Specialization

Artificial Intelligence

Project Summary

Extracting and interpreting information regarding humans from visual data is an active area of research in Computer Vision. Traditional models map information about humans to 2D models (stick like figures) leading to incomplete representation. We implement an optimized model for mapping pixels for a human in a 2D image to a generalized 3D human model. The model builds upon the recent state of the art method named DensePose-RCNN [1] and substitutes backbone architecture with a combination of fire modules and depthwise-convolutions to reduce the parameters and inference latency. We have applied this optimized model for texture transfer and extraction from the persons pictures. Precisely, using the 3D representation (IUV Maps) of person we extract textures information from a set of images of a person and use this information for texture transfer to other subjects or texture mapping. To demonstrate this capability we have built a unified Android application for texture extraction, transfer and retrieval.

Project Objectives

  • Efficiently retrieve and transfer textures in human appearance to replace the contemporary expensive and laborious techniques being used

  • Reduce inference time so that the architecture can be used in real world applications

  • Reduce model size so that the model can be feasible to be use over low resource devices

  • Map a character/texture onto an image so the cost of constructing animations can be reduced for industries such as gaming and animations

Project Implementation Method

The aforementioned network architecture devised by Facebook provides high accuracy and the model is robust to scale variance, translation, overlapping instances, background clutter, and occlusions. However, it needs very high computation power even at test time, which is evident from the fact that it currently runs on a GTX 1080 at test time. This deters the real-world use of this algorithm on devices like cell phones that have very low processing power and memory capacity in comparison.

Taking these limitations into consideration, we propose changes to reduce the overall number of parameters and operations needed by the model. We replace normal convolutions with Depthwise Separable convolutions, which reduce number of operations by 8-9 times. Furthermore, we pool early in the network as suggested by the squeeze net paper to reduce the spatial dimensions being used later in the network. The proposed network (figure 5) can be divided into the following 3 sub-networks:

  • Backbone or feature extractor

  • Region Proposal Network

  • Detector Network

Figure 4 - Inverted residual blocks


 

Benefits of the Project

We would ideally like for Human Pose Estimators to generate a 3D model of the entire human body instead of just a few joints. Moreover, we want this algorithm to be able to do this all-in real time giving decent FPS without requiring a high-end GPU and even on devices other than PC such as a camera or mobile.

Many previous works have tried to achieve this objective. A few works partially succeeded in emulating this approach e.g. DenseReg that performs 3D Pose Estimation for faces. However, the first ones to achieve a high level of accuracy while mapping 2D images to 3D models was Facebook’s AI team responsible for making DensePose- a full-fledged 3D Human Pose Estimator. The problem however is that it requires a high-end GPU at test time and even the GTX 1080 gives 4-5 FPS on 800x1100 images. This means that people with low end GPUs and devices without GPUs cannot use it.

We plan to experiment with different network architectures, loss functions and other 3D Pose Estimator components that might help us optimize these Pose Estimators to generate a 3D model of the entire body with a faster test time without the need of heavy GPUs.

Technical Details of Final Deliverable

Results:

DensePose is the state-of-the-art algorithm used for 3D pose estimation which was released in February 2018, by Facebook AI Research with collaboration of INRIA. They used DensePose-RCNN to estimate 3d human poses trained on hardware resources shown in Table 6.

Architecture

Framework

Estimated Training Time

No of GPUs (Nvidia K-80)

Optimizer

Batch Size

GPU Sharing

DensePose-RCNN

Caffe2

2 Days

8

ADAM

8

Asynchronous

DenseMobile R-CNN

Caffe2

60 Days

1

SGD

1

-

DenseMobile R-CNN

Tensorflow

6 Days

2

SGD

2

Asynchronous

DenseSqueeze R-CNN

Caffe2

8 Days

2

SGD

2

Asynchronous

Table 6 - Comparison of training resources with DensePose-RCNN

We evaluate and compare current results of our model with DensePose-RCNN and summarize the results in Table 7 by evaluating the average precision (AP) of images from COCO minival subset.

Method

AP50

AP75

DensePose-RCNN (ResNet-101)

83.5

54.2

DensePose-RCNN (ResNet-50)

83.7

56.3

DenseMobile-RCNN (MobileNet v2)

55.2

31.1

DenseSqueeze-RCNN (SqueezeNet)

70.31

26.18

Table 7 - Per-instance evaluation of different architectures on COCO


 

We have been able to successfully model a network with much less parameters and memory requirements than DensePose-RCNN. It has been achieved by reducing the parameters of backbone.

Conclusion:

Reduction in model size leads to accuracy drop, not always significant however it can lead to significant improvement in speed. We have validated this empirically and shown a directl practical application of model optimization for texture transfer application. Coupled with this architecture, a generative model tailored for the intended application needs can be used for further improved results.

Architecture

DensePose-RCNN

DenseMobile R-CNN

DenseMobile R-CNN

DenseSqueeze R-CNN

Method

DensePose-RCNN (ResNet-101)

DensePose-RCNN (ResNet-50)

DenseMobile-RCNN (MobileNet v2)

DenseSqueeze-RCNN (SqueezeNet)

Final Deliverable of the Project

Hardware System

Type of Industry

IT

Technologies

Artificial Intelligence(AI)

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Method

AP50

AP75

DensePose-RCNN (ResNet-101)

83.5

54.2

DensePose-RCNN (ResNet-50)

83.7

56.3

DenseMobile-RCNN (MobileNet v2)

55.2

31.1

DenseSqueeze-RCNN (SqueezeNet)

70.31

26.18

If you need this project, please contact me on contact@adikhanofficial.com
Automatic Warehouse Control and Packaging by Image processing and thre...

 It is the era of automation and machine learning and every process industry is now c...

1675638330.png
Adil Khan
10 months ago
Design and control of excitation system for synchronous Genrator

Energy conversion in synchronous generator is possible only, if there exists an excitation...

1675638330.png
Adil Khan
10 months ago
Government Controlled Centralized System for Farmers

Farmers in Pakistan have to search for information regarding different crops. Especially n...

1675638330.png
Adil Khan
10 months ago
DESIGN AND FABRICATION OF RECONFIGURABLE SPIDER ROBOT

This project is to design and fabrication of reconfigurable spider robot. This is Bio-Insp...

1675638330.png
Adil Khan
10 months ago
Automated-Asset Management

While objects of interest can be extracted from flat or 360? images using deep learning te...

1675638330.png
Adil Khan
10 months ago