Adil Khan 10 months ago

AdiKhanOfficial #FYP Ideas

Deep Fashion

Project Title

Deep Fashion

Project Area of Specialization

Artificial Intelligence

Project Summary

Extracting and interpreting information regarding humans from visual data is an active area of research in Computer Vision. Traditional models map information about humans to 2D models (stick like figures) leading to incomplete representation. We implement an optimized model for mapping pixels for a human in a 2D image to a generalized 3D human model. The model builds upon the recent state of the art method named DensePose-RCNN [1] and substitutes backbone architecture with a combination of fire modules and depthwise-convolutions to reduce the parameters and inference latency. We have applied this optimized model for texture transfer and extraction from the persons pictures. Precisely, using the 3D representation (IUV Maps) of person we extract textures information from a set of images of a person and use this information for texture transfer to other subjects or texture mapping. To demonstrate this capability we have built a unified Android application for texture extraction, transfer and retrieval.

Project Objectives

Efficiently retrieve and transfer textures in human appearance to replace the contemporary expensive and laborious techniques being used
Reduce inference time so that the architecture can be used in real world applications
Reduce model size so that the model can be feasible to be use over low resource devices
Map a character/texture onto an image so the cost of constructing animations can be reduced for industries such as gaming and animations

Project Implementation Method

The aforementioned network architecture devised by Facebook provides high accuracy and the model is robust to scale variance, translation, overlapping instances, background clutter, and occlusions. However, it needs very high computation power even at test time, which is evident from the fact that it currently runs on a GTX 1080 at test time. This deters the real-world use of this algorithm on devices like cell phones that have very low processing power and memory capacity in comparison.

Taking these limitations into consideration, we propose changes to reduce the overall number of parameters and operations needed by the model. We replace normal convolutions with Depthwise Separable convolutions, which reduce number of operations by 8-9 times. Furthermore, we pool early in the network as suggested by the squeeze net paper to reduce the spatial dimensions being used later in the network. The proposed network (figure 5) can be divided into the following 3 sub-networks:

Backbone or feature extractor
Region Proposal Network
Detector Network

Figure 4 - Inverted residual blocks

Benefits of the Project

We would ideally like for Human Pose Estimators to generate a 3D model of the entire human body instead of just a few joints. Moreover, we want this algorithm to be able to do this all-in real time giving decent FPS without requiring a high-end GPU and even on devices other than PC such as a camera or mobile.

Many previous works have tried to achieve this objective. A few works partially succeeded in emulating this approach e.g. DenseReg that performs 3D Pose Estimation for faces. However, the first ones to achieve a high level of accuracy while mapping 2D images to 3D models was Facebook’s AI team responsible for making DensePose- a full-fledged 3D Human Pose Estimator. The problem however is that it requires a high-end GPU at test time and even the GTX 1080 gives 4-5 FPS on 800x1100 images. This means that people with low end GPUs and devices without GPUs cannot use it.

We plan to experiment with different network architectures, loss functions and other 3D Pose Estimator components that might help us optimize these Pose Estimators to generate a 3D model of the entire body with a faster test time without the need of heavy GPUs.

Technical Details of Final Deliverable

Results:

DensePose is the state-of-the-art algorithm used for 3D pose estimation which was released in February 2018, by Facebook AI Research with collaboration of INRIA. They used DensePose-RCNN to estimate 3d human poses trained on hardware resources shown in Table 6.

Architecture	Framework	Estimated Training Time	No of GPUs (Nvidia K-80)	Optimizer	Batch Size	GPU Sharing
DensePose-RCNN	Caffe2	2 Days	8	ADAM	8	Asynchronous
DenseMobile R-CNN	Caffe2	60 Days	1	SGD	1	-
DenseMobile R-CNN	Tensorflow	6 Days	2	SGD	2	Asynchronous
DenseSqueeze R-CNN	Caffe2	8 Days	2	SGD	2	Asynchronous

Table 6 - Comparison of training resources with DensePose-RCNN

We evaluate and compare current results of our model with DensePose-RCNN and summarize the results in Table 7 by evaluating the average precision (AP) of images from COCO minival subset.

Method	AP50	AP75
DensePose-RCNN (ResNet-101)	83.5	54.2
DensePose-RCNN (ResNet-50)	83.7	56.3
DenseMobile-RCNN (MobileNet v2)	55.2	31.1
DenseSqueeze-RCNN (SqueezeNet)	70.31	26.18

Table 7 - Per-instance evaluation of different architectures on COCO

We have been able to successfully model a network with much less parameters and memory requirements than DensePose-RCNN. It has been achieved by reducing the parameters of backbone.

Conclusion:

Reduction in model size leads to accuracy drop, not always significant however it can lead to significant improvement in speed. We have validated this empirically and shown a directl practical application of model optimization for texture transfer application. Coupled with this architecture, a generative model tailored for the intended application needs can be used for further improved results.

Architecture

DensePose-RCNN

DenseMobile R-CNN

DenseSqueeze R-CNN

Method

DensePose-RCNN (ResNet-101)

DensePose-RCNN (ResNet-50)

DenseMobile-RCNN (MobileNet v2)

DenseSqueeze-RCNN (SqueezeNet)

Final Deliverable of the Project

Hardware System

Type of Industry

Technologies

Artificial Intelligence(AI)

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Method	AP50	AP75
DensePose-RCNN (ResNet-101)	83.5	54.2
DensePose-RCNN (ResNet-50)	83.7	56.3
DenseMobile-RCNN (MobileNet v2)	55.2	31.1
DenseSqueeze-RCNN (SqueezeNet)	70.31	26.18

If you need this project, please contact me on contact@adikhanofficial.com

Comments 0

Automatic Warehouse Control and Packaging by Image processing and thre...

 It is the era of automation and machine learning and every process industry is now c...

Adil Khan

10 months ago

Design and control of excitation system for synchronous Genrator

Energy conversion in synchronous generator is possible only, if there exists an excitation...

Adil Khan

10 months ago

Government Controlled Centralized System for Farmers

Farmers in Pakistan have to search for information regarding different crops. Especially n...

Adil Khan

10 months ago

DESIGN AND FABRICATION OF RECONFIGURABLE SPIDER ROBOT

This project is to design and fabrication of reconfigurable spider robot. This is Bio-Insp...

Adil Khan

10 months ago

Automated-Asset Management

While objects of interest can be extracted from flat or 360? images using deep learning te...

Adil Khan

10 months ago