Visual Relationship Detection in Images using Deep Learning

Objective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and

2025-06-28 16:36:39 - Adil Khan

Project Title

Visual Relationship Detection in Images using Deep Learning

Project Area of Specialization Computer ScienceProject Summary

Objective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and "person pushing car"). Various relationships can be observed in an image but to train a model on every possible relation is not possible. Visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our work is improved from prior work in terms of testing time due to advancement of R-CNN family. The prior work uses the R-CNN to localize the object while our model uses the advance R-FCN (testing time 0.17 sec/image). A separate network is trained to predict the relationship between the objects. Thus, time complexity of this model is less than the previous models. This idea can be used in robotics to provide human like vision to robots and enale them to sense the environment by identifing relationships among various objects. Relationship Detection can further improve content-based image retrieval

Project Objectives

Aim of our project visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our model will be able to predict the types of relationships for which it has been trained. Moreover, in the predicted relationships objects will be localized as bounding boxes in the image.

Project Implementation Method

As computer vision field aims at extracting useful information like human visual system would do according to scenario. Although there are numerous ways to relate images to detect objects (using edges, corners etc.) and construct meaningful information but to train a model our approach should be something else. Considering this we intend to plan an algorithm that detects the objects within an image using R-FCN Algorithm.

            A separate convolution neural network is trained to classify and detect different relationships.

Benefits of the Project Technical Details of Final Deliverable

Our final deliverable will b a desktop application. We will use Nvidia GPU RXT 2070 and IR camera. 

Final Deliverable of the Project Software SystemCore Industry SecurityOther Industries IT Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 78000
Nvidia GPU RXT 2070 Equipment17000070000
IR Camera Miscellaneous 180008000

More Posts