Visual Relationship Detection in Images using Deep Learning

Objective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and

2025-06-28 16:36:39 - Adil Khan

Project Title

Project Area of Specialization Computer ScienceProject Summary

Objective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and "person pushing car"). Various relationships can be observed in an image but to train a model on every possible relation is not possible. Visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our work is improved from prior work in terms of testing time due to advancement of R-CNN family. The prior work uses the R-CNN to localize the object while our model uses the advance R-FCN (testing time 0.17 sec/image). A separate network is trained to predict the relationship between the objects. Thus, time complexity of this model is less than the previous models. This idea can be used in robotics to provide human like vision to robots and enale them to sense the environment by identifing relationships among various objects. Relationship Detection can further improve content-based image retrieval

Project Objectives

Aim of our project visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our model will be able to predict the types of relationships for which it has been trained. Moreover, in the predicted relationships objects will be localized as bounding boxes in the image.

Project Implementation Method

As computer vision field aims at extracting useful information like human visual system would do according to scenario. Although there are numerous ways to relate images to detect objects (using edges, corners etc.) and construct meaningful information but to train a model our approach should be something else. Considering this we intend to plan an algorithm that detects the objects within an image using R-FCN Algorithm.

A separate convolution neural network is trained to classify and detect different relationships.

Benefits of the Project

This idea can be used in robotics to provide human like vision to robots and enale them to sense the environment by identifing relationships among various objects.
Security Agencies can use it for fast image retrieval from large database like video surveillance. (content retrival)
Medical Field: Our concept can be a foundation for automatic disease detection.

Technical Details of Final Deliverable

Our final deliverable will b a desktop application. We will use Nvidia GPU RXT 2070 and IR camera.

Final Deliverable of the Project Software SystemCore Industry SecurityOther Industries IT Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	78000
Nvidia GPU RXT 2070	Equipment	1	70000	70000
IR Camera	Miscellaneous	1	8000	8000

Visual Relationship Detection in Images using Deep Learning

More Posts