Visual Relationship Detection in Images using Deep Learning
Objective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and
2025-06-28 16:36:39 - Adil Khan
Visual Relationship Detection in Images using Deep Learning
Project Area of Specialization Computer ScienceProject SummaryObjective of Visual Relationship Detection (VRD) is to identify relationships between sets of objects in images. A model that detects objects in an image and construct sensible sentences like humans do. Objects over the globe interacts in numerous ways (e.g. "person driving car" and "person pushing car"). Various relationships can be observed in an image but to train a model on every possible relation is not possible. Visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our work is improved from prior work in terms of testing time due to advancement of R-CNN family. The prior work uses the R-CNN to localize the object while our model uses the advance R-FCN (testing time 0.17 sec/image). A separate network is trained to predict the relationship between the objects. Thus, time complexity of this model is less than the previous models. This idea can be used in robotics to provide human like vision to robots and enale them to sense the environment by identifing relationships among various objects. Relationship Detection can further improve content-based image retrieval
Project ObjectivesAim of our project visual relationship detection is to train visual models that detects multiple objects within an image and later predict relationship among them. Our model will be able to predict the types of relationships for which it has been trained. Moreover, in the predicted relationships objects will be localized as bounding boxes in the image.
Project Implementation MethodAs computer vision field aims at extracting useful information like human visual system would do according to scenario. Although there are numerous ways to relate images to detect objects (using edges, corners etc.) and construct meaningful information but to train a model our approach should be something else. Considering this we intend to plan an algorithm that detects the objects within an image using R-FCN Algorithm.
A separate convolution neural network is trained to classify and detect different relationships.
Benefits of the Project- This idea can be used in robotics to provide human like vision to robots and enale them to sense the environment by identifing relationships among various objects.
- Security Agencies can use it for fast image retrieval from large database like video surveillance. (content retrival)
- Medical Field: Our concept can be a foundation for automatic disease detection.
Our final deliverable will b a desktop application. We will use Nvidia GPU RXT 2070 and IR camera.
Final Deliverable of the Project Software SystemCore Industry SecurityOther Industries IT Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Industry, Innovation and InfrastructureRequired Resources| Item Name | Type | No. of Units | Per Unit Cost (in Rs) | Total (in Rs) |
|---|---|---|---|---|
| Total in (Rs) | 78000 | |||
| Nvidia GPU RXT 2070 | Equipment | 1 | 70000 | 70000 |
| IR Camera | Miscellaneous | 1 | 8000 | 8000 |