Human Activity Recognition using deep learning

2025-06-28 16:32:57 - Adil Khan

Project Title

Project Area of Specialization NeuroTechProject Summary SUMMARY:

Human activity recognition or HAR is a challenging classification task.

It involves predicting the movement of a person based on video data and traditionally involves deep domain expertise and methods from computer vision to correctly engineer features from the video data in order to fit a deep learning model.

Recently, deep learning methods such as convolutional neural networks and recurrent neural networks have shown capable and even achieve state-of-the-art results by automatically learning features from the raw sensor data.

In this project, I am implementing the 3D convolutional neural network that predict the movement of a person based on video data. As it is a very broad idea but I started from very basic activities such as walking, jogging, running, hand clapping, hand waving, boxing. Once it starts giving us results, I will expand it on complex actions like human-human interaction, human object interaction.

Project Objectives AIMS:

1. To convert videos into spatial – Temporal dimension.To convert videos into spatial – temporal dimension.

2. To find accurate CNN model for specific activity.

3. To compere different CNN Architecture to achieve high accuracy rate

Project Implementation Method About Dataset:

In this model we use Kth dataset. In kth Dataset contains six types of human action (Walking, jogging, running, hand waving, hand clapping) performed several times by 25 subjects in four different scenarios: outdoors s1, outdoors with scale variation s2, outdoors with different clothes s3 and indoors s4 Currently the database contains 2391 sequences. All sequences were taken over homogeneous backgrounds with a static camera with 25fps frame rate. The sequences were down sampled to the spatial resolution of 160x120 pixels and have a length of four seconds in average.

Methodology:

To apply the deep domain expertise and methods from computer vision to correctly extract spatial and temporal features from video data for action recognition.

To develop 3D convolutional neural network architecture. This CNN architecture generates multiple of information from adjacent video frames and performs convolution and max pooling. Then split all the extracted frames into test and train set 20% for test and rest of data use for training. Then fit the deep learning model to learn all those informations that contain in the video data. And compute the accuracy of test and training data and compare with different 3D CNN architecture if the accuracy greater than 90% that model is acceptable otherwise I increase numbers of layers and iterations then again fit the deep learning model to learn all this information that contain in the video data. And compute the accuracy.

Benefits of the Project BENIFITS:

As the Technology is progressing day by day people are more dependent on it in nearly all parts of life.One such example is installation of Camera - CCTV at important and no go areas.In nearly every organization even in luxurious housing societies. People have one or more monitoring system to monitor illegal and uncertain activity.Many security companies also use this technique in addition with the security guards, etc. As the number of Camera - CCTV increase labour to monitor such system also increases and amount of successful monitoring rate decreases. Human activity recognition is very helpful to overcome all those problem hat mention above.

Technical Details of Final Deliverable

1. Human actions will be successfully recognized.

2. Each specific action will clearly be defined and information will be useful for future analysis.

3. It will be easy for monitoring and action predictions can also be made.

Final Deliverable of the Project Software SystemType of Industry IT Technologies NeuroTechSustainable Development Goals Sustainable Cities and Communities, Peace and Justice Strong InstitutionsRequired Resources

Item Name	Type	No. of Units	Per Unit Cost (in Rs)	Total (in Rs)
			Total in (Rs)	61180
GPU - 6GB - GeForce GTX 1660	Equipment	1	45000	45000
DRR3 8gb Ram	Equipment	2	7000	14000
Thesis printing	Miscellaneous	3	500	1500
Thesis binding	Miscellaneous	3	60	180
poster	Miscellaneous	1	500	500

Human Activity Recognition using deep learning

More Posts