Automated Cyberbullying Detection

 Cyberbullying detection system is  an automated system that will detect bully comments against various user. Cyberbullying is a form of bullying or harassment using electronic means. It is of the utmost importance to detect cyberbullying in multiple languages. Since current approaches to

2025-06-28 16:30:21 - Adil Khan

Project Title

Automated Cyberbullying Detection

Project Area of Specialization Computer ScienceProject Summary

 Cyberbullying detection system is  an automated system that will detect bully comments against various user. Cyberbullying is a form of bullying or harassment using electronic means. It is of the utmost importance to detect cyberbullying in multiple languages. Since current approaches to identify cyberbullying are mostly focused on English language texts, this project proposes a new approach  for the detection of cyberbullying in multiple languages (English, Hindi, and Roman Urdu). It uses  techniques of machines learning it classifies data on several algorithms like SVM (Support Vector Machine) Naive Bayes and many more to classify the input data as bullying or non-bullying. It would be beneficial for the department of FIA named National Response center for cyber crime that passed a law against cyber bulling. As a law passed by national assembly prevention of electronic crime act 2016 cyber stacking is considered as a crime and my project would be beneficial for them to detect cyber bullying easily.

Project Objectives Project Implementation Method

Project implementation will be based on Machine Learning Techniques.Model will be trained on knowledge base approch.Decission should be made on labeled data.Cyber bullying detection system will provide a web interface where user can write a comment and it can be detected by various machine learning algorithms. To incorporate various distributed techniques into the proposed system and study their consequences on the time and the precision of detection of cyber bullying content.

Our approach has the following limitations:

 • Sarcasm detection is out of the scope of our proposed system.

 • The system can handle messages from only one language at a time.

 • Finding Roman Urdu datasets can be a challenging task

Benefits of the Project

 It would be beneficial for the department of FIA named National Response center for cyber crime that passed a law against cyber bulling. As a law passed by national assembly prevention of electronic crime act 2016 cyber bullying is considered as a crime and my project would be beneficial for them to detect cyber bullying easily. 

Technical Details of Final Deliverable

Automated CyberBullying detection will be completed using Machine Learning technique by using various algorithms such as SVM, Naive Bayes, Random forest and many more to get accurate accuracy.
The following is the process that we employed while creating the machine-learning model:

Automated Cyberbullying Detection _1582927545.png

Input data :

 We will collect dataset from twitter using tweet binder which is a online service for data collection.

Data preprocessing:

We could not use data which is not accurate due to various reason such as presence of special characters and stop words . Hence, we will remove these stop words (e.g., a, are ,an  and, the) and unnecessary characters (like, #, @ , $ , % and URLs).

Feature Extraction:

It will be perform to obtain elements like pronoun, adjective, noun, short hand text in the comments, statistics on the presence of the words in the posts.

Train the model:

We will then divide the dataset and use 80% of the data set for training purposes and 20% for testing purpose then we will perform 10-fold Cross Validation for all our experiments. This is done so that, no matter how the data is divided, we always compute the average.

Prediction and Evaluation:

Finally, we predict the outcome (cyberbullying or non-cyberbullying) on  the test data of the data set using the trained model and evaluate the model on the basis of evaluation techniques such as Precision, Recall and F-score.

Final Deliverable of the Project Software SystemCore Industry ITOther Industries Legal , Security Core Technology Artificial Intelligence(AI)Other Technologies OthersSustainable Development Goals Peace and Justice Strong InstitutionsRequired Resources
Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Total in (Rs) 79000
Data generation from tweet binder API licence Equipment14100041000
Deployment cost Window azure Equipment12800028000
Introduction to python(Udemy cources) Miscellaneous 140004000
Nature Language processing with python(Udemy cource) Miscellaneous 140004000
Overhead Miscellaneous 120002000

More Posts