Detecting Cyberbullying in Roman Urdu Language Using Natural Language Processing Techniques
Abstract
Nowadays, social media platforms are the primary source of public communication and information. Social media platforms have become an integral part of our daily lives, and their user base is rapidly expanding as access is extended to more remote locations. Pakistan has around 71.70 million social media users that utilize Roman Urdu to communicate. With these improvements and the increasing number of users, there has been an increase in digital bullying, often known as cyberbullying. This research focuses on social media users who use Roman Urdu (Urdu language written in the English alphabet) to communicate. In this research, we explored the topic of cyberbullying actions on the Twitter platform, where users employ Roman Urdu as a medium of communication. To our knowledge, this is one of the very few studies that address cyberbullying behavior in Roman Urdu. Our proposed study aims to identify a suitable model for classifying cyberbullying behavior in Roman Urdu. To begin, the dataset was designed by extracting data from twitter using twitter's API. The targeted data was extracted using keywords based on Roman Urdu. The data was then annotated as bully and not-bully. After that, the dataset has been pre-processed to reduce noise, which includes punctuation, stop words, null entries, and duplication removal. Following that, features are extracted using two different methods, Count-Vectorizer and TF-IDF Vectorizer, and a set of ten different learning algorithms including SVM, MLP, and KNN was applied to both types of extracted features based on supervised learning. Support Vector Machine (SVM) performed the best out of the implemented algorithms by both combinations, with 97.8 percent when implemented over the TF-IDF features and 93.4 percent when implemented over the CV features. The proposed mechanism could be helpful for online social apps and chat rooms for the better detection and designing of bully word filters, making safer cyberspace for end users.
Copyright (c) 2022 Pakistan Journal of Engineering and Technology
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
COPYRIGHT POLICY
UOL journals follow an open-access publishing policy and full text of all articles is available free, immediately upon acceptance. Articles are published and distributed under the terms of the CC BY-SA 4.0 International License. Thus, work submitted to UOL Journals implies that it is original, unpublished work of the authors; neither published previously nor accepted/under consideration for publication elsewhere.
Authors will be responsible for any information written/informed/reported in the submitted manuscript. Although we do not require authors to submit the data collection documents and coded sheets used to do quantitative or qualitative analysis, we may request it at any time during the publication process, including after the article has been published. It is author's responsibility to obtain signed permission from the copyright holder to use and reproduce text, illustrations, tables, etc., published previously in other journals, electronic or print media.
Conflict of interest statements will be published at the end of the article. If no conflict of interest exists, the following sentence will be used: "The authors declare no conflict of interest." Authors are required to disclose any sponsorship or funding received from any institution relating to their research. The editor(s) will determine what disclosures, if any, should be available to the readers.
Authors are not permitted to post the work on any website/blog/forum/board or at any other place, by any means, from the time such work is submitted to UOL journals until the final decision on the paper has been given to them. In case a paper is accepted for publication, the authors may not post the work in its entirety on any website/blog/forum/board or at any other place, by any means, till the paper is published in UOL Journals.
The authors may, however, post the title, authors’ names and their affiliations and abstract, with the following statement on the first page of the paper - "The manuscript has been accepted for publication in UOL Journals". After publication of the article, it may be posted anywhere with full journal citation included.
All articles published in UOL journals are open-access articles, published and distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits remixing, transformation, or building upon the material, provided the original work is appropriately cited mentioning the authors and the publisher, as well as the produced work is distributed under the same license as the original.
In the future, UOL may reproduce printed copies of articles in any form. Without prejudice to the terms of the license given below, we retain the right to reproduce author's articles in this way.
Brief Summary Of The License Agreement
By submitting your research article(s) to UOL Journal(s), you agree to Creative Commons Attribution-ShareAlike 4.0 International License which states that:
Anyone is free:
o To copy and redistribute the material in any medium or format
o To remix, transform, or build upon the material for any purpose, even commercially
Provided:
o The author and the publisher have been appropriately credited
o The link to license is provided
o Indicated if any changes were made
o The material produced is distributed under the same license as the original