Phony News Detection in Reddit Using Natural Language Techniques and Machine Learning Pipelines

Phony News Detection in Reddit Using Natural Language Techniques and Machine Learning Pipelines

Srinivas Jagirdar, Venkata Subba K. Reddy
Copyright: © 2021 |Pages: 11
DOI: 10.4018/IJNCR.2021070101
OnDemand:
(Individual Articles)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Phony news or fake news spreads like a wildfire on social media causing loss to the society. Swift detection of fake news is a priority as it reduces harm to society. This paper developed a phony news detector for Reddit posts using popular machine learning techniques in conjunction with natural language processing techniques. Popular feature extraction algorithms like CountVectorizer (CV) and Term Frequency Inverse Document Frequency (TFIDF) were implemented. These features were fed to Multinomial Naive Bayes (MNB), Random Forest (RF), Support Vector Classifier (SVC), Logistic Regression (LR), AdaBoost, and XGBoost for classifying news as either genuine or phony. Finally, coefficient analysis was performed in order to interpret the best coefficients. The study revealed that the pipeline model of MNB and TFIDF achieved a best accuracy rate of 79.05% when compared to other pipeline models.
Article Preview
Top

Introduction

A recent investigation performed by Gartner (Titcomb & Carson, 2021) reveals that by 2022 the social media will be dominated with phony news rather than genuine news. As the number of social media users surge rapidly, there is big chance of phony news leaving a large foot print on the internet. Phony news is usually defined as created news containing misinformation, rumour and falsified facts spread over traditional media and even social media (Thota et al., 2018). The aim of any phony news circulator is to gain from sensationalism it creates or cheat readers, cause loss to the character of a personality or organisation (Liu & Yang, 2019). To counter this, early detection of phony news is required (Allcott & Gentzkow, 2017). The current techniques lack expertise in tackling with phony news circulation among social media platforms like Reddit, Watsapp, blogs, Twitter and Facebook (Bourgonje et al., 2017). Cyber Security authorities all around the world have reported on a new form of phony trick, click bait where in the perpetuator lures an innocent user into clicking phony news by offering gifts (Elisa & Jeffrey, 2017). A popular research proposed that in 2017, 67% of U.S. citizens above the age greater than 18 consumed news mainly from social media (Vosoughi et al., 2018). In comparison with genuine news the fake news propagates relatively swifter, deeper in to the society according to some researchers (Conroy et al., 2015). It has become eminent to detect and dam the genesis and flooding of phony news on social media. Phone news detection is a herculean task as it deals with the cross checking of the news item with trustable entities like news papers, media houses and government agencies. Computer Models developed using NLP techniques and ML algorithms can be implemented for classifying news as either phony or true.

This work developed 12 models by pipelining feature selection algorithms with ML algorithms for detecting phony posts on theOnion and nottheOnion subreddits of reddit social networking site. Exploratory data analysis was performed. Later NLP techniques were implemented to vectorize the words present in the posts. Later ML algorithms were implemented on these vectors to create pipeline models. In the end coefficient analysis was also performed to find out the words that has a positive or negative impact on the classification of the word as phony or true.

Complete Article List

Search this Journal:
Reset
Volume 12: 1 Issue (2024): Forthcoming, Available for Pre-Order
Volume 11: 4 Issues (2022): 1 Released, 3 Forthcoming
Volume 10: 4 Issues (2021)
Volume 9: 4 Issues (2020)
Volume 8: 4 Issues (2019)
Volume 7: 4 Issues (2018)
Volume 6: 2 Issues (2017)
Volume 5: 4 Issues (2015)
Volume 4: 4 Issues (2014)
Volume 3: 4 Issues (2012)
Volume 2: 4 Issues (2011)
Volume 1: 4 Issues (2010)
View Complete Journal Contents Listing