Sentiment Weighted Word Embedding for Big Text Data

Sentiment Weighted Word Embedding for Big Text Data

Jenish Dhanani, Rupa Mehta, Dipti Rana
DOI: 10.4018/IJWLTT.20211101.oa2
Article PDF Download
Open access articles are freely available for download

Abstract

Sentiment analysis is the practice of eliciting a sentiment orientation of people's opinions (i.e. positive, negative and neutral) toward the specific entity. Word embedding technique like Word2vec is an effective approach to encode text data into real-valued semantic feature vectors. However, it fails to preserve sentiment information that results in performance deterioration for sentiment analysis. Additionally, big sized textual data consisting of large vocabulary and its associated feature vectors demands huge memory and computing power. To overcome these challenges, this research proposed a MapReduce based Sentiment weighted Word2Vec (MSW2V), which learns the sentiment and semantic feature vectors using sentiment dictionary and big textual data in a distributed MapReduce environment, where memory and computing power of multiple computing nodes are integrated to accomplish the huge resource demand. Experimental results demonstrate the outperforming performance of the MSW2V compared to the existing distributed and non-distributed approaches.
Article Preview
Top

Introduction

With the advancement of Internet technologies, platforms like Social Media, E-Commerce and Movie Streaming Services have directly reached the millions of individuals. Over such platforms, people express and share their emotions, observations and opinions through a piece of text for topics, products, services etc. Sentiment analysis (El Alaoui et al., 2018; Fang & Zhan, 2015; Liu, 2012; Medhat, Hassan, & Korashy, 2014; Pang, Lee, & others, 2008; Pang, Lee, & Vaithyanathan, 2002; Pouransari & Ghili, 2014; Ravi & Ravi, 2015) is performed to elicit a sentiment orientation (i.e. positive and negative) of shared textual information, which can enhance decision making of Governments, Product designers, Political organizations, Marketing organizations, etc. Thus, there is a strong requisite for efficient sentiment analysis approaches. In sentiment analysis, Machine Learning (ML) algorithms have been extensively exploited due to their excellent capability to gain admirable performance (Dhanani, Mehta, Rana, & Tidke, 2018; Pang et al., 2002; Parikh, Palusa, Kasthuri, Mehta, & Rana, 2018; Xia, Wang, Hu, Li, & Zong, 2013; Yin, Wang, & Zheng, 2012; Zhang, Xu, Su, & Xu, 2015), where efficient feature extraction is the vital demand.

Word embedding techniques extract textual features by transforming raw words into real-valued vectors. Word2vec is a profound word embedding technique which learns the deep and implicit semantic information among the words vectors (also called as feature vectors, Word2vec embedding or word embedding) (Mikolov, Chen, Corrado, & Dean, 2013a; Mikolov, Sutskever, Chen, Corrado, & Dean, 2013b). However, Word2vec fails to encode sufficient sentiment information into the feature vectors (Dhanani et al., 2018; Parikh et al., 2018; Tang et al., 2016, 2014; Yu, Wang, Lai, & Zhang, 2017). As a consequence, semantically similar words like “good” and “bad” are placed closer to each other, even though the sentiment orientations of these words are opposite, such as “good” is sentimentally positive and “bad” is sentimentally negative words. Hence, only semantic specific feature vectors could outcomes the declination in performance. In addition, real-life applications yield big sized textual data consisting of large vocabulary (i.e. unique words in the Text Corpus) (Dhanani et al., 2018; Ordentlich et al., 2016). Word2vec possesses scalability issues due to in-memory computation and accommodation of large vocabulary and its associated vectors (Dhanani et al., 2018; Ordentlich et al., 2016). For such big sized textual data, Word2vec demands huge memory and computing capability to achieve sufficient learning latency.

Many recent works attempted to solve the scalability issues by implementing Word2vec in a distributed environment (Apache Spark based Word2vec, n.d.; Dhanani et al., 2018; Ji, Satish, Li, & Dubey, 2016; Ordentlich et al., 2016). However, they are limited to learn the semantic specific Word2vec feature vectors. In contrast, several recent studies have focused on encoding sentiment information into the Word2vec feature vectors using prior sentiment knowledge (Parikh et al., 2018; Rezaeinia, Ghodsi, & Rahmani, 2017; Yu et al., 2017; Zhang et al., 2015). However, learning sentiment specific Word2vec feature vectors (i.e. which preserves both semantic and sentiment information) is computationally expensive for big sized textual data consisting of a large vocabulary. Existing sentiment analysis approaches learn sentiment and semantic specific feature vectors for big sized textual data, which possess the scalability issue (Dhanani et al., 2018; Parikh et al., 2018). To overcome these challenges, this research proposes a novel sentiment weighted word embedding approach using sentiment dictionary and distributed MapReduce environment.

Complete Article List

Search this Journal:
Reset
Volume 19: 1 Issue (2024)
Volume 18: 2 Issues (2023)
Volume 17: 8 Issues (2022)
Volume 16: 6 Issues (2021)
Volume 15: 4 Issues (2020)
Volume 14: 4 Issues (2019)
Volume 13: 4 Issues (2018)
Volume 12: 4 Issues (2017)
Volume 11: 4 Issues (2016)
Volume 10: 4 Issues (2015)
Volume 9: 4 Issues (2014)
Volume 8: 4 Issues (2013)
Volume 7: 4 Issues (2012)
Volume 6: 4 Issues (2011)
Volume 5: 4 Issues (2010)
Volume 4: 4 Issues (2009)
Volume 3: 4 Issues (2008)
Volume 2: 4 Issues (2007)
Volume 1: 4 Issues (2006)
View Complete Journal Contents Listing