Please note that Internet Explorer version 8.x is not supported as of January 1, 2016. Please refer to this support page for more information.

  • Purchase PDF


Computer Science Review

The evolution of sentiment analysis—a review of research topics, venues, and top cited papers.

Sentiment analysis is one of the fastest growing research areas in computer science, making it challenging to keep track of all the activities in the area. We present a computer-assisted literature review, where we utilize both text mining and qualitative coding, and analyze 6996 papers from Scopus. We find that the roots of sentiment analysis are in the studies on public opinion analysis at the beginning of 20th century and in the text subjectivity analysis performed by the computational linguistics community in 1990’s. However, the outbreak of computer-based sentiment analysis only occurred with the availability of subjective texts on the Web. Consequently, 99% of the papers have been published after 2004. Sentiment analysis papers are scattered to multiple publication venues, and the combined number of papers in the top-15 venues only represent ca. 30% of the papers in total. We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics. In recent years, sentiment analysis has shifted from analyzing online product reviews to social media texts from Twitter and Facebook. Many topics beyond product reviews like stock markets, elections, disasters, medicine, software engineering and cyberbullying extend the utilization of sentiment analysis.

Cited by (0)

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, sentiment analysis.

1062 papers with code • 41 benchmarks • 84 datasets

Sentiment analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral". Given the text and accompanying labels, a model can be trained to predict the correct sentiment.

Sentiment analysis techniques can be categorized into machine learning approaches, lexicon-based approaches, and even hybrid methods. Some subcategories of research in sentiment analysis include: multimodal sentiment analysis, aspect-based sentiment analysis, fine-grained opinion analysis, language specific sentiment analysis.

More recently, deep learning techniques, such as RoBERTa and T5, are used to train high-performing sentiment classifiers that are evaluated using metrics like F1, recall, and precision. To evaluate sentiment analysis systems, benchmark datasets like SST, GLUE, and IMDB movie reviews are used.

Further readings:

sentiment analysis research papers

Benchmarks Add a Result

sentiment analysis research papers

Most implemented papers

Bert: pre-training of deep bidirectional transformers for language understanding.

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers.

Convolutional Neural Networks for Sentence Classification

sentiment analysis research papers

We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks.

Universal Language Model Fine-tuning for Text Classification

sentiment analysis research papers

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch.

Bag of Tricks for Efficient Text Classification

facebookresearch/fastText • EACL 2017

This paper explores a simple and efficient baseline for text classification.

A Structured Self-attentive Sentence Embedding

This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.

Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e. g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i. e., to model polysemy).

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP).

Domain-Adversarial Training of Neural Networks

Our approach is directly inspired by the theory on domain adaptation suggesting that, for effective domain transfer to be achieved, predictions must be made based on features that cannot discriminate between the training (source) and test (target) domains.

sentiment analysis Recently Published Documents

Total documents.

Aspect-based Sentiment Analysis using Dependency Parsing

In this paper, an aspect-based Sentiment Analysis (SA) system for Hindi is presented. The proposed system assigns a separate sentiment towards the different aspects of a sentence as well as it evaluates the overall sentiment expressed in a sentence. In this work, Hindi Dependency Parser (HDP) is used to determine the association between an aspect word and a sentiment word (using Hindi SentiWordNet) and works on the idea that closely connected words come together to express a sentiment about a certain aspect. By generating a dependency graph, the system assigns the sentiment to an aspect having a minimum distance between them and computes the overall polarity of the sentence. The system achieves an accuracy of 83.2% on a corpus of movie reviews and its results are compared with baselines as well as existing works on SA. From the results, it has been observed that the proposed system has the potential to be used in emerging applications like SA of product reviews, social media analysis, etc.

Sentiment Analysis Applied to News from the Brazilian Stock Market

Trg-datt: the target relational graph and double attention network based sentiment analysis and prediction for supporting decision making.

The management of public opinion and the use of big data monitoring to accurately judge and verify all kinds of information are valuable aspects in the enterprise management decision-making process. The sentiment analysis of reviews is a key decision-making tool for e-commerce development. Most existing review sentiment analysis methods involve sequential modeling but do not focus on the semantic relationships. However, Chinese semantics are different from English semantics in terms of the sentence structure. Irrelevant contextual words may be incorrectly identified as cues for sentiment prediction. The influence of the target words in reviews must be considered. Thus, this paper proposes the TRG-DAtt model for sentiment analysis based on target relational graph (TRG) and double attention network (DAtt) to analyze the emotional information to support decision making. First, dependency tree-based TRG is introduced to independently and fully mine the semantic relationships. We redefine and constrain the dependency and use it as the edges to connect the target and context words. Second, we design dependency graph attention network (DGAT) and interactive attention network (IAT) to form the DAtt and obtain the emotional features of the target words and reviews. DGAT models the dependency of the TRG by aggregating the semantic information. Next, the target emotional enhancement features obtained by the DGAT are input to the IAT. The influence of each target word on the review can be obtained through the interaction. Finally, the target emotional enhancement features are weighted by the impact factor to generate the review's emotional features. In this study, extensive experiments were conducted on the car and Meituan review data sets, which contain consumer reviews on cars and stores, respectively. The results demonstrate that the proposed model outperforms the existing models.

A Comprehensive Guideline for Bengali Sentiment Annotation

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.

Employee Sentiment Analysis Towards Remote Work during COVID-19 Using Twitter Data

Topic modelling and sentiment analysis of global warming tweets.

With the increasing extreme weather events and various disasters, people are paying more attention to environmental issues than ever, particularly global warming. Public debate on it has grown on various platforms, including newspapers and social media. This paper examines the topics and sentiments of the discussion of global warming on Twitter over a span of 18 months using two big data analytics techniques—topic modelling and sentiment analysis. There are seven main topics concerning global warming frequently debated on Twitter: factors causing global warming, consequences of global warming, actions necessary to stop global warming, relations between global warming and Covid-19; global warming’s relation with politics, global warming as a hoax, and global warming as a reality. The sentiment analysis shows that most people express positive emotions about global warming, though the most evoked emotion found across the data is fear, followed by trust. The study provides a general and critical view of the public’s principal concerns and their feelings about global warming on Twitter.

Transparent Aspect-Level Sentiment Analysis Based on Dependency Syntax Analysis and Its Application on COVID-19

Aspect-level sentiment analysis identifies fine-grained emotion for target words. There are three major issues in current models of aspect-level sentiment analysis. First, few models consider the natural language semantic characteristics of the texts. Second, many models consider the location characteristics of the target words, but ignore the relationships among the target words and among the overall sentences. Third, many models lack transparency in data collection, data processing, and results generating in sentiment analysis. In order to resolve these issues, we propose an aspect-level sentiment analysis model that combines a bidirectional Long Short-Term Memory (LSTM) network and a Graph Convolutional Network (GCN) based on Dependency syntax analysis (Bi-LSTM-DGCN). Our model integrates the dependency syntax analysis of the texts, and explicitly considers the natural language semantic characteristics of the texts. It further fuses the target words and overall sentences. Extensive experiments are conducted on four benchmark datasets, i.e., Restaurant14, Laptop, Restaurant16, and Twitter. The experimental results demonstrate that our model outperforms other models like Target-Dependent LSTM (TD-LSTM), Attention-based LSTM with Aspect Embedding (ATAE-LSTM), LSTM+SynATT+TarRep and Convolution over a Dependency Tree (CDT). Our model is further applied to aspect-level sentiment analysis on “government” and “lockdown” of 1,658,250 tweets about “#COVID-19” that we collected from March 1, 2020 to July 1, 2020. The experimental results show that Twitter users’ positive and negative sentiments fluctuated over time. Through the transparency analysis in data collection, data processing, and results generating, we discuss the reasons for the evolution of users’ emotions over time based on the tweets and on our models.

Aspect Based Sentiment Analysis of Unlabeled Reviews Using Linguistic Rule Based LDA

In this digital era, people are very keen to share their feedback about any product, services, or current issues on social networks and other platforms. A fine analysis of these feedbacks can give a clear picture of what people think about a particular topic. This work proposed an almost unsupervised Aspect Based Sentiment Analysis approach for textual reviews. Latent Dirichlet Allocation, along with linguistic rules, is used for aspect extraction. Aspects are ranked based on their probability distribution values and then clustered into predefined categories using frequent terms with domain knowledge. SentiWordNet lexicon uses for sentiment scoring and classification. The experiment with two popular datasets shows the superiority of our strategy as compared to existing methods. It shows the 85% average accuracy when tested on manually labeled data.

Aspect Based Sentiment Analysis of Unlabeled Reviews using Linguistic Rule Based LDA

Measuring citizen satisfaction with e-government services by using sentiment analysis technology, export citation format, share document.

Skip to Main Content

IEEE Account

Purchase Details

Profile Information

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2023 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Similar articles being viewed by others

Slider with three articles shown per slide. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide.

sentiment analysis research papers

Sentiment analysis on cross-domain textual data using classical and deep learning approaches

28 February 2023

K. Paramesha, H. L. Gururaj, … K. C. Ravishankar

sentiment analysis research papers

Improving Sentiment Analysis for Social Media Applications Using an Ensemble Deep Learning Language Model

11 October 2021

Ahmed Alsayat

sentiment analysis research papers

Classification of Textual Sentiment Using Ensemble Technique

05 November 2021

Md. Mashiur Rahaman Mamun, Omar Sharif & Mohammed Moshiul Hoque

sentiment analysis research papers

Arabic Sentiment Analysis Using Deep Learning and Ensemble Methods

20 May 2021

Amal Alharbi, Manal Kalkatawi & Mounira Taileb

sentiment analysis research papers

Sentiment analysis with deep neural networks: comparative study and performance assessment

22 May 2020

Ramesh Wadawadagi & Veerappa Pagi

sentiment analysis research papers

Efficient feature selection techniques for sentiment analysis

14 December 2019

Avinash Madasu & Sivasankar Elango

Recent advances in deep learning based sentiment analysis

15 September 2020

JianHua Yuan, Yang Wu, … Ting Liu

sentiment analysis research papers

EnSWF: effective features extraction and selection in conjunction with ensemble learning methods for document sentiment classification

09 March 2019

Jawad Khan, Aftab Alam, … Young-Koo Lee

sentiment analysis research papers

An ensemble approach to stabilize the features for multi-domain sentiment analysis using supervised machine learning

14 November 2018

Monalisa Ghosh & Goutam Sanyal

An enhanced approach for sentiment analysis based on meta-ensemble deep learning

Social Network Analysis and Mining volume  13 , Article number:  38 ( 2023 ) Cite this article

130 Accesses

Metrics details

Sentiment analysis, commonly known as “opinion mining,” aims to identify sentiment polarities in opinion texts. Recent years have seen a significant increase in the acceptance of sentiment analysis by academics, businesses, governments, and several other organizations. Numerous deep-learning efforts have been developed to effectively handle more challenging sentiment analysis problems. However, the main difficulty with deep learning approaches is that they require a lot of experience and hard work to tune the optimal hyperparameters, making it a tedious and time-consuming task. Several recent research efforts have attempted to solve this difficulty by combining the power of ensemble learning and deep learning. Many of these efforts have concentrated on simple ensemble techniques, which have some drawbacks. Therefore, this paper makes the following contributions: First, we propose a meta-ensemble deep learning approach to improve the performance of sentiment analysis. In this approach, we train and fuse baseline deep learning models using three levels of meta-learners. Second, we propose the benchmark dataset “Arabic-Egyptian Corpus 2” as an extension of a previous corpus. The corpus size has been increased by 10,000 annotated tweets written in colloquial Arabic on various topics. Third, we conduct several experiments on six benchmark datasets of sentiment analysis in different languages and dialects to evaluate the performance of the proposed meta-ensemble deep learning approach. The experimental results reveal that the meta-ensemble approach effectively outperforms the baseline deep learning models. Also, the experiments reveal that meta-learning improves performance further when the probability class distributions are used to train the meta-learners.

Working on a manuscript?

1 introduction.

The power of social media for expressing opinions about events, topics, people, services, or products has expanded due to the growth of user-generated content on platforms (Naresh and Venkata Krishna 2021 ). Hence, analyzing this huge amount of social media data can help better understand public opinions and trends and effectively make important decisions by classifying the opinions and feelings expressed in the text and determining their polarity as positive, negative, or neutral (Mejova 2009 ).

In the literature, several research efforts have been introduced to approach sentiment analysis using machine learning (Pontiki et al. 2016 ; Ahmed et al. 2013 ; Duwairi et al. 2014 ; Shoukry and Rafea 2012 ; Alomari et al. 2017 ). Extended efforts have used deep learning to handle bigger data and improve the classification’s performance against classical machine learning models (Mohammed and Kora 2019 ; Chen et al. 2018 ; Pontiki et al. 2016 ; Heikal et al. 2018 ; Baly et al. 2017 ; Rojas-Barahona 2016 ). Deep learning techniques aim to overcome the limitations and problems of classical learning through efficient approaches in dealing with complex problems, large amounts of data, and its capacity to automatically extract the feature from the text (Habimana et al. 2020 ; Chan et al. 2020 ). There are several architectures and models for deep learning approaches when applied to sentiment analysis, such as recurrent neural networks (RNN) (Moitra and Mandal 2019 ), gated recurrent unit (GRU) (Le et al. 2019 ), Long Short-Term Memory (LSTM) (Graves 2012 ), Convolutional Neural Networks (CNN) (Collobert and Weston 2008 ). However, the main difficulty with deep learning techniques is identifying the most appropriate architectures and models. Usually, deep models require much effort due to tuning the optimal hyperparameters in the search space of the possible hyperparameters, which is a tedious task (Yadav and Vishwakarma 2020 ). These problems can be overcome by approaching ensemble learning to deep learning. Traditional ensemble learning refers to merging several basic models to build one powerful model (Kumar et al. 2021 ). Ensemble learning has been successfully applied in many fields, such as image classification (Wang et al. 2013 ), medical image (Cho and Won 2003 ; Shipp and Kuncheva 2002 ), music recognition (Stamatatos and Widmer 2002 ), malware detection (Shahzad and Lavesson 2013 ) and text classification (Kulkarni et al. 2018 ). In the literature, there are several ensemble approaches, like, averaging, boosting, bagging, random forest, and stacking (Zhang and Ma 2012 ). In deep learning, most ensemble learning is a simple averaging of model (Tan et al. 2022 ; Mohammadi and Shaverizade 2021 ; Araque et al. 2017 ) due to its simplicity and high results. However, the voting-based ensemble method is not a smart method to combine the models because it is biased toward weak models, which can reduce the performance in a lot of problems (Tasci et al. 2021 ).

To this end, the primary objectives of this research are four-fold. First, we propose a meta-ensemble deep learning approach to boost the performance of sentiment analysis. The proposed approach combines the predictions of several groups of deep models using three levels of meta-learners. In the proposed approach, we achieve diversity in the ensemble by using differences in the training data, the diversity of trained baseline deep learners, and the variation within the fusion of baseline deep models. Second, we propose the benchmark dataset “Arabic-Egyptian corpus”, which consists of 50,000 tweets written in colloquial Arabic on various topics. This corpus is an extended version of the corpus “Arabic-Egyptian corpus” (Mohammed and Kora 2019 ). Third, we conduct a wide range of experiments on six public benchmark datasets to study the performance of the proposed meta-ensemble deep learning approach on sentiment classification in different languages and dialects. For each benchmark dataset, groups of different deep baseline models are trained on partitions of the trained data. Their best performance is compared with the proposed meta-ensemble deep learning approach. Finally, we show the impact of meta-predictions of the proposed meta-ensemble deep learning approach through different models’ predictions, namely the class label probability distribution and the class label predictions. The main contributions of the paper can be summarized as follows:

We propose a meta-ensemble deep learning approach to improve the sentiment classification performance that combines three levels of meta-learners.

We extended the Arabic-Egyptian corpus (Mohammed and Kora 2019 ) by increasing it to 50k annotated tweets.

We train several baseline deep models using six public benchmark sentiment analysis datasets in different languages and dialects.

We conduct a wide range of experiments to study the effect of the meta-ensemble deep learning approach against single deep learning models.

We compare the effect of the generated predictions of meta-learners involved in the proposed approach to improve the performance.

The paper is structured as follows: Sect.  2 provides a brief overview of the challenges of sentiment analysis and various ensemble learning methods as well as highlighting some of the literature used for ensemble learning in sentiment analysis. Section  3 describes the meta-ensemble deep learning approach. Section  4 shows the experimental results, the evaluation of the baseline deep learning models, and the meta-ensemble deep learning approach in each of the different benchmark datasets. Finally, Sect. 5 concludes the paper and suggests future research directions.

2 Related work

Through sentiment analysis, we can obtain important information that helps in making decisions, solving problems, managing crises, correcting misconceptions, providing desired products and services, interacting with consumers on their terms, improving product and service quality, discovering new marketing strategies and increasing sales (Tuysuzoglu et al. 2018 ). Despite its benefits, sentiment analysis is an extremely difficult task due to several challenges and problems (Cambria et al. 2017 ). First, the problem of identifying the subjective parts of the text: The same word can be treated as subjective in one context, while it might be objective in some other. This makes it challenging to distinguish between subjective and objective (sentiment-free) texts. For instance: “The writer’s language was very crude,” and “Crude oil is extracted from the sea-beds”. Second, the problem of domain Dependence: In other contexts, the same sentence can indicate something quite different. The word unpredictable is negative in the domain of movies, but when used in another context, it has a positive connotation. For instance: “The movie was too slow and too long”, “I love long pasta”. Third, the problem of sarcasm Detection: Sarcastic sentences use positive words to convey a negative opinion about a target. For instance: “Nice perfume. You must be marinated in it”. Fourth, the problem of thwarted Expressions: In some sentences, the polarity of the text is determined by a small portion of the text. For instance: “Although I’m tired, the day is great.” Fifth, the problem of indirect Negation of Sentiment: Such negations are not easily defined because they do not contain “no,” “not,” etc. Sixth, the problem of order Dependence: When the words are not considered independent. For instance, “A is better than B”. Seventh, the problem of entity Recognition: A text may not always refer to the same entity. For instance, “I hate Samsung, but I like OPPO”. Eighth, the problem of identifying Opinion Holders: All written in a text is not always the author’s opinion. For instance, when the author quotes someone. Ninth and finally, the problem of associating sentiment with specific keywords: Many statements express very strong opinions, but it is impossible to identify the source of these sentiments. Generally, sentiment analysis can occur at three levels: Sentence, Document, and Aspect/Feature. At the sentence level, the task of this level is sentence by sentence and decides whether each sentence represents a neutral, positive, or negative opinion. At the document level, this analysis level identifies a document’s overall sentiment and categorizes it as negative or positive. At the aspect level (also known as a word or feature level), this level of analysis aims to discover sentiments on entities and/or their aspects (Wagh and Punde 2018 ).

In recent years, ensemble learning has been considered one of the most successful techniques in machine learning (Sagi and Rokach 2018 ). The main factors behind the ensemble system’s success are increasing diversity among baseline classifier types, using different ensemble methods, using different beginning parameters, and creating multiple datasets from the original dataset (cross-validation or sub-samples) (Mohammed and Kora 2021 ). Ensemble methods aim to increase prediction accuracy by combining decisions from various sub-models into a new model. Besides, the ensemble methods help avoid overfitting and reduce variance and biases. Also, ensemble learning helps to generate multiple hypotheses using the same base learner. In addition, ensemble learning methods help reduce the drawbacks of the baseline models (Alojail and Bhatia 2020 ). The most popular ensemble techniques for enhancing machine learning performance are bagging, boosting, and stacking. Table 1 describes the advantages and disadvantages of each.

There are several domains using ensemble learning methods to generalize machine learning techniques, such as natural language processing (NLP), internet of things (IoT), recommender systems, face recognition, information security, information retrieval, image retrieval, and intrusion detection system (Mohammed and Kora 2021 ; Forouzandeh et al. 2021 ; Yaman et al. 2018 ; Pashaei Barbin et al. 2020 ). Also, in sentiment analysis, many research studies have shown the superiority of the different ensemble learning methods over traditional machine learning classifiers. For example, the research efforts of Kanakaraj and Guddeti ( 2015 ); Prusa et al. ( 2015 ); Wang et al. ( 2014 ); Alrehili and Albalawi ( 2019 ); Sharma et al. ( 2018 ); Fersini et al. ( 2014 ); Perikos and Hatzilygeroudis ( 2016 ); Onan et al. ( 2016 ) applied a bagging method on a several of baseline classifiers such as (NB, SVM, KNN, LR, DT, ME) for English sentiment analysis. Also, the authors in Xia et al. ( 2011 ); Tsutsumi et al. ( 2007 ); Rodriguez-Penagos et al. ( 2013 ); Clark and Wicentwoski ( 2013 ); Li et al. ( 2010 ) applied two ensemble methods by voting and stacking based on NB, SVM and LR for English sentiment analysis. In addition, the authors in Da Silva et al. ( 2014 ); Xia et al. ( 2016 ); Fersini et al. ( 2016 ); Araque et al. ( 2017 ); Saleena ( 2018 ) applied majority voting based on several traditional classifiers such as SVM, RF, LR, NB, DT, and ME for English sentiment analysis. At the same time, several studies applied a stacking based on traditional classifiers for non-English sentiment analysis. For example, the authors in Lu and Tsou ( 2010 ); Li et al. ( 2012 ); Su et al. ( 2012 ) applied a stacking based on KNN, NB, SVM, and ME for Chinese reviews, the authors in Pasupulety et al. ( 2019 ) applied a stacking based on SVM and RF for India’s reviews. In contrast, few studies applied ensemble learning techniques based on traditional classifiers of the Arabic language and its different dialects. For example, the authors in Saeed et al. ( 2022 ) applied a stacking based on SVM, NB, LR, DT, and KNN for Arabic sentiment analysis. But the authors in Oussous et al. ( 2018 ) applied a stacking based on SVM and ME for Moroccan tweets. On the other hand, ensemble-based deep learning models are a powerful alternative to traditional ensemble learning methods. Ensemble deep learning has shown excellent performance in sentiment analysis. For example, the researchers in Deriu et al. ( 2016 ); Akhtyamova et al. ( 2017 ) applied two ensemble methods by voting and stacking based on CNN for English sentiment analysis. Similarly, the work in Xu et al. ( 2016 ); Araque et al. ( 2017 ); Mohammadi and Shaverizade ( 2021 ); Haralabopoulos et al. ( 2020 ) applied voting and stacking based on LSTM and CNN for English sentiment analysis. However, the researchers in Heikal et al. ( 2018 ) applied voting based on CNN, GRU, and LSTM for Arabic sentiment analysis.

3 Proposed meta-ensemble deep learning approach

The meta-ensemble deep learning approach architecture consists of three layers, which are level-1, level-2, and level-3, as in Fig.  1 . Level 1 represents the input layer, where each board of (M) models is trained independently using a unique training dataset and different deep architectures. Level 2 represents the meta-learner’s hidden layer, in which each board model’s prediction outputs in the previous layer are combined using a meta-learner. Level 3 represents the output meta-learner layer. At this level, the outputs of all predictions of the level-2 meta-learner are combined using the final level of the meta-learner to produce the final results. The proposed approach in abstract form can be seen as a general meta-neural network in which the first level is considered the input layer, level 2 is the hidden layer that acts as an activation function, and level 3 is the output layer.

figure 1

The general architecture of the proposed meta-ensemble deep learning approach

3.1 Description of the proposed Algorithm

The formal semantics of the proposed training procedure of the proposed approach is shown in algorithm 1. The algorithm starts by randomly generating N equally-size samples from a training dataset \(Data^{(0)}\) . Each data sample \(Data^{(0)}_i\) =(train \(^{(0)}_i\) ,test \(^{(0)}_i\) ) is splitted into two parts; training and testing data. At the Baseline Learning procedure, the \(Level-1\) learning models are generated by applying M \(BL_j\) Baseline Deep learners on each training dataset (train \(^{(0)}_i\) ). As a result, we have n boards \(C_i, 1 \le i \le n\) each containing M diverse baseline models \(C_ i = Model_ {i1}, Model_ {i2}, \dots , Model_ {iM}\) . For each test, \(Test^{(0)}_i=(X^{(0)}, Y^{(0)})\) , of the n data samples are used to create metadata \(Data^{(1)}_i\) of the next level by stacking all the predicted output of each model \(Model_i\) . Each \(Data^{(1)}_i\) in level-2 has \(M+1\) features: M features result from the prediction of the model in the board \(C_i\) on the \(test^(0)\) , and one extra feature represents the class label \(Y^(0)\) . In \(Level-2\) once metadata has been generated, a set ShallowClf of n shallow meta classifier is used to generate the models of Level-2. Following the creation of Level-2 models, test \(^{(1)}_i=(X^{(a)},Y^{(1)})\) are utilized to construct top the final meta data of \(Level-3\) . Likewise the previous level, the top metadata are generated in two steps. The first step generates \(Data^{(1)}_i\) of \(n+1\) features results from the predictions of Level-2 models on \(X^{(1)}\) and target class \(Y^{(1)}\) . In the next step, we construct \(Data^{(1)}_i\) to form the final metadata. A Final meta learner is utilized to learn those top metadata in the Level-3 learning phase.

figure a

4 Experiment results

This section describes the benchmark datasets used for sentiment analysis, the selection of baseline deep models, and shallow meta-classifiers in the framework of the proposed meta-ensemble deep learning approach scheme.

4.1 Description of benchmark datasets

To evaluate the extended meta-ensemble deep learning approach, we selected six sentiment benchmark datasets for conducting the experiments based on English, Arabic, and different dialects: We propose the first dataset called “Arabic-Egyptian corpus 2”, which made up of 40,000 annotated tweets from the corpus (Mohammed and Kora 2019 ), and another extension of 10 K tweets which is available in Kora and Mohammed ( 2022 ). The later extension consists of 5k positive and 5k negative tweets from the Arabic language and the Egyptian dialect. The second dataset includes tweets in the Saudi dialect related to distance learning during the Covid19 pandemic (Aljabri et al. 2021 ). It contains a total of 1675 tweets, which includes more positive tweets than negative tweets. The third dataset is ASTD (Nabil et al. 2015 ). It contains about 10K Arabic tweets from different dialects and is classified into 797 positive and 1682 negative (Table 2 ). Tweets were annotated as positive, neutral, negative, and mixed. The fourth dataset is ArSenTD-LEV (Al-Laith and Shahbaz 2021 ). It contains 4000 tweets from countries in the Levant Region, such as Jordan, Palestine, Lebanon and Syria. The fifth dataset is Movie Reviews (Koh et al. 2010 ). It contains 10,662 reviews, divided into 5331 negative and 5331 positives. The sixth dataset is the Twitter US Airline Sentiment dataset (Rane and Kumar 2018 ). Table 3 summarizes the characteristics of different benchmark datasets for sentiment analysis. It contains 14,600 customer tweets from six airlines in the US, including negative, positive, and neutral sentiments. In general, the textual data was preprocessed using one-hot encoding or word-embedding (Lai et al. 2016 ), as an initial layer before training the network. Only the positive and negative binary sentiment polarity labels are used for each dataset, and the other polarity labels are neglected. In our experiments, we divided each benchmark dataset into training and validation test sets with a ratio of ( \(80\%\) , \(20\%\) ). In addition, we divided each benchmark dataset into eight partitions.

4.2 Baseline deep learning models

To enhance the performance of predictions in sentiment analysis through the proposed meta-ensemble deep learning approach, we first need to build a set of deep learning models that form the baseline classifiers of the proposed meta-ensemble deep learning approach for each benchmark dataset. Three deep baseline models are proposed in this research: Long Short-Term Memory (LSTM) is the first baseline deep model utilized in our evaluation (Mohammed and Kora 2019 ). The LSTM model is a well-known architecture for representing sequential data. It was designed better to capture long-term dependencies than the recurrent neural network model. Three gates comprise LSTM architecture: the input gate, the forget gate, and the output gate. The Gated recurrent unit (GRU) is the next baseline deep model (Pan et al. 2020 ). The GRU model is comparable to the LSTM model, except it contains fewer parameters. GRU comprises of two gates: the reset gate and the update gate. The Convolutional Neural Network Model (CNN) is the third baseline deep model (Abdulnabi et al. 2015 ). The CNN model is a feedforward neural network consisting of one or more convolutional layers and a fully connected layer, which also includes a pooling layer for integration. In general, each deep baseline model is trained on different hyperparameters. Table 4 shows the configurations of baseline deep learning models. Table 5 shows the accuracy of each data split within each dataset and the average accuracy of each baseline deep model in each dataset. It should be mentioned that the experimental results reveal that the highest average accuracy obtained in the first dataset of Arabic-Egyptian Corpus is 89.38% of the LSTM model. Also, the highest average accuracy obtained in the second dataset of Saudi Arabia Tweets is 65.38% of the LSTM2 model. In addition, the highest average accuracy obtained in the third ASTD dataset is 71.6% of the LSTM model. Moreover, the highest average accuracy obtained in the fourth ArSenTD-LEV dataset is 76.2% of the LSTM model. Additionally, the highest average accuracy obtained in the fifth dataset of the Movie Reviews dataset is 78.03% of the LSTM1 model. Finally, the highest average accuracy obtained in the Twitter US Airline Sentiment dataset’s sixth dataset is 80.05% of the LSTM1 model. In the conducted experiments, 114 deep baseline models in all have been trained. In addition, the sizes of the baseline models vary on each dataset. In Saudi Arabia, tweets, Movie Reviews, and Twitter US Airline Sentiment are 4 deep baseline models, while ASTD and ArSenTD-LEV are 3 deep baseline models.

4.3 Meta-ensemble classifiers

To combine the trained baseline deep models within the boards of models, we use a set of shallow meta-classifiers that include Support Vector Machines (SVM), Gradient Boosting (GB), Naive Bayes (NB), Random Forest (RF), Logistic Regression (LG) as top surface meta learners. Table 6 describes the accuracy results of the proposed clustering method in each dataset. In the first dataset of Arabic-Egyptian Corpus, the results indicate that the ensemble with SVM classifier achieved the best accuracy in both hard and soft prediction with a score of 92.6% and 93.2%, respectively. In the second dataset of Saudi Arabian tweets, the results indicate that the ensemble with the SVM classifier achieved the best accuracy in the hard prediction of 69.9%. In contrast, the ensemble with both the SVM and LG classifier achieved the best soft prediction accuracy with a score of 72.3%. In the third dataset of ASTD, the results indicate that both the ensemble with SVM and LG classifier achieved the best accuracy in hard prediction with a score of 75.9%. At the same time, the ensemble with the LG classifier achieved the best accuracy in soft prediction with a score of 77.6%. In the fourth dataset of ArSenTD-LEV, the results indicate that the ensemble with the SVM classifier achieved the best accuracy in hard prediction with a score of 80.4%. In contrast, the ensemble with the LG classifier achieved the best accuracy in soft prediction with a score of 83.2%. In the fifth Movie Reviews dataset, the results indicate that the ensemble with the SVM classifier achieved the best accuracy in both hard and soft prediction with a score of 80.9% and 83.9%, respectively. In the sixth dataset of Twitter US Airline Sentiment, the results indicate that the ensemble with the SVM classifier achieved the best accuracy in hard prediction with a score of 82.9%. At the same time, the ensemble with the GB classifier achieved the best accuracy in soft prediction with a score of 85.3%. Table 7 compares the highest accuracy results of the average baseline deep models with the highest accuracy results of meta-ensemble classifiers in each dataset. It can be noted that the highest average accuracy was obtained in the proposed meta-ensemble in the different datasets in soft prediction. Also, it can be noted that the highest average accuracy obtained in baseline deep models in the different datasets is the LSTM model than in the other networks. In general, it can be noted that different meta-ensemble classifiers show better performance for the final prediction. It can also be noted that using 5-fold cross-validation on the predictions of deep baseline models, SVM is shown as the most frequent best combiner to fuse the boards of models in the level-1 with 93.2%, 72.3% and 83.9% in each of the Arabic-Egyptian Corpus, Saudi Arabia Tweets and Movie Reviews datasets, respectively. In addition, LG is shown as the most frequent best combiner to fuse the boards of models in level-1 with 77.6% and 83.2% in both the ASTD and ArSenTD-LEV datasets, respectively. Finally, GB is considered the most frequent best combiner to fuse the models’ boards in the level-1 at 85.3% in the Twitter US Airline Sentiment datasets.

5 Conclusion

Deep learning models have shown great success in sentiment analysis in the literature. However, modeling an effective deep learning model requires great effort due to finding the best architecture of the neural network and the best configuration of hyperparameters. An approach for tackling these limitations is using the ensemble methods. The key idea of the ensemble is to produce a powerful learner using a combination of weak learners. Thus, in this research paper, we proposed a meta-ensemble deep learning approach to improve the performance of sentiment analysis. This proposed approach combines the predictions of several groups of deep models using three levels of the meta-learning method. Also, we proposed the benchmark dataset “Arabic-Egyptian Corpus 2”. This corpus comprises 10,000 annotated tweets written in colloquial Arabic on various topics. This corpus is added to the original version in Mohammed and Kora ( 2019 ) that contains 40K annotated tweets. We conducted several experiments on six public benchmark datasets for sentiment analysis involving several languages and dialects to test and evaluate the performance of the proposed meta-ensemble deep learning approach. We trained sets of baseline classifiers (GRU, LSTM, and CNN) on each benchmark dataset, and their best model was compared with the proposed meta-ensemble deep learning approach. In particular, we have trained 114 deep models and performed a comparison on five different shallow meta-classifiers to ensemble those models. The experimental results revealed that the meta-ensemble deep learning approach effectively outperforms all six benchmark datasets’ baseline deep learning models. Also, the experiments suggested that the meta-learners work better when the predictions of the involved layers are of the form probability distribution. In summary, the proposed ensemble approach uses parallel ensemble techniques where baseline learners are generated simultaneously, as there is no data dependency and the fusion methods depend on the meta-learning method. However, our proposed approach has some challenges and limitations, such as determining the appropriate number of baseline models and selecting baseline models that can be relied upon to generate the best predictions from each dataset when designing our meta-ensemble deep learning approach from scratch. Also, the difficulty of computing time complexity is added when the amount of available data grows exponentially. In addition, the issue of multi-label classification raises many problems, such as overfitting and the curse of dimensionality, in the case of high dimensionality of data. Handling a multi-class problems worth investigating in case of multi-level ensemble. Also, transformer models recently received more attention in NLP tasks. It is worth investigating the impact of ensemble learning with transformers with full extensive experiments.

Abdulnabi AH, Wang G, Lu J, Jia K (2015) Multi-task cnn model for attribute prediction. IEEE Trans Multimedia 17(11):1949–1959

Article   Google Scholar  

Ahmed S, Pasquier M, Qadah G (2013) Key issues in conducting sentiment analysis on arabic social media text. In: 2013 9th International conference on innovations in information technology (IIT), pp 72–77. IEEE

van Aken B, Risch J, Krestel R, Löser (2018) A challenges for toxic comment classification: an in-depth error analysis. In: ALW

Akhtyamova L, Ignatov A, Cardiff J (2017) A large-scale cnn ensemble for medication safety analysis. In: International conference on applications of natural language to information systems, pp 247–253. Springer

Al-Laith A, Shahbaz M (2021) Tracking sentiment towards news entities from arabic news on social media. Future Gener Comput Syst 118:467–484

Aljabri M, Chrouf SMB, Alzahrani NA, Alghamdi L, Alfehaid R, Alqarawi R, Alhuthayfi J, Alduhailan N (2021) Sentiment analysis of arabic tweets regarding distance learning in saudi arabia during the covid-19 pandemic. Sensors 21(16):5431

Alojail M, Bhatia S (2020) A novel technique for behavioral analytics using ensemble learning algorithms in e-commerce. IEEE Access 8:150072–150080

Alomari KM, ElSherif HM, Shaalan K (2017) Arabic tweets sentimental analysis using machine learning. In: International conference on industrial, engineering and other applications of applied intelligent systems, pp 602–610. Springer

Alrehili A, Albalawi K (2019) Sentiment analysis of customer reviews using ensemble method, pp 1–6

Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Expert Syst Appl 77:236–246

Baly R, El-Khoury G, Moukalled R, Aoun R, Hajj H, Shaban KB, El-Hajj W (2017) Comparative evaluation of sentiment analysis methods across arabic dialects. Procedia Comput Sci 117:266–273

Bethard S, Savova G, Chen WT, Derczynski L, Pustejovsky J, Verhagen M (2016) Semeval-2016 task 12: clinical tempeval. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 1052–1062

Cambria E, Das D, Bandyopadhyay S, Feraco A, et al (2017) A practical guide to sentiment analysis

Cambria E, Schuller B, Xia Y, Havasi C (2013) New avenues in opinion mining and sentiment analysis. IEEE Intell Syst 28(2):15–21

Chan S, Reddy V, Myers B, Thibodeaux Q, Brownstone N, Liao W (2020) Machine learning in dermatology: current applications, opportunities, and limitations. Dermatol Therapy 10(3):365–386

Chaovalit P, Zhou L (2005) Movie review mining: a comparison between supervised and unsupervised classification approaches. In: Proceedings of the 38th annual Hawaii international conference on system sciences, pp 112c–112c. IEEE

Chen L, Wang W, Nagarajan M, Wang S, Sheth A (2012) Extracting diverse sentiment expressions with target-dependent polarity from twitter. In: Proceedings of the international AAAI conference on web and social media, vol 6, pp 50–57

Chen Y, Yuan J, You Q, Luo J (2018) Twitter sentiment analysis via bi-sense emoji embedding and attention-based lstm. In: 2018 ACM Multimedia conference on multimedia conference, pp 117–125. ACM

Cho SB, Won HH (2003) Machine learning in dna microarray analysis for cancer classification. In: Proceedings of the First Asia-Pacific bioinformatics conference on bioinformatics 2003-volume 19, pp 189–198

Clark S, Wicentwoski R (2013) Swatcs: combining simple classifiers with estimated accuracy. In: Second joint conference on lexical and computational semantics (* SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 425–429

Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167

Da Silva NF, Hruschka ER, Hruschka ER Jr (2014) Tweet sentiment analysis with classifier ensembles. Decis Support Syst 66:170–179

Deriu J, Gonzenbach M, Uzdilli F, Lucchi A, Luca VD, Jaggi M (2016) Swisscheese at semeval-2016 task 4: sentiment classification using an ensemble of convolutional neural networks with distant supervision. In: Proceedings of the 10th international workshop on semantic evaluation, CONF, pp 1124–1128

Duwairi RM, Marji R, Sha’ban N, Rushaidat S (2014) Sentiment analysis in arabic tweets. In: 2014 5th International conference on information and communication systems (ICICS), pp 1–6. IEEE

Dzikovska MO, Nielsen RD, Brew C, Leacock C, Giampiccolo D, Bentivogli L, Clark P, Dagan I, Dang HT (2013) Semeval-2013 task 7: the joint student response analysis and 8th recognizing textual entailment challenge. North Texas State Univ Denton, Tech. rep

Fersini E, Messina E, Pozzi FA (2014) Sentiment analysis: Bayesian ensemble learning. Decis Support Syst 68:26–38

Fersini E, Messina E, Pozzi FA (2016) Expressive signals in social media languages to improve polarity detection. Inf Process Manag 52(1):20–35

Forouzandeh S, Berahmand K, Rostami M (2021) Presentation of a recommender system with ensemble learning and graph embedding: a case on movielens. Multimedia Tools Appl 80(5):7805–7832

Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision. CS224N project report, Stanford 1(12), 2009

Graves A (2012) Long short-term memory. Supervised sequence labelling with recurrent neural networks, pp 37–45

Habimana O, Li Y, Li R, Gu X, Yu G (2020) Sentiment analysis using deep learning approaches: an overview. Sci China Inf Sci 63(1):1–36

Haralabopoulos G, Anagnostopoulos I, McAuley D (2020) Ensemble deep learning for multilabel binary classification of user-generated content. Algorithms 13(4):83

Heikal M, Torki M, El-Makky N (2018) Sentiment analysis of arabic tweets using deep learning. Procedia Comput Sci 142:114–122

Kanakaraj M, Guddeti RMR (2015) Performance analysis of ensemble methods on twitter sentiment analysis using nlp techniques. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 169–170. IEEE

Karimi S, Metke-Jimenez A, Kemp M, Wang C (2015) Cadec: a corpus of adverse drug event annotations. J Biomed Inform 55:73–81

Koh NS, Hu N, Clemons EK (2010) Do online reviews reflect a product’s true perceived quality? An investigation of online movie reviews across cultures. Electron Commer Res Appl 9(5):374–385

Kora R, Mohammed A (2022) Arabic-Egyptian Corpus 2.

Kulkarni NH, Srinivasan G, Sagar B, Cauvery N (2018) Improving crop productivity through a crop recommendation system using ensembling technique. In: 2018 3rd International conference on computational systems and information technology for sustainable solutions (CSITSS), pp 114–119. IEEE

Kumar G, Misra AK (2018) Commonality in liquidity: evidence from India’s national stock exchange. J Asian Econ 59:1–15

Kumar V, Aydav PSS, Minz S (2021) Multi-view ensemble learning using multi-objective particle swarm optimization for high dimensional data classification. J King Saud Univ-Comput Inf Sci

Lai S, Liu K, He S, Zhao J (2016) How to generate a good word embedding. IEEE Intell Syst 31(6):5–14

Le NQK, Yapp EKY, Yeh HY (2019) Et-gru: using multi-layer gated recurrent units to identify electron transport proteins. BMC Bioinform 20(1):1–12

Li FH, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: Twenty-second international joint conference on artificial intelligence

Li S, Lee SY, Chen Y, Huang CR, Zhou G (2010) Sentiment classification and polarity shifting. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 635–643

Li W, Wang W, Chen Y (2012) Heterogeneous ensemble learning for Chinese sentiment classification. J Inf Comput Sci 9(15):4551–4558

Google Scholar  

Lu B, Tsou BK (2010) Combining a large sentiment lexicon and machine learning for subjectivity classification. In: 2010 international conference on machine learning and cybernetics, vol 6, pp 3311–3316. IEEE

Mejova Y (2009) Sentiment analysis: an overview. University of Iowa, Computer Science Department

Mohammadi A, Shaverizade A (2021) Ensemble deep learning for aspect-based sentiment analysis. Int J Nonlinear Anal Appl 12(Special Issue):29–38

Mohammed A, Kora R (2019) Deep learning approaches for arabic sentiment analysis. Soc Netw Anal Min 9(1):52

Mohammed A, Kora R (2021) An effective ensemble deep learning framework for text classification. J King Saud Univ-Comput Inf Sci

Moitra D, Mandal RK (2019) Automated ajcc staging of non-small cell lung cancer (nsclc) using deep convolutional neural network (cnn) and recurrent neural network (rnn). Health Inf Sci Syst 7(1):1–12

Nabil M, Aly M, Atiya A (2015) Astd: Arabic sentiment tweets dataset. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2515–2519

Nakov P, Rosenthal S, Kiritchenko S, Mohammad SM, Kozareva Z, Ritter A, Stoyanov V, Zhu X (2016) Developing a successful semeval task in sentiment analysis of twitter and other social media texts. Lang Resour Eval 50(1):35–65

Naresh A, Venkata Krishna P (2021) An efficient approach for sentiment analysis using machine learning algorithm. Evol Intel 14(2):725–731

Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1–16

Oussous A, Lahcen AA, Belfkih S (2018) Improving sentiment analysis of moroccan tweets using ensemble learning. In: International conference on big data, cloud and applications, pp 91–104. Springer

Pan M, Zhou H, Cao J, Liu Y, Hao J, Li S, Chen CH (2020) Water level prediction model based on gru and cnn. IEEE Access 8:60090–60100

Pang B, Lee L (2005) Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: ACL

Pashaei Barbin J, Yousefi S, Masoumi B (2020) Efficient service recommendation using ensemble learning in the internet of things (iot). J Ambient Intell Humaniz Comput 11(3):1339–1350

Pasupulety U, Anees AA, Anmol S, Mohan BR (2019) Predicting stock prices using ensemble learning and sentiment analysis. In: 2019 IEEE second international conference on artificial intelligence and knowledge engineering (AIKE), pp 215–222. IEEE

Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers. Eng Appl Artif Intell 51:191–201

Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, Mohammad AS, Al-Ayyoub M, Zhao Y, Qin B, De Clercq O, et al (2016) Semeval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 19–30

Prusa J, Khoshgoftaar TM, Dittman DJ (2015) Using ensemble learners to improve classifier performance on tweet sentiment data. In: 2015 IEEE international conference on information reuse and integration, pp 252–257. IEEE

Rane A, Kumar A (2018) Sentiment classification system of twitter data for us airline service analysis. In: 2018 IEEE 42nd annual computer software and applications conference (COMPSAC), vol 1, pp 769–773. IEEE

Rodriguez-Penagos C, Atserias J, Codina-Filba J, García-Narbona D, Grivolla J, Lambert P, Saurí R (2013) Fbm: combining lexicon-based ml and heuristics for social media polarities. In: Second joint conference on lexical and computational semantics (*SEM), volume 2: proceedings of the seventh international workshop on semantic evaluation (SemEval 2013), pp 483–489

Rojas-Barahona LM (2016) Deep learning for sentiment analysis. Lang Linguist Compass 10(12):701–719

Rushdi-Saleh M, Martín-Valdivia MT, Ureña-López LA, Perea-Ortega JM (2011) Oca: opinion corpus for arabic. J Am Soc Inform Sci Technol 62(10):2045–2054

Saeed RM, Rady S, Gharib TF (2022) An ensemble approach for spam detection in arabic opinion texts. J King Saud Univ-Comput Inf Sci 34(1):1407–1416

Sagi O, Rokach L (2018) Ensemble learning: a survey. Wiley Interdiscip Rev: Data Min Knowl Discovery 8(4):e1249

Saif H, Fernandez M, He Y, Alani H (2013) Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold

Saleena N et al (2018) An ensemble classification system for twitter sentiment analysis. Procedia Comput Sci 132:937–946

Seki Y, Evans DK, Ku LW, 0001, L.S., Chen HH, Kando N (2008) Overview of multilingual opinion analysis task at ntcir-7. In: NTCIR, pp 185–203. Citeseer

Shahzad RK, Lavesson N (2013) Comparative analysis of voting schemes for ensemble-based malware detection. J Wirel Mobile Netw Ubiquitous Comput Depend Appl 4(1):98–117

Sharma S, Srivastava S, Kumar A, Dangi A (2018) Multi-class sentiment analysis comparison using support vector machine (svm) and bagging technique-an ensemble method. In: 2018 International conference on smart computing and electronic enterprise (ICSCEE), pp 1–6. IEEE

Shipp CA, Kuncheva LI (2002) Relationships between combination methods and measures of diversity in combining classifiers. Inf Fusion 3(2):135–148

Shoukry A, Rafea A (2012) Sentence-level arabic sentiment analysis. In: 2012 International conference on collaboration technologies and systems (CTS), pp 546–550. IEEE

Speriosu M, Sudan N, Upadhyay S, Baldridge J (2011) Twitter polarity classification with label propagation over lexical links and the follower graph. In: Proceedings of the first workshop on unsupervised learning in NLP, pp 53–63

Stamatatos E, Widmer G (2002) Music performer recognition using an ensemble of simple classifiers. In: ECAI, pp 335–339

Su Y, Zhang Y, Ji D, Wang Y, Wu H (2012) Ensemble learning for sentiment classification. In: Workshop on Chinese lexical semantics, pp 84–93. Springer

Tan KL, Lee CP, Lim KM, Anbananthen KSM (2022) Sentiment analysis with ensemble hybrid deep learning model. IEEE Access 10:103694–103704

Tasci E, Uluturk C, Ugur A (2021) A voting-based ensemble deep learning method focusing on image augmentation and preprocessing variations for tuberculosis detection. Neural Comput Appl, pp 1–15

Tratz S, Briesch D, Laoudi J, Voss C (2013) Tweet conversation annotation tool with a focus on an arabic dialect, moroccan darija. In: Proceedings of the 7th linguistic annotation workshop and interoperability with discourse, pp 135–139

Tsutsumi K, Shimada K, Endo T (2007) Movie review classification based on a multiple classifier. In: Proceedings of the 21st pacific Asia conference on language, information and computation, pp 481–488

Tuysuzoglu G, Birant D, Pala A (2018) Ensemble methods in environmental data mining. Sch Environ Sci, pp 1–16

Wagh R, Punde P (2018) Survey on sentiment analysis using twitter dataset. In: 2018 Second international conference on electronics, communication and aerospace technology (ICECA), pp 208–211. IEEE

Wang G, Sun J, Ma J, Xu K, Gu J (2014) Sentiment classification: the contribution of ensemble learning. Decis Support Syst 57:77–93

Wang XY, Zhang BB, Yang HY (2013) Active svm-based relevance feedback using multiple classifiers ensemble and features reweighting. Eng Appl Artif Intell 26(1):368–381

Whitehead M, Yaeger L (2009) Building a general purpose cross-domain sentiment mining model. In: 2009 WRI world congress on computer science and information engineering, vol 4, pp 472–476. IEEE

Wiebe J, Wilson T, Cardie C (2005) Annotating expressions of opinions and emotions in language. Lang Resour Eval 39(2):165–210

Wilson T, Wiebe J, Hwa R (2006) Recognizing strong and weak opinion clauses. Comput Intell 22(2):73–99

Article   MathSciNet   Google Scholar  

Xia R, Xu F, Yu J, Qi Y, Cambria E (2016) Polarity shift detection, elimination and ensemble: a three-stage model for document-level sentiment analysis. Inf Process Manag 52(1):36–45

Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci 181(6):1138–1152

Xu S, Liang H, Baldwin T (2016) Unimelb at semeval-2016 tasks 4a and 4b: An ensemble of neural networks and a word2vec based model for sentiment classification. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 183–189

Yadav A, Vishwakarma DK (2020) Sentiment analysis using deep learning architectures: a review. Artif Intell Rev 53(6):4335–4385

Yaman MA, Subasi A, Rattay F (2018) Comparison of random subspace and voting ensemble machine learning methods for face recognition. Symmetry 10(11):651

Zhang C, Ma Y (2012) Ensemble machine learning: methods and applications. Springer

Download references

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and affiliations.

Department of Computer Science, Faculty of Graduate Studies for Statistical Researches, Cairo University, Cairo, Egypt

Rania Kora & Ammar Mohammed

You can also search for this author in PubMed   Google Scholar


Paper is written by AM and RK Paper is reviewed by AM.

Corresponding author

Correspondence to Ammar Mohammed .

Ethics declarations

Conflict of interest.

The authors declare no conflict of interest.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit .

Reprints and Permissions

About this article

Cite this article.

Kora, R., Mohammed, A. An enhanced approach for sentiment analysis based on meta-ensemble deep learning. Soc. Netw. Anal. Min. 13 , 38 (2023).

Download citation

Received : 31 December 2022

Revised : 27 January 2023

Accepted : 17 February 2023

Published : 02 March 2023


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

What is sentiment analysis?

What is sentiment analysis used for, why is sentiment analysis important, use cases for sentiment analysis, types of sentiment analysis, pros and cons of using a sentiment analysis system, how does sentiment analysis work, sentiment analysis challenges, three places to analyze customer sentiment, sentiment analysis tools, analyzing customer sentiment, creating better experiences, try qualtrics for free, what is sentiment analysis and how can users leverage it.

20 min read From survey results and customer reviews to social media mentions and chat conversations, today’s businesses have access to data from numerous sources. But how can teams turn all of that data into meaningful insights? Find out how sentiment analysis can help.

When it comes to branding, simply having a great product or service is not enough.  In order to determine the true impact of a brand, organizations must leverage data from across customer feedback channels to fully understand the market perception of their offerings.

Quantitative feedback available via metrics such as net promoter scores can provide some information about brand performance, but qualitative feedback in the form of unstructured data provides more nuanced insight into how people actually “feel” about your brand .

Sifting through textual data, however, can be extremely time-consuming. Whether analyzing solicited feedback via channels such as surveys or examining unsolicited feedback found on social media, online forums, and more, it’s impossible to comprehensively identify and integrate data on brand sentiment when relying solely on manual processes.

Leveraging an omnichannel analytics platform allows teams to collect all of this information and aggregate it into a complete view. Once obtained, there are many ways to analyze and enrich the data, one of which involves conducting sentiment analysis. Sentiment analysis can be used to improve customer experience through direct and indirect interactions with your brand. Let’s consider the definition of sentiment analysis, how it works and when to use it.

Learn how TextiQ can help you conduct advanced sentiment analysis

Sentiment refers to the positivity or negativity expressed in text. Sentiment analysis provides an effective way to evaluate written or spoken language to determine if the expression is favorable, unfavorable, or neutral, and to what degree. Because of this, it gives a useful indication of how the customer felt about their experience.

If you’ve ever left an online review, made a comment about a brand or product online, or answered a large-scale market research survey , there’s a chance your responses have been through sentiment analysis.

Sentiment analysis is part of the greater umbrella of text mining, also known as text analysis . This type of analysis extracts meaning from many sources of text, such as surveys , reviews, public social media, and even articles on the Web. A score is then assigned to each clause based on the sentiment expressed in the text. For example, -1 for negative sentiment and +1 for positive sentiment. This is done using natural language processing (NLP).

Positive neutral and negative sentiment chart

Today’s algorithm-based sentiment analysis tools can handle huge volumes of customer feedback consistently and accurately. A type of text analysis , sentiment analysis, reveals how positive or negative customers feel about topics ranging from your products and services to your location, your advertisements, or even your competitors.

Accurate sentiment analysis can be difficult to conduct, what’s the benefit? Why do we use an AI-powered tool to categorize natural language feedback rather than our human brains?

Mostly, it’s a question of scale. Sentiment analysis is helpful when you have a large volume of text-based information that you need to generalize from.

For example, let’s say you work on the marketing team at a major motion picture studio, and you just released a trailer for a movie that got a huge volume of comments on Twitter.

You can read some – or even a lot – of the comments, but you won’t be able to get an accurate picture of how many people liked or disliked it unless you look at every last one and make a note of whether it was positive, negative or neutral. That would be prohibitively expensive and time-consuming, and the results would be prone to a degree of human error.

On top of that, you’d have a risk of bias coming from the person or people going through the comments. They might have certain views or perceptions that color the way they interpret the data, and their judgment may change from time to time depending on their mood, energy levels, and other normal human variations.

On the other hand, sentiment analysis tools provide a comprehensive, consistent overall verdict with a simple button press.

From there, it’s up to the business to determine how they’ll put that sentiment into action .

Sentiment analysis is critical because it helps provide insight into how customers perceive your brand .

Customer feedback – whether that’s via social media, the website, conversations with service agents, or any other source – contains a treasure trove of useful business information, but it isn’t enough to know what customers are talking about. Knowing how they feel will give you the most insight into how their experience was. Sentiment analysis is one way to understand those experiences.

Sometimes known as “opinion mining,” sentiment analysis can let you know if there has been a change in public opinion toward any aspect of your business. Peaks or valleys in sentiment scores give you a place to start if you want to make product improvements, train sales reps or customer care agents, or create new marketing campaigns.

We live in a world where huge amounts of written information are produced and published every moment, thanks to the internet, news articles, social media, and digital communications. Sentiment analysis can help companies keep track of how their brands and products are perceived, both at key moments and over a period of time.

It can also be used in market research , PR, marketing analysis, reputation management , stock analysis and financial trading, customer experience , product design, and many more fields.

Here are a few scenarios where sentiment analysis can save time and add value:

Airline onboard experience sentiment by category

Not all sentiment analysis is done the same way. There are different ways to approach it and a range of different algorithms and processes that can be used to do the job depending on the context of use and the desired outcome.

Basic sub-types of sentiment analysis include:

In addition, you can choose whether to view the results of sentiment analysis at:

Sentiment analysis is a powerful tool that offers a number of advantages, but like any research method, it has some limitations.

Advantages of sentiment analysis:

Disadvantages of sentiment analysis:

Sentiment analysis uses machine learning, statistics, and natural language processing (NLP) to find out how people think and feel on a macro scale. Sentiment analysis tools take written content and process it to unearth the positivity or negativity of the expression.

This is done in a couple of ways:

In some cases, the best results come from combining the two methods.

Sentiment analysis of client feedback

Developing sentiment analysis tools is technically an impressive feat, since human language is grammatically intricate, heavily context-dependent, and varies a lot from person to person. If you say “I loved it,” another person might say “I’ve never seen better,” or “Leaves its rivals in the dust”. The challenge for an AI tool is to recognize that all these sentences mean the same thing.

Another challenge is to decide how language is interpreted since this is very subjective and varies between individuals. What sounds positive to one person might sound negative or even neutral to someone else. In designing algorithms for sentiment analysis, data scientists must think creatively in order to build useful and reliable tools.

Getting the correct sentiment classification

Sentiment classification requires your sentiment analysis tools to be sophisticated enough to understand not only when a data snippet is positive or negative, but how to extrapolate sentiment even when both positive and negative words are used. On top of that, it needs to be able to understand context and complications such as sarcasm or irony.

Human beings are complicated, and how we express ourselves can be similarly complex. Many types of sentiment analysis tools use a simple view of polarity (positive/neutral/negative), which means much of the meaning behind the data is lost.

Let’s see an example:

“I hated the setup process, but the product was easy to use so in the end, I think my purchase was worth it.”

A less sophisticated sentiment analysis tool might see the sentiment expressed here as “neutral” because the positive – “the product was easy to use so, in the end, I think my purchase was worth it” – and negative-tagged sentiments – “I hated the setup process” – cancel each other out.

However, polarity isn’t so cut-and-dry as being one or the other here. The final part – “in the end, I think my purchase was worth it” – means that as a human analyzing the text, we can see that generally, this customer felt mostly positive about the experience. That’s why a scale from positive to negative is needed, and why a sentiment analysis tool adds weighting along a scale of 1-11.

How satisfied are you with our service? Likert scale question

Scores are assigned with attention to grammar, context, industry, and source, and Qualtrics gives users the ability to adjust the sentiment scores to be even more business-specific.

Understanding context

Context is key for a sentiment analysis model to be correct. This means you need to make sure that your sentiment scoring tool not only knows that “happy” is positive—and that “not happy” is not, but understands that certain words that are context-dependent are viewed correctly.

As human beings, we know customers are pleased when they mention how “thin” their new laptop is, but that they’re complaining when they talk about the “thin” walls in your hotel. We understand that context.

Obviously, a tool that flags “thin” as negative sentiment in all circumstances is going to lose accuracy in its sentiment scores. The context is important.

This is where training natural language processing (NLP) algorithms come in. Natural language processing is a way of mimicking the human understanding of language, meaning context becomes more readily understood by your sentiment analysis tool.

Sentiment analysis algorithms are trained using this system over time, using deep learning to understand instances with context and apply that learning to future data. This is why a sophisticated sentiment analysis tool can help you to not only analyze vast volumes of data more quickly but also discern what context is common or important to your customers .

In a world of endless opinions on the Web, how people “feel” about your brand can be important for measuring the customer experience .

Consumers desire likable brands that understand them; brands that provide memorable on-and-offline experiences. The more in-tune a consumer feels with your brand, the more likely they’ll share feedback, and the more likely they’ll buy from you too. According to our Consumer trends research , 62% of consumers said that businesses need to care more about them, and 60% would buy more as a result.

But the opposite is true as well. As a matter of fact, 71 percent of Twitter users will take to the social media platform to voice their frustrations with a brand.

These conversations, both positive and negative, should be captured and analyzed to improve the customer experience. Sentiment analysis can help.

1. Text analysis for surveys

Surveys are a great way to connect with customers directly, and they’re also ripe with constructive feedback . The feedback within survey responses can be quickly analyzed for sentiment scores.

For the survey itself, consider questions that will generate qualitative customer experience metrics, some examples include:

Remember, the goal here is to acquire honest textual responses from your customers so the sentiment within them can be analyzed. Another tip is to avoid close-ended questions that only generate “yes” or “no” responses. These types of questions won’t serve your analysis well.

Next, use a text analysis tool to break down the nuances of the responses. TextiQ is a tool that will not only provide sentiment scores but extract key themes from the responses.

After the sentiment is scored from survey responses, you’ll be able to address some of the more immediate concerns your customers have during their experiences.

Another great place to find text feedback is through customer reviews .

2. Text analysis for customer reviews

Did you know that 72 percent of customers will not take action until they’ve read reviews on a product or service? An astonishing 95 percent of customers read reviews prior to making a purchase. In today’s feedback-driven world, the power of customer reviews and peer insight is undeniable.

Review sites like G2 are common first-stops for customers looking for honest feedback on products and services. This feedback, like that in surveys, can be analyzed.

The benefit of customer reviews compared to surveys is that they’re unsolicited, which often leads to more honest and in-depth feedback.

To improve the customer experience, you can take the sentiment scores from customer reviews – positive, negative, and neutral – and identify gaps and pain points that may have not been addressed in the surveys. Remember, negative feedback is just as (if not more) beneficial to your business than positive feedback.

3. Text analysis for social media

Another way to acquire textual data is through social media analysis.

Monitoring tools ingest publicly available social media data on platforms such as Twitter and Facebook for brand mentions and assign sentiment scores accordingly. This has its upsides as well considering users are highly likely to take their uninhibited feedback to social media.

Regardless, a staggering 70 percent of brands don’t bother with feedback on social media. Because social media is an ocean of big data just waiting to be analyzed, brands could be missing out on some important information.

When choosing sentiment analysis technologies, bear in mind how you will use them. There are a number of options out there, from open-source solutions to in-built features within social listening tools. Some of them are limited in scope, while others are more powerful but require a high level of user knowledge.

Text iQ is a natural language processing tool within the Experience Management Platform™ that allows you to carry out sentiment analysis online using just your browser. It’s fully integrated, meaning that you can view and analyze your sentiment analysis results in the context of other data and metrics, including those from third-party platforms.

Like all our tools, it’s designed to be straightforward, clear, and accessible to those without specialized skills or experience, so there’s no barrier between you and the results you want to achieve.

When it comes to understanding the customer experience, the key is to always be on the lookout for customer feedback. Sentiment analysis is not a one-and-done effort and requires continuous monitoring. By reviewing your customers’ feedback on your business regularly, you can proactively get ahead of emerging trends and fix problems before it’s too late.  Acquiring feedback and analyzing sentiment can provide businesses with a deep understanding of how customers truly “feel” about their brand. When you’re able to understand your customers, you’re able to provide a more robust customer experience.

Related resources

Analysis & Reporting

Predictive Analytics 15 min read

What is anova 14 min read, statistical significance calculator: tool & complete guide 18 min read, regression analysis 19 min read, data analysis 31 min read, social media analytics 13 min read, kano analysis 20 min read, request demo.

Ready to learn more about Qualtrics?

A Literature Review on Application of Sentiment Analysis Using Machine Learning Techniques

A Literature Review on Application of Sentiment Analysis Using Machine Learning Techniques. International Journal of Applied Engineering and Management Letters (IJAEML), 4(2), 41-77, 2020

37 Pages Posted: 1 Oct 2020

Krishna Prasad K

Srinivas University - Institute of Computer Science & Information Science

Date Written: August 09, 2020

Many businesses are using social media networks to deliver different services and connect with clients and collect information about the thoughts and views of individuals. Sentiment analysis is a technique of machine learning that senses polarities such as positive or negative thoughts within the text, full documents, paragraphs, lines, or subsections. Machine Learning (ML) is a multidisciplinary field, a mixture of statistics and computer science algorithms that are commonly used in predictive and classification analyses. This paper presents the common techniques of analyzing sentiment from a machine learning perspective. In light of this, this literature review explores and discusses the idea of Sentiment analysis by undertaking a systematic review and assessment of corporate and community white papers, scientific research articles, journals, and reports. The goal and primary objectives of this article are to analytically categorize and analyze the prevalent research techniques and implementations of Machine Learning techniques to Sentiment Analysis on various applications. The limitation of this analysis is that by excluding the hardware and the theoretical exposure pertinent to the subject, the main emphasis is on the application side alone. The limitation of this study is that the major focus is on the application side thereby excluding the hardware and theoretical aspects related to the subject. Finally, this paper includes a research proposal for e-commerce environment towards sentiment analysis applying machine learning algorithms.

Keywords: Machine Learning Techniques, Sentiment Classification, Sentiment Analysis Applications

Suggested Citation: Suggested Citation

Krishna Prasad K (Contact Author)

Srinivas university - institute of computer science & information science ( email ).

Mangalore India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, cognition in mathematics, science, & technology ejournal.

Subscribe to this fee journal for more curated articles on this topic

Machine Learning eJournal

Social & personality psychology ejournal, data science & analytics ejournal, psychology research methods ejournal, communication & computational methods ejournal, libraries & information technology ejournal, libraries & media ejournal.

Research on non-dependent aspect-level sentiment analysis


Aspect-based sentiment analysis with attention-assisted graph and variational sentence representation

The Biases of Pre-Trained Language Models: An Empirical Study on Prompt-Based Sentiment Analysis and Emotion Detection

Meta-based Self-training and Re-weighting for Aspect-based Sentiment Analysis

SSEGCN: Syntactic and Semantic Enhanced Graph Convolutional Network for Aspect-based Sentiment Analysis

SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis

Attention-based BiLSTM models for personality recognition from user-generated content

MetaPro: A computational metaphor processing model for text pre-processing

Gated recurrent unit with multilingual universal sentence encoder for Arabic aspect-based sentiment analysis

Dual Graph Convolutional Networks for Aspect-based Sentiment Analysis

Bridging Towers of Multi-task Learning with a Gating Mechanism for Aspect-based Sentiment Analysis and Sequential Metaphor Identification

Related Papers

Showing 1 through 3 of 0 Related Papers

Sentiment analysis using product review data

Journal of Big Data volume  2 , Article number:  5 ( 2015 ) Cite this article

155k Accesses

326 Citations

4 Altmetric

Metrics details

Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis. A general process for sentiment polarity categorization is proposed with detailed process descriptions. Data used in this study are online product reviews collected from Experiments for both sentence-level categorization and review-level categorization are performed with promising outcomes. At last, we also give insight into our future work on sentiment analysis.


Sentiment is an attitude, thought, or judgment prompted by feeling. Sentiment analysis [ 1 - 8 ], which is also known as opinion mining, studies people’s sentiments towards certain entities. Internet is a resourceful place with respect to sentiment information. From a user’s perspective, people are able to post their own content through various social media, such as forums, micro-blogs, or online social networking sites. From a researcher’s perspective, many social media sites release their application programming interfaces (APIs), prompting data collection and analysis by researchers and developers. For instance, Twitter currently has three different versions of APIs available [ 9 ], namely the REST API, the Search API, and the Streaming API. With the REST API, developers are able to gather status data and user information; the Search API allows developers to query specific Twitter content, whereas the Streaming API is able to collect Twitter content in realtime. Moreover, developers can mix those APIs to create their own applications. Hence, sentiment analysis seems having a strong fundament with the support of massive online data.

However, those types of online data have several flaws that potentially hinder the process of sentiment analysis. The first flaw is that since people can freely post their own content, the quality of their opinions cannot be guaranteed. For example, instead of sharing topic-related opinions, online spammers post spam on forums. Some spam are meaningless at all, while others have irrelevant opinions also known as fake opinions [ 10 - 12 ]. The second flaw is that ground truth of such online data is not always available. A ground truth is more like a tag of a certain opinion, indicating whether the opinion is positive, negative, or neutral. The Stanford Sentiment 140 Tweet Corpus [ 13 ] is one of the datasets that has ground truth and is also public available. The corpus contains 1.6 million machine-tagged Twitter messages. Each message is tagged based on the emoticons (☺as positive, ☹ as negative) discovered inside the message.

Data used in this paper is a set of product reviews collected from Amazon [ 14 ], between February and April, 2014. The aforementioned flaws have been somewhat overcome in the following two ways: First, each product review receives inspections before it can be posted a . Second, each review must have a rating on it that can be used as the ground truth. The rating is based on a star-scaled system, where the highest rating has 5 stars and the lowest rating has only 1 star (Figure 1 ).

Rating System for

This paper tackles a fundamental problem of sentiment analysis, namely sentiment polarity categorization [ 15 - 21 ]. Figure 2 is a flowchart that depicts our proposed process for categorization as well as the outline of this paper. Our contributions mainly fall into Phase 2 and 3. In Phase 2: 1) An algorithm is proposed and implemented for negation phrases identification; 2) A mathematical approach is proposed for sentiment score computation; 3) A feature vector generation method is presented for sentiment polarity categorization. In Phase 3: 1) Two sentiment polarity categorization experiments are respectively performed based on sentence level and review level; 2) Performance of three classification models are evaluated and compared based on their experimental results.

Sentiment Polarity Categorization Process.

The rest of this paper is organized as follows: In section ‘ Background and literature review ’, we provide a brief review towards some related work on sentiment analysis. Software package and classification models used in this study are presented in section ‘ Methods ’. Our detailed approaches for sentiment analysis are proposed in section ‘ Background and literature review ’. Experimental results are presented in section ‘ Results and discussion ’. Discussion and future work is presented in section ‘ Review-level categorization ’. Section ‘ Conclusion ’ concludes the paper.

Background and literature review

One fundamental problem in sentiment analysis is categorization of sentiment polarity [ 6 , 22 - 25 ]. Given a piece of written text, the problem is to categorize the text into one specific sentiment polarity, positive or negative (or neutral). Based on the scope of the text, there are three levels of sentiment polarity categorization, namely the document level, the sentence level, and the entity and aspect level [ 26 ]. The document level concerns whether a document, as a whole, expresses negative or positive sentiment, while the sentence level deals with each sentence’s sentiment categorization; The entity and aspect level then targets on what exactly people like or dislike from their opinions.

Since reviews of much work on sentiment analysis have already been included in [ 26 ], in this section, we will only review some previous work, upon which our research is essentially based. Hu and Liu [ 27 ] summarized a list of positive words and a list of negative words, respectively, based on customer reviews. The positive list contains 2006 words and the negative list has 4783 words. Both lists also include some misspelled words that are frequently present in social media content. Sentiment categorization is essentially a classification problem, where features that contain opinions or sentiment information should be identified before the classification. For feature selection, Pang and Lee [ 5 ] suggested to remove objective sentences by extracting subjective ones. They proposed a text-categorization technique that is able to identify subjective content using minimum cut. Gann et al. [ 28 ] selected 6,799 tokens based on Twitter data, where each token is assigned a sentiment score, namely TSI(Total Sentiment Index), featuring itself as a positive token or a negative token. Specifically, a TSI for a certain token is computed as:

where p is the number of times a token appears in positive tweets and n is the number of times a token appears in negative tweets. \(\frac {tp}{tn}\) is the ratio of total number of positive tweets over total number of negative tweets.

Research design and methdology

Data collection.

Data used in this paper is a set of product reviews collected from From February to April 2014, we collected, in total, over 5.1 millions of product reviews b in which the products belong to 4 major categories: beauty, book, electronic, and home (Figure 3 (a)). Those online reviews were posted by over 3.2 millions of reviewers (customers) towards 20,062 products. Each review includes the following information: 1) reviewer ID; 2) product ID; 3) rating; 4) time of the review; 5) helpfulness; 6) review text. Every rating is based on a 5-star scale(Figure 3 (b)), resulting all the ratings to be ranged from 1-star to 5-star with no existence of a half-star or a quarter-star.

Data collection (a) Data based on product categories (b) Data based on review categories.

Sentiment sentences extraction and POS tagging

It is suggested by Pang and Lee [ 5 ] that all objective content should be removed for sentiment analysis. Instead of removing objective content, in our study, all subjective content was extracted for future analysis. The subjective content consists of all sentiment sentences. A sentiment sentence is the one that contains, at least, one positive or negative word. All of the sentences were firstly tokenized into separated English words.

Every word of a sentence has its syntactic role that defines how the word is used. The syntactic roles are also known as the parts of speech. There are 8 parts of speech in English: the verb, the noun, the pronoun, the adjective, the adverb, the preposition, the conjunction, and the interjection. In natural language processing, part-of-speech (POS) taggers [ 29 - 31 ] have been developed to classify words based on their parts of speech. For sentiment analysis, a POS tagger is very useful because of the following two reasons: 1) Words like nouns and pronouns usually do not contain any sentiment. It is able to filter out such words with the help of a POS tagger; 2) A POS tagger can also be used to distinguish words that can be used in different parts of speech. For instance, as a verb, “enhanced" may conduct different amount of sentiment as being of an adjective. The POS tagger used for this research is a max-entropy POS tagger developed for the Penn Treebank Project [ 31 ]. The tagger is able to provide 46 different tags indicating that it can identify more detailed syntactic roles than only 8. As an example, Table 1 is a list of all tags for verbs that has been included in the POS tagger.

Each sentence was then tagged using the POS tagger. Given the enormous amount of sentences, a Python program that is able to run in parallel was written in order to improve the speed of tagging. As a result, there are over 25 million adjectives, over 22 million adverbs, and over 56 million verbs tagged out of all the sentiment sentences, because adjectives, adverbs, and verbs are words that mainly convey sentiment.

Negation phrases identification

Words such as adjectives and verbs are able to convey opposite sentiment with the help of negative prefixes. For instance, consider the following sentence that was found in an electronic device’s review: “The built in speaker also has its uses but so far nothing revolutionary." The word, “revolutionary" is a positive word according to the list in [ 27 ]. However, the phrase “nothing revolutionary" gives more or less negative feelings. Therefore, it is crucial to identify such phrases. In this work, there are two types of phrases have been identified, namely negation-of-adjective (NOA) and negation-of-verb (NOV).

Most common negative prefixes such as not, no, or nothing are treated as adverbs by the POS tagger. Hence, we propose Algorithm 1 for the phrases identification. The algorithm was able to identify 21,586 different phrases with total occurrence of over 0.68 million, each of which has a negative prefix. Table 2 lists top 5 NOA and NOV phrases based on occurrence, respectively.

Sentiment score computation for sentiment tokens

A sentiment token is a word or a phrase that conveys sentiment. Given those sentiment words proposed in [ 27 ], a word token consists of a positive (negative) word and its part-of-speech tag. In total, we selected 11,478 word tokens with each of them that occurs at least 30 times throughout the dataset. For phrase tokens, 3,023 phrases were selected of the 21,586 identified sentiment phrases, which each of the 3,023 phrases also has an occurrence that is no less than 30. Given a token t , the formula for t ’s sentiment score (SS) computation is given as:

O c c u r r e n c e i ( t ) is t ’s number of occurrence in i -star reviews, where i =1,...,5. According to Figure 3 , our dataset is not balanced indicating that different number of reviews were collected for each star level. Since 5-star reviews take a majority amount through the entire dataset, we hereby introduce a ratio, γ 5, i , which is defined as:

In equation 3 , the numerator is the number of 5-star reviews and the denominator is the number of i -star reviews, where i =1,...,5. Therefore, if the dataset were balanced, γ 5, i would be set to 1 for every i . Consequently, every sentiment score should fall into the interval of [1,5]. For positive word tokens, we expect that the median of their sentiment scores should exceed 3, which is the point of being neutral according to Figure 1 . For negative word tokens, it is to expect that the median should be less than 3.

As a result, the sentiment score information for positive word tokens is showing in Figure 4 (a). The histogram chart describes the distribution of scores while the box-plot chart shows that the median is above 3. Similarly, the box-plot chart in Figure 4 (b) shows that the median of sentiment scores for negative word tokens is lower than 3. In fact, both the mean and the median of positive word tokens do exceed 3, and both values are lower than 3, for negative word tokens (Table 3 ).

Sentiment score information for word tokens (a) Positive word tokens (b) Negative word tokens.

The ground truth labels

The process of sentiment polarity categorization is twofold: sentence-level categorization and review-level categorization. Given a sentence, the goal of sentence-level categorization is to classify it as positive or negative in terms of the sentiment that it conveys. Training data for this categorization process require ground truth tags, indicating the positiveness or negativeness of a given sentence. However, ground truth tagging becomes a really challenging problem, due to the amount of data that we have. Since manually tagging each sentence is infeasible, a machine tagging approach is then adopted as a solution. The approach implements a bag-of-word model that simply counts the appearance of positive or negative (word) tokens for every sentence. If there are more positive tokens than negative ones, the sentence will be tagged as positive, and vice versa. This approach is similar to the one used for tagging the Sentiment 140 Tweet Corpus. Training data for review-level categorization already have ground truth tags, which are the star-scaled ratings.

Feature vector formation

Sentiment tokens and sentiment scores are information extracted from the original dataset. They are also known as features, which will be used for sentiment categorization. In order to train the classifiers, each entry of training data needs to be transformed to a vector that contains those features, namely a feature vector. For the sentence-level (review-level) categorization, a feature vector is formed based on a sentence (review). One challenge is to control each vector’s dimensionality. The challenge is actually twofold: Firstly, a vector should not contain an abundant amount (thousands or hundreds) of features or values of a feature, because of the curse of dimensionality [ 32 ]; secondly, every vector should have the same number of dimensions, in order to fit the classifiers. This challenge particularly applies to sentiment tokens: On one hand, there are 11,478 word tokens as well as 3,023 phrase tokens; On the other hand, vectors cannot be formed by simply including the tokens appeared in a sentence (or a review), because different sentences (or reviews) tend to have different amount of tokens, leading to the consequence that the generated vectors are in different dimensions.

Since we only concern each sentiment token’s appearance inside a sentence or a review,to overcome the challenge, two binary strings are used to represent each token’s appearance. One string with 11,478 bits is used for word tokens, while the other one with a bit-length of 3,023 is applied for phrase tokens. For instance, if the i th word (phrase) token appears, the word (phrase) string’s i th bit will be flipped from “0" to “1". Finally, instead of directly saving the flipped strings into a feature vector, a hash value of each string is computed using Python’s built-in hash function and is saved. Hence, a sentence-level feature vector totally has four elements: two hash values computed based on the flipped binary strings, an averaged sentiment score, and a ground truth label. Comparatively, one more element is exclusively included in review-level vectors. Given a review, if there are m positive sentences and n negative sentences, the value of the element is computed as: −1× m +1× n .

Results and discussion

Evaluation methods.

Performance of each classification model is estimated base on its averaged F1-score ( 4 ):

where P i is the precision of the i th class, R i is the recall of the i th class, and n is the number of classes. P i and R i are evaluated using 10-fold cross validation. A 10-fold cross validation is applied as follows: A dataset is partitioned into 10 equal size subsets, each of which consists of 10 positive class vectors and 10 negative class vectors. Of the 10 subsets, a single subset is retained as the validation data for testing the classification model, and the remaining 9 subsets are used as training data. The cross-validation process is then repeated 10 times, with each of the 10 subsets used exactly once as the validation data. The 10 results from the folds are then averaged to produce a single estimation. Since training data are labeled under two classes (positive and negative) for the sentence-level categorization, ROC (Receiver Operating Characteristic) curves are also plotted for a better performance comparison.

Sentence-level categorization

Result on manually-labeled sentences.

200 feature vectors are formed based on the 200 manually-labeled sentences. As a result, the classification models show the same level of performance based on their F1-scores, where the three scores all take a same value of 0.85. With the help of the ROC curves (Figure 5 ), it is clear to see that all three models performed quite well for testing data that have high posterior probability. (A posterior probability of a testing data point, A , is estimated by the classification model as the probability that A will be classified as positive, denoted as P (+| A ).) As the probability getting lower, the Naïve Bayesain classifier outperforms the SVM classifier, with a larger area under curve. In general, the Random Forest model performs the best.

ROC curves based on the manually labeled set.

Result on machine-labeled sentences

2-million feature vectors (1 million with positive labels and 1 million with negative labels) are generated from 2-million machine-labeled sentences, known as the complete set. Four subsets are obtained from the complete set, with subset A contains 200 vectors, subset B contains 2,000 vectors, subset C contains 20,000 vectors, and subset D contains 200,000 vectors, respectively. The amount of vectors with positive labels equals the amount of vectors with negative labels for every subset. Performance of the classification models is then evaluated based on five different vector sets (four subsets and one complete set, Figure 6 ).

F1 scores of sentence-level categorization.

While the models are getting more training data, their F1 scores are all increasing. The SVM model takes the most significant enhancement from 0.61 to 0.94 as its training data increased from 180 to 1.8 million. The model outperforms the Naïve Bayesain model and becomes the 2nd best classifier, on subset C and the full set. The Random Forest model again performs the best for datasets on all scopes. Figure 7 shows the ROC curves plotted based on the result of the full set.

ROC curves based on the complete set.

Review-level categorization

3-million feature vectors are formed for the categorization. Vectors generated from reviews that have at least 4-star ratings are labeled as positive, while vectors labeled as negative are generated from 1-star and 2-star reviews. 3-star reviews are used to prepare neutral class vectors. As a result, this complete set of vectors are uniformly labeled into three classes, positive, neutral, and negative. In addition, three subsets are obtained from the complete set, with subset A contains 300 vectors, subset B contains 3,000 vectors, subset C contains 30,000 vectors, and subset D contains 300,000 vectors, respectively.

Figure 8 shows the F1 scores obtained on different sizes of vector sets. It can be clearly observed that both the SVM model and the Naïve Bayesain model are identical in terms of their performances. Both models are generally superior than the Random Forest model on all vector sets. However, neither of the models can reach the same level of performance when they are used for sentence-level categorization, due to their relative low performances on neutral class.

F1 scores of review-level categorization.

The experimental result is promising, both in terms of the sentence-level categorization and the review-level categorization. It was observed that the averaged sentiment score is a strong feature by itself, since it is able to achieve an F1 score over 0.8 for the sentence-level categorization with the complete set. For the review-level categorization with the complete set, the feature is capable of producing an F1 score that is over 0.73. However, there are still couple of limitations to this study. The first one is that the review-level categorization becomes difficult if we want to classify reviews to their specific star-scaled ratings. In other words, F1 scores obtained from such experiments are fairly low, with values lower than 0.5. The second limitation is that since our sentiment analysis scheme proposed in this study relies on the occurrence of sentiment tokens, the scheme may not work well for those reviews that purely contain implicit sentiments. An implicit sentiment is usually conveyed through some neutral words, making judgement of its sentiment polarity difficult. For example, sentence like “Item as described.", which frequently appears in positive reviews, consists of only neutral words.

With those limitations in mind, our future work is to focus on solving those issues. Specifically, more features will be extracted and grouped into feature vectors to improve review-level categorizations. For the issue of implicit sentiment analysis, our next step is to be able to detect the existence of such sentiment within the scope of a particular product. More future work includes testing our categorization scheme using other datasets.

Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments, attitudes, or emotions towards certain entities. This paper tackles a fundamental problem of sentiment analysis, sentiment polarity categorization. Online product reviews from are selected as data used for this study. A sentiment polarity categorization process (Figure 2 ) has been proposed along with detailed descriptions of each step. Experiments for both sentence-level categorization and review-level categorization have been performed.

Software used for this study is scikit-learn [ 33 ], an open source machine learning software package in Python. The classification models selected for categorization are: Naïve Bayesian, Random Forest, and Support Vector Machine [ 32 ].

Naïve Bayesian classifier

The Naïve Bayesian classifier works as follows: Suppose that there exist a set of training data, D , in which each tuple is represented by an n -dimensional feature vector, X = x 1 , x 2 ,.., x n , indicating n measurements made on the tuple from n attributes or features. Assume that there are m classes, C 1 , C 2 ,..., C m . Given a tuple X , the classifier will predict that X belongs to C i if and only if: P ( C i | X )> P ( C j | X ), where i , j ∈ [1, m ] a n d i ≠ j . P ( C i | X ) is computed as:

Random forest

The random forest classifier was chosen due to its superior performance over a single decision tree with respect to accuracy. It is essentially an ensemble method based on bagging. The classifier works as follows: Given D , the classifier firstly creates k bootstrap samples of D , with each of the samples denoting as D i . A D i has the same number of tuples as D that are sampled with replacement from D . By sampling with replacement, it means that some of the original tuples of D may not be included in D i , whereas others may occur more than once. The classifier then constructs a decision tree based on each D i . As a result, a “forest" that consists of k decision trees is formed. To classify an unknown tuple, X , each tree returns its class prediction counting as one vote. The final decision of X ’s class is assigned to the one that has the most votes.

The decision tree algorithm implemented in scikit-learn is CART (Classification and Regression Trees). CART uses Gini index for its tree induction. For D , the Gini index is computed as:

where p i is the probability that a tuple in D belongs to class C i . The Gini index measures the impurity of D . The lower the index value is, the better D was partitioned. For the detailed descriptions of CART, please see [ 32 ].

Support vector machine

Support vector machine (SVM) is a method for the classification of both linear and nonlinear data. If the data is linearly separable, the SVM searches for the linear optimal separating hyperplane (the linear kernel), which is a decision boundary that separates data of one class from another. Mathematically, a separating hyperplane can be written as: W · X + b =0, where W is a weight vector and W = w 1 , w 2,..., w n . X is a training tuple. b is a scalar. In order to optimize the hyperplane, the problem essentially transforms to the minimization of ∥ W ∥ , which is eventually computed as: \(\sum \limits _{i=1}^{n} \alpha _{i} y_{i} x_{i}\) , where α i are numeric parameters, and y i are labels based on support vectors, X i . That is: if y i =1 then \(\sum \limits _{i=1}^{n} w_{i}x_{i} \geq 1\) ; if y i =−1 then \(\sum \limits _{i=1}^{n} w_{i}x_{i} \geq -1\) .

If the data is linearly inseparable, the SVM uses nonlinear mapping to transform the data into a higher dimension. It then solve the problem by finding a linear hyperplane. Functions to perform such transformations are called kernel functions. The kernel function selected for our experiment is the Gaussian Radial Basis Function (RBF):

where X i are support vectors, X j are testing tuples, and γ is a free parameter that uses the default value from scikit-learn in our experiment. Figure 9 shows a classification example of SVM based on the linear kernel and the RBF kernel.

A Classification Example of SVM.

a Even though there are papers talking about spam on, we still contend that it is a relatively spam-free website in terms of reviews because of the enforcement of its review inspection process.

b The product review data used for this work can be downloaded at: .

Kim S-M, Hovy E (2004) Determining the sentiment of opinions In: Proceedings of the 20th international conference on Computational Linguistics, page 1367.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Google Scholar  

Liu B (2010) Sentiment analysis and subjectivity In: Handbook of Natural Language Processing, Second Edition.. Taylor and Francis Group, Boca.

Liu B, Hu M, Cheng J (2005) Opinion observer: Analyzing and comparing opinions on the web In: Proceedings of the 14th International Conference on World Wide Web, WWW ’05, 342–351.. ACM, New York, NY, USA.

Chapter   Google Scholar  

Pak A, Paroubek P (2010) Twitter as a corpus for sentiment analysis and opinion mining In: Proceedings of the Seventh conference on International Language Resources and Evaluation.. European Languages Resources Association, Valletta, Malta.

Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts In: Proceedings of the 42Nd Annual Meeting on Association for Computational Linguistics, ACL ’04.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr2(1-2): 1–135.

Article   Google Scholar  

Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, 417–424.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM ’05, 625–631.. ACM, New York, NY, USA.

Twitter (2014) Twitter apis. .

Liu B (2014) The science of detecting fake reviews. .

Jindal N, Liu B (2008) Opinion spam and analysis In: Proceedings of the 2008 International Conference on, Web Search and Data Mining, WSDM ’08, 219–230.. ACM, New York, NY, USA.

Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews In: Proceedings of the 21st, International Conference on World Wide Web, WWW ’12, 191–200.. ACM, New York, NY, USA.

Stanford (2014) Sentiment 140. .

Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision, 1–12.. CS224N Project Report, Stanford.

Lin Y, Zhang J, Wang X, Zhou A (2012) An information theoretic approach to sentiment polarity classification In: Proceedings of the 2Nd Joint WICOW/AIRWeb Workshop on Web Quality, WebQuality ’12, 35–40.. ACM, New York, NY, USA.

Sarvabhotla K, Pingali P, Varma V (2011) Sentiment classification: a lexical similarity based approach for extracting subjectivity in documents. Inf Retrieval14(3): 337–353.

Wilson T, Wiebe J, Hoffmann P (2005) Recognizing contextual polarity in phrase-level sentiment analysis In: Proceedings of the conference on human language technology and empirical methods in natural language processing, 347–354.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences In: Proceedings of the 2003 conference on, Empirical methods in natural language processing, 129–136.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Zhang Y, Xiang X, Yin C, Shang L (2013) Parallel sentiment polarity classification method with substring feature reduction In: Trends and Applications in Knowledge Discovery and Data Mining, volume 7867 of Lecture Notes in Computer Science, 121–132.. Springer Berlin Heidelberg, Heidelberg, Germany.

Zhou S, Chen Q, Wang X (2013) Active deep learning method for semi-supervised sentiment classification. Neurocomputing120(0): 536–546. Image Feature Detection and Description.

Chesley P, Vincent B, Xu L, Srihari RK (2006) Using verbs and adjectives to automatically classify blog sentiment. Training580(263): 233.

Choi Y, Cardie C (2009) Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, EMNLP ’09, 590–598.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Jiang L, Yu M, Zhou M, Liu X, Zhao T (2011) Target-dependent twitter sentiment classification In: Proceedings of the 49th, Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 151–160.. Association for Computational Linguistics, Stroudsburg, PA, USA.

Tan LK-W, Na J-C, Theng Y-L, Chang K (2011) Sentence-level sentiment polarity classification using a linguistic approach In: Digital Libraries: For Cultural Heritage, Knowledge Dissemination, and Future Creation, 77–87.. Springer, Heidelberg, Germany.

Liu B (2012) Sentiment Analysis and Opinion Mining. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.

Hu M, Liu B (2004) Mining and summarizing customer reviews In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 168–177.. ACM, New York, NY, USA.

Gann W-JK, Day J, Zhou S (2014) Twitter analytics for insider trading fraud detection system In: Proceedings of the sencond ASE international conference on Big Data.. ASE.

Roth D, Zelenko D (1998) Part of speech tagging using a network of linear separators In: Coling-Acl, The 17th International Conference on Computational Linguistics, 1136–1142.

Kristina T (2003) Stanford log-linear part-of-speech tagger. .

Marcus M (1996) Upenn part of speech tagger. .

Han J, Kamber M, Pei J (2006) Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems), 2nd ed.. Morgan Kaufmann, San Francisco, CA, USA.

(2014) Scikit-learn. .

Download references


This research was partially supported by the following grants: NSF No. 1137443, NSF No. 1247663, NSF No. 1238767, DoD No. W911NF-13-0130, DoD No. W911NF-14-1-0119, and the Data Science Fellowship Award by the National Consortium for Data Science.

Author information

Authors and affiliations.

Department of Computer Science, North Carolina A&T State University, Greensboro, NC, USA

Xing Fang & Justin Zhan

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Xing Fang .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

XF performed the primary literature review, data collection, experiments, and also drafted the manuscript. JZ worked with XF to develop the articles framework and focus. All authors read and approved the final manuscript.

Authors’ information

Xing Fang is a Ph.D. candidate at the Department of Computer Science, North Carolina A&T State University. His research interests include social computing, machine learning, and natural language processing. Mr. Fang holds one Master’s degree in computer science from North Carolina A&T State University, and one Baccalaureate degree in electronic engineering from Northwestern Polytechnical University, Xi’an, China.

Dr. Justin Zhan is an associate professor at the Department of Computer Science, North Carolina A&T State University. He has previously been a faculty member at Carnegie Mellon University and National Center for the Protection of Financial Infrastructure in Dakota State University. His research interests include Big Data, Information Assurance, Social Computing, and Health Science.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( ), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Cite this article.

Fang, X., Zhan, J. Sentiment analysis using product review data. Journal of Big Data 2 , 5 (2015).

Download citation

Received : 12 January 2015

Accepted : 20 April 2015

Published : 16 June 2015


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

sentiment analysis research papers


  1. Top 13 Sentiment Analysis Software in 2022

    sentiment analysis research papers

  2. What is Sentiment Analysis? A Complete Guide for Beginners

    sentiment analysis research papers

  3. Sentiment Analysis using Deep Learning

    sentiment analysis research papers

  4. (PDF) A Study on Sentiment Analysis Techniques of Twitter Data

    sentiment analysis research papers

  5. Solving common challenges in sentiment analysis with help from Project Debater

    sentiment analysis research papers

  6. Overview of our sentiment analysis framework.

    sentiment analysis research papers


  1. Sentiment Data and Analysis

  2. More Useful and Accurate than Sentiment Analysis

  3. sentiment analysis trend in recent [email protected] [email protected] 01 16

  4. Text Analytics-23 Sentiment Analysis: Concept & Packages

  5. Harvard Sentiment Analysis using Python || 2022 ||

  6. Intro to sentiment analysis part 1


  1. A Survey on Sentiment Analysis

    In this paper, from defining the sentiment analysis to algorithms for sentiment analysis and from the first step of sentiment analysis to evaluating the predictions of sentiment classifiers, additional feature extractions to boost performance are discussed with practical results.

  2. Sentiment analysis algorithms and applications: A survey

    Sentiment Analysis (SA) or Opinion Mining (OM) is the computational study of people's opinions, attitudes and emotions toward an entity. The entity can represent individuals, events or topics. These topics are most likely to be covered by reviews. The two expressions SA or OM are interchangeable. They express a mutual meaning.

  3. (PDF) Sentiment Analysis

    Sentiment or opinion analysis employs natural language processing to extract a significant pattern of knowledge from a large amount of textual data. It examines comments, opinions, emotions,...

  4. A review on sentiment analysis and emotion detection from text

    Review Paper Published: 28 August 2021 A review on sentiment analysis and emotion detection from text Pansy Nandwani & Rupali Verma Social Network Analysis and Mining 11, Article number: 81 ( 2021 ) Cite this article 17k Accesses 44 Citations 18 Altmetric Metrics Download PDF Working on a manuscript? Avoid the common mistakes Sections Figures

  5. The evolution of sentiment analysis—A review of research topics, venues

    Sentiment analysis is a series of methods, techniques, and tools about detecting and extracting subjective information, such as opinion and attitudes, from language [2 ]. Traditionally, sentiment analysis has been about opinion polarity, i.e., whether someone has positive, neutral, or negative opinion towards something [ 3 ].

  6. Title: A Comparative Study of Sentiment Analysis Using NLP and

    In this paper, we have introduced two NLP techniques (Bag-of-Words and TF-IDF) and various ML classification algorithms (Support Vector Machine, Logistic Regression, Multinomial Naive Bayes, Random Forest) to find an effective approach for Sentiment Analysis on a large, imbalanced, and multi-classed dataset.

  7. Systematic reviews in sentiment analysis: a tertiary study

    With advanced digitalisation, we can observe a massive increase of user-generated content on the web that provides opinions of people on different subjects. Sentiment analysis is the computational study of analysing people's feelings and opinions for an entity. The field of sentiment analysis has been the topic of extensive research in the past decades. In this paper, we present the results of ...

  8. Sentiment Analysis

    Sentiment Analysis | Papers With Code Natural Language Processing Edit Sentiment Analysis 1062 papers with code • 41 benchmarks • 84 datasets Sentiment analysis is the task of classifying the polarity of a given text. For instance, a text-based tweet can be categorized into either "positive", "negative", or "neutral".

  9. sentiment analysis Latest Research Papers

    Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer's feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval.

  10. Comprehensive Study on Sentiment Analysis: Types ...

    Sentiment analysis can be considered a major application of machine learning, more particularly natural language processing (NLP). As there are varieties of applications, Sentiment analysis has gained a lot of attention and is one among the fastest growing research area in computer science. It is a type of data analysis which is observed from news reports, user reviews, feedbacks, social media ...

  11. A survey of sentiment analysis techniques

    Sentiment analysis is an application of natural language processing. It is also known as emotion extraction or opinion mining. This is a very popular field of research in text mining. The basic idea is to find the polarity of the text and classify it into positive, negative or neutral. It helps in human decision making. To perform sentiment analysis, one has to perform various tasks like ...

  12. (PDF) Sentiment Analysis

    Sentiment analysis (also called opinion mining) refers to the application of natural language processing, computational linguistics, and text analytics to identify and classify subjective...

  13. An enhanced approach for sentiment analysis based on meta-ensemble deep

    Also, in sentiment analysis, many research studies have shown the superiority of the different ensemble learning methods over traditional machine learning classifiers. For example, the research efforts of Kanakaraj and Guddeti ; Prusa ... Thus, in this research paper, we proposed a meta-ensemble deep learning approach to improve the performance ...

  14. (PDF) Sentiment Analysis: Machine Learning Approach

    To mine emotions and polarity in tweets, text mining techniques are used. Approximately 5000 tweets are recoded and pre-processed to create a dataset of frequently appearing words. R is used for...

  15. The Evolution of Sentiment Analysis

    published after 2004. Sentiment analysis papers are scattered to multiple publication venues, and the combined number of papers in the top-15 venues only represent ca. 30% of the papers in total. We present the top-20 cited papers from Google Scholar and Scopus and a taxonomy of research topics. In recent years, sentiment analysis has shifted from

  16. Sentiment Analysis by using Recurrent Neural Network

    Sentiment analysis is the process of emotion extraction and opinion mining from given text. This research paper gives the detailed overview of different feature selection methods, sentiment classification techniques and deep learning approaches for sentiment analysis. The feature selection methods include n-grams, stop words and negation handling.

  17. Sentiment Analysis: The What & How in 2023

    Pricing Analysis eBook 2022 Market Research Trends Report Download now XM DSCVR Stop betting on what your employees and customers want and find out why they contact you, how they feel and what they will do next with advanced conversation analytics. Learn More Products Customer Employee Brand Popular Use Cases Contact Center Analytics

  18. A Literature Review on Application of Sentiment Analysis Using Machine

    This paper presents the common techniques of analyzing sentiment from a machine learning perspective. In light of this, this literature review explores and discusses the idea of Sentiment analysis by undertaking a systematic review and assessment of corporate and community white papers, scientific research articles, journals, and reports.

  19. Research on non-dependent aspect-level sentiment analysis

    Semantic Scholar extracted view of "Research on non-dependent aspect-level sentiment analysis" by Lei-Na Jiang et al. ... This paper proposes an aspect-aware attention mechanism combined with self-attention to obtain attention score matrices of a sentence, which can not only learn the aspect-related semantic correlations, but also learn the ...

  20. Sentiment analysis using product review data

    Sentiment analysis or opinion mining is one of the major tasks of NLP (Natural Language Processing). Sentiment analysis has gain much attention in recent years. In this paper, we aim to tackle the problem of sentiment polarity categorization, which is one of the fundamental problems of sentiment analysis.