Subscribe to the PwC Newsletter

Join the community, edit dataset, edit dataset tasks.

Some tasks are inferred based on the benchmarks list.

Add a Data Loader

Remove a data loader.

  • huggingface/datasets -
  • tensorflow/datasets -
  • pytorch/text -

Edit Dataset Modalities

Edit dataset languages, edit dataset variants.

The benchmarks section lists all benchmarks using a given dataset or any of its variants. We use variants to distinguish between results evaluated on slightly different versions of the same dataset. For example, ImageNet 32⨉32 and ImageNet 64⨉64 are variants of the ImageNet dataset.

Add a new evaluation result row

Imdb movie reviews.

sentiment analysis on movie reviews dataset

The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. No more than 30 reviews are included per movie. The dataset contains additional unlabeled data.

Benchmarks Edit Add a new result Link an existing benchmark

Trend Task Dataset Variant Best Model Paper Code
Paper Code Results Date Stars

Dataset Loaders Edit Add Remove

sentiment analysis on movie reviews dataset

Similar Datasets

License edit, modalities edit, languages edit.

  • Visual Arts
  • Entertainment and Arts

Sentiment Analysis of IMDb Movie Reviews Using Traditional Machine Learning Techniques and Transformers

Rustam Talibzade at George Washington University

  • George Washington University

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Mohamed Cherradi

  • Bharathi Raja Chakravarthi

Ruba priyadharshini Asoka Chakravarthi

  • John P. McCrae

Saeed Mian Qaisar

  • J.G.R. Sathiaseelan
  • S. B. Swathi
  • P. Kamakshi

Bhavani Yerram

  • Sanjeev Ahuja

David W Hosmer

  • Stanley Lemeshow
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Sentiment Analysis of IMDb Movie Reviews: A Comparative Analysis of Feature Selection and Feature Extraction Techniques

  • Conference paper
  • First Online: 04 March 2022
  • Cite this conference paper

sentiment analysis on movie reviews dataset

  • Gahina Karak 16 ,
  • Shubham Mishra 16 ,
  • Arkadyuti Bandyopadhyay 16 ,
  • Pavirala Ranga Sai Rohith 16 &
  • Hemant Rathore 16  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 420))

Included in the following conference series:

  • International Conference on Hybrid Intelligent Systems

769 Accesses

1 Citations

Humans are social animals who are dependent on the opinions and experiences of others when it comes to choosing a product for themselves. Most people need to seek the reviews of products like movies, web series, and video games before trying them out themselves. It becomes difficult for an average person to scour the correct information because of the large number of reviews present on the internet. Sentiment analysis is often used to obtain helpful information about a review and classify it into positive or negative sentiment. Our main goal in this paper is to construct sentiment analysis models using different feature extraction (count vectorization, TF-IDF, and Word2Vec) and feature selection (mutual information gain and Chi-square) techniques on textual movie reviews. We also study the performance of various classification algorithms for constructing sentiment analysis models over several metrics. We obtained the highest accuracy of \(90\%\) with TF-IDF Vectorization, Chi2 feature selection, and SVM classification algorithm. We also found that feature selection drastically reduces the train test time for almost all the classification models without severely impacting other performance metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bandana, R.: Sentiment analysis of movie reviews using heterogeneous features. In: 2018 2nd International Conference on Electronics, Materials Engineering & Nano-Technology (IEMENTech), pp. 1–4. IEEE (2018)

Google Scholar  

Casalino, G., Cuzzocrea, A., Bosco, G.L., Maiorana, M., Pilato, G., Schicchi, D.: A novel approach for supporting italian satire detection through deep learning. In: International Conference on Flexible Query Answering Systems, pp. 170–181. Springer (2021)

Chiny, M., Chihab, M., Bencharef, O., Chihab, Y.: Lstm, vader and tf-idf based hybrid sentiment analysis model. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 12 (7) (2021)

Chu, C.H., Wang, C.A., Chang, Y.C., Wu, Y.W., Hsieh, Y.L., Hsu, W.L.: Sentiment analysis on Chinese movie review with distributed keyword vector representation. In: 2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI), pp. 84–89. IEEE (2016)

Daeli, N.O.F., Adiwijaya, A.: Sentiment analysis on movie reviews using information gain and k-nearest neighbor. J. Data Sci. Appl. 3 (1), 1–7 (2020)

Ghanem, B., Karoui, J., Benamara, F., Rosso, P., Moriceau, V.: Irony detection in a multilingual context. Adv. Inf. Retrieval 12036 , 141 (2020)

Kumar, H., Harish, B., Darshan, H.: Sentiment analysis on imdb movie reviews using hybrid feature extraction method. Int. J. Interact. Multimed. Artif. Intell. 5 (5), 109–114 (2019)

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150. ACL, June 2011

Mahyarani, M., Adiwijaya, A., Al Faraby, S., Dwifebri, M.: Implementation of sentiment analysis movie review based on imdb with naive bayes using information gain on feature selection. In: 2021 3rd International Conference on Electronics Representation and Algorithm (ICERA), pp. 99–103. IEEE (2021)

Mesnil, G., Mikolov, T., Ranzato, M., Bengio, Y.: Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. arXiv preprint arXiv:1412.5335 (2014)

Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. arXiv preprint cs/0205070 (2002)

Pouransari, H., Ghili, S.: Deep learning for sentiment analysis of movie reviews. CS224N Proj, pp. 1–8 (2014)

Rahman, A., Hossen, M.S.: Sentiment analysis on movie review data using machine learning approach. In: 2019 International Conference on Bangla Speech and Language Processing (ICBSLP), pp. 1–4. IEEE (2019)

Sahu, T.P., Ahuja, S.: Sentiment analysis of movie reviews: a study on feature selection & classification algorithms. In: 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), pp. 1–6. IEEE (2016)

Sharma, Y., Mandalam, A.V.: Irony detection in non-english tweets. In: 2021 6th International Conference for Convergence in Technology (I2CT), pp. 1–6. IEEE (2021)

Shaukat, Z., Zulfiqar, A.A., Xiao, C., Azeem, M., Mahmood, T.: Sentiment analysis on imdb using lexicon and neural networks. SN Appl. Sci. 2 (2), 1–10 (2020)

Article   Google Scholar  

Singh, V.K., Piryani, R., Uddin, A., Waila, P.: Sentiment analysis of movie reviews: A new feature-based heuristic for aspect-level sentiment classification. In: 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), pp. 712–717. IEEE (2013)

Download references

Author information

Authors and affiliations.

Department of CS & IS, Goa Campus, BITS Pilani, Goa, India

Gahina Karak, Shubham Mishra, Arkadyuti Bandyopadhyay, Pavirala Ranga Sai Rohith & Hemant Rathore

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Gahina Karak .

Editor information

Editors and affiliations.

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA

Ajith Abraham

Campus Centre de Créteil, Université Paris-Est Créteil, Créteil, France

Patrick Siarry

Department of Computer Science, Università degli Studi di Milano, Milan, Milano, Italy

Vincenzo Piuri

Niketa Gandhi

University of Bari, Bari, Italy

Gabriella Casalino

Division of Graduate Studies and Research, Tijuana Institute of Technology, Tijuana, Mexico

Oscar Castillo

Ontario Tech University, Oshawa, ON, Canada

Patrick Hung

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper.

Karak, G., Mishra, S., Bandyopadhyay, A., Rohith, P.R.S., Rathore, H. (2022). Sentiment Analysis of IMDb Movie Reviews: A Comparative Analysis of Feature Selection and Feature Extraction Techniques. In: Abraham, A., et al. Hybrid Intelligent Systems. HIS 2021. Lecture Notes in Networks and Systems, vol 420. Springer, Cham. https://doi.org/10.1007/978-3-030-96305-7_27

Download citation

DOI : https://doi.org/10.1007/978-3-030-96305-7_27

Published : 04 March 2022

Publisher Name : Springer, Cham

Print ISBN : 978-3-030-96304-0

Online ISBN : 978-3-030-96305-7

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Sentiment Classification on the Large Movie Review Dataset

Data mining project, bert sentiment classification.

  • Monticone Pietro
  • Moroni Claudio
  • Orsenigo Davide

Problem: Sentiment Classification

A sentiment classification problem consists, roughly speaking, in detecting a piece of text and predicting if the author likes or dislikes what he/she is talking about: the input X is a piece of text and the output Y is the sentiment we want to predict, such as the rating of a movie review.

If we can train a model to map X to Y based on a labelled dataset then it can be used to predict sentiment of a reviewer after watching a movie.

Data: Large Movie Review Dataset v1.0

The dataset contains movie reviews along with their associated binary sentiment polarity labels.

  • The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets.
  • The overall distribution of labels is balanced (25k pos and 25k neg).
  • 50,000 unlabeled documents for unsupervised learning are included, but they won’t be used.
  • The train and test sets contain a disjoint set of movies, so no significant performance is obtained by memorizing movie-unique terms and their associated with observed labels.
  • In the labeled train/test sets, a negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10. Thus reviews with more neutral ratings are not included in the train/test sets.
  • In the unsupervised set, reviews of any rating are included and there are an even number of reviews > 5 and ≤ 5.

Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis . The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).

Theoretical introduction

The encoder-decoder sequence.

Roughly speaking, an encoder-decoder sequence is an ordered collection of steps ( coders ) designed to automatically translate sentences from a language to another (e.g. the English “the pen is on the table” into the Italian “la penna è sul tavolo”), which could be useful to visualize as follows: input sentence → ( encoders ) → ( decoders ) → output/translated sentence .

For our practical purpose, encoders and decoders are effectively indistinguishable (that’s why we will call them coders ): both are composed of two layers: a LSTM or GRU neural network and an attention module (AM) . They only differ in the way in which their output is processed.

LSTM or GRU neural network

Both the input and the output of an LSTM/GRU neural network consists of two vectors:

  • the hidden state : the representation of what the network has learnt about the sentence it’s reading;
  • the prediction : the representation of what the network predicts (e.g. translation).

Each word in the English input sentence is translated into its word embedding vector (WEV) before being processed by the first coder (e.g. with word2vec ). The WEV of the first word of the sentence and a random hidden state are processed by the first coder of the sequence. Regarding the output: the prediction is ignored, while the hidden state and the WEV of the second word are passed as input into the second coder and so on to the last word of the sentence. Therefore in this phase the coders work as encoders .

At the end of the sequence of N encoders (N being the number of words in the input sentence), the decoding phase begins:

  • the last hidden state and the WEV of the “START” token are passed to the first decoder ;
  • the decoder outputs a hidden state and a prection;
  • the hidden state and the prediction are passed to the second decoder;
  • the second decoder outputs a new hidden state and the second word of the translated/output sentence

and so on up until the whole sentence has been translated, namely when a decoder of the sequence outputs the WEV of the “END” token. Then there is an external mechanism to convert prediction vectors into real words, so it’s very importance to notice that the only purpose of decoders is to predict the next word .

Attention module (AM)

The attention module is a further layer that is placed before the network which provides the collection of words of the sentence with a relational structure. Let’s consider the word “table” in the sentence used as an exampe above. Because of the AM, the encoder will weight the preposition “on” (processed by the previous encoder) more than the article “the” which refers to the subject “cat”.

Bidirectional Encoder Representations from Transformers (BERT)

Transformer.

The transformer is a coder endowed with the AM layer. Transformers have been observed to work much better than the basic encoder-decoder sequences.

BERT is a sequence of encoder-type transformers which was pre-trained to predict a word or sentence (i.e. used as decoder). The benefit of improved performance of Transformers comes at a cost: the loss of bidirectionality , which is the ability to predict both next word and the previous one. BERT is the solution to this problem, a Tranformer which preserves biderectionality .

The first token is not “START”. In order to use BERT as a pre-trained language model for sentence-classification, we need to input the BERT prediction of “CLS” into a linear regression because

  • the model has been trained to predict the next sentence, not just the next word;
  • the semantic information of the sentence is encoded in the prediction output of “CLS” as a document vector of 512 elements.

sentiment analysis on movie reviews dataset

  • bert_final_data
  • https://www.kaggle.com/dataset/5f1193b4685a6e3aa8b72fa3fdc427d18c3568c66734d60cf8f79f2607551a38
  • https://www.kaggle.com/dataset/9850d2e4b7d095e2b723457263fbef547437b159e3eb7ed6dc2e88c7869fca0b
  • Bert-For-Tf2
  • Google github repository
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
  • A Visual Guide to Using BERT for the First Time
  • Machine Translation(Encoder-Decoder Model)!
  • The Illustarted Tranformers
  • The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
  • BERT Explained: State of the art language model for NLP
  • Learning Word Vectors for Sentiment Analysis .

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

This project focuses on sentiment analysis of movie reviews using the IMDb dataset. The dataset consists of 50,000 movie reviews labeled as positive or negative. The main goal of this project is to develop models that can accurately classify the sentiment of movie reviews.

Taha533/Sentiment-Analysis-of-IMDB-Movie-Reviews

Folders and files.

NameName
6 Commits

Repository files navigation

Sentiment analysis of mdb movie reviews.

Label Number of Samples
Positive 25000
Negative 25000

Dataset Link: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews?datasetId=134715&sortBy=dateRun&tab=profile

Data Preprocessing

Before training the models, the dataset undergoes preprocessing steps to prepare it for analysis. The following preprocessing steps are performed:

Removal of HTML tags: The dataset is cleaned by removing any HTML tags present in the movie reviews.

Removal of stop words: Commonly occurring stop words that do not contribute much to sentiment analysis are removed from the reviews.

Removal of single characters: Single characters that are often noise in the text are eliminated.

Removal of multiple spaces: Extra spaces between words are reduced to a single space.

Tokenization and Padding

To prepare the textual data for model training, the sentences are tokenized using a tokenizer. Tokenization involves splitting the text into individual words or tokens. This step helps in creating input sequences for the models.

Furthermore, a padding sequence function provided by Keras is utilized to ensure that all sentences have the same length. In this project, a length of 100 is chosen as the maximum sequence length. Padding sequences is crucial for handling variable-length input and enables efficient batch processing.

GloVe Word Embeddings

To capture semantic relationships between words, GloVe embeddings are employed. GloVe stands for Global Vectors for Word Representation and provides dense vector representations for words. These embeddings allow for measuring similarity between words based on their vector representations.

In this project, GloVe embeddings are utilized to enhance the models' understanding of word semantics and improve their performance in sentiment analysis.

Sentiment Analysis Models

Three different models are developed for sentiment analysis of movie reviews:

Simple Neural Network: This model architecture consists of a simple feed-forward neural network with fully connected layers. It is trained on the preprocessed movie review data to learn sentiment classification.

Convolutional Neural Network (CNN): The CNN model incorporates convolutional1D layers, which are effective in capturing local patterns and features in text data. It is trained to perform sentiment analysis on the movie reviews.

Long Short-Term Memory (LSTM): The LSTM model is a type of recurrent neural network (RNN) that is particularly effective in capturing long-term dependencies in sequential data. It is trained on the movie reviews to learn sentiment classification.

Model Training and Evaluation

Each of the models is trained for 10 epochs using the preprocessed movie review dataset. The models are optimized to learn the sentiment expressed in the reviews, and their performances are evaluated based on accuracy.

Based on the experimental results, it is observed that the LSTM model performs better than the other models for sentiment analysis of movie reviews.

Feel free to explore this project's code and experiment with different models and configurations to enhance sentiment analysis performance on the IMDb movie review dataset.

Requirements

To run this project, the following dependencies are required:

  • Scikit-Learn
  • GloVe word embeddings

Please make sure to install the necessary libraries and download the GloVe word embeddings before running the project.

This project is licensed under the MIT License .

Contributions

Contributions to this project are welcome. If you would like.

If you find this project useful, consider referencing the following resources:

  • Stanford NLP Group: GloVe

Note: This project is intended for educational purposes and to showcase the implementation of sentiment analysis using different models.

  • Jupyter Notebook 100.0%

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

BDCC-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

A comparative study of sentiment classification models for greek reviews.

sentiment analysis on movie reviews dataset

1. Introduction

2. theoretical background and review, 2.1. text representation, 2.2. computational methods for sentiment classification, 2.3. related research for greek sentiment classification, 3. methodology, 3.1. dataset selection, 3.2. text preprocessing, 3.3. modeling experiments, 3.3.1. machine learning approaches, 3.3.2. artficial neural network models, 3.3.3. transfer learning models, 3.3.4. large language models, 3.4. model evaluation, 4.1. machine learning, 4.2. artificial neural networks, 4.3. transfer learning model, 4.4. large language models, 5. discussion, 6. conclusions, data availability statement, acknowledgments, conflicts of interest.

  • Nandwani, P.; Verma, R. A Review on Sentiment Analysis and Emotion Detection from Text. Soc. Netw. Anal. Min. 2021 , 11 , 81. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Al Maruf, A.; Khanam, F.; Haque, M.M.; Jiyad, Z.M.; Mridha, M.F.; Aung, Z. Challenges and Opportunities of Text-Based Emotion Detection: A Survey. IEEE Access 2024 , 12 , 18416–18450. [ Google Scholar ] [ CrossRef ]
  • Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowl. Data Eng. 2023 , 35 , 11019–11038. [ Google Scholar ] [ CrossRef ]
  • Li, R.; Chen, H.; Feng, F.; Ma, Z.; Wang, X.; Hovy, E. DualGCN: Exploring Syntactic and Semantic Information for Aspect-Based Sentiment Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2024 , 35 , 7642–7656. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Alaei, A.R.; Becken, S.; Stantic, B. Sentiment Analysis in Tourism: Capitalizing on Big Data. J. Travel. Res. 2019 , 58 , 175–191. [ Google Scholar ] [ CrossRef ]
  • Alamoodi, A.H.; Zaidan, B.B.; Zaidan, A.A.; Albahri, O.S.; Mohammed, K.I.; Malik, R.Q.; Almahdi, E.M.; Chyad, M.A.; Tareq, Z.; Albahri, A.S.; et al. Sentiment Analysis and Its Applications in Fighting COVID-19 and Infectious Diseases: A Systematic Review. Expert Syst. Appl. 2021 , 167 , 114155. [ Google Scholar ] [ CrossRef ]
  • Jain, P.K.; Pamula, R.; Srivastava, G. A Systematic Literature Review on Machine Learning Applications for Consumer Sentiment Analysis Using Online Reviews. Comput. Sci. Rev. 2021 , 41 , 100413. [ Google Scholar ] [ CrossRef ]
  • Rambocas, M.; Pacheco, B.G. Online Sentiment Analysis in Marketing Research: A Review. J. Res. Interact. Mark. 2018 , 12 , 146–163. [ Google Scholar ] [ CrossRef ]
  • Zhang, X.; Guo, F.; Chen, T.; Pan, L.; Beliakov, G.; Wu, J. A Brief Survey of Machine Learning and Deep Learning Techniques for E-Commerce Research. J. Theor. Appl. Electron. Commer. Res. 2023 , 18 , 2188–2216. [ Google Scholar ] [ CrossRef ]
  • Giachanou, A.; Crestani, F. Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Comput. Surv. 2016 , 49 , 1–41. [ Google Scholar ] [ CrossRef ]
  • Krugmann, J.O.; Hartmann, J. Sentiment Analysis in the Age of Generative AI. Cust. Needs Solut. 2024 , 11 , 3. [ Google Scholar ] [ CrossRef ]
  • Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [ Google Scholar ]
  • Hartmann, J.; Heitmann, M.; Siebert, C.; Schamp, C. More than a Feeling: Accuracy and Application of Sentiment Analysis. Int. J. Res. Mark. 2023 , 40 , 75–87. [ Google Scholar ] [ CrossRef ]
  • Wang, Z.; Xie, Q.; Feng, Y.; Ding, Z.; Yang, Z.; Xia, R. Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study. arXiv 2023 , arXiv:2304.04339. [ Google Scholar ]
  • Tsakalidis, A.; Papadopoulos, S.; Voskaki, R.; Ioannidou, K.; Boididou, C.; Cristea, A.I.; Liakata, M.; Kompatsiaris, Y. Building and Evaluating Resources for Sentiment Analysis in the Greek Language. Lang. Resour. Eval. 2018 , 52 , 1021–1044. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bilianos, D. Experiments in Text Classification: Analyzing the Sentiment of Electronic Product Reviews in Greek. J. Quant. Linguist. 2022 , 29 , 374–386. [ Google Scholar ] [ CrossRef ]
  • Markopoulos, G.; Mikros, G.; Iliadi, A.; Liontos, M. Sentiment Analysis of Hotel Reviews in Greek: A Comparison of Unigram Features. In Cultural Tourism in a Digital Era, Springer Proceedings in Business and Economics ; Katsoni, V., Ed.; Springer Science and Business Media B.V.: Berlin/Heidelberg, Germany, 2015; pp. 373–383. [ Google Scholar ]
  • Dontaki, C.; Koukaras, P.; Tjortjis, C. Sentiment Analysis on English and Greek Twitter Data Regarding Vaccinations. In Proceedings of the 14th International Conference on Information, Intelligence, Systems and Applications, IISA, Volos, Greece, 10–12 July 2023; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2023. [ Google Scholar ]
  • Charalampakis, B.; Spathis, D.; Kouslis, E.; Kermanidis, K. Detecting Irony on Greek Political Tweets: A Text Mining Approach. In Proceedings of the 16th International Conference on Engineering Applications of Neural Networks (INNS), Island, Rhodes, Greece, 25–28 September 2015; Association for Computing Machinery: New York, NY, USA, 2015. [ Google Scholar ]
  • Athanasiou, V.; Maragoudakis, M. A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages Where NLP Resources Are Not Plentiful: A Case Study for Modern Greek. Algorithms 2017 , 10 , 34. [ Google Scholar ] [ CrossRef ]
  • Katika, A.; Zoulias, E.; Koufi, V.; Malamateniou, F. Mining Greek Tweets on Long COVID Using Sentiment Analysis and Topic Modeling. In Healthcare Transformation with Informatics and Artificial Intelligence ; IOS Press BV: Amsterdam, The Netherlands, 2023; Volume 305, pp. 545–548. [ Google Scholar ]
  • Patsiouras, E.; Koroni, I.; Mademlis, I.; Pitas, I. GreekPolitics: Sentiment Analysis on Greek Politically Charged Tweets. In Proceedings of the 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1320–1324. [ Google Scholar ]
  • Alexandridis, G.; Varlamis, I.; Korovesis, K.; Caridakis, G.; Tsantilas, P. A Survey on Sentiment Analysis and Opinion Mining in Greek Social Media. Information 2021 , 12 , 331. [ Google Scholar ] [ CrossRef ]
  • Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–8 December 2013; pp. 3111–3119. [ Google Scholar ]
  • Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [ Google Scholar ]
  • Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; pp. 427–431. [ Google Scholar ]
  • Janiesch, C.; Zschech, P.; Heinrich, K. Machine Learning and Deep Learning. Electron. Mark. 2021 , 31 , 685–695. [ Google Scholar ] [ CrossRef ]
  • Yadav, A.; Vishwakarma, D.K. Sentiment Analysis Using Deep Learning Architectures: A Review. Artif. Intell. Rev. 2020 , 53 , 4335–4385. [ Google Scholar ] [ CrossRef ]
  • Vaswani, A.; Brain, G.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [ Google Scholar ]
  • Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019 , arXiv:1910.01108. [ Google Scholar ]
  • Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019 , arXiv:1907.11692. [ Google Scholar ]
  • Koutsikakis, J.; Chalkidis, I.; Malakasiotis, P.; Androutsopoulos, I. GREEK-BERT: The Greeks Visiting Sesame Street. In Proceedings of the 11th Hellenic Conference on Artificial Intelligence, Athens, Greece, 2–4 September 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 110–117. [ Google Scholar ]
  • Kalamatianos, G.; Mallis, D.; Symeonidis, S.; Arampatzis, A. Sentiment Analysis of Greek Tweets and Hashtags Using a Sentiment Lexicon. In Proceedings of the 19th Panhellenic Conference on Informatics, Athens, Greece, 1–3 October 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 63–68. [ Google Scholar ]
  • Kydros, D.; Argyropoulou, M.; Vrana, V. A Content and Sentiment Analysis of Greek Tweets during the Pandemic. Sustainability 2021 , 13 , 6150. [ Google Scholar ] [ CrossRef ]
  • Samaras, L.; García-Barriocanal, E.; Sicilia, M.A. Sentiment Analysis of COVID-19 Cases in Greece Using Twitter Data. Expert. Syst. Appl. 2023 , 230 , 120577. [ Google Scholar ] [ CrossRef ]
  • Giatsoglou, M.; Vozalis, M.G.; Diamantaras, K.; Vakali, A.; Sarigiannidis, G.; Chatzisavvas, K.C. Sentiment Analysis Leveraging Emotions and Word Embeddings. Expert. Syst. Appl. 2017 , 69 , 214–224. [ Google Scholar ] [ CrossRef ]
  • Aivatoglou, G.; Fytili, A.; Arampatzis, G.; Zaikis, D.; Stylianou, N.; Vlahavas, I. End-to-End Aspect Extraction and Aspect-Based Sentiment Analysis Framework for Low-Resource Languages. In Lecture Notes in Networks and Systems, Proceedings of the Intelligent Systems and Applications, Amsterdam, The Netherlands, 7–8 September 2023 ; Springer: Cham, Switzerland, 2023; Volume 824, pp. 841–858. [ Google Scholar ]
  • Fragkis, N. Skroutz Shops Greek Reviews. Available online: https://www.kaggle.com/datasets/nikosfragkis/skroutz-shop-reviews-sentiment-analysis (accessed on 13 April 2024).
  • Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011 , 12 , 2825–2830. [ Google Scholar ]
  • Cauteruccio, F.; Kou, Y. Investigating the Emotional Experiences in ESports Spectatorship: The Case of League of Legends. Inf. Process Manag. 2023 , 60 , 103516. [ Google Scholar ] [ CrossRef ]
  • Tamer, M.; Khamis, M.A.; Yahia, A.; Khaled, S.A.; Ashraf, A.; Gomaa, W. Arab Reactions towards Russo-Ukrainian War. EPJ Data Sci. 2023 , 12 , 36. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

ML ModelTrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
LR92.7392.7792.7392.7393.5293.5993.5293.51
KNN81.4383.1181.4381.2280.9383.2280.9380.55
DT83.5383.5683.3883.3283.6883.7083.6883.67
MNB
SVM90.6390.6790.6390.6391.6191.6991.6191.60
RF88.6588.7688.8288.7089.3289.3889.3289.32
AdaBoost89.3589.4389.3589.3590.7790.8390.7790.76
SGB89.4389.4889.4789.4589.4789.4889.4789.47
ML ModelTrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
LR92.2592.2892.2592.2593.4493.4593.4493.44
KNN88.3488.4888.3488.3388.3388.3588.3388.33
DT82.7982.8082.2982.7782.0782.0882.0782.07
MNB92.3792.7292.3792.3593.0693.3593.0693.05
SVM
RF88.7488.8888.8488.9289.0989.1089.0989.09
AdaBoost89.3089.3289.3089.3089.6389.6489.6389.62
SGB88.0288.1188.1588.1389.7089.7189.7089.70
ML ModelTrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
LR92.7592.7992.7592.7593.4493.5293.4493.44
MNB
SVM92.6592.6692.6592.6592.8392.8592.8392.83
ML ModelTrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
LR93.4993.5193.4993.49
MNB92.6592.9692.6592.6493.3693.6393.3693.36
SVM 94.2094.2094.2094.20
Model/
Neurons
TrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
MLP/6093.6893.7293.6893.6893.7593.7793.7593.74
MLP/7093.7893.8093.7893.7894.5894.5994.5894.58
MLP/8093.7693.7793.7693.7694.2094.2294.2094.20
MLP/90 93.9093.9393.9093.90
MLP/10093.8293.8393.8293.82
Model/
Neurons
TrainingTesting
AccuracyPrecisionRecallF1 ScoreAccuracyPrecisionRecallF1 Score
MLP/6093.3093.3393.3093.3094.5194.5194.5194.51
MLP/70
MLP/8093.7493.7993.7493.7494.0594.0594.0594.05
MLP/9093.7893.8193.7893.7894.5194.5194.5194.51
MLP/10093.7493.7793.7493.7493.9093.9193.9093.90
EpochsTrainingTesting
LossAccuracy (%)LossAccuracy (%)
10.2689.660.1594.74
20.1196.010.1295.42
30.0897.250.1295.88
40.0598.240.1396.03
ModelAccuracyPrecisionRecallF1 Score
GPT-3.5-turbo93.1393.9893.1393.30
GPT-4
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Michailidis, P.D. A Comparative Study of Sentiment Classification Models for Greek Reviews. Big Data Cogn. Comput. 2024 , 8 , 107. https://doi.org/10.3390/bdcc8090107

Michailidis PD. A Comparative Study of Sentiment Classification Models for Greek Reviews. Big Data and Cognitive Computing . 2024; 8(9):107. https://doi.org/10.3390/bdcc8090107

Michailidis, Panagiotis D. 2024. "A Comparative Study of Sentiment Classification Models for Greek Reviews" Big Data and Cognitive Computing 8, no. 9: 107. https://doi.org/10.3390/bdcc8090107

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Sentiment Analysis On IMDB Movie Review

    sentiment analysis on movie reviews dataset

  2. (PDF) SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING

    sentiment analysis on movie reviews dataset

  3. Sentiment Analysis of IMDB Dataset using RF, KNN, and MNB

    sentiment analysis on movie reviews dataset

  4. GitHub

    sentiment analysis on movie reviews dataset

  5. GitHub

    sentiment analysis on movie reviews dataset

  6. Lais/Sentiment-Analysis-on-Movie-Reviews · Datasets at Hugging Face

    sentiment analysis on movie reviews dataset

VIDEO

  1. Implement sentiment analysis model using LSTM

  2. How to Use Sentiment Analysis

  3. 26: Sentiment Analysis

  4. Introduction To Long Short Term Memory (LSTM)

  5. Sentiment Classification with Python and Sklearn

  6. Audience vs Critics Ratings Rotten Tomatoes

COMMENTS

  1. IMDb Movie Reviews Dataset

    The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. A negative review has a score ≤ 4 out of 10, and a positive review has a score ≥ 7 out of 10.

  2. rishimule/Sentiment-Analysis-of-Movie-Reviews

    This project aims to perform sentiment analysis on the IMDB movie review dataset. It utilizes deep learning techniques, particularly LSTM and Conv1D layers, to classify movie reviews into positive and negative sentiments. The model is built using Keras and GloVe embeddings for word representations.

  3. qh21/Sentiment-Analysis-of-IMDB-Movie-Reviews

    Explore sentiment analysis on the IMDB movie reviews dataset using Python. This Jupyter Notebook showcases text preprocessing, TF-IDF feature extraction, and model training (Multinomial Naive Bayes, Random Forest) for sentiment classification. Ideal for understanding NLP basics and applying ML to textual data. - qh21/Sentiment-Analysis-of-IMDB-Movie-Reviews

  4. Sentiment Analysis in Action: A Case Study with Movie Reviews ...

    The chosen battleground for our sentiment analysis adventure is the IMDB dataset, a curated collection of movie reviews labeled with sentiment scores. Each review is associated with a sentiment ...

  5. Large Movie Review Dataset

    Sentiment Analysis. Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed ...

  6. Sentiment Analysis on Movie Reviews

    Classify the sentiment of sentences from the Rotten Tomatoes dataset Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more.

  7. (PDF) Sentiment Analysis of IMDb Movie Reviews Using Traditional

    IMDb movie reviews dataset is preprocessed, cleaned, and tokenized, followed by feature extraction using Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) methods.

  8. Sentiment Analysis of IMDB Movie Reviews

    Explore and run machine learning code with Kaggle Notebooks | Using data from IMDB Dataset of 50K Movie Reviews. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed!

  9. PDF Sentiment analysis of IMDb reviews

    The report utilizes a methodology to conduct the analysis of the sentiment analysis of IMDb reviews, as shown in Fig. 1. First, the report illustrates and feeds the data into the data cleaning and preprocess. Next, the report removes the stop words and some irrelevant words from the original data; then, the vectorization techniques are applied ...

  10. Sentiment Analysis of IMDb Movie Reviews: A Comparative Analysis of

    In this section, we explain the dataset used for sentiment analysis of movie reviews followed by the operating system and application specifications in which the work was carried out. Figure 3 provides a clear walk-through for the experimental process. To begin with, the data is cleaned and structured by removing unwanted symbols, punctuations ...

  11. How to Prepare Movie Review Data for Sentiment Analysis (Text

    The dataset is comprised of 1,000 positive and 1,000 negative movie reviews drawn from an archive of the rec.arts.movies.reviews newsgroup hosted at IMDB. The authors refer to this dataset as the "polarity dataset".

  12. Performing Sentiment Analysis on Movie Reviews

    The dataset contains 50,000 reviews — 25,000 positive and 25,000 negative reviews. An example of a review can be seen in Fig 1, where a user gave a 10/10 rating and a written review for the Oscar-winning movie Parasite (2020). The number of stars would be a good proxy for sentiment classification. For example, we could pre-assign the following:

  13. PDF Sentiment Analysis on Movie Reviews using Recursive and Recurrent

    We use the IMDB movie review dataset provided by Maas et. al. [1]. We train the word vectors on this corpus using the skip-gram architecture. Note that [1] is specifically about learning word vectors for sentiment analysis. As mentioned earlier, we intend to use standard, off-the-shelf vectors along with a novel architecture.

  14. GitHub

    IMDB Movie Reviews Sentiment Analysis using NLP in Python - sestok/IMDB-Sentiment-Analysis-NLP. ... Ensure that the IMDb Movie Reviews dataset is downloaded and stored in the data directory. Preprocess the raw text data to clean and standardize it before feature extraction.

  15. Sentiment_Analysis_Movie_Reviews.ipynb

    Notebook to train an XLNet model to perform sentiment analysis. The dataset used is a balanced collection of (50,000 - 1:1 train-test ratio) IMDB movie reviews with binary labels: postive or negative from the paper by Maas et al. (2011).The current state-of-the-art model on this dataset is XLNet by Yang et al. (2019) which has an accuracy of 96.2%.We get an accuracy of 92.2% due to the ...

  16. PDF Deep learning for sentiment analysis of movie reviews

    The labeled data set consists of 50,000 IMDB movie reviews, specially selected for sentiment analysis. The sentiment of reviews is binary, meaning the IMDB rating <5 results in a sentiment score of 0, and rating 7 have a sentiment score of 1. No individual movie has more than 30 reviews. The 25,000 review labeled training set does not include ...

  17. Sentiment Analysis

    Maybe you're interested in knowing whether movie reviews are positive or negative, companies use sentiment analysis in a variety of settings, particularly for marketing purposes. ... We'll be using the IMDB movie dataset which has 25,000 labelled reviews for training and 25,000 reviews for testing.

  18. Sentiment Analysis of Movie Reviews

    In this project, I made a sentiment analysis of movie reviews from the dataset of reviews on imdb from the UCI Machine Learning Repository's Sentiment Labelled Sentences Data Set Contents For pt.1 -- Basics, here are the Python notebook and the website .

  19. Sentiment Classification on the Large Movie Review Dataset

    The dataset contains movie reviews along with their associated binary sentiment polarity labels. The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The overall distribution of labels is balanced (25k pos and 25k neg). 50,000 unlabeled documents for unsupervised learning are included, but they won't be used.

  20. Sentiment Analysis of IMDB Movie Reviews using Convolutional Neural

    In this project, I will use IMDB movie reviews. This dataset contains 50,000 movie's reviews from IMDB, labeled by sentiment (positive/negative). The dataset can be loaded and splitted into training and test sets as the following. Load IMDB movie reviews ¶

  21. Taha533/Sentiment-Analysis-of-IMDB-Movie-Reviews

    This project focuses on sentiment analysis of movie reviews using the IMDb dataset. The dataset consists of 50,000 movie reviews labeled as positive or negative. The main goal of this project is to develop models that can accurately classify the sentiment of movie reviews. - Taha533/Sentiment-Analysis-of-IMDB-Movie-Reviews

  22. A Comparative Study of Sentiment Classification Models for Greek Reviews

    Specifically, the dataset selected for this experimental sentiment analysis consists of the Greek shop reviews from the e-commerce website Skroutz, which is publicly available on Kaggle . This dataset contains 6552 reviews (3276 positive and 3276 negative) about a diverse variety of products provided by the e-shop Skroutz.

  23. Sentiment Analysis on IMDB Movie reviews

    Explore and run machine learning code with Kaggle Notebooks | Using data from IMDB Dataset of 50K Movie Reviews. Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Learn more. OK, Got it. Something went wrong and this page crashed!