Development of a Sentiment Analysis Model for Evaluating Open Source Reviews on CCleaner

Thesis title: Development of a Sentiment Analysis Model for Evaluating Open Source Reviews on CCleaner
Author: Flores Trochez, Luis Diego
Thesis type: Diploma thesis
Supervisor: Ziaei Nafchi, Majid
Opponents: Sudzina, František
Thesis language: English
Abstract:
This study develops a machine learning-based sentiment analysis model to automate the evaluation of user reviews for CCleaner and two of its competitors. By collecting over 3,000 reviews from the Google Play Store and applying preprocessing, classification, and clustering techniques, the study compares the performance of Logistic Regression, Support Vector Machine (SVM), and Long Short-Term Memory (LSTM) models for sentiment classification. Results indicate that LSTM consistently outperforms traditional models across accuracy, precision, recall, and F1-score metrics. In addition, K-Means clustering reveals five dominant feedback themes, aiding product teams in pinpointing areas of user concern and satisfaction. The findings show the practical value of automated sentiment analysis in enhancing user experience, reducing manual effort, and informing decision makers of current user concerns.
Keywords: CCleaner; LSTM; Logistic Regression; Natural Language Processing (NLP); App Reviews; K-Means Clustering; Topic Modeling; Sentiment Analysis; Machine Learning; Support Vector Machine; Deep Learning
Thesis title: Development of a Sentiment Analysis Model for Evaluating Open Source Reviews on CCleaner
Author: Flores Trochez, Luis Diego
Thesis type: Diplomová práce
Supervisor: Ziaei Nafchi, Majid
Opponents: Sudzina, František
Thesis language: English
Abstract:
This study develops a machine learning-based sentiment analysis model to automate the evaluation of user reviews for CCleaner and two of its competitors. By collecting over 3,000 reviews from the Google Play Store and applying preprocessing, classification, and clustering techniques, the study compares the performance of Logistic Regression, Support Vector Machine (SVM), and Long Short-Term Memory (LSTM) models for sentiment classification. Results indicate that LSTM consistently outperforms traditional models across accuracy, precision, recall, and F1-score metrics. In addition, K-Means clustering reveals five dominant feedback themes, aiding product teams in pinpointing areas of user concern and satisfaction. The findings show the practical value of automated sentiment analysis in enhancing user experience, reducing manual effort, and informing decision makers of current user concerns.
Keywords: Sentiment Analysis; Machine Learning; Support Vector Machine; Logistic Regression; App Reviews; Natural Language Processing (NLP); CCleaner; Deep Learning; Topic Modeling; K-Means Clustering; LSTM

Information about study

Study programme: Information Systems Management/Data and Business
Type of study programme: Magisterský studijní program
Assigned degree: Ing.
Institutions assigning academic degree: Vysoká škola ekonomická v Praze
Faculty: Faculty of Informatics and Statistics
Department: Department of Systems Analysis

Information on submission and defense

Date of assignment: 2. 12. 2024
Date of submission: 22. 7. 2025
Date of defense: 2025

Files for download

The files will be available after the defense of the thesis.

    Last update: