American Journal of Data Mining and Knowledge Discovery

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Review Article |

Machine Learning for Text Classification on Twitter: A Literature Review

This literature review examines the application of machine learning (ML) techniques for text classification on Twitter. With the immense volume of data generated on social media platforms like Twitter, there is a need for automated methods to extract valuable information. ML, known for its ability to learn patterns and relationships in large datasets, has gained significant attention in this context. The purpose of this review is to explore the background and aim of ML for text classification on Twitter, the methods employed, the results obtained, and the conclusions drawn. The review begins by discussing the background and aim, emphasizing the vast amount of data available on Twitter and the need for automated techniques to extract useful information from this data. It highlights the significance of ML in addressing this challenge, particularly in tasks such as sentiment analysis, topic modeling, and spam detection, which play a crucial role in social media analysis. Next, the review provides an overview of the methods used in various studies on text classification using Twitter data. It explores the latest approaches and techniques employed in ML, including feature extraction methods like bag-of-words, n-grams, and word embeddings. It also discusses the preprocessing steps involved in preparing Twitter data for classification tasks. subsequently, the review presents the results obtained from different studies in the field. It discusses the performance metrics used to evaluate the effectiveness of ML models, highlighting measures such as accuracy, precision, recall, and F1-score. The review also discusses variations in performance across different classification tasks, providing insights into the strengths and limitations of the approaches used.

Machine Learning, Text Classification, Twitter Data, NLP

APA Style

Alsurori, M., Enan, A., Alwan, R., Algumaei, W., Alturki, S., et al. (2023). Machine Learning for Text Classification on Twitter: A Literature Review. American Journal of Data Mining and Knowledge Discovery, 8(1), 11-17. https://doi.org/10.11648/j.ajdmkd.20230801.12

ACS Style

Alsurori, M.; Enan, A.; Alwan, R.; Algumaei, W.; Alturki, S., et al. Machine Learning for Text Classification on Twitter: A Literature Review. Am. J. Data Min. Knowl. Discov. 2023, 8(1), 11-17. doi: 10.11648/j.ajdmkd.20230801.12

AMA Style

Alsurori M, Enan A, Alwan R, Algumaei W, Alturki S, et al. Machine Learning for Text Classification on Twitter: A Literature Review. Am J Data Min Knowl Discov. 2023;8(1):11-17. doi: 10.11648/j.ajdmkd.20230801.12

Copyright © 2023 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Statista. (2021). Number of monthly active Twitter users worldwide from 1st quarter 2010 to 2nd quarter 2021 (in millions). Retrieved from https://www.statista.com/statistics/282087/number-of-monthly-active
2. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436-444.
3. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34 (1), 1-47.
4. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2 (1-2), 1-135.
5. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3 (Jan), 993-1022.
6. Cormack, G., & Lynam, T. (2007). Spam Filtering: A review. Foundations and Trends in Information Retrieval, 1 (4), 267-349.
7. M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
8. Hasan, M. R., Maliha, M., & Arifuzzaman, M. (2019, July). Sentiment analysis with NLP on Twitter data. In 2019 international conference on computer, communication, chemical, materials and electronic engineering (IC4ME2) (pp. 1-4). IEEE.
9. Basarslan, M. S., & Kayaalp, F. (2020). Sentiment analysis with machine learning methods on social media.‏
10. Muneer, A., & Fati, S. M. (2020). A comparative analysis of machine learning techniques for cyberbullying detection on twitter. Future Internet, 12 (11), 187.
11. Harjule, P., Gurjar, A., Seth, H., & Thakur, P. (2020, February). Text classification on Twitter data. In 2020 3rd International Conference on Emerging Technologies in Computer Engineering: Machine Learning and Internet of Things (ICETCE) (pp. 160-164). IEEE.
12. Wadhwa, S., & Babber, K. (2021). Performance comparison of classifiers on twitter sentimental analysis. European Journal of Engineering Science and Technology, 4 (3), 15-24.
13. Shamrat, F. M. J. M., Chakraborty, S., Imran, M. M., Muna, J. N., Billah, M. M., Das, P., & Rahman, O. M. (2021). Sentiment analysis on twitter tweets about COVID-19 vaccines using NLP and supervised KNN classification algorithm. Indonesian Journal of Electrical Engineering and Computer Science, 23 (1), 463-470.‏
14. Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667.‏
15. AlBadani, B., Shi, R., & Dong, J. (2022). A novel machine learning approach for sentiment analysis on twitter incorporating the universal language model fine-tuning and SVM. Applied System Innovation, 5 (1).
16. Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., & Malik, S. H. (2022). Detecting twitter hate speech in COVID-19 era using machine learning and ensemble learning techniques. International Journal of Information Management Data Insights, 2 (2), 100120.‏
17. Rahman, S., Jahan, N., Sadia, F., & Mahmud, I. (2023). Social crisis detection using Twitter based text mining-a machine learning approach. Bulletin of Electrical Engineering and Informatics, 12 (2), 1069-1077.‏
18. Ellyanti, L., Ruldeviyani, Y., Pradana, L. E., & Harjanto, A. (2023). Sentiment Analysis of Twitter Users to the PeduliLindungi Using Naïve Bayes Algorithm. Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), 7 (2), 414-421.
19. Wadhwani, G. K., Varshney, P. K., Gupta, A., & Kumar, S. (2023). Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia–Ukraine War. SN Computer Science, 4 (4), 346.