| Peer-Reviewed

Social Media Text Data Visualization Modeling: A Timely Topic Score Technique

Received: 7 April 2019     Accepted: 5 June 2019     Published: 26 July 2019
Views:       Downloads:
Abstract

Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.

Published in American Journal of Management Science and Engineering (Volume 4, Issue 3)
DOI 10.11648/j.ajmse.20190403.12
Page(s) 49-55
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2019. Published by Science Publishing Group

Keywords

Text Analytics, Natural Language Processing, Cyber Security, Signaling, Pattern Detection, Social Media

References
[1] Zaman, T. R., Herbrich, R., Van Gael, J., & Stern, D. (2010, December). Predicting information spreading in Twitter. In Workshop on computational social science and the wisdom of crowds, nips (Vol. 104, No. 45, pp. 17599-601). Citeseer.
[2] Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360.
[3] Yang, J., & Counts, S. (2010, May). Predicting the speed, scale, and range of information diffusion in Twitter. In Fourth International AAAI Conference on Weblogs and Social Media.
[4] Shah, D., & Zaman, T. (2010). Community detection in networks: The leader-follower algorithm. stat, 1050, 2.
[5] Zaman, T., Fox, E. B., & Bradlow, E. T. (2014). A bayesian approach for predicting the popularity of tweets. The Annals of Applied Statistics, 8 (3), 1583-1611.
[6] Allen, T. T., & Xiong, H. (2012). Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews. Applied Stochastic Models in Business and Industry, 28 (2), 152-163.
[7] Allen, T. T., Xiong, H., & Afful‐Dadzie, A. (2016). A directed topic model applied to call center improvement. Applied Stochastic Models in Business and Industry, 32 (1), 57-73.
[8] Allen, T. T., Vinson, S. M., Raqab, A., & Allam, Y. (2013). Using SMERT to Identify Actionable Topics in Student Feedback. Integrated Systems Engineering Technical Report 2013.
[9] Blei, D. M., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation Journal of Machine Learning Research (3).
[10] Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712.
[11] Feldman, R. and Sanger, J. (2007). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press.
[12] Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3): 130-137.
[13] Teh, Y. W., Newman, D., & Welling, M. (2007). A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (pp. 1353-1360).
[14] Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1), 5228-5235.
[15] Carpenter, B. (2010). Integrating out multinomial parameters in latent Dirichlet allocation and naive Bayes for collapsed Gibbs sampling. Rapport Technique, 4, 464.
Cite This Article
  • APA Style

    Zhenhuan Sui. (2019). Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. American Journal of Management Science and Engineering, 4(3), 49-55. https://doi.org/10.11648/j.ajmse.20190403.12

    Copy | Download

    ACS Style

    Zhenhuan Sui. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. Am. J. Manag. Sci. Eng. 2019, 4(3), 49-55. doi: 10.11648/j.ajmse.20190403.12

    Copy | Download

    AMA Style

    Zhenhuan Sui. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. Am J Manag Sci Eng. 2019;4(3):49-55. doi: 10.11648/j.ajmse.20190403.12

    Copy | Download

  • @article{10.11648/j.ajmse.20190403.12,
      author = {Zhenhuan Sui},
      title = {Social Media Text Data Visualization Modeling: A Timely Topic Score Technique},
      journal = {American Journal of Management Science and Engineering},
      volume = {4},
      number = {3},
      pages = {49-55},
      doi = {10.11648/j.ajmse.20190403.12},
      url = {https://doi.org/10.11648/j.ajmse.20190403.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmse.20190403.12},
      abstract = {Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.},
     year = {2019}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Social Media Text Data Visualization Modeling: A Timely Topic Score Technique
    AU  - Zhenhuan Sui
    Y1  - 2019/07/26
    PY  - 2019
    N1  - https://doi.org/10.11648/j.ajmse.20190403.12
    DO  - 10.11648/j.ajmse.20190403.12
    T2  - American Journal of Management Science and Engineering
    JF  - American Journal of Management Science and Engineering
    JO  - American Journal of Management Science and Engineering
    SP  - 49
    EP  - 55
    PB  - Science Publishing Group
    SN  - 2575-1379
    UR  - https://doi.org/10.11648/j.ajmse.20190403.12
    AB  - Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.
    VL  - 4
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Department of Integrated Systems Engineering, The Ohio State University, Columbus, Ohio, USA

  • Sections