| Peer-Reviewed

Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis

Received: 5 July 2015     Accepted: 17 July 2015     Published: 28 July 2015
Views:       Downloads:
Abstract

For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.

Published in American Journal of Theoretical and Applied Statistics (Volume 4, Issue 5)
DOI 10.11648/j.ajtas.20150405.11
Page(s) 317-321
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2015. Published by Science Publishing Group

Keywords

Clustering, K-means, PCA, ICA

References
[1] Bradley, P., & Fayyad, U. (1998). Refining initial points for k means clustering. Proc. 15th International Conf. on Machine Learning.
[2] Cluster R package.(http://cran.r-project.org/web/packages/ cluster/index.html).
[3] Ding, C., & He, X.. K-Means clustering via principal component analysis. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720
[4] Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern classification, 2nd ed. Wiley.
[5] Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1,183–187.
[6] Groeneveld RA (1998) A class of quantile measures for kurtosis. Am Stat 52: 325-329.
[7] Hartigan, J., & Wang, M. (1979). A K-means clustering algorithm. Applied Statistics, 28, 100–108.
[8] Hastie, T., Tibshirani, R., & Friedman, J. (2001). Elements of statistical learning. Springer Verlag.
[9] Hyv¨arinen, A. and Oja, E.: Independent component analysis: Algorithms and applications. Neural Networks. 4-5(13):411-430. 2000.
[10] Jain, A., & Dubes, R. (1988). Algorithms for clustering data. Prentice Hall.
[11] J.C. Salagubang and Erniel B. Barrios, Outlier detection in high dimensional data in the context of clustering, 12th National Convention on Statistics (NCS) EDSA Shangri-La Hotel, Mandaluyong City October 1-2, 2013
[12] Johnson, R. and Wischern, D. (2002). Applied Multivariate statistical analysis, 5th ed. Prentice-Hall, Inc.
[13] Jolliffe, I. (2002). Principal component analysis. Springer. 2nd edition.
[14] Jones,M. and Sibson, R. What is projection pursuit? J. of the Royal Statistical Society, Ser. A, 150:1-36. 1987.
[15] Kotz, S., and Seier, E. (2008), Kurtosis of the Two-Sided Power Distribution, Brazilian Journal of Probability and Statistics, 28, 6168.
[16] Leela, V. K. Sakthi priya and R. Manikandan, 2013. “Comparative Study of Clustering Techniques in Iris Data Sets” World Applied Sciences Journal 29 (Data Mining and Soft Computing Techniques): 24-29, 2014 ISSN 1818-4952.
[17] Lihua An, S.Ejaz Ahmed. Improving the performance of kurtosis estimator. Computational Statistics and Data Analysis 52, 2669-2681. 2008.
[18] Lloyd, S. (1957). Least squares quantization in pcm. Bell Telephone Laboratories Paper, Marray Hill.
[19] MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proc. 5th Berkeley Symposium, 281–297.
[20] Maurya V.N., Misra R.B., Jaggi C.K., and Maurya A.K., Performance analysis of powers of skewness and kurtosis based multivariate normality tests and use of extended Monte Carlo simulation for proposed novelty algorithm, American Journal of Theoretical and Applied Statistics, Science Publishing Group, USA, Vol. 4(2-1), pp. 11-18, 2015.
[21] Matthias Scholz, Yves Gibon, Mark Stitt and Joachim Selbig, Independent component analysis of starch deficient pgm mutants. Proceedings of the German conference on Bioinformatics. Gesellschaft fur info mark, Bonn, pp.95-104,2004.
[22] Meira Jr., W.; Zaki, M. Fundamentals of Data Mining Algorithms.(http://www.dcc.ufmg.br/miningalgorithms/DokuWiki/doku.php).
[23] Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Proc. Neural Info. Processing Systems (NIPS 2001).
[24] Pearson K (1905) Skew variation, a rejoinder. Biometrika 4:169212.
[25] Reza, M.S., Nasser, M. and Shahjaman, M. (2011) An Improved Version of Kurtosis Measure and Their Application in ICA, International Journal of Wireless Communication and Information Systems (IJWCIS) Vol 1 No 1.
[26] Reza M.S., Ruhi S., Multivariate Outlier Detection Using Independent Component Analysis, Science Journal of Applied Mathematics and Statistics, Science Publishing Group, USA, Vol. 3, No. 4, 2015, pp. 171-176. doi: 10.11648/j.sjams.20150304.11.
[27] Scholz, M., Gatzek, S., Sterling, A., Fiehn, O., and Selbig, J. Metabolite fingerprinting: detecting biological features by independent component analysis. Bioinformatics 20, 2447-2454, 2004.
[28] Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for K-means clustering. Advances in Neural Information Processing Systems 14 (NIPS’01), 1057–1064.
Cite This Article
  • APA Style

    Md. Shamim Reza, Sabba Ruhi. (2015). Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. American Journal of Theoretical and Applied Statistics, 4(5), 317-321. https://doi.org/10.11648/j.ajtas.20150405.11

    Copy | Download

    ACS Style

    Md. Shamim Reza; Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am. J. Theor. Appl. Stat. 2015, 4(5), 317-321. doi: 10.11648/j.ajtas.20150405.11

    Copy | Download

    AMA Style

    Md. Shamim Reza, Sabba Ruhi. Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis. Am J Theor Appl Stat. 2015;4(5):317-321. doi: 10.11648/j.ajtas.20150405.11

    Copy | Download

  • @article{10.11648/j.ajtas.20150405.11,
      author = {Md. Shamim Reza and Sabba Ruhi},
      title = {Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis},
      journal = {American Journal of Theoretical and Applied Statistics},
      volume = {4},
      number = {5},
      pages = {317-321},
      doi = {10.11648/j.ajtas.20150405.11},
      url = {https://doi.org/10.11648/j.ajtas.20150405.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20150405.11},
      abstract = {For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.},
     year = {2015}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Study of Multivariate Data Clustering Based on K-Means and Independent Component Analysis
    AU  - Md. Shamim Reza
    AU  - Sabba Ruhi
    Y1  - 2015/07/28
    PY  - 2015
    N1  - https://doi.org/10.11648/j.ajtas.20150405.11
    DO  - 10.11648/j.ajtas.20150405.11
    T2  - American Journal of Theoretical and Applied Statistics
    JF  - American Journal of Theoretical and Applied Statistics
    JO  - American Journal of Theoretical and Applied Statistics
    SP  - 317
    EP  - 321
    PB  - Science Publishing Group
    SN  - 2326-9006
    UR  - https://doi.org/10.11648/j.ajtas.20150405.11
    AB  - For last two decades, clustering is well-recognized area in the research field of data mining. Data clustering plays the major research at pattern recognition, Signal processing, bioinformatics and Artificial Intelligence. Clustering process is an unsupervised learning techniques where it generates a group of object based on their similarity in such a way that the objects belonging to other groups are similar and those belonging to other are dissimilar. This paper analysis the three different data types clustering techniques like K-Means, Principal components analysis (PCA) and Independent component analysis (ICA) in real and simulated data. The recent developments by considering a rather unexpected application of the theory of Independent component analysis (ICA) found in data clustering, outlier detection and multivariate data visualization. Accurate identification of data clustering plays an important role in statistical analysis. In this paper we explore the connection among these three techniques to identify multivariate data clustering and develop a new method k-means on PCA or ICA then the result shows that ICA based clustering performs well than others.
    VL  - 4
    IS  - 5
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh

  • Department of Mathematics, Pabna University of Science & Technology, Pabna, Bangladesh

  • Sections