A Comparison of K-Means and Mean Shift Algorithms

Mehak Nigar Shumaila

doi:doi:10.11648/j.ijtam.20210705.12

| Peer-Reviewed

A Comparison of K-Means and Mean Shift Algorithms

Mehak Nigar Shumaila

Published in International Journal of Theoretical and Applied Mathematics (Volume 7, Issue 5)

Received: 25 August 2021 Accepted: 30 September 2021 Published: 27 November 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.

Published in	International Journal of Theoretical and Applied Mathematics (Volume 7, Issue 5)
DOI	10.11648/j.ijtam.20210705.12
Page(s)	76-84
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

K-Mean, Mean-Shift, Performance, Accuracy

References

[1]	Automation. Top 10 things to know about custom automation. [Online]. Available: https://www.roboticstomorrow.com/article/2020/11/maximizing-the-benefits-of-customized-solutions/15941/.
[2]	K. Kambatla, G. Kollias, V. Kumar, and A. Grama, “Trends in big data analytics,” Journal of parallel and distributed computing, vol. 74, no. 7, pp. 2561–2573, 2014.
[3]	Y. P. Raykov, A. Boukouvalas, F. Baig, and M. A. Little, “What to do when k-means clustering fails: a simple yet principled alternative algorithm,” PloS one, vol. 11, no. 9, p. e0162259, 2016.
[4]	C.-W. Tsai, C.-F. Lai, H.-C. Chao, and A. V. Vasilakos, “Big data analytics: a survey,” Journal of Big data, vol. 2, no. 1, pp. 1–32, 2015.
[5]	X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, S. Y. Philip et al., “Top 10 algorithms in data mining,” Knowledge and information systems, vol. 14, no. 1, pp. 1–37, 2008.
[6]	R. Sathya and A. Abraham, “Comparison of supervised and unsu- pervised learning algorithms for pattern classification,” International Journal of Advanced Research in Artificial Intelligence, vol. 2, no. 2, pp. 34–38, 2013.
[7]	A. Kapoor and A. Singhal, “A comparative study of k-means, k- means++ and fuzzy c-means clustering algorithms,” in 2017 3rd inter- national conference on computational intelligence & communication technology (CICT). IEEE, 2017, pp. 1–6.
[8]	O. A. Abbas, “Comparisons between data clustering algorithms.” International Arab Journal of Information Technology (IAJIT), vol. 5, no. 3, 2008.
[9]	J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, 1967, pp. 281–297.
[10]	S. Lloyd, “Least squares quantization in pcm,” IEEE transactions on information theory, vol. 28, no. 2, pp. 129–137, 1982.
[11]	E. W. Forgy, “Cluster analysis of multivariate data: efficiency versus interpretability of classifications,” biometrics, vol. 21, pp. 768–769, 1965.
[12]	K. Raghupathi. 10 interesting use cases for the k-means algorithm. [Online]. Available: https://dzone.com/articles/10-interesting-use- cases-for-the-k-means-algorithm.
[13]	S. C. Nair, M. S. Elayidom, and S. Gopalan, “Call detail record-based traffic density analysis using global k-means clustering,” International Journal of Intelligent Enterprise, vol. 7, no. 1-3, pp. 176–187, 2020.
[14]	D. LIN. Using data science techniques for the automatic clustering of it alerts. [Online]. Avail- able: https://tanzu.vmware.com/content/blog/using-data-science- techniques-for-the-automatic-clustering-of-it-alerts.
[15]	M. Zulfadhilah, Y. Prayudi, and I. Riadi, “Cyber profiling using log analysis and k-means clustering,” International Journal of Advanced Computer Science and Applications, vol. 7, no. 7, pp. 430–435, 2016.
[16]	A. Chakravarthy, “Analysis of cyber-criminal profiling and cyber- attacks: A comprehensive study,” in 3rd World Conference on Applied Sciences, Engineering and Technology, Kathmandu, Nepal, 2014.
[17]	J. Yang, S. Rahardja, and P. Fränti, “Mean-shift outlier detection and filtering,” Pattern Recognition, p. 107874, 2021.
[18]	A. Shivhare and V. Choudhary, “Object tracking in video using mean shift algorithm: A review,” International Journal of Computer Science and Information Technologies, 2015.
[19]	Fisher. Iris data set. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/iris.
[20]	W. D. Set. Wine data set. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/wine.
[21]	starweaver. Python is a powerful programming language of choice. [Online]. Available: https://starweaver.com/why-python-is-a- powerful-programming-language-of-choice/.
[22]	M. Learning. Why is python used for ai (artificial intelligence) machine learning? [Online]. Available: esparkinfo.com/why-python- is-used-for-ai-and-machine-learning.html.
[23]	Blogarama. Why is python used for ai (artificial intelligence) machine learning? [Online]. Available: https://www.blogarama.com/software- blogs/1070228-learnprogramingbyluckysir-blog/22404363-why- python-powerful-for-data-science.
[24]	P. Piotrowski, “Build a rapid web development environment for python server pages and oracle,” Oracle Technology Network, 2012.
[25]	S. Raschka, J. Patterson, and C. Nolet, “Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence,” Information, vol. 11, no. 4, p. 193, 2020.
[26]	A. Singh, A. Yadav, and A. Rana, “K-means with three different distance metrics,” International Journal of Computer Applications, vol. 67, no. 10, 2013.
[27]	M. Inaba, N. Katoh, and H. Imai, “Applications of weighted voronoi diagrams and randomization to variance-based k-clustering,” in Pro- ceedings of the tenth annual symposium on Computational geometry, 1994, pp. 332–339.
[28]	M. K. Pakhira, “A linear time-complexity k-means algorithm using cluster shifting,” in 2014 International Conference on Computational Intelligence and Communication Networks, 2014, pp. 1047–1051.
[29]	Scikit-Learn. Meanshift algorithm sci- kit learn. [Online]. Available: https://scikit- learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html.

Cite This Article

Plain Text BibTeX RIS

APA Style

Mehak Nigar Shumaila. (2021). A Comparison of K-Means and Mean Shift Algorithms. International Journal of Theoretical and Applied Mathematics, 7(5), 76-84. https://doi.org/10.11648/j.ijtam.20210705.12

Copy | Download

ACS Style

Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int. J. Theor. Appl. Math. 2021, 7(5), 76-84. doi: 10.11648/j.ijtam.20210705.12

Copy | Download

AMA Style

Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int J Theor Appl Math. 2021;7(5):76-84. doi: 10.11648/j.ijtam.20210705.12

Copy | Download

@article{10.11648/j.ijtam.20210705.12,
  author = {Mehak Nigar Shumaila},
  title = {A Comparison of K-Means and Mean Shift Algorithms},
  journal = {International Journal of Theoretical and Applied Mathematics},
  volume = {7},
  number = {5},
  pages = {76-84},
  doi = {10.11648/j.ijtam.20210705.12},
  url = {https://doi.org/10.11648/j.ijtam.20210705.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijtam.20210705.12},
  abstract = {Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.},
 year = {2021}
}

Copy | Download

TY - JOUR
T1 - A Comparison of K-Means and Mean Shift Algorithms
AU - Mehak Nigar Shumaila
Y1 - 2021/11/27
PY - 2021
N1 - https://doi.org/10.11648/j.ijtam.20210705.12
DO - 10.11648/j.ijtam.20210705.12
T2 - International Journal of Theoretical and Applied Mathematics
JF - International Journal of Theoretical and Applied Mathematics
JO - International Journal of Theoretical and Applied Mathematics
SP - 76
EP - 84
PB - Science Publishing Group
SN - 2575-5080
UR - https://doi.org/10.11648/j.ijtam.20210705.12
AB - Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.
VL - 7
IS - 5
ER -

Copy | Download

Author Information

Mehak Nigar Shumaila

Department of Information Technology, Technische Hochschule Ostwestfalen-Lippe, North Rhine-Westphalia, Germany

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Mehak Nigar Shumaila. (2021). A Comparison of K-Means and Mean Shift Algorithms. International Journal of Theoretical and Applied Mathematics, 7(5), 76-84. https://doi.org/10.11648/j.ijtam.20210705.12

Copy | Download

ACS Style

Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int. J. Theor. Appl. Math. 2021, 7(5), 76-84. doi: 10.11648/j.ijtam.20210705.12

Copy | Download

AMA Style

Mehak Nigar Shumaila. A Comparison of K-Means and Mean Shift Algorithms. Int J Theor Appl Math. 2021;7(5):76-84. doi: 10.11648/j.ijtam.20210705.12

Copy | Download

@article{10.11648/j.ijtam.20210705.12,
  author = {Mehak Nigar Shumaila},
  title = {A Comparison of K-Means and Mean Shift Algorithms},
  journal = {International Journal of Theoretical and Applied Mathematics},
  volume = {7},
  number = {5},
  pages = {76-84},
  doi = {10.11648/j.ijtam.20210705.12},
  url = {https://doi.org/10.11648/j.ijtam.20210705.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijtam.20210705.12},
  abstract = {Clustering, also known as cluster analysis, is a learning problem that occurs without the intervention of a human. This technique is frequently used very efficiently in data analysis to observe and identify interesting, useful, or desirable patterns in data. The clustering technique operates by dividing the data involved into similar objects based on their identified properties. This process results in the formation of groups, and each formed group is referred to as a cluster. A single said cluster consists of objects from the data that share similarities with other objects found in the same cluster and differ from objects identified from the data that now exist in other clusters. Clustering is an important process in many aspects of data analysis because it determines and presents the intrinsic grouping of objects in the data based on their attributes in a batch of unlabeled raw data. This method of cluster analysis lacks a textbook or, to put it another way, good criteria. This is due to the fact that this process is unique and customizable for each user who requires it for a variety of reasons. There is no single best clustering algorithm because it is so dependent on the user's scenario and needs. The purpose of this paper is to compare and contrast two different clustering algorithms. The algorithms under consideration are the k- mean and the mean shift. These algorithms are compared based on the following criteria: time complexity, training, prediction performance, and clustering algorithm accuracy.},
 year = {2021}
}

Copy | Download