Multiple Means Based on Multiple Clustering (MMMC) Imputation

Raed Rasheed; Wesam Ashour

doi:doi:10.11648/j.ijdst.20220803.11

| Peer-Reviewed

Multiple Means Based on Multiple Clustering (MMMC) Imputation

Raed Rasheed, Wesam Ashour

Published in International Journal on Data Science and Technology (Volume 8, Issue 3)

Received: 18 August 2022 Accepted: 13 September 2022 Published: 11 October 2022

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.

Published in	International Journal on Data Science and Technology (Volume 8, Issue 3)
DOI	10.11648/j.ijdst.20220803.11
Page(s)	48-54
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Preprocessing, Missing Data, Data Imputation, Clustering

References

[1]	A. V. D. H. G. S. T. a. M. Donders, "A gentle introduction to imputation of missing values," Journal of clinical epidemiology, vol. 59, pp. 1087-1091, 2006.
[2]	O. C. M. S. G. B. P. H. T. T. R. B. D. a. A. R. Troyanskaya, "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, pp. 520-525, 2001.
[3]	P. a. H. J. Flyer, "Missing data in confirmatory clinical trials," Journal of biopharmaceutical statistics, vol. 19, pp. 969-979, 2009.
[4]	A. a. E. C. Baraldi, "An introduction to modern missing data analyses," Journal of school psychology, vol. 48, pp. 5-37, 2010.
[5]	T. Schneider, "Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values," Journal of climate, vol. 14, pp. 853-871, 2001.
[6]	R. J. A. L. a. D. B. Rubin, "Statistical Analysis with Missing Data".
[7]	M. A.-M. A. a. P. P. Osman, "A survey on data imputation techniques: Water distribution system as a use case," IEEE Access, vol. 6, pp. 63279-63291, 2018.
[8]	J. P. J. a. K. M. Han, Data mining: concepts and techniques, Elsevier, 2011.
[9]	A. P. D. a. R. K. Jadhav, "Comparison of performance of data imputation methods for numeric dataset," Applied Artificial Intelligence, vol. 33, pp. 913-933, 2019.
[10]	J. a. G. J. Schafer, "Missing data: our view of the state of the art," Psychological methods, vol. 7, p. 147, 2002.
[11]	D. Rubin, "Inference and missing data," Biometrika, vol. 63, pp. 581-592, 1976.
[12]	K. a. R. V. Nishanth, "Probabilistic neural network based categorical data imputation," Neurocomputing, vol. 218, pp. 17-25, 2016.
[13]	M. A. J. L.-M. P. M. S. a. P. D. Gómez-Carracedo, "A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets," Chemometrics and Intelligent Laboratory Systems, vol. 134, pp. 23-33, 2014.
[14]	P. S.-G. J. a. F.-V. A. García-Laencina, "Pattern classification with missing data: a review," Neural Computing and Applications, vol. 19, pp. 263-282, 2010.
[15]	C. L. F. d. C. J. F. a. S. A. Galán, "Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions," Journal of Computational and Applied Mathematics, vol. 311, pp. 704-717, 2017.
[16]	Y. a. C.-d. B. Wang, "An online Bayesian filtering framework for Gaussian process regression: Application to global surface temperature analysis," Expert Systems with Applications, vol. 67, pp. 285-295, 2017.
[17]	D. a. M. T. Blend, "Comparison of data imputation techniques and their impact," arXiv preprint arXiv: 0812. 1539, 2008.
[18]	J. G. L. E. A. a. P. L. Dauwels, "Tensor factorization for missing data imputation in medical questionnaires," in IEEE, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19]	H. F. G. F. J. W. W. Z. Y. a. L. F. Tan, "A tensor-based method for missing traffic data completion," Transportation Research Part C: Emerging Technologies, vol. 28, pp. 15-27, 2013.
[20]	M. Mørup, "Applications of tensor (multiway array) factorizations and decompositions in data mining," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, pp. 24-40, 2011.
[21]	R. a. R. D. Little, "The analysis of social science data with missing values," Sociological Methods & Research, vol. 18, pp. 292-326, 1989.
[22]	M. Lichman, "UCI Machine Learning Repository," University of California, School of Information and Computer Science, 2013. [Online]. Available: http://archive.ics.uci.edu/ml. [Accessed 24 1 2022].
[23]	P. M. J. a. G. M. Schmitt, "A comparison of six methods for missing data imputation," Journal of Biometrics & Biostatistics, vol. 6, p. 1, 2015.

Cite This Article

Plain Text BibTeX RIS

APA Style

Raed Rasheed, Wesam Ashour. (2022). Multiple Means Based on Multiple Clustering (MMMC) Imputation. International Journal on Data Science and Technology, 8(3), 48-54. https://doi.org/10.11648/j.ijdst.20220803.11

Copy | Download

ACS Style

Raed Rasheed; Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int. J. Data Sci. Technol. 2022, 8(3), 48-54. doi: 10.11648/j.ijdst.20220803.11

Copy | Download

AMA Style

Raed Rasheed, Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int J Data Sci Technol. 2022;8(3):48-54. doi: 10.11648/j.ijdst.20220803.11

Copy | Download

@article{10.11648/j.ijdst.20220803.11,
  author = {Raed Rasheed and Wesam Ashour},
  title = {Multiple Means Based on Multiple Clustering (MMMC) Imputation},
  journal = {International Journal on Data Science and Technology},
  volume = {8},
  number = {3},
  pages = {48-54},
  doi = {10.11648/j.ijdst.20220803.11},
  url = {https://doi.org/10.11648/j.ijdst.20220803.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20220803.11},
  abstract = {In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.},
 year = {2022}
}

Copy | Download

TY - JOUR
T1 - Multiple Means Based on Multiple Clustering (MMMC) Imputation
AU - Raed Rasheed
AU - Wesam Ashour
Y1 - 2022/10/11
PY - 2022
N1 - https://doi.org/10.11648/j.ijdst.20220803.11
DO - 10.11648/j.ijdst.20220803.11
T2 - International Journal on Data Science and Technology
JF - International Journal on Data Science and Technology
JO - International Journal on Data Science and Technology
SP - 48
EP - 54
PB - Science Publishing Group
SN - 2472-2235
UR - https://doi.org/10.11648/j.ijdst.20220803.11
AB - In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.
VL - 8
IS - 3
ER -

Copy | Download

Author Information

Raed Rasheed

Faculty of Engineering, Islamic University of Gaza, Gaza, Palestine
Wesam Ashour

Faculty of Engineering, Islamic University of Gaza, Gaza, Palestine

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Raed Rasheed, Wesam Ashour. (2022). Multiple Means Based on Multiple Clustering (MMMC) Imputation. International Journal on Data Science and Technology, 8(3), 48-54. https://doi.org/10.11648/j.ijdst.20220803.11

Copy | Download

ACS Style

Raed Rasheed; Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int. J. Data Sci. Technol. 2022, 8(3), 48-54. doi: 10.11648/j.ijdst.20220803.11

Copy | Download

AMA Style

Raed Rasheed, Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int J Data Sci Technol. 2022;8(3):48-54. doi: 10.11648/j.ijdst.20220803.11

Copy | Download

@article{10.11648/j.ijdst.20220803.11,
  author = {Raed Rasheed and Wesam Ashour},
  title = {Multiple Means Based on Multiple Clustering (MMMC) Imputation},
  journal = {International Journal on Data Science and Technology},
  volume = {8},
  number = {3},
  pages = {48-54},
  doi = {10.11648/j.ijdst.20220803.11},
  url = {https://doi.org/10.11648/j.ijdst.20220803.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20220803.11},
  abstract = {In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.},
 year = {2022}
}

Copy | Download