| Peer-Reviewed

Multiple Means Based on Multiple Clustering (MMMC) Imputation

Received: 18 August 2022    Accepted: 13 September 2022    Published: 11 October 2022
Views:       Downloads:
Abstract

In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.

Published in International Journal on Data Science and Technology (Volume 8, Issue 3)
DOI 10.11648/j.ijdst.20220803.11
Page(s) 48-54
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Preprocessing, Missing Data, Data Imputation, Clustering

References
[1] A. V. D. H. G. S. T. a. M. Donders, "A gentle introduction to imputation of missing values," Journal of clinical epidemiology, vol. 59, pp. 1087-1091, 2006.
[2] O. C. M. S. G. B. P. H. T. T. R. B. D. a. A. R. Troyanskaya, "Missing value estimation methods for DNA microarrays," Bioinformatics, vol. 17, pp. 520-525, 2001.
[3] P. a. H. J. Flyer, "Missing data in confirmatory clinical trials," Journal of biopharmaceutical statistics, vol. 19, pp. 969-979, 2009.
[4] A. a. E. C. Baraldi, "An introduction to modern missing data analyses," Journal of school psychology, vol. 48, pp. 5-37, 2010.
[5] T. Schneider, "Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values," Journal of climate, vol. 14, pp. 853-871, 2001.
[6] R. J. A. L. a. D. B. Rubin, "Statistical Analysis with Missing Data".
[7] M. A.-M. A. a. P. P. Osman, "A survey on data imputation techniques: Water distribution system as a use case," IEEE Access, vol. 6, pp. 63279-63291, 2018.
[8] J. P. J. a. K. M. Han, Data mining: concepts and techniques, Elsevier, 2011.
[9] A. P. D. a. R. K. Jadhav, "Comparison of performance of data imputation methods for numeric dataset," Applied Artificial Intelligence, vol. 33, pp. 913-933, 2019.
[10] J. a. G. J. Schafer, "Missing data: our view of the state of the art," Psychological methods, vol. 7, p. 147, 2002.
[11] D. Rubin, "Inference and missing data," Biometrika, vol. 63, pp. 581-592, 1976.
[12] K. a. R. V. Nishanth, "Probabilistic neural network based categorical data imputation," Neurocomputing, vol. 218, pp. 17-25, 2016.
[13] M. A. J. L.-M. P. M. S. a. P. D. Gómez-Carracedo, "A practical comparison of single and multiple imputation methods to handle complex missing data in air quality datasets," Chemometrics and Intelligent Laboratory Systems, vol. 134, pp. 23-33, 2014.
[14] P. S.-G. J. a. F.-V. A. García-Laencina, "Pattern classification with missing data: a review," Neural Computing and Applications, vol. 19, pp. 263-282, 2010.
[15] C. L. F. d. C. J. F. a. S. A. Galán, "Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions," Journal of Computational and Applied Mathematics, vol. 311, pp. 704-717, 2017.
[16] Y. a. C.-d. B. Wang, "An online Bayesian filtering framework for Gaussian process regression: Application to global surface temperature analysis," Expert Systems with Applications, vol. 67, pp. 285-295, 2017.
[17] D. a. M. T. Blend, "Comparison of data imputation techniques and their impact," arXiv preprint arXiv: 0812. 1539, 2008.
[18] J. G. L. E. A. a. P. L. Dauwels, "Tensor factorization for missing data imputation in medical questionnaires," in IEEE, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] H. F. G. F. J. W. W. Z. Y. a. L. F. Tan, "A tensor-based method for missing traffic data completion," Transportation Research Part C: Emerging Technologies, vol. 28, pp. 15-27, 2013.
[20] M. Mørup, "Applications of tensor (multiway array) factorizations and decompositions in data mining," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, pp. 24-40, 2011.
[21] R. a. R. D. Little, "The analysis of social science data with missing values," Sociological Methods & Research, vol. 18, pp. 292-326, 1989.
[22] M. Lichman, "UCI Machine Learning Repository," University of California, School of Information and Computer Science, 2013. [Online]. Available: http://archive.ics.uci.edu/ml. [Accessed 24 1 2022].
[23] P. M. J. a. G. M. Schmitt, "A comparison of six methods for missing data imputation," Journal of Biometrics & Biostatistics, vol. 6, p. 1, 2015.
Cite This Article
  • APA Style

    Raed Rasheed, Wesam Ashour. (2022). Multiple Means Based on Multiple Clustering (MMMC) Imputation. International Journal on Data Science and Technology, 8(3), 48-54. https://doi.org/10.11648/j.ijdst.20220803.11

    Copy | Download

    ACS Style

    Raed Rasheed; Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int. J. Data Sci. Technol. 2022, 8(3), 48-54. doi: 10.11648/j.ijdst.20220803.11

    Copy | Download

    AMA Style

    Raed Rasheed, Wesam Ashour. Multiple Means Based on Multiple Clustering (MMMC) Imputation. Int J Data Sci Technol. 2022;8(3):48-54. doi: 10.11648/j.ijdst.20220803.11

    Copy | Download

  • @article{10.11648/j.ijdst.20220803.11,
      author = {Raed Rasheed and Wesam Ashour},
      title = {Multiple Means Based on Multiple Clustering (MMMC) Imputation},
      journal = {International Journal on Data Science and Technology},
      volume = {8},
      number = {3},
      pages = {48-54},
      doi = {10.11648/j.ijdst.20220803.11},
      url = {https://doi.org/10.11648/j.ijdst.20220803.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20220803.11},
      abstract = {In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Multiple Means Based on Multiple Clustering (MMMC) Imputation
    AU  - Raed Rasheed
    AU  - Wesam Ashour
    Y1  - 2022/10/11
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ijdst.20220803.11
    DO  - 10.11648/j.ijdst.20220803.11
    T2  - International Journal on Data Science and Technology
    JF  - International Journal on Data Science and Technology
    JO  - International Journal on Data Science and Technology
    SP  - 48
    EP  - 54
    PB  - Science Publishing Group
    SN  - 2472-2235
    UR  - https://doi.org/10.11648/j.ijdst.20220803.11
    AB  - In recent years, data science has emerged as one of the most significant variables in both the realm of research and the realm of business potential. The existence of missing values is typically observed in real-world datasets, which might present a challenge. There are a variety of methods that can be used to deal with missing values. Imputation methods that are most commonly used to fill in missing data include the mean imputation, the median imputation, and the KNN imputation. The most significant drawback of the mean and mode methods is that, if there are a significant number of missing values, all of those values will be imputed with the same value. This will result in a change to the shape of the distribution, and the variance will be reduced when compared to its value before and after imputation. The more values that are absent, the greater the shrinking that will occur within the variance. In order to address this shortcoming of existing imputations, we have developed a brand-new imputation method. Multiple clustering's serve as the basis for multiple mean calculations (MMMC). When there are missing values in a dataset variable, MMMC imputation will substitute those values with several separate means rather than a single mean. The means obtained from the use of multiple clustering with the other variables contained in the dataset. The findings demonstrate that MMMC is superior to the other imputation strategies in a number of respects.
    VL  - 8
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Faculty of Engineering, Islamic University of Gaza, Gaza, Palestine

  • Faculty of Engineering, Islamic University of Gaza, Gaza, Palestine

  • Sections