Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method

Tongan Cai; Hongliang He; Wenyu Zhang

doi:doi:10.11648/j.acm.20180703.20

| Peer-Reviewed

Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method

Tongan Cai, Hongliang He, Wenyu Zhang

Published in Applied and Computational Mathematics (Volume 7, Issue 3)

Received: 2 August 2018 Published: 3 August 2018

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.

Published in	Applied and Computational Mathematics (Volume 7, Issue 3)
DOI	10.11648/j.acm.20180703.20
Page(s)	146-154
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Data Mining, Breast Cancer, Ensemble Method, Imbalanced Learning

References

[1]	Akay, M. F., “Support vector machines combined with feature selection for breast cancer diagnosis.” Expert Systems with Applications, vol. 36, no. 2, 2009, pp. 3240-3247.
[2]	Asri, H., Mousannif, H., Moatassime, H. A., and Noel, T., “Using machine learning algorithms for breast cancer risk prediction and diagnosis.” Procedia Computer Science, vol. 83, 2016, pp. 1064-1069.
[3]	Breiman, L., “Bagging predictors.” Machine Learning, vol. 24, no. 2, 1996, pp. 123-140.
[4]	Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P., “SMOTE: synthetic minority over-sampling technique.” Journal of Artificial Intelligence Research, vol. 16, no. 2002, pp. 321-357.
[5]	Coulter, D. M., Bate, A., Meyboom, R. H., Lindguist, M., and Edwards, I. R., “Antipsychotic drugs and heart muscle disorder in international pharmacovigilance: data mining study.” BMJ, vol. 322, no. 7296, 2001, pp. 1207-1209.
[6]	Emamjomeh, A., Goliaei, B., Zahiri, J., and Ebrahimpour, R., “Predicting protein–protein interactions between human and Hepatitis C virus via an ensemble learning method.” Molecular Biosystems, vol. 10, no. 12, 2014, pp. 3147-3154.
[7]	Eom, J., Kim, S., and Zhang, B., “AptaCDSS-E: a classifier ensemble-based clinical decision support system for cardiovascular disease level prediction.” Expert Systems with Applications, vol. 34, no. 4, 2008, pp. 2465-2479.
[8]	Han, B., and Cook, P., “A stacking-based approach to twitter user geolocation prediction.” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sonifa, Bulgaria, August 4-9, 2013, pp. 7-12.
[9]	He, H. L., Zhang, W. Y., and Zhang, S., “A novel ensemble method for credit scoring: adaption of different imbalance ratios.” Expert Systems with Applications, vol. 98, 2018, pp. 105-117.
[10]	Hsieh, S. L., Hsieh, S. H., Cheng, P. H., Chen, C. H., Hsu, K. P., Lee, I. S., Wang, Z., and Lai, F., “Design ensemble machine learning model for breast cancer diagnosis.” Journal of Medical Systems, vol. 36, no. 5, 2011, pp. 2841-2847.
[11]	Johnston, M. E., Langton, K. B., Haynes, R. B., and Mathieu, A., “Effects of computer-based clinical decision support systems on clinician performance and patient outcome: a critical appraisal of research.” Annals of Internal Medicine, vol. 120, no. 2, 1994, pp. 135-142.
[12]	Karabatak, M., and Ince, M. C., “An expert system for detection of breast cancer based on association rules and neural network.” Expert Systems with Applications, vol. 36, no. 2, 2009, pp. 3465-3469.
[13]	Niemeijer, M., Ginneken, B. V., Russell, S. R., Suttorp-Schulten, M. S., and Abramoff, M. D., “Automated detection and differentiation of drusen, exudates, and cotton-wool spots in digital color fundus photographs for diabetic retinopathy diagnosis.” Investigative Ophthalmology & Visual Science, vol. 48, no. 5, Jan. 2007, pp. 2260-2267.
[14]	Osareh, A., and Shadgar, B., “Machine learning techniques to diagnose breast cancer.” In Proceedings of the 5th International Symposium on Health Informatics and Bioinformatics, Antalya, Turkey, April 20-22, 2010, pp. 114-120.
[15]	Peña-Reyes, C. A., and Sipper, M., “A fuzzy-genetic approach to breast cancer diagnosis.” Artificial Intelligence in Medicine, vol. 17, no. 2, 1999, pp. 131-155.
[16]	Perez-Iratxeta, C., Bork, P., and Andrade, M. A., “Association of genes to genetically inherited diseases using data mining.” Nature Genetics, vol. 31, no. 3, 2002, pp. 316-319.
[17]	Prather, J. C., Lobach, D. F., Goodwin, L. K., Hales, J. W., Hage, M. L., and Hammond, W. E., “Medical data mining: knowledge discovery in a clinical data warehouse.” In Proceedings of the 1997 American Medical Informatics Association Annual Fall Symposium, Nashville, USA, Oct. 25-29, 1997, pp. 101-105.
[18]	Sarwar, A., Sharma, V., and Gupta, R., “Hybrid ensemble learning technique for screening of cervical cancer using papanicolaou smear image analysis.” Personalized Medicine Universe, vol. 4, 2015, pp. 54-62.
[19]	Schapire, R. E., “The strength of weak learnability.” Machine Learning, vol. 5, no. 2, 1990, pp. 197-227.
[20]	Snoek, J., Larochelle, H., and Adams, R. P., “Practical Bayesian optimization of machine learning algorithms.” Neural Information Processing Systems, 2012, pp. 2951-2959.
[21]	Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., and Clore, J. N., “Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records.” Biomed Research International, 2014, pp. 1-11.
[22]	“U.S. Breast Cancer Statistics.” Breastcancer.org, Jan. 9, 2018, www.breastcancer.org/symptoms/understand_bc/statistics.
[23]	Wang, Y., Rimm, E. B., Stampfer, M. J., Willett, W. C., and Hu, F. B., “Comparison of abdominal adiposity and overall obesity in predicting risk of Type 2 diabetes among men.” The American Journal of Clinical Nutrition, vol. 81, no. 3, 2005, pp. 555-563.
[24]	Wilson, A. M., Thabane, L., and Holbrook, A., “Application of data mining techniques in pharmacovigilance.” British Journal of Clinical Pharmacology, vol. 57, no. 2, 2003, pp. 127-134.
[25]	Wolberg, W. H., and Mangasarian, O. L., “Multisurface method of pattern separation for medical diagnosis applied to breast cytology.” Proceedings of the National Academy of Sciences of the United States of America, vol. 87, no. 23, 1990, pp. 9193-9196.
[26]	Wolpert, D. H., “Stacked generalization.” Neural Networks, vol. 5, no. 2, 1992, pp. 241-259.
[27]	Xu, Y., Mo, T., Feng, Q., Zhong, P., Lai, M., and Chang, I. C., “Deep learning of feature representation with multiple instance learning for medical image analysis.” In Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, Florence, Italia, May 4-9, 2014, pp. 1626-1630.
[28]	Yavuz, E., Eyupoglu, C., and Sanver, U., “An ensemble of neural networks for breast cancer diagnosis.” In Proceedings of International Conference on Computer Science and Engineering, Antalya, Turkey, Oct 5-8, 2017, pp. 538-543.
[29]	Zheng, B., Yoon, S. W., and Lam, S. S., “Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms.” Expert Systems with Applications, vol. 41, no. 4, 2014, pp. 1476-1482.

Cite This Article

Plain Text BibTeX RIS

APA Style

Tongan Cai, Hongliang He, Wenyu Zhang. (2018). Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Applied and Computational Mathematics, 7(3), 146-154. https://doi.org/10.11648/j.acm.20180703.20

Copy | Download

ACS Style

Tongan Cai; Hongliang He; Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl. Comput. Math. 2018, 7(3), 146-154. doi: 10.11648/j.acm.20180703.20

Copy | Download

AMA Style

Tongan Cai, Hongliang He, Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl Comput Math. 2018;7(3):146-154. doi: 10.11648/j.acm.20180703.20

Copy | Download

@article{10.11648/j.acm.20180703.20,
  author = {Tongan Cai and Hongliang He and Wenyu Zhang},
  title = {Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method},
  journal = {Applied and Computational Mathematics},
  volume = {7},
  number = {3},
  pages = {146-154},
  doi = {10.11648/j.acm.20180703.20},
  url = {https://doi.org/10.11648/j.acm.20180703.20},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180703.20},
  abstract = {Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method
AU  - Tongan Cai
AU  - Hongliang He
AU  - Wenyu Zhang
Y1  - 2018/08/03
PY  - 2018
N1  - https://doi.org/10.11648/j.acm.20180703.20
DO  - 10.11648/j.acm.20180703.20
T2  - Applied and Computational Mathematics
JF  - Applied and Computational Mathematics
JO  - Applied and Computational Mathematics
SP  - 146
EP  - 154
PB  - Science Publishing Group
SN  - 2328-5613
UR  - https://doi.org/10.11648/j.acm.20180703.20
AB  - Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.
VL  - 7
IS  - 3
ER  -

Copy | Download

Author Information

Tongan Cai

Department of Electrical Engineering & Computer Science, University of Michigan, Ann Arbor, USA
Hongliang He

School of Information, Zhejiang University of Finance and Economics, Hangzhou, China
Wenyu Zhang

School of Information, Zhejiang University of Finance and Economics, Hangzhou, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Tongan Cai, Hongliang He, Wenyu Zhang. (2018). Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Applied and Computational Mathematics, 7(3), 146-154. https://doi.org/10.11648/j.acm.20180703.20

Copy | Download

ACS Style

Tongan Cai; Hongliang He; Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl. Comput. Math. 2018, 7(3), 146-154. doi: 10.11648/j.acm.20180703.20

Copy | Download

AMA Style

Tongan Cai, Hongliang He, Wenyu Zhang. Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method. Appl Comput Math. 2018;7(3):146-154. doi: 10.11648/j.acm.20180703.20

Copy | Download

@article{10.11648/j.acm.20180703.20,
  author = {Tongan Cai and Hongliang He and Wenyu Zhang},
  title = {Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method},
  journal = {Applied and Computational Mathematics},
  volume = {7},
  number = {3},
  pages = {146-154},
  doi = {10.11648/j.acm.20180703.20},
  url = {https://doi.org/10.11648/j.acm.20180703.20},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180703.20},
  abstract = {Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - Breast Cancer Diagnosis Using Imbalanced Learning and Ensemble Method
AU  - Tongan Cai
AU  - Hongliang He
AU  - Wenyu Zhang
Y1  - 2018/08/03
PY  - 2018
N1  - https://doi.org/10.11648/j.acm.20180703.20
DO  - 10.11648/j.acm.20180703.20
T2  - Applied and Computational Mathematics
JF  - Applied and Computational Mathematics
JO  - Applied and Computational Mathematics
SP  - 146
EP  - 154
PB  - Science Publishing Group
SN  - 2328-5613
UR  - https://doi.org/10.11648/j.acm.20180703.20
AB  - Worldwide, breast cancer is one of the most threatening killers to mid-aged women. The diagnosis of breast cancer aims to classify spotted breast tumor to be Benign or Malignant. With recent developments in data mining technique, new model structures and algorithms are helping medical workers greatly in improving classification accuracy. In this study, a model is proposed combining ensemble method and imbalanced learning technique for the classification of breast cancer data. First, Synthetic Minority Over-Sampling Technique (SMOTE), an imbalanced learning algorithm is applied to selected datasets and second, multiple baseline classifiers are tuned by Bayesian Optimization. Finally, a stacking ensemble method combines the optimized classifiers for final decision. Comparative analysis shows the proposed model can achieve better performance and adaptivity than conventional methods, in terms of classification accuracy, specificity and AuROC on two mostly-used breast cancer datasets, validating the clinical value of this model.
VL  - 7
IS  - 3
ER  -

Copy | Download