Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis
Applied and Computational Mathematics
Volume 9, Issue 4, August 2020, Pages: 130-145
Received: Aug. 20, 2020;
Published: Aug. 22, 2020
Views 182 Downloads 101
Ruixuan Dong, Department of Statistic, East China Normal University, Shanghai, China
In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.
Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis, Applied and Computational Mathematics.
Vol. 9, No. 4,
2020, pp. 130-145.
Hui-Ling Chen, Bo Yang, Jie Liu, Da-You Liu. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis [J]. Expert Systems With Applications, 2011, 38 (7).
Zheng Ying, Wu Chunxiao, Zhang Minlu. The prevalence and disease characteristics of breast cancer in China [J]. Chinese Journal of Cancer, 2013, 23 (008): 561-569. (in Chinese).
Yang Ling, Li Liandi, Chen Yude, et al. Estimation and prediction of the incidence and death trend of breast cancer in China [J]. Chinese Journal of Oncology, 2006, 28 (006): 438-440. (in Chinese).
M. Eskelinen, E. Hämäläinen, V.-M. Kosmat, I. Penttilä, E. Alhava, K. Syrjänent. 7 Comparison of tumour markers CEA, AFP, CA15-3, TPS and NEU in breast cancer diagnosis [J]. The Breast, 1995, 4 (1).
Na Liu, Er-Shi Qi, Man Xu, Bo Gao, Gui-Qiu Liu. A novel intelligent classification model for breast cancer diagnosis [J]. Information Processing and Management, 2019, 56 (3).
M. Patrício, J. Pereira, J. Crisóstomo, P. Matafome, M. Gomes, R. Seic A, and F. Caramelo. Using resistin, glucose, age and bmi to predict the presence of breast cancer. Bmc Cancer, 18 (1): 29, 2018.
Jiang Yina, Chen Naihong. Research on the mechanism of CCL2/MCP-1 in related diseases [J]. Chinese Pharmacological Bulletin, 2016, 32 (12): 1634-1638. (in Chinese).
Yue Chen. Adiponectin-a new type of lipid-derived hormone [J]. Medical Journal of Chinese People's Liberation Army, 2003 (02): 183-185. (in Chinese).
Wallace, Tara M., Levy, Jonathan C., Matthews, & David R. Use and Abuse of HOMA Modeling. [J]. Diabetes Care, 2004.
Srivastava M. S, Hui T. K. On assessing multivariate normality based on shapiro-wilk W statistic. 1987, 5 (1): 15-18.
Liu-Cang Wu, Deng-Ke Xu. Maximum Likelihood Estimation of Normal Distribution Parameters under Data Transformation [J]. Journal of Data Analysis, 2010, 5 (5): 15-24. (in Chinese).
Dai Jinhui, Yuan Jing. Comparison of single-factor analysis of variance and multiple linear regression analysis methods [J]. Statistics and Decision, 2016 (09): 23-26. (in Chinese).
Guo Zhibo, Liu Huajun, Zheng Yujie, et al. Enhanced linear discriminant analysis criteria based on the unification principle of PCA and LDA [J]. Journal of Image and Graphics, 2008, 13 (4): 702-708. (in Chinese).
Lin Haiming, Du Zifang. Problems that should be paid attention to in the comprehensive evaluation of principal component analysis [J]. Statistical Research, 2013, 30 (08): 25-31. (in Chinese).
P. A. Lachenbruch and M. R. Mickey. Estimation of error rates in discriminant analysis. Technomet- rics, 10 (1): 1–11, 1968.