International Journal of Statistical Distributions and Applications

Submit a Manuscript

Publishing with us to make your research visible to the widest possible audience.

Propose a Special Issue

Building a community of authors and readers to discuss the latest research and develop new ideas.

Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data

With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.

Semi-Parametric Models with Interaction, Variable Selection, Modal Regression, Adaptive LASSO

APA Style

Yafeng Xia, Na Kui. (2023). Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. International Journal of Statistical Distributions and Applications, 9(2), 49-61. https://doi.org/10.11648/j.ijsd.20230902.11

ACS Style

Yafeng Xia; Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int. J. Stat. Distrib. Appl. 2023, 9(2), 49-61. doi: 10.11648/j.ijsd.20230902.11

AMA Style

Yafeng Xia, Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int J Stat Distrib Appl. 2023;9(2):49-61. doi: 10.11648/j.ijsd.20230902.11

Copyright © 2023 Authors retain the copyright of this article.
This article is an open access article distributed under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Liang, H., H. Wang a, and C.-L. Tsai, Profiled forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Statistica Sinica, 2012. 22 (2): p. 531-554.
2. Zhao, W., et al., Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 2013. 66 (1): p. 165-191.
3. Zhang, R., W. Zhao, and J. Liu, Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. Journal of Nonparametric Statistics, 2013. 25 (2): p. 523-544.
4. Bradley, E., et al., Least angle regression. The Annals of Statistics, 2004. 32 (2): p. 407-499.
5. Hao, N. and H. H. Zhang, Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 2014. 109 (507): p. 1285-1301.
6. Hao, N., Y. Feng, and H. H. Zhang, Model Selection for High-Dimensional Quadratic Regression via Regularization. Journal of the American Statistical Association, 2018. 113 (522): p. 615-625.
7. Dong, Y. and H. Jiang, A Two-Stage Regularization Method for Variable Selection and Forecasting in High-Order Interaction Model. Complexity, 2018. 2018: p. 1-12.
8. Lv, J., H. Yang, and C. Guo, Variable selection in partially linear additive models for modal regression. Communications in Statistics - Simulation and Computation, 2017. 46 (7): p. 5646-5665.
9. Yao, W., B. G. Lindsay, and R. Li, Local Modal Regression. J Nonparametr Stat, 2012. 24 (3): p. 647-663.
10. Li, J., S. Ray, and B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 2007. 8: p. 1687-1723.
11. Fan, J. and R. Li, Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 2001. 96 (456): p. 1348-1360.
12. Hao, N. and H. H. Zhang, A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 2018. 71 (4): p. 291-297.
13. Wainwright, M. J., Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using-Constrained Quadratic Programming (Lasso). IEEE Transactions on Information Theory, 2009. 55 (5): p. 2183-2202.
14. Liu, X., L. Wang, and H. Liang, Estimation and Variable Selection for Semiparametric Additive Partial Linear Models STATISTICA SINICA, 2011. 21 (3): p. 1225-1248.
15. Zou, H., The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 2012. 101 (476): p. 1418-1429.
16. R, J. and C. de Boor, A Practical Guide to Splines. Mathematics of Computation, 1980. 34 (149).