Regression analysis is a core analytical tool widely employed across diverse domains for predicting continuous outcomes, serving as a cornerstone of statistical inference and machine learning applications ranging from economic trend forecasting to healthcare risk assessment and real estate valuation. Choosing an effective regression technique is critical for accurate predictions, yet a daunting challenge for non-experts due to the wide variety of methods, each with distinct assumptions, tuning requirements and applicability boundaries. To address this dilemma, this study conducts a rigorous empirical comparison of five popular regression techniques—Ordinary Least Squares (OLS), Ridge regression, Lasso regression, Elastic Net, and Polynomial regression—applied to house price prediction using two benchmark datasets: the classic Boston Housing dataset and the comprehensive California Housing dataset. A multi-dimensional evaluation framework was adopted, including quantitative metrics (Mean Squared Error (MSE) and coefficient of determination () and qualitative diagnostics (residual analysis and Quantile-Quantile (QQ) plots) to assess prediction accuracy and error distribution. Results indicate that Polynomial regression consistently achieves superior performance across both datasets, highlighting its effectiveness in capturing the complex nonlinear relationships inherent in housing data. Ridge, Lasso, and Elastic Net provide comparable but lower performance, with strengths in mitigating multicollinearity rather than enhancing nonlinear fitting. OLS yields acceptable baseline results but less robust performance when confronted with real-world nonlinearities. These findings offer clear practical guidance for non-experts seeking reliable “out-of-the-box” regression techniques, and contribute valuable insights to assist practitioners in model selection for real-world predictive tasks without extensive tuning.
| Published in | Science Journal of Applied Mathematics and Statistics (Volume 14, Issue 1) |
| DOI | 10.11648/j.sjams.20261401.12 |
| Page(s) | 6-15 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2026. Published by Science Publishing Group |
Regression Analysis, House Price Prediction, Polynomial Regression, Ridge Regression, Model Comparison
| [1] | Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. (Springer, 2009). |
| [2] | James, G., Witten, D., Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning. (Springer, 2013). |
| [3] | Kuhn, M. & Johnson, K. Applied Predictive Modeling. (Springer, 2013). |
| [4] | Goodfellow, I., Bengio, Y., & Courville, A. Deep Learning. (MIT Press, 2016). |
| [5] | Breiman, L. Random forests. Mach. Learn. 45, 5-32 (2001). |
| [6] | Glaeser, E. L., Gyourko, J. & Saks R. E. Why is Manhattan so expensive? Regulation and the rise in housing prices. J. Law Econ. 48, 331-373 (2005). |
| [7] | Mora-Garcia, R. T., Cespedes-Lopez, M. F., Perez-Sanchez, V. R. Housing price prediction using machine learning algorithms in COVID-19 times. Land 11, 2100 (2022). |
| [8] | Soltani, A., Heydari, M., Aghaei, F., Pettit, C. J. Housing price prediction incorporating spatio-temporal dependency into machine learning algorithms. Cities 131:103941 (2022). |
| [9] | Wold, S., Sjöström, M., Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109-130 (2001). |
| [10] | Friedman, J., Hastie, T., Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1-22 (2010). |
| [11] | Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267-288 (1996). |
| [12] | Zou, H., Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005). |
| [13] | Sharma, S., Arora, D., Shankar, G., & Sharma, P. House Price Prediction Using Machine Learning Algorithm. IEEE ICCMC Conf., pp 982-986 (2023). |
| [14] | Cellmer, R., Kobylińska, K., Housing price prediction-machine learning and geostatistical methods. Real Estate Manag Valuat 33(1), 1-10 (2025). |
| [15] | Hoxha, V. Comparative analysis of machine learning models in predicting housing prices: A case study of Prishtina’s real estate market. Int J Hous Mark Anal 18(3), 694-711 (2023). |
| [16] | Pace, R. K., Barry, R. & Sirmans, C. F. Spatial statistics and real estate. J. Real Estate Finance Econ. 17, 5-13 (1998). |
| [17] | Akaike, H. A new look at the statistical model identification. IEEE Trans Autom Control 19(6), 716-723 (1974). |
| [18] | Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J., Classification and Regression Trees. (Wadsworth, 1984). |
| [19] | Harrison, D. Jr., Rubinfeld, D. L. Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5(1), 81-102 (1978). |
| [20] | Pedregosa, F., et al. Scikit-learn: Machine learning in Python. J Mach Learn Res 12, 2825-2830 (2011). |
| [21] | Chen, T., Guestrin, C. XGBoost: A scalable tree boosting system. In Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min. pp 785-794 (2016). |
APA Style
Wang, J., Yu, Q., Liu, X., Zhu, H., Ming, Q., et al. (2026). Unveiling the Impact of Nonlinear Modeling in Housing Price Prediction: An Empirical Comparative Study. Science Journal of Applied Mathematics and Statistics, 14(1), 6-15. https://doi.org/10.11648/j.sjams.20261401.12
ACS Style
Wang, J.; Yu, Q.; Liu, X.; Zhu, H.; Ming, Q., et al. Unveiling the Impact of Nonlinear Modeling in Housing Price Prediction: An Empirical Comparative Study. Sci. J. Appl. Math. Stat. 2026, 14(1), 6-15. doi: 10.11648/j.sjams.20261401.12
@article{10.11648/j.sjams.20261401.12,
author = {Jie Wang and Qiutong Yu and Xintong Liu and Hongli Zhu and Qiuyu Ming and Jiatong Dai and Guanyu Sha and Hanyu Xu and Yan Zhong and Shancheng Yu},
title = {Unveiling the Impact of Nonlinear Modeling in Housing Price Prediction: An Empirical Comparative Study},
journal = {Science Journal of Applied Mathematics and Statistics},
volume = {14},
number = {1},
pages = {6-15},
doi = {10.11648/j.sjams.20261401.12},
url = {https://doi.org/10.11648/j.sjams.20261401.12},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20261401.12},
abstract = {Regression analysis is a core analytical tool widely employed across diverse domains for predicting continuous outcomes, serving as a cornerstone of statistical inference and machine learning applications ranging from economic trend forecasting to healthcare risk assessment and real estate valuation. Choosing an effective regression technique is critical for accurate predictions, yet a daunting challenge for non-experts due to the wide variety of methods, each with distinct assumptions, tuning requirements and applicability boundaries. To address this dilemma, this study conducts a rigorous empirical comparison of five popular regression techniques—Ordinary Least Squares (OLS), Ridge regression, Lasso regression, Elastic Net, and Polynomial regression—applied to house price prediction using two benchmark datasets: the classic Boston Housing dataset and the comprehensive California Housing dataset. A multi-dimensional evaluation framework was adopted, including quantitative metrics (Mean Squared Error (MSE) and coefficient of determination () and qualitative diagnostics (residual analysis and Quantile-Quantile (QQ) plots) to assess prediction accuracy and error distribution. Results indicate that Polynomial regression consistently achieves superior performance across both datasets, highlighting its effectiveness in capturing the complex nonlinear relationships inherent in housing data. Ridge, Lasso, and Elastic Net provide comparable but lower performance, with strengths in mitigating multicollinearity rather than enhancing nonlinear fitting. OLS yields acceptable baseline results but less robust performance when confronted with real-world nonlinearities. These findings offer clear practical guidance for non-experts seeking reliable “out-of-the-box” regression techniques, and contribute valuable insights to assist practitioners in model selection for real-world predictive tasks without extensive tuning.},
year = {2026}
}
TY - JOUR T1 - Unveiling the Impact of Nonlinear Modeling in Housing Price Prediction: An Empirical Comparative Study AU - Jie Wang AU - Qiutong Yu AU - Xintong Liu AU - Hongli Zhu AU - Qiuyu Ming AU - Jiatong Dai AU - Guanyu Sha AU - Hanyu Xu AU - Yan Zhong AU - Shancheng Yu Y1 - 2026/01/09 PY - 2026 N1 - https://doi.org/10.11648/j.sjams.20261401.12 DO - 10.11648/j.sjams.20261401.12 T2 - Science Journal of Applied Mathematics and Statistics JF - Science Journal of Applied Mathematics and Statistics JO - Science Journal of Applied Mathematics and Statistics SP - 6 EP - 15 PB - Science Publishing Group SN - 2376-9513 UR - https://doi.org/10.11648/j.sjams.20261401.12 AB - Regression analysis is a core analytical tool widely employed across diverse domains for predicting continuous outcomes, serving as a cornerstone of statistical inference and machine learning applications ranging from economic trend forecasting to healthcare risk assessment and real estate valuation. Choosing an effective regression technique is critical for accurate predictions, yet a daunting challenge for non-experts due to the wide variety of methods, each with distinct assumptions, tuning requirements and applicability boundaries. To address this dilemma, this study conducts a rigorous empirical comparison of five popular regression techniques—Ordinary Least Squares (OLS), Ridge regression, Lasso regression, Elastic Net, and Polynomial regression—applied to house price prediction using two benchmark datasets: the classic Boston Housing dataset and the comprehensive California Housing dataset. A multi-dimensional evaluation framework was adopted, including quantitative metrics (Mean Squared Error (MSE) and coefficient of determination () and qualitative diagnostics (residual analysis and Quantile-Quantile (QQ) plots) to assess prediction accuracy and error distribution. Results indicate that Polynomial regression consistently achieves superior performance across both datasets, highlighting its effectiveness in capturing the complex nonlinear relationships inherent in housing data. Ridge, Lasso, and Elastic Net provide comparable but lower performance, with strengths in mitigating multicollinearity rather than enhancing nonlinear fitting. OLS yields acceptable baseline results but less robust performance when confronted with real-world nonlinearities. These findings offer clear practical guidance for non-experts seeking reliable “out-of-the-box” regression techniques, and contribute valuable insights to assist practitioners in model selection for real-world predictive tasks without extensive tuning. VL - 14 IS - 1 ER -