Pile foundations are deep foundations commonly employed in bridge construction, high-rise buildings, trains, and situations requiring high bearing capacity and minimal settlement. Accurate prediction of pile settlement is essential for ensuring the safety and stability of deep foundations, yet traditional methods like in-situ load tests are often costly and impractical. The cone penetration test (CPT) is one of the most frequent in-situ tests for pile analysis because, like a model pile, the measured cone resistance and sleeve fiction can be used to estimate pile unit toe and shaft resistances, respectively. In this paper, a machine learning (ML) framework for pile settlement prediction with a genetic algorithm (GA) majority voting (MV) feature selection (FS) strategy to enhance model performance is presented. Three tree-based algorithms, each with a unique approach for tree development and feature handling—categorical boosting (CB), light gradient boosting (LGB), and random forest (RF) are selected for this purpose. The dataset was compiled from fifty-six pile case histories in different countries have been compiled including static loading tests which include maintained load tests and constant rate of penetration tests, shaft, and toe resistances which comprise CPT and CPTu (undrained CPT) sounding, the pile geometric and mechanical properties, the loads applied from the load tests as the model inputs, and recorded settlement values for the piles from the tests as the model output to be predicted. The CB model, coupled with the GA-MV approach, achieved the best predictive accuracy, yielding an R² of 0.926 and RMSE of 5.92 mm upon testing, while feature importance analysis identifies applied load (P) and pile length (L) as key predictors of settlement. Also, an overall decrease of the RMSE by 11.19% was observed between the CB-GAMV model (5.92 mm) and the CB-All features model (6.68 mm), and 9.41% between the CB-GAMV model and the CB-GA model (6.54 mm) on the validation set.
Published in | Journal of Civil, Construction and Environmental Engineering (Volume 10, Issue 3) |
DOI | 10.11648/j.jccee.20251003.11 |
Page(s) | 104-114 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Pile Foundation Settlement, Cone Penetration Test, Machine Learning, Majority Voting, Genetic Algorithm
[1] | Berardi R, Bovolenta R. Pile-settlement evaluation using field stiffness non-linearity. Proceedings of the Institution of Civil Engineers-Geotechnical Engineering 2005; 158: 35-44. |
[2] | Abu-Farsakh MY, Titi HH. Assessment of direct cone penetration test methods for predicting the ultimate capacity of friction driven piles. Journal of Geotechnical and Geoenvironmental Engineering 2004; 130: 935-44. |
[3] | Poulos HG, Davis EH. Pile foundation analysis and design. vol. 397. Wiley New York; 1980. |
[4] | Murthy VNS. Principles and practices of soil mechanics and foundation engineering. New York: Marcel Decker Inc., 2002. |
[5] | Meyerhof GG. Bearing capacity and settlement of pile foundations. Journal of the Geotechnical Engineering Division 1976; 102: 197-228. |
[6] | Ardalan H, Eslami A, Nariman-Zadeh N. Shaft resistance of driven piles based on CPT and CPTu results using GMDH-type neural networks and genetic algorithms. The 12th International Conference of International Association for Computer Methods and Advances in Geomechanics (IACMAG), Citeseer, 2008, p. 1850-8. |
[7] | Decourt L. Prediction of load-settlement relationships for foundations on the basis of the SPT, Ciclo de Conferencias Internationale. Leonardo Zeevaert, UNAM, Mexico 1985: 85-104. |
[8] | Karimpour-Fard M, Eslami A. Estimation of vertical bearing capacity of piles using the results CPT and SPT tests. Geotechnical and Geophysical Site Characterization: Proceedings of the 4th International Conference on Site Characterization ISC-4, vol. 1, Taylor & Francis Books Ltd; 2013, p. 1055-62. |
[9] | Vesic AS. Design of pile foundations. NCHRP Synthesis of Highway Practice 1977. |
[10] | Chan WT, Chow YK, Liu LF. Neural network: An alternative to pile driving formulas. Comput Geotech 1995; 17: 135-56. |
[11] | Goh ATC. Pile Driving Records Reanalyzed Using Neural Networks. Journal of Geotechnical Engineering 1996; 122: 492-5. |
[12] | Lee IM., Lee JH. Prediction of pile bearing capacity using artificial neural networks. Computers and Geotechnics 1996; 18: 189-200. |
[13] | Teh CI, Wong KS, Goh ATC, Jaritngam S. Prediction of pile capacity using neural networks. Journal of Computing in Civil Engineering 1997; 11: 129-38. |
[14] | Samui P. Prediction of pile bearing capacity using support vector machine. International Journal of Geotechnical Engineering 2011; 5: 95-102. |
[15] | Bui XN, Jaroonpattanapong P, Nguyen H, Tran QH, Long NQ. A Novel Hybrid Model for Predicting Blast-Induced Ground Vibration Based on k-Nearest Neighbors and Particle Swarm Optimization. Scientific Reports 2019; 9: 1-14. |
[16] | Pham BT, Tien Bui D, Prakash I. Landslide Susceptibility Assessment Using Bagging Ensemble Based Alternating Decision Trees, Logistic Regression and J48 Decision Trees Methods: A Comparative Study. Geotechnical and Geological Engineering 2017; 35: 2597-611. |
[17] | Zhang R, Li Y, Goh ATC, Zhang W, Chen Z. Analysis of ground surface settlement in anisotropic clays using extreme gradient boosting and random forest regression models. Journal of Rock Mechanics and Geotechnical Engineering 2021; 13: 1478-84. |
[18] | Zhang W, Wu C, Zhong H, Li Y, Wang L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geoscience Frontiers 2021; 12: 469-77. |
[19] | Ardalan H, Eslami A, Nariman-Zadeh N. Piles shaft capacity from CPT and CPTu data by polynomial neural networks and genetic algorithms. Computers and Geotechnics 2009; 36: 616-325. |
[20] | Nejad F, Jaksa M, Kakhi M, McCabe BA. Prediction of pile settlement using artificial neural networks based on standard penetration test data. Computers and Geotechnics 2009; 36: 1125-1133. |
[21] | Nejad FP, Jaksa MB. Load-settlement behavior modeling of single piles using artificial neural networks and CPT data. Computers and Geotechnics 2017; 89: 9-21. |
[22] | Kardani N, Zhou A, Nazem M, Shen SL. Estimation of Bearing Capacity of Piles in Cohesionless Soil Using Optimised Machine Learning Approaches. Geotechnical and Geological Engineering 2020; 38: 2271-91. |
[23] | Ismail A, Jeng D-S. Empirical Method for Settlement Prediction of Single Piles Using Higher Order Neural Network and Particle Swarm Optimization 2012: 285-94. |
[24] | Zhang G, Xiang X, Tang H. Time Series Prediction of Chimney Foundation Settlement by Neural Networks. International Journal of Geomechanics 2011; 11: 154-8. |
[25] | Bustamante M, Gianeselli L. Pile bearing capacity by means of static penetrometer CPT: In Proceedings of the 2nd European Symposium on Penetration Testing 1982. |
[26] | Kosaraju N, Sankepally SR, Mallikharjuna Rao K. Categorical Data: Need, Encoding, Selection of Encoding Method and Its Emergence in Machine Learning Models—A Practical Review Study on Heart Disease Prediction Dataset Using Pearson Correlation. Proceedings of International Conference on Data Science and Applications, Singapore: Springer Nature Singapore; 2023, p. 369-382. |
[27] | Breiman L. Random forests. Machine Learning 2001; 45: 5-32. |
[28] | Ho TK. Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1,1995,278-282. |
[29] | Bernard S, Adam S, Heutte L. Dynamic Random Forests. Pattern Recognit Lett 2012; 33: 1580-6. |
[30] | Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 2017; 30. |
[31] | Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R. Advances in neural information processing systems 2018; 31. |
[32] | Dorogush AV, Ershov V, Yandex AG. CatBoost: gradient boosting with categorical features support 2018. |
[33] | Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering 2014; 40: 16-28. |
[34] | Holland JH. Genetic Algorithms and Adaptation. Adaptive Control of Ill-Defined Systems, Boston, MA: Springer US; 1984, p. 317-33. |
[35] | Rostami M, Berahmand K, Forouzandeh S. A novel community detection based genetic algorithm for feature selection. J Big Data 2021; 8: 2. |
[36] | Nematzadeh H, Mani J, Nematzadeh Z, Akbari E, Mohamad R. Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets. Neural Comput Appl 2025: 1-16. |
[37] | Imani V, Moradi E, Sevilla-Salcedo C, Fortino V, Tohka J. Optimizing Feature Selection for Binary Classification with Noisy Labels: A Genetic Algorithm Approach. International Conference on Advances in Computing Research, Springer; 2024, 956: 392-403. |
[38] | Cameron AC, Windmeijer FAG. An R-squared measure of goodness of fit for some common nonlinear regression models. J Econom 1997; 77: 329-42. |
[39] | Hecht-Nielsen R. Theory of the backpropagation neural network. Neural networks for perception 1992; 593-605. |
APA Style
Bello, H. H., Wang, Y., Lawal, S. (2025). A Machine Learning Model for Pile Settlement Prediction Using Majority Voting-Based Feature Selection. Journal of Civil, Construction and Environmental Engineering, 10(3), 104-114. https://doi.org/10.11648/j.jccee.20251003.11
ACS Style
Bello, H. H.; Wang, Y.; Lawal, S. A Machine Learning Model for Pile Settlement Prediction Using Majority Voting-Based Feature Selection. J. Civ. Constr. Environ. Eng. 2025, 10(3), 104-114. doi: 10.11648/j.jccee.20251003.11
@article{10.11648/j.jccee.20251003.11, author = {Hafeez Husain Bello and You Wang and Shamsudeen Lawal}, title = {A Machine Learning Model for Pile Settlement Prediction Using Majority Voting-Based Feature Selection }, journal = {Journal of Civil, Construction and Environmental Engineering}, volume = {10}, number = {3}, pages = {104-114}, doi = {10.11648/j.jccee.20251003.11}, url = {https://doi.org/10.11648/j.jccee.20251003.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.jccee.20251003.11}, abstract = {Pile foundations are deep foundations commonly employed in bridge construction, high-rise buildings, trains, and situations requiring high bearing capacity and minimal settlement. Accurate prediction of pile settlement is essential for ensuring the safety and stability of deep foundations, yet traditional methods like in-situ load tests are often costly and impractical. The cone penetration test (CPT) is one of the most frequent in-situ tests for pile analysis because, like a model pile, the measured cone resistance and sleeve fiction can be used to estimate pile unit toe and shaft resistances, respectively. In this paper, a machine learning (ML) framework for pile settlement prediction with a genetic algorithm (GA) majority voting (MV) feature selection (FS) strategy to enhance model performance is presented. Three tree-based algorithms, each with a unique approach for tree development and feature handling—categorical boosting (CB), light gradient boosting (LGB), and random forest (RF) are selected for this purpose. The dataset was compiled from fifty-six pile case histories in different countries have been compiled including static loading tests which include maintained load tests and constant rate of penetration tests, shaft, and toe resistances which comprise CPT and CPTu (undrained CPT) sounding, the pile geometric and mechanical properties, the loads applied from the load tests as the model inputs, and recorded settlement values for the piles from the tests as the model output to be predicted. The CB model, coupled with the GA-MV approach, achieved the best predictive accuracy, yielding an R² of 0.926 and RMSE of 5.92 mm upon testing, while feature importance analysis identifies applied load (P) and pile length (L) as key predictors of settlement. Also, an overall decrease of the RMSE by 11.19% was observed between the CB-GAMV model (5.92 mm) and the CB-All features model (6.68 mm), and 9.41% between the CB-GAMV model and the CB-GA model (6.54 mm) on the validation set. }, year = {2025} }
TY - JOUR T1 - A Machine Learning Model for Pile Settlement Prediction Using Majority Voting-Based Feature Selection AU - Hafeez Husain Bello AU - You Wang AU - Shamsudeen Lawal Y1 - 2025/06/11 PY - 2025 N1 - https://doi.org/10.11648/j.jccee.20251003.11 DO - 10.11648/j.jccee.20251003.11 T2 - Journal of Civil, Construction and Environmental Engineering JF - Journal of Civil, Construction and Environmental Engineering JO - Journal of Civil, Construction and Environmental Engineering SP - 104 EP - 114 PB - Science Publishing Group SN - 2637-3890 UR - https://doi.org/10.11648/j.jccee.20251003.11 AB - Pile foundations are deep foundations commonly employed in bridge construction, high-rise buildings, trains, and situations requiring high bearing capacity and minimal settlement. Accurate prediction of pile settlement is essential for ensuring the safety and stability of deep foundations, yet traditional methods like in-situ load tests are often costly and impractical. The cone penetration test (CPT) is one of the most frequent in-situ tests for pile analysis because, like a model pile, the measured cone resistance and sleeve fiction can be used to estimate pile unit toe and shaft resistances, respectively. In this paper, a machine learning (ML) framework for pile settlement prediction with a genetic algorithm (GA) majority voting (MV) feature selection (FS) strategy to enhance model performance is presented. Three tree-based algorithms, each with a unique approach for tree development and feature handling—categorical boosting (CB), light gradient boosting (LGB), and random forest (RF) are selected for this purpose. The dataset was compiled from fifty-six pile case histories in different countries have been compiled including static loading tests which include maintained load tests and constant rate of penetration tests, shaft, and toe resistances which comprise CPT and CPTu (undrained CPT) sounding, the pile geometric and mechanical properties, the loads applied from the load tests as the model inputs, and recorded settlement values for the piles from the tests as the model output to be predicted. The CB model, coupled with the GA-MV approach, achieved the best predictive accuracy, yielding an R² of 0.926 and RMSE of 5.92 mm upon testing, while feature importance analysis identifies applied load (P) and pile length (L) as key predictors of settlement. Also, an overall decrease of the RMSE by 11.19% was observed between the CB-GAMV model (5.92 mm) and the CB-All features model (6.68 mm), and 9.41% between the CB-GAMV model and the CB-GA model (6.54 mm) on the validation set. VL - 10 IS - 3 ER -