Research Article | | Peer-Reviewed

Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost

Received: 31 May 2024     Accepted: 18 June 2024     Published: 23 July 2024
Views:       Downloads:
Abstract

Traffic flow prediction is of great significance for urban planning and alleviating traffic congestion. Due to the randomness and high volatility of urban road network short-term traffic flow, it is difficult for a single model to accurately estimate traffic flow and travel time. In order to obtain more ideal prediction accuracy, a combined prediction model based on wavelet decomposition and reconstruction (WDR) and the extreme gradient boosting (XGBoost) model is developed in this paper. Firstly, the Mallat algorithm is applied to perform multi-scale wavelet decomposition on the average travel time series of the original traffic data, and single branch reconstruction is performed on the components at each scale. Secondly, XGBoost is used to predict each reconstructed single-branch sequence, so as to obtain multiple sub-models, and the Bayesian algorithm is used to optimize the hyperparameters of the sub-models. Finally, the algebraic sum of the predicted values of all sub-models is used to obtain the overall traffic prediction result. To test the performance of the proposed model, actual traffic flow data has been collected from a certain link of the Brooklyn area in New York, USA. The performance of proposed WDR-XGBoost model has been compared with other existing machine learning models, e.g., support vector regression model (SVR) and single XGBoost model. Experimental findings demonstrated that the proposed WDR-XGBoost model performs better on multiple evaluation indicators and has significantly outperformed the other models in terms of accuracy and stability.

Published in International Journal of Transportation Engineering and Technology (Volume 10, Issue 1)
DOI 10.11648/j.ijtet.20241001.12
Page(s) 15-24
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Traffic Time Prediction, Wavelet Analysis, XGBoost, Bayesian Algorithm

1. Introduction
With the increasingly serious traffic congestion and traffic accidents in urban, the application of intelligent transportation system is more and more widely, involving information service, intelligent highway, operation management and so on. Traffic flow prediction is one of the key technologies to realize dynamic guidance system in intelligent transportation system, which is to use historical and real-time traffic data information to predict the traffic status of a specified road or area for a period of time in the future. The prediction content generally includes traffic flow, speed, density (or occupancy), travel time and other variables that reflect traffic status. It plays an important role in signal control, alleviating traffic congestion and improving the operation efficiency of transportation network. The parameters of traffic flow mainly include traffic volume, speed, density, etc. Short-term traffic flow predicting refers to the prediction of traffic volume in a short period (generally < = 15 minute) based on the current and past traffic flow data, so as to meet the real-time requirements of traffic flow control and traffic guidance.
There are many methods are used for prediction in the field of transportation , mainly including: (1) models based on statistical analysis, such as autoregressive integrated moving average (ARIMA) model , Kalman filtering model etc. ; (2) nonlinear theoretical models, such as chaos theory , catastrophe theory ; (3) simulation forecasting model ; (4) machine learning models, such as K-nearest neighbor, neural network, support vector regression, etc. ; (5) combined model.
In the prediction method based on statistical analysis, ARIMA is a commonly used time series model. Zhang designed a research method for short-term passenger flow prediction of urban rail transit based on an improved ARIMA model. Aiming at the limitation of ARIMA in obtaining the nonlinear characteristics of time series, Wang proposed a model combined the linear ARIMA algorithm and nonlinear generalized autoregressive conditional heteroscedasticity in mean (GARCH-M) algorithm considering the heteroscedasticity of the traffic flow time series. Li used the chaos theory in the nonlinear theoretical model to determine the optimal delay time and embedding dimension of the original traffic flow time series, and obtained a more reasonable model data set with the same dynamic characteristics as the original data through phase space reconstruction. Based on long short-term memory neural network (LSTM), Weng proposed a short-term traffic flow prediction method considering the characteristics of bus proportion. Wang (Wang et al., 2021) proposed a long short term memory (LSTM) model based on the encoder decoder (ED) framework, which can achieve multi-step prediction of traffic flow sequences. Subsequently, Wang proposed a short-term passenger flow prediction model for road networks based on a multi-layer convolutional long short term memory neural network (ConvLSTM). The spatial characteristics of passenger flow are obtained through convolution operations in multi-layer ConvLSTM, and the temporal characteristics of passenger flow are obtained through the long short term memory part.
As the performance improvement of a single model gradually slows down, scholars are increasingly committed to the application of combined models in traffic flow forecasting . Mallick introduces transfer learning technology into traffic flow prediction. Song uses the empirical mode decomposition (EMD) algorithm to decompose the traffic flow.
When the time interval is shortened to 5 ~ 15 minutes, the nonlinearity, randomness and time-varying characteristics of short-term traffic flow will be very obvious, which will reduce the accuracy and stability of the forecasting models . The wavelet analysis theory can project the original traffic flow data to different scales, then establish models and integrate them separately, so as to improve the forecasting accuracy . At the same time, the traditional models based on wavelet analysis only model the data directly on individual scale after decomposition. Due to the different length of each scale sequence, it brings inconvenience and low accuracy to the calculation in the later forecasting and reconstruction. In recent years, the XGBoost algorithm has been introduced into the field of transportation research. It has shown significant advantages in solving problems such as slow convergence speed, overfitting, and susceptibility to local optima faced by models such as neural networks . Ye proposed a short-term traffic flow prediction model based on convolutional neural network (CNN)-XGBoost, and the results showed that compared to SVR, LSTM and other models, the prediction error of traffic flow using XGBoost model was significantly reduced. Antypas uses gradient boosting method to estimate the arrival time of autonomous vehicle. In a recent study in 2024, Chen et al. utilizes a large dataset of taxi trajectories and uses advanced machine learning techniques, including XGBoost, to analyze the causes of imbalanced traffic networks.
Based on the above, a WDR-XGBoost short-term traffic flow predicting model is proposed. Firstly, the discrete wavelet decomposition of short-term traffic flow time series is carried out by using multi-scale wavelet analysis theory, and the approximate components and detail components are single-branch reconstructed to the original scale. Then the XGBoost model is used to forecast the reconstructed single-branch time series, and the Bayesian algorithm is used to obtain the optimal parameter combination of the model to accelerate the convergence speed. Finally, the predicting results of the traffic flow are obtained by summing the predicting values of all single-branch reconstruction sequences. The experimental results show that the WDR-XGBoost model can ultimately improve the short-term traffic flow predicting performance.
2. Problem Statement
Short-term traffic predicting at a certain time refers to the real-time predicting of traffic characteristics in the next time even later, and the predictive time-span is not more than 15 minutes or less than 5 minutes. Generally, predicting is based on three basic parameters of macroscopic traffic flow: average traffic volume, average speed and average occupancy, average travel time et al. Those are the traffic characteristics from the perspective of the whole transportation system. In contrast, there are few studies regarding the average travel time of the link as the predicting result. Thus, the average travel time of the link refers to the average travel time of all vehicles through the observation link during a certain time interval, as shown in formula (1):
(1)
where is the average link travel time in the time interval, is the total number of vehicles passing through the link during the observation interval, is the travel time of the vehicle through this link. Considering that the current road condition is related to the road condition in the past intervals, the average travel time in the past several intervals is used to forecast the average travel time in the future interval. Specifically, the average travel time of the time interval on the target link can be described as a time series as follow:
(2)
where is the time lag.
3. Research Methodology
3.1. Research Design Wavelet Transform and Multi-scale Analysis
The average travel time of a link can be regarded as a kind of time-varying information, which is nonlinear and random, and there will be noise interference. If the modeling and forecasting is directly based on the original data, it is obviously impossible to ensure the forecasting accuracy. Therefore, it is necessary to adopt some methods to extract noise and minimize its impact on the forecasting results.
Wavelet transform is a multi-scale time-frequency analysis tool for time series scaling in frequency domain and translation in time domain. In any space , the function is expanded by a wavelet function, which is called continuous wavelet transform (CWT), and the expression is
(3)
In equation (3), is the scaling factor and is the translation factor; and . is basic wavelet; is the conjugate function; is a family of functions generated by scaling and translation of basic wavelet, which is called wavelet function.
However, continuous wavelet transform will calculate the wavelet coefficients on all scales, which takes time and generates redundant data. Therefore, discrete wavelet transform (DWT) is usually used in the practical application. Discrete wavelet transform is the discretization of scaling factor and translation factor , namely
, (4)
In equation (4), , , . Then the wavelet function is
(5)
Discrete wavelet transform is
(6)
Then any function can be expressed as a wavelet series
(7)
Where is called wavelet coefficients.
At the same time, according to multi-scale analysis, Mallat algorithm can be used to select appropriate scale function and wavelet function , and the corresponding decomposition coefficient sequence , to reconstruct the coefficient sequence , . The original traffic flow data can be decomposed into low-frequency approximate component and high-frequency detail component at a certain scale by low-pass filter and high-pass filter, namely
(8)
The noise part is usually contained in the high-frequency detail component. The low-frequency approximate component can be further decomposed as
(9)
This decomposition process can be iterative many times until it reaches the maximum number set in advance. If -scale decomposition is performed, the formula is as follows:
(10)
The wavelet decomposition of 3-scale is shown in Figure 1:
Figure 1. Wavelet decomposition of Mallat 3-scale.
Then, the approximate component and the detail component are single-branch reconstructed to the original scale separately. The formula is as follow:
(11)
Thus, the approximate component and the detail components can be obtained. Finally, approximate component and all detail components are algebraically added, and the reconstructed series is
(12)
3.2. Principle of XGBoost Model
XGBoost is an integrated learning model, which can improve machine learning effect and forecasting accuracy by constructing multiple base learners. For data set , is the eigenvector of th instance and is the attribute value of . XGBoost model is defined as the following addition model, and its base classifiers are classification and regression tree (CART):
(13)
(14)
(15)
(16)
In the above equation (13)-(15): is the optimization function of XGBoost model and is the error function between the forecasting value and the actual value corresponding to ; is the sum of the complexity of all trees, which is added to the objective function as a regularization term to effectively prevent over fitting; and are regularization parameters. is a set of CART. is a tree model which means mapping a eigenvector of instance to the corresponding a leaf node; is the number of leaf nodes of tree; represents the weight vector of all leaf nodes of a tree.
In this study, the forward step-by-step algorithm is used. In the step, the second-order Taylor expansion of the objective function is carried out, and the expression is as follow:
(17)
(18)
After omitting the constant term of the first steps in equation (17), let be the set of all samples belonging to the leaf node. By minimizing this formula, the optimal weight of the leaf node of the sub-model and the corresponding optimal objective function value can be obtained by:
(19)
Furthermore, the structure of the tree is determined by the information gain before and after segmentation:
(20)
3.3. Bayesian Optimization Algorithm
Most widely adopted ways of parameter optimization are the so-called random parameters approach. However, grid search is slow, and random search is easy to miss some important information. Using the above methods to optimize parameter simply has many drawbacks, so the Bayesian optimization algorithm with fewer iterations, faster speed and better generalization ability is adopted. The idea of Bayesian optimization approach is an optimization algorithm based on probability distribution, and it finds the minimum value of objective function based on past evaluation results of objective function. The core process is surrogate function and acquisition function. Suppose a set of parameter combinations is , and the objective function is a black box with no analytics and higher evaluation cost. The optimal value satisfying the following equation (21) needs to be found:
(21)
For an optimization problem, there are given objective function , parameter search space , surrogate function , acquisition function , number of iterations . Then the algorithm steps can be expressed as:
Step1. Initialize several groups of parameters randomly and obtain the data set , where ;
Step2. Fit data set with surrogate function and selecting the optimum with acquisition function;
(22)
(23)
Step3: Update the data set by
(24)
Go back to step2 until the maximum number of iterations.
The Bayesian optimization process is shown in Figure 2. Among them, the surrogate function uses the tree-structured Parzen estimator (TPE), which is a Gaussian mixture model. The Bayesian rule is applied to construct the surrogate function , where is defined as:
(25)
and are the generation models of all domain variables, and is the specific quantile. The Eq. (25) means that two different distributions are made for the parameters.
Figure 2. Calculation diagram of Bayesian optimization.
The acquisition function uses expected improvement method (EI) because EI is intuitive and has been proved to work well in various environments. EI is defined as:
(26)
where is the threshold of the objective function; is the actual value of the objective function corresponding the parameter combination ; is an surrogate function expressed in probability. If the Eq. (26) is positive, it means that the parameter combination is expected to produce better results than the threshold.
4. Numerical Example
The Materials and Methods section should provide comprehensive details to enable other researchers to replicate the study and further expand upon the published results. If you have multiple methods, consider using subsections with appropriate headings to enhance clarity and organization.
4.1. Data Collection and Preprocessing
The dataset in this paper is from the real-time traffic data of New York City (https://data.beta.nyc/dataset/nyc-real-time-traffic-speed-data-feed-archived), and the data source contains the real-time traffic information collected by the Ministry of transport in five administrative regions, which are mainly distributed on main roads and expressways. The total number of recorded from the 1st May to the 30th June, 2020 was 17480 with an interval of 5 minutes for the link-4616271 in Brooklyn administrative region. The missing values are completed by time series mean interpolation:
(27)
The descriptive statistics of average travel time series of link are shown in Table 1.
Table 1. Descriptive statistics of average travel time series of link.

Statistic

Average travel time(s)

Sample size

17480

Mean

503.22

Standard deviation

358.06

Maximum

3782

Minimum

249

Split the original data set, as shown in Figure 3. Take the data from the 1st May to the 20th June as the training set and the data from the 21th June to the 30th June as the test set.
Figure 3. Schematic diagram of data set division.
When using discrete wavelet transform, there are two points worth attention. The first is to determine the type of basic wavelet. The commonly used basic wavelets include Haar wavelet, db wavelet, sym wavelet, coif wavelet and Morlet wavelet, which all belong to a wavelet family. Each wavelet family contains many specific wavelets. The second is the wavelet decomposition scale , because with the increase of decomposition scale, the loss of information is more. These two points do not have specific selection instructions, and are mostly based on experience and multiple experimental results. In this paper, db5 is selected as the basic wavelet, which is one of the commonly used wavelets in db wavelet family. It indicates that the maximum decomposition scale is 5 in the process of wavelet transform. As shown as Figure 4, when the decomposition scale is 3, the approximate component can better show the trend of the original time series. At the same time, the noise reflected in the detail component is removed, so the decomposition scale is finally selected as 3.
Figure 4. Average travel time by 3-level wavelet decomposition.
Since the length of the series after wavelet decomposition is different from that of the original series, the single-branch reconstruction of the approximate component and detail component after decomposition are shown in Figure 5.
Figure 5. The original average travel time and the single reconstruction data by three-level wavelet decomposition.
4.2. Analysis and Extraction of Features
Since the short-term traffic flow forecasting is mainly based on the average travel time of the previous moments to forecast the average travel time of the next moment, thus the data of some moments in a certain day or a week will also affect the forecasted results.
Figure 6. Basic characteristics of average travel time.
As can be seen from Figure 6, the change trends of average travel time on Saturday and Sunday are similar because they are both weekends. Similarly, from Monday to Friday, the change trends of average travel time are similar because they are all workdays. In the morning and evening peak period, the average travel time has an obvious upward trend. Therefore, the expression of the average travel time series can be described as
(28)
where is the hourly feature and is the weekly feature, and the time lag is .
4.3. Evaluation Index
Four indexes that aim to evaluate forecasting performance are used, i.e. root mean squared error (RMSE), mean absolute percentage error (MAPE), mean absolute error (MAE), and determination coefficient . Both MAE and RMSE can measure absolute errors between the actual and forecasting values while the MAPE is employed to evaluate the relative errors of them. The closer to 1, the higher the forecasting performance is. These indexes are mathematically represented as Eqs. (29) to (32):
RMSE=
RMSE= (29)
MAPE=
MAPE= (30)
MAE=
MAE= (31)
(32)
is the total number of testing data used in prediction, is the forecasting value, is the actual value, and is the mean of all actual value. The forecasting results of four WDR-XGBoost sub-models are shown in Table 2.
Table 2. Performance of four WDR-XGBoost sub-models.

Evaluation indexes

A3

D3

D2

D1

RMSE

0.0173

0.01

0.01

0.0141

MAPE

0.1738

78.8512

180.7309

199.0863

MAE

0.0113

0.0065

0.0077

0.0086

0.9990

0.9526

0.8844

0.8140

At the same time, the forecasting effects of SVR, XGBoost and WDR-XGBoost are compared, and evaluation indexes and forecasting results are as shown in Table 3 and Figure 7.
Table 3. Comparison of forecasting performance of each modes.

Evaluation indexes

SVR

XGBoost

WDR-XGBoost

RMSE

0.07791

0.06565

0.02627

MAPE

0.90762

0.78653

0.30937

MAE

0.05648

0.04837

0.01941

0.98073

0.98633

0.99781

Figure 7. Comparison of forecasting results of three models.
According to Table 3 and Figure 7, comparing the traffic flow forecasting performance of WDR-XGBoost model proposed in this paper with SVR model and XGBoost model, it can be intuitively found that the forecasting value of WDR-XGBoost model is closer to the real value.
5. Conclusions
It is difficult for a single model to obtain ideal accuracy and stability when predicting short-term traffic characteristics with complex nonlinearity and randomness. To solve this problem, a short-term travel time predicting model based on multi-scale WDR-XGBoost is proposed and the experimental results show that:
1) The db5 basic wavelet is used to decompose and reconstruct the original traffic data, and the predicting performance is improved. It shows that the discrete wavelet transform has the ability to completely reproduce the original data, and realizes the accurate predicting of time series data. By reconstructing the decomposed sequence to the original scale, each scale sequence has the same length as the original sequence, which simplifies the predicting process and improves the accuracy.
2) Compared with linear model and single XGBoost model, WDR-XGBoost algorithm has higher predicting accuracy, smaller error and faster training speed, which meets the timeliness requirements of short-term traffic predicting.
3) However, the traffic flow state is affected by many factors in real life, such as weather conditions, road conditions, unexpected accidents, holidays and the spatial-temporal correlation of upstream and downstream links. This paper has not considered the above factors. At the same time, the experimental data are only for a single link, without considering other regions. In future research, more sufficient data collection and model validation are needed.
Abbreviations

WDR-XGBoost

Wavelet Decomposition and Reconstruction and the Extreme Gradient Boosting

SVR

Support Vector Regression Model

ARIMA

Autoregressive Integrated Moving Average model

GARCH-M

Generalized Autoregressive Conditional Heteroscedasticity in Mean Algorithm

LSTM

Long Short-Term Memory Neural Network

ED

Encoder Decoder

ConvLSTM

Convolutional LONG Short Term Memory Neural Network

EMD

Empirical Mode Decomposition

CNN

Convolutional Neural Network

CWT

Continuous Wavelet Transform

DWT

Discrete Wavelet Transform

CART

Classification and Regression Tree

TPE

Tree-structured Parzen Estimator

EI

Expected Improvement Method

RMSE

Root mean squared error

MAPE

Mean absolute percentage error

MAE

Mean absolute error

Author Contributions
Xin Wang: Supervision, Validation, Writing – original draft, Writing – review & editing
Fang Fang: Methodology, Software, Visualization
Data Availability Statement
The data that support the findings of this study are openly available at: https://data.beta.nyc/dataset/nyc-real-time-traffic-speed-data-feed-archived.
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1] Zhao, H., Zhai, D. M., Shi, Z. H. Review of short-term traffic flow forecasting models. Urban Rapid Rail Transit. 2019, 32(4), 50-54.
[2] Li, W., L, J. Z., Wang, T. Improved ARIMA model traffic flow prediction method based on box-cox exponential transformation. Journal of Wuhan University of Technology (Transportation Science & Engineering). 2020, 44(6), 974-977.
[3] Zhou, T. Jiang, D. Lin, Z. et al. Hybrid dual Kalman filtering model for short-term traffic flow forecasting. IET Intelligent Transport Systems. 2019, 13(6), 1023-1032.
[4] Cai, L. Zhang, Z. Yang, J. et al. A noise-immune Kalman filter for short-term traffic flow forecasting. Physica A: Statistical Mechanics and its Applications. 2019, 536, 122601.
[5] Liao, R. H., Lan, S., Liu, Z. X. Short-term traffic flow forecasting based on local prediction method in chaotic time series. Computer Technology and Development. 2015, 25 (1), 1-5.
[6] Hu, J. R., He, L. Freeway Traffic flow condition criterion method based on cusp catastrophe theory. China Journal of Highway and Transport. 2017, 30(10), 137-144.
[7] Ma, Q. BP neural network short-term traffic flow prediction based on improved particle swarm optimization. Computer Simulation. 2019, 36(4), 94-98+323.
[8] Liu, Z., Du, W., Yan, D. M., et al. Short-term traffic flow forecast based on combination of k-nearest neighbor algorithm and support vector regression. Journal of Highway and Transportation Research and Development. 2017, 34 (5), 122-128+158.
[9] Yuan, H., Chen, Z. H. Short-term traffic flow prediction based on temporal convolutional networks. Journal of South China University of Technology (Natural Science Edition). 2020, 48(11), 107-113+122.
[10] Fu, C. H., Yang, S. M., Zhang, Y. Promoted short-term traffic flow prediction model based on deep learning and support vector regression. Journal of Transportation Systems Engineering and Information Technology. 2019, 19(4), 130-134+148.
[11] Zhang, G. Y., Jin, H. Research on the prediction of short-term passenger flow of urban rail transit based on improved ARIMA model. Computer Applications and Software. 2022, 39(1), 339-344.
[12] Wang, X. Q., Shao, C., F., Yin, C., Y., et al. Short-term traffic flow forecasting method based on ARIMA-GARCH-M model. Journal of Beijing Jiaotong University. 2018, 42(4), 83-88.
[13] Li, Q. R., Chi, W. Y., Chen, L., et al. Short-term traffic flow forecast based on phase space reconstruction and PSO-GPR. Journal of Transport Information and Safety. 2019, 37(2), 70-76.
[14] Weng, X. X. Hao, Y. Short-term traffic flow prediction based on LSTM algorithm with the characteristics of passenger car proportion. Journal of Chongqing Jiaotong University (Natural Science). 2020, 39(11), 20-25, 50.
[15] Wang, B. W., Wang, J. S., Wang, T. Y., Zhang, Z. Q., Liu Y., Yu, H. An encoder-decoder multi-step traffic flow prediction model based on long short-time memory network. Journal of Chongqing University. 2021, 44(11), 71-80.
[16] Wang, Q., Li, Y., Zhang, S. T., Zhang, L. Y. Research on short-term passenger flow prediction of urban rail transit based on multilayer convolution long and short-term memory neural network. Modern Urban Transit. 2023, 9, 95-99.
[17] Bui, K. N., Cho, J. Yi, H. Spatial temporal graph neural network for traffic forecasting: an overview and open research issues. Applied Intelligence. 2022, 52(3), 3763-3774.
[18] Li, L., Bi, J., Yang, K., et al. MGC-GAN: multi-graph convolutional generative adversarial networks for accurate citywide traffic flow prediction. International Conference on Systems, Man, and Cybernetics. IEEE, Czech, 2022; pp. 2557-2562.
[19] Aljuaydi, F., Wiwatanapataphee, B., Wu, Y. H. Multivariate machine learning-based prediction models of freeway traffic flow under non-recurrent events. Alexandria Engineering Journal. 2023, 65, 151-162.
[20] Mallick, T., Balaprakash, P., Rask, E., et al. Transfer learning with graph neural networks for short-term highway traffic forecasting. The 25th International Conference on Pattern Recogintion. IEEE, Italy, 2020; pp. 10367-10374.
[21] Song, X. D., Ren, M. X. The short-term traffic flow prediction based on combination model. Computer Simulation. 2022, 39(7), 156-160.
[22] Teresa, P. Impact of data loss for prediction of traffic flow on an urban road using neural networks. IEEE Transactions on Intelligent Transportation Systems. 2019, 20(3), 1000-1009.
[23] Zhan, H. Y., Gomes, G. Li, X. S., et al. Consensus ensemble system for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems. 2018, 19(12), 3903-3914.
[24] Zheng, Z. B., Yang, Y. T., Liu, J. H., et al. Deep and embedded learning approach for traffic flow prediction in urban informatics. IEEE Transactions on Intelligent Transportation Systems. 2019, 20(10), 3927- 3939.
[25] Mou, Z. H., Li, K. P., Shen, D. F. Short-term traffic flow prediction based on wavelet denoising and Bayesian neural network model. Science Technology and Engineering. 2020, 20(33), 13881-13886.
[26] Liu, B., Wu, Z. D., Yang, J. Y. Research on bio-intelligence algorithm optimized wavelet neural network and its plication in traffic flow prediction. Journal of Beijing Jiaotong University. 2020, 44(5), 17-26.
[27] Zhong, Y., Shao, Y. M., Wu, W. W. et al. Short-term traffic flow prediction model based on XGBoost. Science Technology and Engineering. 2019, 19(30), 337-342.
[28] Ye, J., Li, L. J., Tang, Z X. Short-term traffic flow forecasting based on CNN-XGBoost. Computer Engineering and Design. 2020, 41(4), 1080-1086.
[29] Antypas, E., Spanos, G., Lalas, A., Votis, K., Tzovaras, D. A time-series approach for estimated time of arrival prediction in autonomous vehicles. Transportation Research Procedia. 2024, 78, 166-173.
[30] Chen, B. Y., Chen, X. Y., Chen, H. P., Huang, Y. B., Jia, T., Lam. W. H. Understanding user equilibrium states of road networks: Evidence from two Chinese mega-cities using taxi trajectory mining. Transportation Research Part A: Policy and Practice. 2024, 180(1), 103976.
Cite This Article
  • APA Style

    Wang, X., Fang, F. (2024). Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost. International Journal of Transportation Engineering and Technology, 10(1), 15-24. https://doi.org/10.11648/j.ijtet.20241001.12

    Copy | Download

    ACS Style

    Wang, X.; Fang, F. Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost. Int. J. Transp. Eng. Technol. 2024, 10(1), 15-24. doi: 10.11648/j.ijtet.20241001.12

    Copy | Download

    AMA Style

    Wang X, Fang F. Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost. Int J Transp Eng Technol. 2024;10(1):15-24. doi: 10.11648/j.ijtet.20241001.12

    Copy | Download

  • @article{10.11648/j.ijtet.20241001.12,
      author = {Xin Wang and Fang Fang},
      title = {Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost
    },
      journal = {International Journal of Transportation Engineering and Technology},
      volume = {10},
      number = {1},
      pages = {15-24},
      doi = {10.11648/j.ijtet.20241001.12},
      url = {https://doi.org/10.11648/j.ijtet.20241001.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijtet.20241001.12},
      abstract = {Traffic flow prediction is of great significance for urban planning and alleviating traffic congestion. Due to the randomness and high volatility of urban road network short-term traffic flow, it is difficult for a single model to accurately estimate traffic flow and travel time. In order to obtain more ideal prediction accuracy, a combined prediction model based on wavelet decomposition and reconstruction (WDR) and the extreme gradient boosting (XGBoost) model is developed in this paper. Firstly, the Mallat algorithm is applied to perform multi-scale wavelet decomposition on the average travel time series of the original traffic data, and single branch reconstruction is performed on the components at each scale. Secondly, XGBoost is used to predict each reconstructed single-branch sequence, so as to obtain multiple sub-models, and the Bayesian algorithm is used to optimize the hyperparameters of the sub-models. Finally, the algebraic sum of the predicted values of all sub-models is used to obtain the overall traffic prediction result. To test the performance of the proposed model, actual traffic flow data has been collected from a certain link of the Brooklyn area in New York, USA. The performance of proposed WDR-XGBoost model has been compared with other existing machine learning models, e.g., support vector regression model (SVR) and single XGBoost model. Experimental findings demonstrated that the proposed WDR-XGBoost model performs better on multiple evaluation indicators and has significantly outperformed the other models in terms of accuracy and stability.
    },
     year = {2024}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Short-Term Traffic Flow Prediction Based on Wavelet Analysis and XGBoost
    
    AU  - Xin Wang
    AU  - Fang Fang
    Y1  - 2024/07/23
    PY  - 2024
    N1  - https://doi.org/10.11648/j.ijtet.20241001.12
    DO  - 10.11648/j.ijtet.20241001.12
    T2  - International Journal of Transportation Engineering and Technology
    JF  - International Journal of Transportation Engineering and Technology
    JO  - International Journal of Transportation Engineering and Technology
    SP  - 15
    EP  - 24
    PB  - Science Publishing Group
    SN  - 2575-1751
    UR  - https://doi.org/10.11648/j.ijtet.20241001.12
    AB  - Traffic flow prediction is of great significance for urban planning and alleviating traffic congestion. Due to the randomness and high volatility of urban road network short-term traffic flow, it is difficult for a single model to accurately estimate traffic flow and travel time. In order to obtain more ideal prediction accuracy, a combined prediction model based on wavelet decomposition and reconstruction (WDR) and the extreme gradient boosting (XGBoost) model is developed in this paper. Firstly, the Mallat algorithm is applied to perform multi-scale wavelet decomposition on the average travel time series of the original traffic data, and single branch reconstruction is performed on the components at each scale. Secondly, XGBoost is used to predict each reconstructed single-branch sequence, so as to obtain multiple sub-models, and the Bayesian algorithm is used to optimize the hyperparameters of the sub-models. Finally, the algebraic sum of the predicted values of all sub-models is used to obtain the overall traffic prediction result. To test the performance of the proposed model, actual traffic flow data has been collected from a certain link of the Brooklyn area in New York, USA. The performance of proposed WDR-XGBoost model has been compared with other existing machine learning models, e.g., support vector regression model (SVR) and single XGBoost model. Experimental findings demonstrated that the proposed WDR-XGBoost model performs better on multiple evaluation indicators and has significantly outperformed the other models in terms of accuracy and stability.
    
    VL  - 10
    IS  - 1
    ER  - 

    Copy | Download

Author Information