Forecasting with VAR and Prophet

In my previous post, I tried to present the ARIMA model for forecasting. It was based on the use of autoregression and moving average concepts, combining the regression of variable based on its lagged values and calculation of error based on the linear combination of error terms occurred in the past, respectively. In this post, we’re going to talk about VAR and Prophet as alternative models.

The main problem was that developed ARIMA model could not be applied to predict the output  for all of different objects we ran the forecasting for, which was the principal idea. As we dug deeper into the problem domain, we discovered that there were several exogenous variables affecting the output we were trying to predict. So, using its past lagged values and errors only, couldn’t solve the problem. That’s when we decided to try another model, a model that accepts historical data of several variables and tries to forecast their future values, by using the information of their lagged values, calculated error and one additional concept – their linear interdependencies.

VAR represents vector autoregression model. What does that mean? It accepts the vector of several time series, representing historical data of some variables. For each one of them, it tries to predict future values by using lagged values of the variable analyzed at the moment, lagged values of other variables and an error term.

Some of you may wonder: why not use the ARIMAX model, which enables us to use ARIMA model with explanatory variable? The problem with ARIMAX model is that it requires future values for the endogenous variable, to be able to predict values for dependent variable. VAR is forecasting those values, which is pretty convenient for our kind of problem, since we do not have the information of future variable behavior.

In consultations with domain expert, we’ve chosen several input variables that could be intercorrelated in some way and also affect the output we’ve been trying to predict.

The first thing we should do is inspect the correlation between variables, so that we could be sure they depend on each other.

Here’s the correlation matrix.

It is noticeable that var_3 is in weak correlation with var_2, but has a strong negative one with var_1, for example. That’s why we should include it anyway. After the existence of correlation is inspected and confirmed, we are able to continue with the analysis.

Running a VAR model

PyFlux’s VAR model accepts three parameters: data, number of lags to use for the variable autoregression and the order of differencing, if needed (remember, we do this to remove the non-stationarity).  Just like the ARIMA model, it also uses the MLE, AIC i BIC criteria to estimate parameters.

Here’s the model summary.

The only tricky part here was determining the optimal value for lags parameter. We used function that runs a model with different values in range(1,30) for lags parameter, while the optimal value has been chosen for minimal RMSE obtained. There may be a more elegant way, and I’m certainly going to investigate it in the future.

You can see forecast results below.

And even more…

VAR produced more stable and accurate results, and we were pretty satisfied with it. But then some other models appeared and we couldn’t resist trying them.  Thus you can see Facebook Prophet applied to our data below.

Prophet is one of the newest buzzwords in the field of time series forecasting. It is concise and easy to use, flexible to seasonal data, trend changes and outliers. You can feed it with holidays or some other dates you think will have the unusual behaviour, or extract trend and detect seasonality. Finally – it has everything a good forecasting model should have. Well, almost everything.  That will be discussed in the conclusion. But enough of lauding, let’s see it in action.

Prophet has its implementation in Python and R. The docs can be found here, so I’m not going to bother you with the code. Model requires a time series dataframe having two columns –  ds (datestamp) and y (output to forecast).

Prophet has many parameters that could be set, and we tried using some of them, like growth, changepoints, holidays, interval_width and weekly_seasonality. Growth is used to specify a trend, linear or logistic. Other parameters are very intuitive.

You can see forecast results below.

Prophet gives us upper and lower bounds, so we can make plans based on these values as well. It also decomposes output to trend and seasonal components. You can see the filtered output below, upper and lower bounds are removed for better transparency.

Just like with the ARIMA model, the only flaw we noticed here is not supporting the intercorrelations between multiple variables to forecast some output. That’s why we stick to the VAR model. We can only imagine how powerful the Prophet model could be if it was upgraded with this functionality. For now, we can only wait and hope that Facebook will surprise us one more time.

Our client was pretty satisfied with these forecast results, we managed to reduce a 3.56 to a 1.44 RMSE, which was beneficial for decision and strategy making. What’s left now is model improvement based on results and data gathered from its utilization, so I will, for sure, continue to share with you our use cases and inform you of any important updates.

Hopefully, this article was interesting and useful enough, if you have any questions, please do not hesitate to contact me. In my next post,  we’ll be talking about anomalies and their detection and proper treatment.

Forecasting with ARIMA

One of the most challenging machine learning problems is predicting some output based on the history of its previous values. The complexity of the problem multiplies as new features and constraints are added to analysis. Thus, in time series analysis it is not always enough to use previous values only, there often are many features that could impact on the output which should be predicted. There are numerous models and algorithms developed for this kind of problem. In this article, and series of articles that follows, several models, such as ARIMA, VAR and  Facebook Prophet, will be introduced and finally compared to see which one is best suitable for different kinds of problems.

ARIMA stands for autoregressive integrated moving average model, which produces forecasts based upon prior values in the time series (AR terms) and the errors made by previous predictions (MA terms), with possibility to work with non-stationary data, since it allows us to initially differentiate data to eliminate the non-stationarity (I term).

PyFlux library offers a very good implementation of ARIMA model in Python. It has useful documentation, followed by examples and it is very easy to use. To run  this model, we need to determine the optimal value for each of the parameters this function takes – AR, MA and I terms.

Problem definition

Data used for analysis include 125 instances, historical values, gathered during time period January-May 2017. Data take values mostly between 20-40, with deviation of +-15, which depends on the object that is observed. The problem is to predict the output for a given interval, based on these historical data. This is how our data look like.

Three things that should be investigated before running a model are time series stationarity, trend and seasonality, since each of them could negatively affect results obtained. That will tell us if we should differentiate data before training.

A rough conclusion that data is not stationary could be made by simply looking at the plot, but there is a little more elegant solution, which allows us to exactly determine if data is stationary – a Dickey-Fuller test, which has its implementation in StatsModels library. Here’s one example of how it can be used, taken from here.

These are results of a DF test applied to our data.

The test statistics is higher than the critical value, which leads to a conclusion that we cannot accept the null hypothesis of data stationarity.

The most popular technique used to make data stationary and eliminate trend and seasonality is differentiating.

ARIMA function in PyFlux library has a parameter that is used to set number of differencing order. In most cases, first order of differencing will do the job, but sometimes it is needed to differentiate data several times.

There are enough guidelines on how to determine the order of differencing and there is an excellent post that summarizes rules for identifying a proper ARIMA model, and set its parameters (https://people.duke.edu/~rnau/arimrule.htm ).

So, our data now look like this.

After re-running a stationarity test on differentiated data, we can see the following results.

A test statistics is now less than the critical value of 5% confidence interval, and we can adopt the null hypothesis that data is stationary.

Looks decent, so we could continue with analysis and run a model.

Set up parameters

The trickiest part of running an ARIMA model is determining the optimal number for AR and MA terms. There are two additional functions that can be used to solve this problem: ACF (autocorrelation function) and PACF (partial autocorrelation function). Their implementation can also be found in StatsModels library. Basically, both are used to determine how past and future data points are related in a time series, where the ACF function measures the correlation between two successive points, and the PACF tries to remove the transitive dependency between points (if a depends on b, and b depends on c, than a depends on c.)

Autocorrelation function can be used to determine the optimal number for MA terms by the following rule: the lag value where the ACF chart crosses the upper confidence interval for the first time is the recommended value for MA parameter. In this case, it is 29.

Partial autocorrelation function is used to determine the optimal value for AR parameter in ARIMA function (the same rule as previous is used).

The optimal value for AR parameter is 2.

Since we have optimal parameter values, we are able to run a model. One additional parameter that could be set is a distribution family of the time series – by default it is Normal but other options include Student-t, Laplace, Cauchy, etc.

Running a model

model = ARIMA(data=working_data, ar=2, ma=29, integ=1, target=’output’, family=pf.Normal())

The model is fitted with MLE – maximum likelihood point mass estimate method, which is used to estimate model parameters, by finding particular parametric values that make the observed results the most probable given the model.

Here’s the summary.

AIC and BIC are used as model performance measures. AIC is acronym for Akaike Information Criteria and it is widely used as a measure of a statistical model. It basically quantifies the goodness of fit, and the simplicity of the model into a single statistics. BIC is a Bayesian information criteria. If a model is estimated on a training set, BIC score gives an estimate of the model performance on a testing set. The main rule is “the lower, the better”, and those two can be mixed to choose the best-fit model among several others.

And this is what results look like graphically.

Pretty powerful, right?

Well, at least we thought so, until we came up to something like this.

We were looking for a more stable, accurate model and weren’t satisfied with ARIMA and its 3.56 RMSE, so we dug deeper and found out that the output we’re trying to predict depends on many other features that should also be included in modeling. That’s when we applied a VAR model that allows us to introduce exogenous variables and train an autoregressive model which takes into account values of these external features beside the historical values of the output that should be predicted. We will be talking about it more in the next post.