A tool to ease and reproduce the univariate time series forecast/prediction analysis

An introduction to ForecastTB, an R package

12 min readJun 8, 2020

Background

It is very important to make computational research reproducible and easy to analyze. There are several benefits in doing so and some of them are discussed here. In the field of Data Science, prediction and forecasting methods are very crucial processes. The ultimate objectives of data science projects are majorly affected by the performance of forecasting/prediction methods, therefore an accurate selection of these methods is a crucial task. Besides, nowadays, a large number of such methods are available and it is becoming a tedious task to choose a more appropriate model among the pool of them. Surely, if there is a tool that can automate this procedure with minimum efforts, it can be handy stuff for Data Science Researchers, Data Scientists, Data Analysts, and Academicians. This post is an introduction and demonstration to such a handy tool, an R package, named ForecastTB.

Decision making is one of the most crucial tasks in many domains and often decisions are based on the most accurate forecast available in the respective domains. A large number of areas, such as energy, economics, infrastructure, health, agriculture, defense, education, technology, geoscience, climate, and structural engineering among several others, are looking forward to benefits that can be achieved with time series forecasting.

Standardizing the process of the forecasting methods comparison is associated with numerous challenges. The apparent concern is the nature of the time series and selection of appropriate methods such as the complete variation from seasonality, trends, and random parameters, that should be handled with each method. For instance, one forecasting model may perform better for a seasonal time series but show poor accuracy as compared to other models for a trendy or a random one. Besides, interpretations might be affected by the error metric selection, for example, Root means square error (RMSE), as different metrics have different objectives.

Overview of ForecastTB

The ForecastTB package can be helpful for professionals and researchers working in the field of data science and forecasting analysis. The salient features of the ForecastTB package are as follows:

1. Reduction in efforts and time consumption: The ForecastTB package is designed to reduce the efforts and time consumption for the time series forecasting analysis. It avoids the repetitive steps in the analysis and leads to the promising comparative results report generation.
2. Truthful comparison assurance: The ForecastTB package ensures a truthful and unbiased comparison of forecasting methods. Hence, this package may be considered a reliable tool for forecasting models based on industrial reports generation or scientific publications.
3. Reproducible research: Along with unbiased comparisons, the ForecastTB package provides ease in reproducible research with minimum efforts. In other words, the forecasting comparison can be reproduced several times easily with the help of the ForecastTB package.
4. Stepping stone in machine learning automation: Forecasting methods play a very important role in machine learning applications. The ForecastTB package aims to evaluate the best performing forecasting method for a given time series dataset and this can be presented as a stepping stone in machine learning automation modeling. For example, on changing the nature and patterns of the time series dataset, a machine learning application could automatically replace the existing forecasting methods based on the output of the ForecastTB package.
5. A handy tool: The ForecastTB package is a handy tool, especially for researchers who are not comfortable with computer coding, since it is a plug-and-play module based package. A very simple syntax leads to very impressive and accurate forecasting comparison analysis.

The ForecastTB package is a plug-and-play structured module as shown below.

The plug-and-play module of the *ForecastTB* package.

It is used to compare the forecasting methods, which begins by forecasting time series with distinct strategies. Then the prediction accuracies are evaluated with the error metrics for all methods for several repetitions. The prediction_errors() function is employed for a forecasting method comparative evaluation with the consideration of various input parameters. It returns an object, which is the basic module in the package. Further, this module can be updated with new methods and other parameters with append_() and choose_() functions. The Monte_Carlo() function is a further extension of the prediction_errors() module to compare distinct methods for randomly selected patches of the input time series. The remaining two functions, plotMethods() and plot_circle(), are used to visualize forecasted results and error summaries for the chosen methods. All functions, as a set of modules, are based on and connected with the object provided by the prediction_errors() function. Hence, the framework of the ForecastTB package is version-control-friendly. It means, in the future, new features in the next versions of the package, can be easily introduced.

Demonstration of ‘ForecastTB’ package:

ForecastTB package is intended for comparing the performance of forecasting methods. The package assists in developing background, strategies, policies, and environment needed for comparison of forecasting methods. A comparison report for the defined framework is produced as an output.

Load the package as follows:

library(ForecastTB)

The basic function of the package is prediction_errors(). Following are the parameters considered by this function:

data: input time series for testing
nval: an integer to decide a number of values to predict (default:12)
ePara: type of error calculation (RMSE and MAE are default), add an error parameter of your choice in the following manner: ePara = c("errorparametername"), where errorparametername is should be a source/function which returns the desired error set. (default:RMSE and MAE)
ePara_name: list of names of error parameters passed in order (default:RMSE and MAE)
Method: list of locations of function for the proposed prediction method (should be recursive) (default:ARIMA)
MethodName: list of names for function for the proposed prediction method in order (default:ARIMA)
strats: list of forecasting strategies. Available : recursive and dirRec. (default:recursive)
append_: suggests if the function is used to append to another instance. (default:1)
dval: last d values of the data to be used for forecasting (default: length of the data)

The prediction_errors() function returns, two slots as output. The first slot is output, which provides Error_Parameters, indicating error values for the forecasting methods and error parameters defined in the framework, and Predicted_Values as values forecasted with the same forecasting methods. Further, the second slot is parameters, which returns the parameters used or provided to prediction_errors() function.

a <- prediction_errors(data = nottem)  
#`nottem` is a sample dataset in CRANa
#> An object of class "prediction_errors"
#> Slot "output":
#> $Error_Parameters
#>            RMSE       MAE      MAPE exec_time
#> ARIMA 2.3400915 1.9329816 4.2156087 0.1356769
#> 
#> $Predicted_Values
#>                   1       2       3       4       5       6     7
#> Test values 39.4000 40.9000 42.4000 47.8000 52.4000 58.0000 60.70
#> ARIMA       37.4193 37.6971 41.1825 46.2992 52.2480 57.1069 59.71
#>                    8        9      10       11       12
#> Test values 61.80000 58.20000 46.7000 46.60000 37.80000
#> ARIMA       59.41173 56.38197 51.4756 46.04203 41.52592
#> 
#> 
#> Slot "parameters":
#> $data
#>       Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
#> 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8
#> 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8
#> 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7
#> 1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6
#> 1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6
#> 1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3
#> 1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8
#> 1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2
#> 1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3
#> 1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9
#> 1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8
#> 1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6
#> 1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8
#> 1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8
#> 1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8
#> 1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4
#> 1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3
#> 1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1
#> 1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2
#> 1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8
#> 
#> $nval
#> [1] 12
#> 
#> $ePara
#> [1] "RMSE" "MAE"  "MAPE"
#> 
#> $ePara_name
#> [1] "RMSE" "MAE"  "MAPE"
#> 
#> $Method
#> [1] "ARIMA"
#> 
#> $MethodName
#> [1] "ARIMA"
#> 
#> $Strategy
#> [1] "Recursive"
#> 
#> $dval
#> [1] 240

The quick visualization of the object returned with prediction_errors() function can be done with plot() function as below:

b <- plot(a)

Comparison of multiple methods:

As discussed above, prediction_errors() function evaluates the performance of ARIMA method. In addition, it allows to compare the performance of distinct methods along with ARIMA. In the following example, two methods (LPSF and PSF) are compared along with the ARIMA. These methods are formatted in the form of a function, which requires data and nval as input parameters and must return the nval number of forecasted values as a vector. In the following code, test1() and test2() functions are used for LPSF and PSF methods, respectively.

library(decomposedPSF)
#> Warning: package 'decomposedPSF' was built under R version 3.6.2
test1 <- function(data, nval){
   return(lpsf(data = data, n.ahead = nval))
}library(PSF)
#> Warning: package 'PSF' was built under R version 3.6.2
test2 <- function(data, nval){
  a <- psf(data = data, cycle = 12)
  b <- predict(object = a, n.ahead = nval)
  return(b)
}

Following code chunk show how a user can attach various methods in the prediction_errors() function. In this chunk, the append_ parameter is assigned 1, to append the new methods (LPSF and PSF) in addition to the default ARIMA method. On the contrary, if the append_parameter is assigned 0, only newly added LPSF and PSF methods would be compared.

a1 <- prediction_errors(data = nottem, nval = 48, 
                Method = c("test1(data, nval)", "test2(data,nval)"), 
                MethodName = c("LPSF","PSF"), append_ = 1)a1@output$Error_Parameters
#>            RMSE       MAE      MAPE exec_time
#> ARIMA 2.5233156 2.1280641 4.5135378 0.1659489
#> LPSF   2.391580  1.936111  4.238650  0.441232
#> PSF    2.467598  1.854861  3.943937  0.098737b1 <- plot(a1)

Appending new methods:

Consider, another function test3(), which is to be added to an already existing object prediction_errors, eg. a1.

library(forecast)
test3 <- function(data, nval){
  b <- as.numeric(forecast(ets(data), h = nval)$mean)
  return(b)
}

For this purpose, the append_() function can be used as follows:

The append_() function have object, Method, MethodName, ePara and ePara_name parameters, with similar meaning as that of used in prediction_errors() function. Other hidden parameters of the append_() function automatically get synced with the prediction_errors() function.

c1 <- append_(object = a1, 
              Method = c("test3(data,nval)"), MethodName = c('ETS'))c1@output$Error_Parameters
#>              RMSE         MAE        MAPE   exec_time
#> ARIMA   2.5233156   2.1280641   4.5135378   0.1659489
#> LPSF     2.391580    1.936111    4.238650    0.441232
#> PSF      2.467598    1.854861    3.943937    0.098737
#> ETS   38.29743056 36.85216463 73.47667823  0.03786588d1 <- plot(c1)

Removing methods:

When more than one method is established in the environment and the user wishes to remove one or more of these methods from it, the choose_() function can be used. This function takes a prediction_errors object as input shows all methods established in the environment, and asks the number of methods that the user wants to remove from it.

In the following example, the user-supplied 4 as input, which reflects Method 4: ETS, and in response to this, the choose_() function provides a new object with updated method lists.

# > e1 <- choose_(object = c1)
# Following are the methods attached with the object:
#         [,1]    [,2]   [,3]  [,4] 
# Indices "1"     "2"    "3"   "4"  
# Methods "ARIMA" "LPSF" "PSF" "ETS"
#
# Enter the indices of methods to remove:4
#
# > e1@output$Error_Parameters
#            RMSE       MAE exec_time
# ARIMA 2.5233156 2.1280641 0.1963789
# LPSF  2.3915796 1.9361111 0.2990961
# PSF   2.2748736 1.8301389 0.1226711

Adding new Error metrics:

In a default scenario, the prediction_errors() function compares forecasting methods in terms of RMSE, MAE and MAPE. In addition, it allows us to append multiple new error metrics. The Percent change in variance (PCV) is another error metric with the following definition:

where var(Predicted) and var(Observed) are the variance of predicted and observed values. Following chunk code is the function for PCV error metric:

pcv <- function(obs, pred){
  d <- (var(obs) - var(pred)) * 100/ var(obs)
  d <- abs(as.numeric(d))
  return(d)
}

Following chunk code is used to append PCV as a new error metric in existing prediction_errors object.

a1 <- prediction_errors(data = nottem, nval = 48, 
                Method = c("test1(data, nval)", "test2(data,nval)"), 
                MethodName = c("LPSF","PSF"), 
                ePara = "pcv(obs, pred)", ePara_name = 'PCV',
                append_ = 1)a1@output$Error_Parameters
#>             RMSE        MAE       MAPE        PCV  exec_time
#> ARIMA   2.523316   2.128064   4.513538  13.757073   0.152633
#> LPSF   2.5366583  1.9791667  4.2695528 13.6065832  0.3272281
#> PSF    2.1729411  1.7230385  3.7067049  0.9558439  0.1047201b1 <- plot(a1)

A polar plot:

This polar plot shows how forecasted observations are behaving on an increasing number of seasonal time horizons.

plot_circle(a1)

Monte-Carlo strategy:

Monte-Carlo is a popular strategy to compare the performance of forecasting methods, which selects multiple patches of dataset randomly and test the performance of forecasting methods and returns the average error values.

The Monte-Carlo strategy ensures an accurate comparison of forecasting methods and avoids the biased results obtained by chance. This package provides the monte_carlo() function as follows:

The parameters used in this function are:

object: output of ‘prediction_errors()’ function
size: volume of time series used in Monte Carlo strategy
iteration: number of iterations models to be applied
fval: a flag to view forecasted values in each iteration (default: 0, don’t view values)
figs: a flag to view plots for each iteration (default: 0, don’t view plots)

This function returns:

Error values with provided models in each iteration along with the mean values

a1 <- prediction_errors(data = nottem, nval = 48, 
                        Method = c("test1(data, nval)"), 
                        MethodName = c("LPSF"), append_ = 1)monte_carlo(object = a1, size = 180, iteration = 10)
#>         ARIMA     LPSF
#> 9    3.446114 5.009127
#> 11   3.807793 5.367784
#> 30   3.477685 5.195355
#> 32   2.590447 4.758633
#> 24   4.570650 4.871212
#> 2    4.476293 5.698093
#> 48   2.815019 5.143118
#> 35   2.692311 4.714380
#> 18   3.309056 5.182951
#> 3    4.974178 5.665410
#> Mean 3.615955 5.160606

When monte_carlo() function with fval and figs OFF and ON flags, respectively:

monte_carlo(object = a1, size = 144, iteration = 2, fval=0, figs= 1)

Summary

This post demonstrated the ForecastTB package as a test-bench for comparing the time series forecasting methods as a crucial step towards more formal time series analysis. This demonstration is further described with some examples to show how characteristics of the temporal correlation structure can influence the forecast accuracy. The ForecastTB package greatly assists in comparing different forecasting methods and considering the characteristic of the time series dataset.

Also, it explains how new forecasting methods and error metrics can be introduced in the comparative test-bench. Finally, a simple plug-and-play module based architecture of the ForecastTB to append or remove several forecasting methods and error metrics makes it a robust and handy tool to evaluate forecasting analysis.

The ForecastTB, an R package is available on CRAN repository, here. Also, it is demonstrated in the journal publication along with some case studies in renewable energy applications. The details are available here.

For any further details, feel free to comment on this post.

Author:
Dr. Neeraj Dhanraj Bokde,
Postdoctoral Researcher,
Aarhus University, Denmark
https://www.researchgate.net/profile/Neeraj_Bokde