# A tool to ease and reproduce the univariate time series forecast/prediction analysis

## An introduction to ForecastTB, an R package

--

## Background

It is very important to make computational research reproducible and easy to analyze. There are several benefits in doing so and some of them are discussed here. In the field of Data Science, prediction and forecasting methods are very crucial processes. The ultimate objectives of data science projects are majorly affected by the performance of forecasting/prediction methods, therefore an accurate selection of these methods is a crucial task. Besides, nowadays, a large number of such methods are available and it is becoming a tedious task to choose a more appropriate model among the pool of them. Surely, if there is a tool that can automate this procedure with minimum efforts, it can be handy stuff for Data Science Researchers, Data Scientists, Data Analysts, and Academicians. This post is an introduction and demonstration to such a handy tool, an R package, named *ForecastTB*.

Decision making is one of the most crucial tasks in many domains and often decisions are based on the most accurate forecast available in the respective domains. A large number of areas, such as energy, economics, infrastructure, health, agriculture, defense, education, technology, geoscience, climate, and structural engineering among several others, are looking forward to benefits that can be achieved with time series forecasting.

Standardizing the process of the forecasting methods comparison is associated with numerous challenges. The apparent concern is the nature of the time series and selection of appropriate methods such as the complete variation from seasonality, trends, and random parameters, that should be handled with each method. For instance, one forecasting model may perform better for a seasonal time series but show poor accuracy as compared to other models for a trendy or a random one. Besides, interpretations might be affected by the error metric selection, for example, Root means square error (RMSE), as different metrics have different objectives.

## Overview of ForecastTB

The *ForecastTB* package can be helpful for professionals and researchers working in the field of data science and forecasting analysis. The salient features of the *ForecastTB* package are as follows:

1.

Reduction in efforts and time consumption:The ForecastTB package is designed to reduce the efforts and time consumption for the time series forecasting analysis. It avoids the repetitive steps in the analysis and leads to the promising comparative results report generation.2.

Truthful comparison assurance:The ForecastTB package ensures a truthful and unbiased comparison of forecasting methods. Hence, this package may be considered a reliable tool for forecasting models based on industrial reports generation or scientific publications.3.

Reproducible research:Along with unbiased comparisons, the ForecastTB package provides ease in reproducible research with minimum efforts. In other words, the forecasting comparison can be reproduced several times easily with the help of the ForecastTB package.4.

Stepping stone in machine learning automation:Forecasting methods play a very important role in machine learning applications. The ForecastTB package aims to evaluate the best performing forecasting method for a given time series dataset and this can be presented as a stepping stone in machine learning automation modeling. For example, on changing the nature and patterns of the time series dataset, a machine learning application could automatically replace the existing forecasting methods based on the output of the ForecastTB package.5.

A handy tool:The ForecastTB package is a handy tool, especially for researchers who are not comfortable with computer coding, since it is a plug-and-play module based package. A very simple syntax leads to very impressive and accurate forecasting comparison analysis.

The *ForecastTB* package is a plug-and-play structured module as shown below.

It is used to compare the forecasting methods, which begins by forecasting time series with distinct strategies. Then the prediction accuracies are evaluated with the error metrics for all methods for several repetitions. The *prediction_errors()* function is employed for a forecasting method comparative evaluation with the consideration of various input parameters. It returns an object, which is the basic module in the package. Further, this module can be updated with new methods and other parameters with *append_()* and *choose_()* functions. The *Monte_Carlo()* function is a further extension of the *prediction_errors()* module to compare distinct methods for randomly selected patches of the input time series. The remaining two functions, *plotMethods()* and *plot_circle()*, are used to visualize forecasted results and error summaries for the chosen methods. All functions, as a set of modules, are based on and connected with the object provided by the *prediction_errors()* function. Hence, the framework of the *ForecastTB* package is version-control-friendly. It means, in the future, new features in the next versions of the package, can be easily introduced.

## Demonstration of ‘ForecastTB’ package:

*ForecastTB* package is intended for comparing the performance of forecasting methods. The package assists in developing background, strategies, policies, and environment needed for comparison of forecasting methods. A comparison report for the defined framework is produced as an output.

Load the package as follows:

**library**(ForecastTB)

The basic function of the package is `prediction_errors()`

. Following are the parameters considered by this function:

`: input time series for testing`

data

`: an integer to decide a number of values to predict (default:`

nval`)`

12

`: type of error calculation (RMSE and MAE are default), add an error parameter of your choice in the following manner:`

ePara`, where errorparametername is should be a source/function which returns the desired error set. (default:`

ePara = c("errorparametername")`and`

RMSE`)`

MAE

`: list of names of error parameters passed in order (default:`

ePara_name`and`

RMSE`)`

MAE

`: list of locations of function for the proposed prediction method (should be recursive) (default:`

Method`)`

ARIMA

`: list of names for function for the proposed prediction method in order (default:`

MethodName`)`

ARIMA

`: list of forecasting strategies. Available :`

strats`and`

recursive`. (default:`

dirRec`)`

recursive

`: suggests if the function is used to append to another instance. (default:`

append_`)`

1

`: last d values of the data to be used for forecasting (default: length of the`

dval`)`

data

The `prediction_errors()`

function returns, two slots as output. The first slot is `output`

, which provides `Error_Parameters`

, indicating error values for the forecasting methods and error parameters defined in the framework, and `Predicted_Values`

as values forecasted with the same forecasting methods. Further, the second slot is `parameters`

, which returns the parameters used or provided to `prediction_errors()`

function.

a <-prediction_errors(data = nottem)#`nottem` is a sample dataset in CRANa#> An object of class "prediction_errors"#> Slot "output":#> $Error_Parameters#> RMSE MAE MAPE exec_time#> ARIMA 2.3400915 1.9329816 4.2156087 0.1356769#>#> $Predicted_Values#> 1 2 3 4 5 6 7#> Test values 39.4000 40.9000 42.4000 47.8000 52.4000 58.0000 60.70#> ARIMA 37.4193 37.6971 41.1825 46.2992 52.2480 57.1069 59.71#> 8 9 10 11 12#> Test values 61.80000 58.20000 46.7000 46.60000 37.80000#> ARIMA 59.41173 56.38197 51.4756 46.04203 41.52592#>#>#> Slot "parameters":#> $data#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec#> 1920 40.6 40.8 44.4 46.7 54.1 58.5 57.7 56.4 54.3 50.5 42.9 39.8#> 1921 44.2 39.8 45.1 47.0 54.1 58.7 66.3 59.9 57.0 54.2 39.7 42.8#> 1922 37.5 38.7 39.5 42.1 55.7 57.8 56.8 54.3 54.3 47.1 41.8 41.7#> 1923 41.8 40.1 42.9 45.8 49.2 52.7 64.2 59.6 54.4 49.2 36.3 37.6#> 1924 39.3 37.5 38.3 45.5 53.2 57.7 60.8 58.2 56.4 49.8 44.4 43.6#> 1925 40.0 40.5 40.8 45.1 53.8 59.4 63.5 61.0 53.0 50.0 38.1 36.3#> 1926 39.2 43.4 43.4 48.9 50.6 56.8 62.5 62.0 57.5 46.7 41.6 39.8#> 1927 39.4 38.5 45.3 47.1 51.7 55.0 60.4 60.5 54.7 50.3 42.3 35.2#> 1928 40.8 41.1 42.8 47.3 50.9 56.4 62.2 60.5 55.4 50.2 43.0 37.3#> 1929 34.8 31.3 41.0 43.9 53.1 56.9 62.5 60.3 59.8 49.2 42.9 41.9#> 1930 41.6 37.1 41.2 46.9 51.2 60.4 60.1 61.6 57.0 50.9 43.0 38.8#> 1931 37.1 38.4 38.4 46.5 53.5 58.4 60.6 58.2 53.8 46.6 45.5 40.6#> 1932 42.4 38.4 40.3 44.6 50.9 57.0 62.1 63.5 56.3 47.3 43.6 41.8#> 1933 36.2 39.3 44.5 48.7 54.2 60.8 65.5 64.9 60.1 50.2 42.1 35.8#> 1934 39.4 38.2 40.4 46.9 53.4 59.6 66.5 60.4 59.2 51.2 42.8 45.8#> 1935 40.0 42.6 43.5 47.1 50.0 60.5 64.6 64.0 56.8 48.6 44.2 36.4#> 1936 37.3 35.0 44.0 43.9 52.7 58.6 60.0 61.1 58.1 49.6 41.6 41.3#> 1937 40.8 41.0 38.4 47.4 54.1 58.6 61.4 61.8 56.3 50.9 41.4 37.1#> 1938 42.1 41.2 47.3 46.6 52.4 59.0 59.6 60.4 57.0 50.7 47.8 39.2#> 1939 39.4 40.9 42.4 47.8 52.4 58.0 60.7 61.8 58.2 46.7 46.6 37.8#>#> $nval#> [1] 12#>#> $ePara#> [1] "RMSE" "MAE" "MAPE"#>#> $ePara_name#> [1] "RMSE" "MAE" "MAPE"#>#> $Method#> [1] "ARIMA"#>#> $MethodName#> [1] "ARIMA"#>#> $Strategy#> [1] "Recursive"#>#> $dval#> [1] 240

The quick visualization of the object returned with `prediction_errors() `

function can be done with `plot()`

function as below:

`b <- `**plot**(a)

## Comparison of multiple methods:

As discussed above, `prediction_errors()`

function evaluates the performance of `ARIMA`

method. In addition, it allows to compare the performance of distinct methods along with `ARIMA`

. In the following example, two methods (`LPSF`

and `PSF`

) are compared along with the `ARIMA`

. These methods are formatted in the form of a function, which requires `data`

and `nval`

as input parameters and must return the `nval`

number of forecasted values as a vector. In the following code, `test1()`

and `test2()`

functions are used for `LPSF`

and `PSF`

methods, respectively.

library(decomposedPSF)#> Warning: package 'decomposedPSF' was built under R version 3.6.2

test1 <-function(data, nval){

return(lpsf(data = data, n.ahead = nval))

}library(PSF)#> Warning: package 'PSF' was built under R version 3.6.2

test2 <-function(data, nval){

a <-psf(data = data, cycle = 12)

b <-predict(object = a, n.ahead = nval)

return(b)

}

Following code chunk show how a user can attach various methods in the `prediction_errors()`

function. In this chunk, the `append_`

parameter is assigned `1`

, to append the new methods (`LPSF`

and `PSF`

) in addition to the default `ARIMA`

method. On the contrary, if the `append_`

parameter is assigned `0`

, only newly added `LPSF`

and `PSF`

methods would be compared.

a1 <-prediction_errors(data = nottem, nval = 48,

Method =c("test1(data, nval)", "test2(data,nval)"),

MethodName =c("LPSF","PSF"), append_ = 1)a1@output$Error_Parameters#> RMSE MAE MAPE exec_time#> ARIMA 2.5233156 2.1280641 4.5135378 0.1659489#> LPSF 2.391580 1.936111 4.238650 0.441232#> PSF 2.467598 1.854861 3.943937 0.098737b1 <-plot(a1)

## Appending new methods:

Consider, another function `test3()`

, which is to be added to an already existing object `prediction_errors`

, eg. `a1`

.

**library**(forecast)

test3 <- **function**(data, nval){

b <- **as.numeric**(**forecast**(**ets**(data), h = nval)$mean)

**return**(b)

}

For this purpose, the `append_()`

function can be used as follows:

The `append_()`

function have `object`

, `Method`

, `MethodName`

, `ePara`

and `ePara_name`

parameters, with similar meaning as that of used in `prediction_errors()`

function. Other hidden parameters of the `append_()`

function automatically get synced with the `prediction_errors()`

function.

c1 <-append_(object = a1,

Method =c("test3(data,nval)"), MethodName =c('ETS'))c1@output$Error_Parameters#> RMSE MAE MAPE exec_time#> ARIMA 2.5233156 2.1280641 4.5135378 0.1659489#> LPSF 2.391580 1.936111 4.238650 0.441232#> PSF 2.467598 1.854861 3.943937 0.098737#> ETS 38.29743056 36.85216463 73.47667823 0.03786588d1 <-plot(c1)

## Removing methods:

When more than one method is established in the environment and the user wishes to remove one or more of these methods from it, the `choose_()`

function can be used. This function takes a `prediction_errors`

object as input shows all methods established in the environment, and asks the number of methods that the user wants to remove from it.

In the following example, the user-supplied `4`

as input, which reflects `Method 4: ETS`

, and in response to this, the `choose_()`

function provides a new object with updated method lists.

`# > e1 <- choose_(object = c1)`

# Following are the methods attached with the object:

# [,1] [,2] [,3] [,4]

# Indices "1" "2" "3" "4"

# Methods "ARIMA" "LPSF" "PSF" "ETS"

#

# Enter the indices of methods to remove:4

#

# > e1@output$Error_Parameters

# RMSE MAE exec_time

# ARIMA 2.5233156 2.1280641 0.1963789

# LPSF 2.3915796 1.9361111 0.2990961

# PSF 2.2748736 1.8301389 0.1226711

## Adding new Error metrics:

In a default scenario, the `prediction_errors()`

function compares forecasting methods in terms of `RMSE`

, `MAE`

and `MAPE`

. In addition, it allows us to append multiple new error metrics. The Percent change in variance (PCV) is another error metric with the following definition:

where *var(Predicted)* and *var(Observed)* are the variance of predicted and observed values. Following chunk code is the function for PCV error metric:

`pcv <- `**function**(obs, pred){

d <- (**var**(obs) - **var**(pred)) * 100/ **var**(obs)

d <- **abs**(**as.numeric**(d))

**return**(d)

}

Following chunk code is used to append PCV as a new error metric in existing `prediction_errors`

object.

a1 <-prediction_errors(data = nottem, nval = 48,

Method =c("test1(data, nval)", "test2(data,nval)"),

MethodName =c("LPSF","PSF"),

ePara = "pcv(obs, pred)", ePara_name = 'PCV',

append_ = 1)a1@output$Error_Parameters#> RMSE MAE MAPE PCV exec_time#> ARIMA 2.523316 2.128064 4.513538 13.757073 0.152633#> LPSF 2.5366583 1.9791667 4.2695528 13.6065832 0.3272281#> PSF 2.1729411 1.7230385 3.7067049 0.9558439 0.1047201b1 <-plot(a1)

## A polar plot:

This polar plot shows how forecasted observations are behaving on an increasing number of seasonal time horizons.

**plot_circle**(a1)

## Monte-Carlo strategy:

Monte-Carlo is a popular strategy to compare the performance of forecasting methods, which selects multiple patches of dataset randomly and test the performance of forecasting methods and returns the average error values.

The Monte-Carlo strategy ensures an accurate comparison of forecasting methods and avoids the biased results obtained by chance. This package provides the `monte_carlo()`

function as follows:

The parameters used in this function are:

`: output of ‘prediction_errors()’ function`

object

`: volume of time series used in Monte Carlo strategy`

size

`: number of iterations models to be applied`

iteration

`: a flag to view forecasted values in each iteration (default: 0, don’t view values)`

fval

`: a flag to view plots for each iteration (default: 0, don’t view plots)`

figs

This function returns:

Error values with provided models in each iteration along with the mean values

a1 <-prediction_errors(data = nottem, nval = 48,

Method =c("test1(data, nval)"),

MethodName =c("LPSF"), append_ = 1)monte_carlo(object = a1, size = 180, iteration = 10)#> ARIMA LPSF#> 9 3.446114 5.009127#> 11 3.807793 5.367784#> 30 3.477685 5.195355#> 32 2.590447 4.758633#> 24 4.570650 4.871212#> 2 4.476293 5.698093#> 48 2.815019 5.143118#> 35 2.692311 4.714380#> 18 3.309056 5.182951#> 3 4.974178 5.665410#> Mean 3.615955 5.160606

When `monte_carlo()`

function with `fval`

and `figs`

OFF and ON flags, respectively:

**monte_carlo**(object = a1, size = 144, iteration = 2, fval=0, figs= 1)

# Summary

This post demonstrated the *ForecastTB* package as a test-bench for comparing the time series forecasting methods as a crucial step towards more formal time series analysis. This demonstration is further described with some examples to show how characteristics of the temporal correlation structure can influence the forecast accuracy. The *ForecastTB* package greatly assists in comparing different forecasting methods and considering the characteristic of the time series dataset.

Also, it explains how new forecasting methods and error metrics can be introduced in the comparative test-bench. Finally, a simple plug-and-play module based architecture of the *ForecastTB* to append or remove several forecasting methods and error metrics makes it a robust and handy tool to evaluate forecasting analysis.

The *ForecastTB*, an R package is available on CRAN repository, here. Also, it is demonstrated in the journal publication along with some case studies in renewable energy applications. The details are available here.

For any further details, feel free to comment on this post.

Author:Dr. Neeraj Dhanraj Bokde,

Postdoctoral Researcher,

Aarhus University, Denmark