Time series: a small but detailed introduction

In this article, we would like to briefly present the classical time series analysis. The topic we are dealing with is very broad and requires fairly in-depth knowledge for a detailed understanding of the algorithms we are discussing; however, we do not wish to dispense with a presentation that can introduce the topic and provide some essential tools, while in the second part (which we will publish in 15 days’ time) we will address the topic of time series from the perspective of ML machine learning and AI.

What is a Time Series

Una time series (serie temporale) è una serie di dati indicizzati in sequenza rispetto al tempo. An example is a sequence of observations or detections recorded as time passes. There are two macrocategories of time series:

univariate time series: observations are one-dimensional, i.e. only one numerical value is recorded as time passes;
multivariate time series: observations are multidimensional, i.e., several numerical values are recorded for a single instant of time.

Typically, a time series is represented with a data structure that records a timestamp, which may be of some date-specific type or an integer or long containing a Unix timestamp; in addition to this, it contains additional customised data, which in the simplest version may be a single numeric value, i.e. the observation recorded as time passes.

To make this definition concrete, let us present an example: the recording of temperature as time passes:

The preceding is an example of univariate, let us look at a case of multivariate time series:

In the above example, there are two measurements recorded for each timestamp: the maximum and minimum temperature; hence we speak of a multivariate time series.

A more detailed example may include inhomogeneous measurements (i.e. of different types of magnitudes) recorded with the same timestamp (data are taken from here):

Breakdown of time series

A time series can be broken down into four components that encode specific aspects of the evolution of recorded values over time.

Trend: is the long-term movement of the series, which may be upward, downward, stationary or a composition of these for different periods.
Seasonal variation: this is a repetitive pattern of growth and decrease with a fixed frequency.
Cyclical fluctuations: this is a repetitive pattern of growth and decrease that does not have a fixed frequency.
Irregular variations: variations with no codifiable regularity.

An example from here of decomposition into the four components:

In the picture you see: the time series as the first graph, the trend as the second, the seasonal variation to follow and the irregular fluctuations, called ‘residual’ here, last.

A representation of the ‘overlapping’ components is available here:

From the figure above, it can be seen that the time series trend is upward and has a linear course; seasonality is the darker line that overlaps the trend, with regular fluctuations.

The decomposition then allows us to extract some information on the trend of the phenomenon studied.

Classical algorithms

Let us look at some classic algorithms used for prediction.

Autoregressive (AR)

An Autoregressive (AR) model uses patterns extracted from historical data to extrapolate future time series behaviour.

According to Wikipedia

The autoregressive model is a linear model that specifies that the output variable depends linearly on the values of the previous outputs.

The model is based on regression: by indicating with y(t) the value of the time series at time t and with y(t-i) the value of the time series at time t-i, for i=0,…,t-1, we can perform a linear regression:

y(t)= b(1)y(t-1) + … + b(t-1)y(1)+ e(t)

where e(t) indicates an error at prediction time t and i (possibly null) b(t) are the coefficients of the linear regression.

Moving Average (MA)

The Moving Average (MA) model uses the error predicted in the previous step to work out an average trend on the data using a regression.

As Wikipedia reports:

Moving averaging of a data set consists of creating a series of averages of various subsets of the complete set. It is a tool used for the analysis of time series […].

The model can be formalised as:

y(t)= t(1) e(t-1)+ … + t(t-1) e(1)+e(t)

where t(i) are regressor coefficients and e(t-i) are the errors at times t-i respectively, for i=0,…,t-1.

Autoregressive Moving Average (ARMA)

The ARMA (Autoregressive Moving Average) model is a combination of the previous two. From a formal point of view, the prediction is made using a formula such as:

where the first sum has p terms and the second q terms, called autoregressive order and moving average order, respectively.

The ARMA algorithm explains the relationship using its past values (via the AR model) and combines them with white noise (via the MA model).

Autoregressive Integrated Moving Average (ARIMA)

The Autoregressive Integrated Moving Average (ARIMA) algorithm is one of the most widely used algorithms in time series forecasting. The main objective of the ARIMA model is to explain the autocorrelation between the data. Therefore, ARIMA is based on two fundamental concepts: stationarity and differentiation.

A time series is stationary if the mean and variance of the observations do not change over time. Those with a non-constant trend, therefore, are not stationary. A stationary time series has no predictable patterns of interest in the long run (it has constant variance).

Differentiation, on the other hand, is a calculation of the difference between consecutive stationary observations at data points. This method is used to modify a time series, stabilising the average and eliminating (or reducing) trend and seasonality.

A model generated by the ARIMA algorithm can thus be defined as a prediction model of differentiated time series in order to make them stationary.

Worthy of note is the SARIMA model, where a seasonal component is also integrated.

Conclusions

In this article, we presented the concept of the time series and some classic methods of analysis to study and predict its behaviour. The study of these structures is a complex and broad field; indeed, many temporal phenomena can be represented with them.

Examples in the industrial field are various sensor measurements or system logs, which are usually historicised with an associated timestamp; examples in other fields could be satellite measurements, historical data from financial markets, environmental measurements (temperature, dust pollution, etc.).

Time series analysis is therefore an essential requirement for the study of many phenomena.

We have many interesting articles in our blog, don’t miss a single one!

What are time series