Introduction
A time series is a series of data points in a certain time order. It is a sequence of discrete time data. Examples of time series are heights of ocean tides, power usage, heart rate, counts of sunspots, stock market values, click data, predict the demand for powerplants usage. But think about sales forecasting, or financial analysis or future projections.Time series data is normally plotted in line graphs.
The real goal is forecasting of timeseries.
Examples of time series
I've gathered a couples of examples of timeseries.
Here is another example of times series with trend and a seasonal effect.
The Microsoft shares that goes up and down during the year.
Definitions
First, let's start with some definitions that helps understanding this topic in a more thorough way. Timeseries have certain characteristics. Time Series needs some patterns because if it hasn't have patterns it is impossible to predict future values. So, the data should contain some certain repeat ability in order to make predictions for the future.
Stationary
The mean should be constant and not dependent on time. So, if the overall time series goes upwards or downwards in a graph we can say that this is stationary. The mean should not change over time. Also the variance shouldn't change over time. So if the variance of the data changes overtime like larger and larger peaks it will not be called stationary. Also the covariance should also be the same over time. You can find more information here.
So stationarity has:
- a constant mean.
- a constant variance.
- a constanct covariance over time.
Irregular reporting
In order to build time series analysis it is important that the data points are registered at regular moments. For instance when the data is registered at monthly level and then a half year no data is registered, it's difficult to find regular patterns because of the lack of data.
Auto correlation
Auto means self and Wikpedia : "Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals."
Auto Correlation Function (ACF)
One important function that is often used in time series is the "Auto correlation function". The auto correlation function defines the correlation between the data points in a time series graph. If the graph is random there would be no correlation found between the different data points.
Partial Auto Correlation Function (PACF)
Wikipedia :"In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a time series with its own lagged values, controlling for the values of the time series at all shorter lags. It contrasts with the autocorrelation function, which does not control for other lags."
Decomposition
The decomposition of time series is a statistical method that deconstructs a time series into several components. There are couple of major components of decomposition of a time series: trend, the pattern and a stationary signal.
LOWESS / LOESS regression
LOWESS (Locally Weighted Scatterplot Smoothing) or LOESS (locally weighted smoothing), is a popular method used in regression analysis. It creates a smooth line through a timeplot or scatter plot to help you to see relationship between variables and foresee trends. LOESS used local polynomial modeling to find trends and you need to determine the bandwidth parameter. This can be a bit difficult to determine. More information on Loess here.
Moving Average smoothing
Moving Average smoothing is used to remove the smaller period stuff in contrast with LOESS that is used for larger period patterns. Moving Average smoothing is for a more fine grain level.
With Moving Average smoothing you find trends with averaging over a cycle length. You need to know the cycle trend. This can a bit tedious and a bit difficult to see but with experimenting and trail and error it is possible to determine trends.
STL
STL is a combination of LOESS and Moving Average Smoothing technique. First LOESS is used for finding a general trend. Then a Moving Avere Smoothing is used to find more trends. The result is a seasonal effect and the last pass is the remainder, the residu.
Use Loess to find a general trend T and subtract it (X - T) and the Moving Average Smoothing is used to find a seasonal effect. Subtract that S = X - T - C.
In the above example the original diagram is shown ("data") and in the diagram below the seasonal effect is presented without the trend and the remainder. Without the seasonal effect the trend is clearly recognized and what is left is the remainder and if this is normally distributed you're on the right track. Because if the remainder does not show a normal distribution there is perhaps a correlation between certain variables.
And the remainder is normally distributed we have removed all of the hidden relations. It is pure random.
Use Loess to find a general trend T and subtract it (X - T) and the Moving Average Smoothing is used to find a seasonal effect. Subtract that S = X - T - C.
In the above example the original diagram is shown ("data") and in the diagram below the seasonal effect is presented without the trend and the remainder. Without the seasonal effect the trend is clearly recognized and what is left is the remainder and if this is normally distributed you're on the right track. Because if the remainder does not show a normal distribution there is perhaps a correlation between certain variables.
And the remainder is normally distributed we have removed all of the hidden relations. It is pure random.
Forecasting
If we know the trend and the seasonal effect we can predict the future. In the diagram below the forecast is the blue line is a predicted value of the future based on the trend and the seasonal effect.
Conclusion
This blogpost is an introduction on forecasting based on timeseries.
Hennie
Geen opmerkingen:
Een reactie posten