What is a Time Series and why analyze it?
Hi Folks! We are living in unprecedented times which can be rightly called the transition period for humans. With each new day, we see new advancements in technologies and it has become important for businesses to leverage those technologies for generating profits and achieving process efficiency.
With advancements in data storage technologies organizations are generating a humungous amount of transactional as well as analytical data, which can be leveraged to get hindsight about the past and foresight about the future.
Past information when analyzed can reveal trends and insights that can be used with Machine Learning techniques to make predictions about the future.
But before being able to do that we need to understand a few basic concepts regarding time series. So let's get started!
Time Series Data
A Time series data is a sequence of information that has a time-period or timestamp attached to each value of the sequence. The value can be anything measurable that depends on time in some way. For example, Gold prices recorded for a particular period of time on an hourly basis. Another example can be the temperature recorded in a particular village for a period of 1 year on a daily basis. In both of these examples, a value is being recorded at regular intervals which forms a sequence with a timestamp attached to each value.
The frequency of a time-series is how often or how frequently the values of the dataset are recorded. In the examples above, the frequencies of the time-series are hourly and daily respectively.
Why analyze a Time Series?
Patterns observed in a time-series are expected to persist in the future and that is why we try to model a time-series to make predictions about the future.
For defining any business strategy and planning, it is important to have a sense of how things may turn out in the future. And thus, it is desirable to have as much information we can have about the future which can assist in defining the strategy.
Consider the case of electricity generation and consumption. Electricity generation and consumption is a real-time process since it is infeasible to store electricity in large amounts. Due to the peculiarities of the process, the overall cost of electricity generation is quite high and thus, it is important for electricity generation utilities to know the electricity demand that should be met for optimal planning of resources. Otherwise, consumers may face electricity outages if the demand doesn't match the electricity generation. So, it becomes important for the utilities to forecast electricity demand.
How can the utility forecast demand?
Forecasting refers to predicting future values using historical data. A power generation utility to be able to do demand forecasting needs to analyze historical data to find patterns. The historical data can be the hourly generation recorded by the utility in the last 5 years. This electricity generation data should be representative of the actual electricity demand in the last 5 years at an hourly level. Thus, we are speaking of data requirements for 5 years at an hourly level frequency. (Obviously, we can analyze data for a shorter or longer period of time too.)
Now, once the data is available, it can be analyzed to find patterns. For analyzing any time-series in a meaningful way all the time-periods must be equal and clearly defined resulting in a constant frequency. Patterns observed in a time-series are expected to persist in the future and that is why we try to model a time-series to make predictions for the future. Predictions are made after analysis of historical data which only makes sense if the data is in chronological order.
Once, the data requirements are fulfilled, exploratory data analysis can be performed to find insights and patterns, and using appropriate data modeling and machine learning techniques, the electricity generation utility can forecast hourly demand for future time periods. This would greatly help the utility in resource optimization and process efficiency.
Peculiarities in Time Series Data
Time series data is different from regular data that is not taken at specific intervals. Some special features of time series data are as following:
- Each value of a time series is time-dependent. Time dependency means that the values for every period are affected by the values of the past periods.
- Time series data suffers from seasonality. This is not often observed in regular data. This is mainly due to the chronological order of data and results in the cyclicity of patterns. This cyclicity is a result of cyclicity observed in the time i.e. hour, day, month, or year.
- The intervals in a time series need to be identical. This means that each value must be recorded at a regular interval. The interval can be hourly, daily, monthly, etc but should remain constant throughout the sequence.
- Dealing with missing values as a result of an issue in the equipment/software can be complex. Here, we cannot use the traditional approach of imputing missing values with the median or mean of the data. This is obvious as every value is time-dependent and thus, the mean or median may not be the correct representation of the missing data.
- We can adjust the frequency of a time-series depending on the values we are interested in. This means that hourly data can be converted into daily data by appropriate measures. Similarly, daily data can be split into hourly data using appropriate rules.
- In contrast to regular data, time-series data cannot be shuffled as the data needs to be in chronological order and shuffling will destroy that order.
- Time series data doesn't follow any of the standard distributions.
- Time series data assumes patterns observed in the past are expected to persist in the future.
In the following articles in this series, we will develop an understanding of Time series notations, mathematics and learn ways to model time series data using Python programming language to make predictions for the future. Stay tuned!
Author: Anup Tiwari
Connect on LinkedIn