In a previous article (add a link to the last article), you saw how Python’s Matplotlib and Seabron libraries can be used to plot some of the basic types of plots such as bar plot, pie plot, and histograms, etc. In this article, you will see how to plot time series data using Python’s Pandas library. The Pandas library is primarily used for data preprocessing in Python, however it also supports basic data visualizations. Particularly, time series data can be easily plotted with the Pandas library.
What is Time Series Data
Before we go an and actually plot time series data, we first need to know what time series data is. Time series data as the name suggest, is a type of data that changes over a specific time period. For instance, the daily sales for a shopping mall in a year, the hourly temperature throughout the day, the stock prices for a particular company within a specific time period, etc.
In time series data, one of the data attributes corresponds to time unit. This attribute can be a second, minute, day, hour, month year, etc. The rest of the attributes in the dataset change whenever the values in the time series column change.
In this article, you will see how the Pandas library can be used to visualize the stock prices of the Google company over a period of five years. The stock data for Google stock can be free downloaded from yahoo finance. Simply go to the yahoo finance website, search Google in the search box and download the data for any specific period. In this article, the Google stock data from January 01, 2014 to December 31, 2018 (five years) will be used for time series visualization.
Importing Required Libraries
The first step is to import the libraries necessary to visualize time series data. Execute the following script:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Importing the Dataset
The data downloaded from Yahoo finance is in CSV format. The pandas “read_csv” method can be used to read the CSV files as shown in the following script:
google_stock = pd.read_csv(‘E:/GOOG.csv’)
To get a first look at the dataset you can use the “head()” method which displays the first five rows of the dataset.
The output looks like this:
If you look at the above image, you can see that the dataset contains the Date, along with the opening, closing, highest and lowest price of the stock for a particular data. The dataset also contains the adjusted closing value and the total volume of the stock traded for a particular day.
However, if you look at the leftmost column, you will see it has no title. This is by default the index column of a Pandas dataframe. Another important observation is that the Date column is being treated as a string by default.
In the first preprocessing step, the type of the “Date” column will be changed from string to DateTime. Execute the following script:
google_stock[‘Date’] = google_stock[‘Date’].apply(pd.to_datetime)
As a next step, we will set the “Date” column as the index column because we know that the values of the rest of the attributes depend upon the “Date” column. The following script sets “Date” as the index column.
Let’s now see our dataset again:
You can see that the leftmost column which had no title, has now been removed and Date column is being treated as the index column.
Let’s plot a simple line plot to see the trend for the Closing price of Google stock data over a period of five years. Execute the following script:
plt.rcParams[‘figure.figsize’] = (10,8) # Change the plot size
The output looks like this:
You can clearly see how the closing stock price varies over a period of five years for Google.
Time shifting in time series analysis refers to shifting the data a certain number of time-steps forwards or backwards. Shifting data to forward time-steps is called forward shifting while shifting data backward for certain time-steps is called backward shifting.
It is very easy to shift data backward or forwards in Python. All you have to do is call the “shift()” method on the Pandas dataframe, and pass the number of time-steps to shift as a parameter. For instance, the following script forward shifts the data for 3 time-steps:
In the script above, we shift the data 3 time-steps forward and then use the “head()” method to print the first five records. The output looks like this:
The output shows that the attribute values for the first time-steps have shifted three timesteps forward. You can see NULL values for the attributes of the first three records.
For backward shifting, you can again use the “shift” method, however, you have to pass a negative value. For instance, if you want to move three time-steps backward, you can use the shift method as follows:
The script above shifts the data three time-steps back and then displays the last five records of the dataset using the “tail()” method. The output of the script above looks like this:
You can see that the records that previously belonged to last 3 indexes have now been shifted three steps backward, leaving NULL values for the last three indexes.
Time resampling refers to finding aggregated values for a particular attribute, grouped over a certain subset of the index column. For example, time resampling can be used to find the average of the Open stock prices per month. OR to find the maximum closing stock price per year.
Let’s find the mean value of all the attributes per month. Execute the following script:
To resample the data, the “resample” function is used. It has one parameter “rule”. The value for the rule specifies the time period for which you want to aggregate the data. In the above script, we specified “M” which stands for a month. After that, you can call whatever aggregate function you like. The detailed list of values that the “rule” parameter can accept can be foud here.
Similarly, to find the mean value of all the attributes over a quarter (three months period), you can use the following script:
The output looks like this:
You can also plot the aggregated values for a particular column. For instance, the following script prints the mean opening stock value per year:
The parameter value “A” corresponds to a year. The output looks like this:
Freq: A-DEC, Name: Open, dtype: float64
You can also plot graphs for the time resampled data. All you have to do is call “plot” function in a chain with the resample() function and an aggregate function of your choice. For instance, to plot the average opening stock prices per year, in the form of a bar plot, you can use the following script:
The output of the script looks like this:
You can clearly see an increase in the average opening stock price per year.
Similarly, you can use the following script to plot the line plot for the minimum closing values for the Google stock price for each month.
Time series analysis is one of the most crucial tasks in time-dependent datasets. In this article, we explained the process of time series analysis with the help of Python’s Pandas library. We also studied how time shifting and time resampling can be used for time series analysis in Python.
VSH is a leading python development company with extensive experience in data science and visualization solutions. Contact us to know more about our python expertise, python solutions and customer success stories.