In the previous blog (add a link to the second article), we explained the process of time series data visualization in Python. In the article before that (add link) you saw how to plot some of the basic Python plots. However, the graphs plotted in the previous articles were static and non-interactive. In this article, we will see how to plot interactive graphs in Python using Python’s Plotly and Cufflinks libraries.
Downloading the Required Libraries
There are several Python libraries that can be used for plotting interactive plots. However, plotly is the defacto standard for plotting interactive graphs.
To download the Plotly library, execute the following script:
$ pip install plotly
Plotly commands can be cumbersome to execute. However using Cufflinks library, which is a Plotly wrapper for Pandas library, you can create interactive plots using pandas syntax. In Pandas, we use “plot” function to plot different types of graphs whereas, with cufflinks, we have to use the “iplot” function. Execute the following command to download the cufflinks library.
$ pip install cufflinks
Importing the Required Libraries
Once the required libraries have been downloaded, we need to import them in our application. Execute the following script:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode
Importing the Dataset
We will use the famous Titanic dataset to plot different types of interactive visualization. The Titanic dataset comes builtin with the seaborn library. The following script displays all the built-in datasets in the seaborn library.
import seaborn as sns
[‘anscombe’, ‘attention’, ‘brain_networks’, ‘car_crashes’, ‘diamonds’, ‘dots’, ‘exercise’, ‘flights’, ‘fmri’, ‘gammas’, ‘iris’, ‘mpg’, ‘planets’, ‘tips’, ‘titanic’]
The output shows the list of all the built-in datasets for the titanic library. To import the titanic dataset, execute the following script:
import seaborn as sns
titanic_Data = sns.load_dataset(‘titanic’)
The script above downloads the titanic dataset and displays its first five records. The output looks like this.
Before plotting any graph, we will remove all the null values from the dataset using the following script:
The first interactive plot that we will see is the histogram plot. A histogram is used to plot the frequency of unique values in a specific column, with the help of vertical or horizontal bars. For instance, to see the distribution of age in the form of a histogram, you can execute the following script.
titanic_Data[‘age’].iplot(kind=’hist’, xTitle=’Age’,yTitle=’count’, title=’Age Distribution’)
The output shows the most of the people are aged between 20 and 40. If you run the above script in a Python editor, you can hover over the above plot and see the changing values for different bins in the histogram.
Bar plot is used to plot the sum or mean for all the unique values in a column. For instance, if you want to plot the sum of the fare for titanic passengers who survived and for those who didn’t, you can use the bar plot as follows:
You can see that passengers who survived paid a total fare of around 16 thousand, while those who didn’t survive paid sum of 12 thousand.
The bar plot that shows the mean values for all the numeric columns in the dataset can be displayed via the following script:
In the following output, you can see that the average age is around 30 years and the average fare paid by the passengers is around 33 dollars.
The scatter plot draws values for two variables along x and axis using specified markers. For instance, the following script plots circle for the values of age and fare along x and y-axis.
titanic_Data.iplot(kind=’scatter’, x=’age’, y=’fare’, mode=’markers’)
The box plot draws the distribution of data in four quartiles in the form of a box, and top and bottom whiskers. For instance, the following script plots a box plot for the age column.
In the following output, you can see that first quartile (25%) of the passengers are aged between 0 and 20. The second quartile, of the passengers, are aged between 21 and 28. Similarly, the third quartile is aged between 29 and 38, while the last quartile contains passengers aged between 39 to 80.
The spread plot is used to plot the spread for two or more than two columns in the dataset. The following script plots spread for the age and fare columns.
Till now, you have only seen the 2-D plots, the, however, the plotly library can also be used to plot 3-D plots that can be used to visualize relationships between three variables in a dataset. For instance, the following script plots a 3-D plot for age, pclass and fare variables.
dataset2 = titanic_Data[[“age”, “pclass”, “fare”]]
data = dataset2.iplot(kind=’surface’, colorscale=’rdylbu’)
Interactive graphs can help users visualize a variety of information in the dataset in an interactive way. In this article, we saw how to different types of interactive plots in Python using plotly and the cufflinks library. In the next article, we will see how to draw geographical maps using Python.