Chart Overview

A wide variety of packages exist for generating charts and rich (style) tables directly from dataframe type data objects in Python and R.

Creating charts directly from datatables means that the charts re guaranteed to fairly represent the original data, as long as the chart definition is sound.

A variety of statistical charting packages also exist that perform statistical operations on the provided data before then rendering an appropriate statistical chart (examples include histogram charts than bin and count raw data values, for example).

This overview introduces some of the charting approaches that are possibly in a Python environment. Charting tools in R will be covered elsewhere.

seaborn statistical charting package

The seaborn statistical data visualization package is a Python package that provides a wide range of chart types that can be used to generate static charts from pandas Series or DataFrame objects.

Conveniently, the package includes sample datasets that can be used to demonstrate the various available chart types.

%%capture
try:
    import seaborn
except:
    %pip install seaborn
import seaborn as sns

View a fragment of a sample dataset:

df_penguins = sns.load_dataset("penguins")

df_penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female

As an example of generating a statistical plot directly from the data, consider the construction of a histogram:

chart = sns.histplot(data=df_penguins, x="flipper_length_mm");
../_images/overview_8_0.png

The bin widths and counts within bins are handled automatically by the chart based on the provided (raw) data.

In some teaching and learning examples, we may want learners to work through the steps involved in the data processing steps required to create the plot; but in other cases, where we just want to create the chart itself, using the statistical chart type (if we trust it!) is simpler.

We can also save the chart to a file and then load it back in to display it:

fn_sns = "sns_output.png"

chart.figure.savefig(fn_sns)

from IPython.display import Image
Image(fn_sns)
../_images/overview_11_0.png

Creating Interactive Charts

As well as creating static charts, a wide variety of packages exist that are capable of rendering interactive charts. One such example is the plotly Python graphing library.

%%capture
try:
    import plotly
except:
    %pip install plotly

As with seaborn, various sample datasets are included in the package, such as the famous iris dataset:

import plotly.express as px

df_iris = px.data.iris()
df_iris.head()
sepal_length sepal_width petal_length petal_width species species_id
0 5.1 3.5 1.4 0.2 setosa 1
1 4.9 3.0 1.4 0.2 setosa 1
2 4.7 3.2 1.3 0.2 setosa 1
3 4.6 3.1 1.5 0.2 setosa 1
4 5.0 3.6 1.4 0.2 setosa 1

Interactive charts can be created by calling the appropriate chart type with the dataframe and specifying which data columns should be used for which chart dimension.

The resulting chart is a fully interactive chart that includes tools for zooming and panning around the chart and interactively saving the current chart view to a file.

iris_plot = px.scatter(df_iris, x="sepal_width", y="sepal_length", color='petal_length')
iris_plot

# The HTML chart can also be saved to a file
#iris_plot.write_html("plotly_demo.html")