import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
%reload_ext autoreload
%autoreload 2
What is seaborn?
There is no universally best way to visualize data. Different questions are best answered by different plots. Seaborn makes it easy to switch between different visual representations by using a consistent dataset-oriented API. Seaborn helps you explore and understand your data.
Its plotting functions operate on dataframes and arrays containing whole datasets and internally perform the necessary semantic mapping and statistical aggregation to produce informative plots. Its dataset-oriented, declarative API lets you focus on what the different elements of your plots mean, rather than on the details of how to draw them.
Setup
Import Data
seaborn has a number of built in datases:
# Seaborn provides a playground of built in datasets
print(sns.get_dataset_names())
['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights', 'fmri', 'geyser', 'glue', 'healthexp', 'iris', 'mpg', 'penguins', 'planets', 'seaice', 'taxis', 'tips', 'titanic']
For the purposes of this blog we will be working with the car_crashes dataset.
# Load a built in dataset of US car crash
= sns.load_dataset('car_crashes')
crash_df crash_df.head()
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | abbrev | |
---|---|---|---|---|---|---|---|---|
0 | 18.8 | 7.332 | 5.640 | 18.048 | 15.040 | 784.55 | 145.08 | AL |
1 | 18.1 | 7.421 | 4.525 | 16.290 | 17.014 | 1053.48 | 133.93 | AK |
2 | 18.6 | 6.510 | 5.208 | 15.624 | 17.856 | 899.47 | 110.35 | AZ |
3 | 22.4 | 4.032 | 5.824 | 21.056 | 21.280 | 827.34 | 142.39 | AR |
4 | 12.0 | 4.200 | 3.360 | 10.920 | 10.680 | 878.41 | 165.63 | CA |
Distribution plots
Distribution plot
A distribution plot provides a way to look at a univariate distribution. A univeriate distribution provides a distribution for one variable. For more detail check out the seaborn documentation.
# Kernal Density Estimation with a Histogram is provided
'alcohol']) sns.distplot(crash_df[
/tmp/ipykernel_206/3403426496.py:3: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(crash_df['alcohol'])
<AxesSubplot: xlabel='alcohol', ylabel='Density'>
If we just want the KDE on its own:
# Get just the KDE plot
'alcohol']) sns.kdeplot(crash_df[
<AxesSubplot: xlabel='alcohol', ylabel='Density'>
The kernel density estimation is included by default but can be removed by setting kde=False.
# kde=False removes the KDE
# Bins define how many buckets to divide the data up into between intervals
'alcohol'],
sns.distplot(crash_df[=False,
kde=25) bins
/tmp/ipykernel_206/1004551440.py:5: UserWarning:
`distplot` is a deprecated function and will be removed in seaborn v0.14.0.
Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).
For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751
sns.distplot(crash_df['alcohol'], kde=False, bins=25)
<AxesSubplot: xlabel='alcohol'>
Joint plot
Let’s say we want to investigate the alcohol and speeding variables and also the relationship between the two. A jointplot compares two distributions and plots a scatter plot by default. We can also include a ‘best fit’ regression line by passing in the argument kind=‘reg’.
# joint plot of speeding vs alcohol
= 'speeding',
sns.jointplot(x = 'alcohol',
y = crash_df,
data = 'reg') kind
As we can see there looks to be a clear positive correlation between speeding and drink-driving.
We can also create a 2D KDE by passing in the argument kind=‘kde’
# joint plot of speeding vs alcohol
= 'speeding',
sns.jointplot(x = 'alcohol',
y = crash_df,
data = 'kde') kind
…and we can create a hexagon distribution with kind=‘hex’
# joint plot of speeding vs alcohol
= 'speeding',
sns.jointplot(x = 'alcohol',
y = crash_df,
data = 'hex') kind
Pair plots
Pair Plots plot relationships across the entire DataFrame’s numerical values:
# # pair plot for car_crashes DataFrame
sns.pairplot(crash_df)
Note that in addition to the relationship scatterplots we also have a histogram for each variable on the leading diagonal.
TWe can pass in hue=‘some_categorical_variable’ to effectively include a THIRD variable in our pair plot!
Let’s illustrate this using one of the other inbuilt seaborn datasets tips which includes some categorical variables:
# With hue you can pass in a categorical column and the charts will be colorized
# You can use color maps (palette) from Matplotlib to define what colors to use
= sns.load_dataset('tips')
tips_df
sns.pairplot(tips_df, = 'sex',
hue = 'Blues') palette
Rug Plot
A Rug Plot plots a single column of datapoints in an array as sticks on an axis. You’ll see a more dense number of lines where the amount is most common. This is like how a histogram is taller where values are more common.
# plot a rugplot of tips
'tip']) sns.rugplot(tips_df[
<AxesSubplot: xlabel='tip'>
We can see that the lines are denser between just below 2 dollars and up towards 3 dollars.
Styling
We can control the overall look of our plots using set_style
# set overall style
'ticks')
sns.set_style(
# set size of our plot
= (8,4))
plt.figure(figsize
# labels
'paper', font_scale = 1.5)
sns.set_context(
# jointplot of speeding v alcohol from car_crashes DF, include regression line
= 'speeding',
sns.jointplot(x = 'alcohol',
y = crash_df,
data = 'reg')
kind
# we can turn the axis spines on or off
= False,
sns.despine(left = True) bottom
<Figure size 800x400 with 0 Axes>
Categorical Plots
Bar Plots
Bar plot or bar chart can be used to aggregate categorical data based on a function. Mean is the default but we can change the reported summary statistic. For more information see the seaborn documentation.
# plot a barplot showing mean total bill by sex
= 'sex',
sns.barplot(x = 'total_bill',
y = tips_df) data
<AxesSubplot: xlabel='sex', ylabel='total_bill'>
We can estimate total bill amount based on sex. With estimator we can define functions to use other than the mean like those provided by NumPy : median, std, var, cov or make your own functions
# plot a barplot showing median total bill by sex
= 'sex',
sns.barplot(x = 'total_bill',
y = tips_df,
data = np.median) estimator
<AxesSubplot: xlabel='sex', ylabel='total_bill'>
Count Plot
A count plot is like a bar plot, but the estimator is counting the number of occurences. For more information see the seaborn documentation.
# plot a count of number of males and females
= 'sex',
sns.countplot(x = tips_df) data
<AxesSubplot: xlabel='sex', ylabel='count'>
Hit shift and tab after the function to pull up docstring and the list of possible arguments
Box Plot
A box plot allows you to compare different variables. The box shows the quartiles of the data. The bar in the middle is the median and the box extends 1 standard deviation from the median. The whiskers extend to all the other data aside from the points that are considered to be outliers. For more information see the seaborn documentation.
# create box plots of total bill per day
# use hue to allow comparison between male and female
= 'day',
sns.boxplot(x = 'total_bill',
y = tips_df,
data = 'sex')
hue
# Sometimes the labelling can get a bit cluttered. We can control this
# loc=0 moves legend to the best position
=0) plt.legend(loc
<matplotlib.legend.Legend at 0x7f31c52d92d0>
Interestingly, we can see that men spend more than women on a Friday, but less than women on a Saturday.
Violin Plot
A violin plot is a combination of the boxplot and KDE. Whereas a box plot corresponds to data points, the violin plot uses the KDE estimation of the data points. For more information see the seaborn documentation.
# create a violin plot showing distribution of total bill by day and sex
= 'day',
sns.violinplot(x = 'total_bill',
y = tips_df,
data = 'sex') hue
<AxesSubplot: xlabel='day', ylabel='total_bill'>
# Split allows you to compare how the categories compare to each other
= 'day',
sns.violinplot(x = 'total_bill',
y = tips_df,
data = 'sex',
hue = True) split
<AxesSubplot: xlabel='day', ylabel='total_bill'>
Strip Plot
The strip plot or dotplot draws a scatter plot representing all data points where one variable is categorical. It is often used to show all observations with a box plot that represents the average distribution. For more information see the seaborn documentation.
= (8,5))
plt.figure(figsize
# Jitter spreads data points out so that they aren't stacked on top of each other
# Hue breaks data into men and women
= 'day',
sns.stripplot(x = 'total_bill',
y = tips_df,
data = True,
jitter = 'sex',
hue = False) dodge
<AxesSubplot: xlabel='day', ylabel='total_bill'>
This looks great, but the data points for male and female are difficult to distinguish. We can set the argument dode=True to separate them amd show them side by side:
= (8,5))
plt.figure(figsize
# Jitter spreads data points out so that they aren't stacked on top of each other
# Hue breaks data into men and women
= 'day',
sns.stripplot(x = 'total_bill',
y = tips_df,
data = True,
jitter = 'sex',
hue = True) dodge
<AxesSubplot: xlabel='day', ylabel='total_bill'>
Swarm plot
A swarm plot is like a strip plot, but points are adjusted so they don’t overlap. It looks like a combination of the violin and strip plots. For more information see the seaborn documentation.
# create a raw swarmplot of total bill bu day
= 'day',
sns.swarmplot(x = 'total_bill',
y = tips_df) data
<AxesSubplot: xlabel='day', ylabel='total_bill'>
We can then stack a violin plot with a swarm:
# stack a violin plot on top of our swarmplot
= 'day',
sns.violinplot(x = 'total_bill',
y = tips_df)
data
= 'day',
sns.swarmplot(x = 'total_bill',
y = tips_df,
data = 'white') color
<AxesSubplot: xlabel='day', ylabel='total_bill'>
Palettes
You can use matplotlib’s color maps for styling
= (8,8))
plt.figure(figsize
# let's go for a presentation style using 'talk'
'ticks')
sns.set_style('paper')
sns.set_context(
# customize our colours using palette argument
= 'day',
sns.stripplot(x = 'total_bill',
y = tips_df,
data = 'sex',
hue = 'plasma')
palette
# control location of legend
=2) plt.legend(loc
<matplotlib.legend.Legend at 0x7f31c50bcc40>
We can control the position of the legend by using plt.legend(loc = )
1 - upper left
2 - lower left
3 - lower right
See the other options
Matrix plots
Heatmaps
To create a heatmap you must have data set up as a matrix where variables are on the columns and rows
For example let’s look at an extract from our car_crashes dataset :
# grab first 5 rows
crash_df.head()
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | abbrev | |
---|---|---|---|---|---|---|---|---|
0 | 18.8 | 7.332 | 5.640 | 18.048 | 15.040 | 784.55 | 145.08 | AL |
1 | 18.1 | 7.421 | 4.525 | 16.290 | 17.014 | 1053.48 | 133.93 | AK |
2 | 18.6 | 6.510 | 5.208 | 15.624 | 17.856 | 899.47 | 110.35 | AZ |
3 | 22.4 | 4.032 | 5.824 | 21.056 | 21.280 | 827.34 | 142.39 | AR |
4 | 12.0 | 4.200 | 3.360 | 10.920 | 10.680 | 878.41 | 165.63 | CA |
In its current format (with the variables across the columns but not the rows) we won’t be able to produce a heat map for our car_crashes DataFrame. We can transform our DataFrame into the correct format using a correlation function:
=(8,6))
plt.figure(figsize'paper',
sns.set_context(= 1.4)
font_scale
# transform our dataset into a correlation matrix
= crash_df.corr()
crash_mx crash_mx
/tmp/ipykernel_206/2429762792.py:5: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
crash_mx = crash_df.corr()
total | speeding | alcohol | not_distracted | no_previous | ins_premium | ins_losses | |
---|---|---|---|---|---|---|---|
total | 1.000000 | 0.611548 | 0.852613 | 0.827560 | 0.956179 | -0.199702 | -0.036011 |
speeding | 0.611548 | 1.000000 | 0.669719 | 0.588010 | 0.571976 | -0.077675 | -0.065928 |
alcohol | 0.852613 | 0.669719 | 1.000000 | 0.732816 | 0.783520 | -0.170612 | -0.112547 |
not_distracted | 0.827560 | 0.588010 | 0.732816 | 1.000000 | 0.747307 | -0.174856 | -0.075970 |
no_previous | 0.956179 | 0.571976 | 0.783520 | 0.747307 | 1.000000 | -0.156895 | -0.006359 |
ins_premium | -0.199702 | -0.077675 | -0.170612 | -0.174856 | -0.156895 | 1.000000 | 0.623116 |
ins_losses | -0.036011 | -0.065928 | -0.112547 | -0.075970 | -0.006359 | 0.623116 | 1.000000 |
<Figure size 800x600 with 0 Axes>
Now that we have our data in the correct format we can go ahead and plot out heatmap:
# create a heatmap, annotate and customize from colormaps
sns.heatmap(crash_mx, = True,
annot = 'Blues') cmap
<AxesSubplot: >
Another way to prep our data for plotting is to create a pivot table. Let’s use another of seaborn’s built in datasets to illustrate:
= sns.load_dataset("flights")
flights flights
year | month | passengers | |
---|---|---|---|
0 | 1949 | Jan | 112 |
1 | 1949 | Feb | 118 |
2 | 1949 | Mar | 132 |
3 | 1949 | Apr | 129 |
4 | 1949 | May | 121 |
... | ... | ... | ... |
139 | 1960 | Aug | 606 |
140 | 1960 | Sep | 508 |
141 | 1960 | Oct | 461 |
142 | 1960 | Nov | 390 |
143 | 1960 | Dec | 432 |
144 rows × 3 columns
We can create a matrix with an index of month, columns representing years and the number of passengers for each:
= flights.pivot_table(index = 'month',
flights = 'year',
columns = 'passengers')
values flights
year | 1949 | 1950 | 1951 | 1952 | 1953 | 1954 | 1955 | 1956 | 1957 | 1958 | 1959 | 1960 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
month | ||||||||||||
Jan | 112 | 115 | 145 | 171 | 196 | 204 | 242 | 284 | 315 | 340 | 360 | 417 |
Feb | 118 | 126 | 150 | 180 | 196 | 188 | 233 | 277 | 301 | 318 | 342 | 391 |
Mar | 132 | 141 | 178 | 193 | 236 | 235 | 267 | 317 | 356 | 362 | 406 | 419 |
Apr | 129 | 135 | 163 | 181 | 235 | 227 | 269 | 313 | 348 | 348 | 396 | 461 |
May | 121 | 125 | 172 | 183 | 229 | 234 | 270 | 318 | 355 | 363 | 420 | 472 |
Jun | 135 | 149 | 178 | 218 | 243 | 264 | 315 | 374 | 422 | 435 | 472 | 535 |
Jul | 148 | 170 | 199 | 230 | 264 | 302 | 364 | 413 | 465 | 491 | 548 | 622 |
Aug | 148 | 170 | 199 | 242 | 272 | 293 | 347 | 405 | 467 | 505 | 559 | 606 |
Sep | 136 | 158 | 184 | 209 | 237 | 259 | 312 | 355 | 404 | 404 | 463 | 508 |
Oct | 119 | 133 | 162 | 191 | 211 | 229 | 274 | 306 | 347 | 359 | 407 | 461 |
Nov | 104 | 114 | 146 | 172 | 180 | 203 | 237 | 271 | 305 | 310 | 362 | 390 |
Dec | 118 | 140 | 166 | 194 | 201 | 229 | 278 | 306 | 336 | 337 | 405 | 432 |
Now that we have our data in the correct format we can go ahead and plot out heatmap:
# You can separate data with lines
sns.heatmap(flights,= 'Blues',
cmap = 'white',
linecolor = 1) linewidth
<AxesSubplot: xlabel='year', ylabel='month'>
We see that flights have increased over time and that most people travel in July and August.
Cluster Maps
A Cluster map is a hierarchically clustered heatmap. The distance between points is calculated, the closest are joined, and this continues for the next closest (It compares columns / rows of the heatmap). For more information see the seaborn documentation.
Let’s illustrate using another of seaborn’s built-in datasets:
= sns.load_dataset('iris')
iris iris
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
... | ... | ... | ... | ... | ... |
145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
150 rows × 5 columns
It would be useful to carve up our numerical data based on the different species in order to identify any trends.
# remove the column species using .pop
= iris.pop('species')
species
sns.clustermap(iris)
Let’s try this out on our flight data we looked at earlier:
# standard_scale normalizes the data to focus on the clustering
sns.clustermap(flights,= "Blues",
cmap = 1) standard_scale
With our flights data we can see that years have been reoriented to place like data closer together. You can see clusters of data for July & August for the years 59 & 60.
Pair Grids
Pair Grids allow us to create a grid of different plots with complete control over what is displayed. For more information see the seaborn documentation.
# load in iris dataset
= sns.load_dataset('iris')
iris
# Create the empty grid system using the provided data
# Colorize based on species
= sns.PairGrid(iris,
iris_g = 'species')
hue
# Put a histogram on the diagonal
iris_g.map_diag(plt.hist)
# And a scatter plot every place else
iris_g.map_offdiag(plt.scatter)
We can further tailor by having different plots in the upper, lower and diagonal:
# load in iris dataset
= sns.load_dataset('iris')
iris
# Create the empty grid system using the provided data
# Colorize based on species
= sns.PairGrid(iris,
iris_g = 'species')
hue
# Put a histogram on the diagonal
iris_g.map_diag(plt.hist)
# Have different plots in upper, lower and diagonal
iris_g.map_upper(plt.scatter) iris_g.map_lower(sns.kdeplot)
We can also define the x and y variables for our custom grids:
# load in iris dataset
= sns.load_dataset('iris')
iris
# You can define define variables for x & y for a custom grid
= sns.PairGrid(iris,
iris_g = "species",
hue = ["sepal_length", "sepal_width"],
x_vars = ["petal_length", "petal_width"])
y_vars
map(plt.scatter)
iris_g.
# Add a legend last
iris_g.add_legend()
Facet Grid
A facet grid allows us to print multiple plots in a grid where we can define columns & rows. For further information see the seaborn documentation.
Let’s return to our tips dataset to illustrate what can be done usin facet grids.
# Get histogram for smokers and non with total bill for lunch & dinner
= sns.FacetGrid(tips_df,
tips_fg = 'time',
col = 'smoker')
row
# You can pass in attributes for the histogram
map(plt.hist,
tips_fg.'total_bill',
= 8) bins
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
# Get histogram for smokers and non with total bill for lunch & dinner
= sns.FacetGrid(tips_df,
tips_fg = 'time',
col = 'smoker')
row
# Create a scatter plot with data on total bill & tip (You need 2 parameters)
map(plt.scatter,
tips_fg.'total_bill',
'tip')
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
# We can assign variables to different colors and increase size of grid
# Aspect is 1.3 x the size of height
# You can change the order of the columns
# Define the palette used
= sns.FacetGrid(tips_df,
tips_fg = 'time',
col = 'smoker',
hue = 4,
height = 1.3)
aspect
map(plt.scatter,
tips_fg."total_bill",
"tip")
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
We can change the column order:
# We can assign variables to different colors and increase size of grid
# Aspect is 1.3 x the size of height
# Define the palette used
= sns.FacetGrid(tips_df,
tips_fg = 'time',
col = 'smoker',
hue = 4,
height = 1.3,
aspect
# We can change the order of the columns and paletter
= ['Dinner', 'Lunch'],
col_order = 'Set1')
palette
# We can change the edge colour of our dots to white
map(plt.scatter,
tips_fg."total_bill",
"tip",
= 'w') edgecolor
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
# create a dictionary to define the size, linewidth and edge color of our markers
= dict(s=50,
kws =.5,
linewidth='w')
edgecolor
# let's go to town on the customizations! note we can reference the dictionary we just created
= sns.FacetGrid(tips_df,
tips_fg ='sex',
col='smoker',
hue=4,
height=1.3,
aspect=['Yes', 'No'],
hue_order=dict(marker=['^', 'v'])
hue_kws
)
map(plt.scatter,
tips_fg.'total_bill',
'tip',
**kws
)
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
Here hue has been used to denote smoker vs non-smoker - blue upward pointing markers represent smokers, and the orange downward facing markers represent non-smokers.
Let’s look at a seaborn dataset that we haven’t seeen yet. This dataframe provides scores for different students based on the level of attention they could provide during testing:
# load in attention dataset and take a look at first 13 rows
= sns.load_dataset('attention')
att_df 13) att_df.head(
Unnamed: 0 | subject | attention | solutions | score | |
---|---|---|---|---|---|
0 | 0 | 1 | divided | 1 | 2.0 |
1 | 1 | 2 | divided | 1 | 3.0 |
2 | 2 | 3 | divided | 1 | 3.0 |
3 | 3 | 4 | divided | 1 | 5.0 |
4 | 4 | 5 | divided | 1 | 4.0 |
5 | 5 | 6 | divided | 1 | 5.0 |
6 | 6 | 7 | divided | 1 | 5.0 |
7 | 7 | 8 | divided | 1 | 5.0 |
8 | 8 | 9 | divided | 1 | 2.0 |
9 | 9 | 10 | divided | 1 | 6.0 |
10 | 10 | 11 | focused | 1 | 6.0 |
11 | 11 | 12 | focused | 1 | 8.0 |
12 | 12 | 13 | focused | 1 | 6.0 |
# Put each student (subject) in their own plot with 5 per line and plot their scores
= sns.FacetGrid(att_df,
att_fg = 'subject',
col = 5,
col_wrap = 1.5
height
)
map(plt.plot,
att_fg.'solutions',
'score',
='.') marker
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
/home/stephen137/mambaforge/lib/python3.10/site-packages/seaborn/axisgrid.py:745: FutureWarning: iteritems is deprecated and will be removed in a future version. Use .items instead.
plot_args = [v for k, v in plot_data.iteritems()]
Regression Plots
Let’s revisit the tips dataset used earlier. Let’s recall what our dataset looks like:
# grab first 5 rows
tips_df.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
Say we wanted to explore the relationship between tip size and total bill size comparing men and women:
=(8,6))
plt.figure(figsize
'paper',
sns.set_context(= 1.4)
font_scale
= 'total_bill',
sns.lmplot(x = 'tip',
y = 'sex',
hue = tips_df,
data = [ 'o', '^'],
markers = { 's' : 100, 'linewidth': 0.5, 'edgecolor': 'w'})
scatter_kws
<Figure size 800x600 with 0 Axes>
It might be useful to see the equivalent broken down by day:
# sns.lmplot(x='total_bill', y='tip', col='sex', row='time', data=tips_df)
tips_df.head()
# Makes the fonts more readable
'poster',
sns.set_context(=1.4)
font_scale
='total_bill',
sns.lmplot(x='tip', data=tips_df,
y='day',
col='sex',
hue=8,
height=0.6) aspect
Acknowledgements
Thanks once again to Santiago for signposting this video posted by Derek Banas. This blog was written after interactively working through it.
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/6GUZXDef2U0" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>