Basics of Matplotlib¶

Importing data to plot in Matplotlib¶

In [10]:
import pandas as pd
import numpy as np
df1 = pd.read_csv('surveys.csv')
df1
Out[10]:
record_id month day year plot_id species_id sex hindfoot_length weight
0 1 7 16 1977 2 NL M 32.0 NaN
1 2 7 16 1977 3 NL M 33.0 NaN
2 3 7 16 1977 2 DM F 37.0 NaN
3 4 7 16 1977 7 DM M 36.0 NaN
4 5 7 16 1977 3 DM M 35.0 NaN
... ... ... ... ... ... ... ... ... ...
35544 35545 12 31 2002 15 AH NaN NaN NaN
35545 35546 12 31 2002 15 AH NaN NaN NaN
35546 35547 12 31 2002 10 RM F 15.0 14.0
35547 35548 12 31 2002 7 DO M 36.0 51.0
35548 35549 12 31 2002 5 NaN NaN NaN NaN

35549 rows × 9 columns

Creating a basic scatterplot¶

In [11]:
import matplotlib.pyplot as plt
fig,ax1 = plt.subplots()
ax1.scatter(df1['hindfoot_length'], df1['weight'])
Out[11]:
<matplotlib.collections.PathCollection at 0x28abc9fa290>
No description has been provided for this image

Creating a more complex scatterplot¶

s changes marker size, c changes marker color, and label adds a label for the legend. facecolors='none' creates hollow points and edgecolors=... changes edge color.

set_aspect() changes the aspect ratio of the plot.

In [12]:
fig,ax2 = plt.subplots()

ds_df = df1[df1['species_id'] == 'DS']
so_df = df1[df1['species_id'] == 'SO']

ax2.scatter(ds_df['hindfoot_length'], ds_df['weight'], s=5, c='r', label='DS' )
ax2.scatter(so_df['hindfoot_length'], so_df['weight'], s=7, facecolors='none', edgecolors='c', label='SO')

ax2.set_aspect(.25)
ax2.set_xlabel('hindfoot length')
ax2.set_ylabel('weight')
ax2.set_title('Hindfoot Length vs Weight in DS and SO individuals')
ax2.legend()
Out[12]:
<matplotlib.legend.Legend at 0x28abc9f8210>
No description has been provided for this image

Creating multiple plots¶

The first subplot parameter is the number of rows and the second subplot parameter is the number of columns.

ax.spines[...].set_visible(False) changes the visibility of the specified spine.

In [13]:
fig, (ax3a, ax3b) = plt.subplots(1,2)
ax3a.scatter(ds_df['hindfoot_length'], ds_df['weight'], s=2, c='r')
ax3b.scatter(so_df['hindfoot_length'], so_df['weight'], s = 2, c = 'c')

ax3a.spines['top'].set_visible(False)
ax3a.spines['right'].set_visible(False)
ax3b.spines['top'].set_visible(False)
ax3b.spines['right'].set_visible(False)
No description has been provided for this image

Creating a histogram¶

bins changes the bins size and range for the plot and histtype='step' makes an unfilled histogram.

In [14]:
fig,ax4 = plt.subplots()
ax4.hist(ds_df['hindfoot_length'], bins=range(0,80,2))
ax4.hist(so_df['hindfoot_length'], bins=range(0,80,2), histtype='step')
Out[14]:
(array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  1.,  2.,  2., 15.,
        11.,  5.,  2.,  0.,  1.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
         0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18., 20., 22., 24.,
        26., 28., 30., 32., 34., 36., 38., 40., 42., 44., 46., 48., 50.,
        52., 54., 56., 58., 60., 62., 64., 66., 68., 70., 72., 74., 76.,
        78.]),
 [<matplotlib.patches.Polygon at 0x28abd5dc950>])
No description has been provided for this image

Plotting lines¶

In [15]:
fig,ax5 = plt.subplots()
grouped_data = df1.groupby('year')['hindfoot_length'].mean()
year_data = grouped_data.reset_index()
ax5.plot(year_data['year'], year_data['hindfoot_length'])
ax5.scatter(year_data['year'], year_data['hindfoot_length'])
Out[15]:
<matplotlib.collections.PathCollection at 0x28abd663850>
No description has been provided for this image
In [16]:
df2 = pd.read_csv('alpaca.csv')
df2
Out[16]:
treatment control
0 7.2 4.6
1 8.3 4.6
2 8.3 5.1
3 7.1 2.8
4 4.4 5.4
5 4.1 4.4
6 4.8 4.2
7 6.2 5.6
8 7.7 5.8
9 7.4 4.1
10 4.3 3.9
11 5.8 3.8

Plotting data in groups¶

In [17]:
fig,ax6 = plt.subplots(figsize = (3,3))
treat_xvls = np.zeros(len(df2['treatment']))
ctrl_xvls = np.zeros(len(df2['control'])) + 1

ax6.scatter(treat_xvls, df2['treatment'])
ax6.scatter(ctrl_xvls, df2['control'])
ax6.plot(0.1, np.mean(df2['treatment']), '<')
ax6.plot(1.1, np.mean(df2['control']), '<')

ax6.set_xticks([0,1])
ax6.set_xticklabels(['treatment', 'control'])
Out[17]:
[Text(0, 0, 'treatment'), Text(1, 0, 'control')]
No description has been provided for this image

Plotting a confidence interval¶

In [18]:
# obtaining data for confidence interval
control_data = np.array(df2['control'])
boot_means = []

for n in range(1000):
    boot_data = np.random.choice(control_data, len(control_data))
    boot_means.append(np.mean(boot_data))

conf_interval = np.percentile(boot_means, [2.5, 97.5])
In [19]:
fig,ax7 = plt.subplots()
ax7.hist(boot_means)
ax7.plot([conf_interval[0], conf_interval[1]], [225, 225], c='r')
Out[19]:
[<matplotlib.lines.Line2D at 0x28abc986290>]
No description has been provided for this image
In [20]:
df3 = pd.read_csv('precip.csv')
df3.head()
Out[20]:
precip growth
0 2.176092 25.350882
1 2.280644 17.534213
2 1.703581 28.590446
3 1.061713 21.454899
4 1.718713 14.993775

Plotting a linear regression model¶

In [21]:
#obtaining linear regression model
r = np.corrcoef(df3['precip'], df3['growth'])[0,1]
slope = r*(np.std(df3['growth'])/np.std(df3['precip']))
intercept = np.mean(df3['growth']) - slope*np.mean(df3['precip'])
In [22]:
fig,ax8 = plt.subplots()
ax8.scatter(df3['precip'], df3['growth'])
ax8.set_xlabel('Precipitation')
ax8.set_ylabel('Growth')
xdata = df3['precip']
model_ydata = slope*xdata + intercept
ax8.plot(xdata, model_ydata, c='r')
Out[22]:
[<matplotlib.lines.Line2D at 0x28abc8e0f90>]
No description has been provided for this image