Permutation Tests¶

Alpaca Tutorial: https://www.jwilber.me/permutationtest/¶

Steps of a Permutation Test¶

  • Identify a null hypothesis (e.g. there is no difference between groups) and an alternative hypothesis (e.g. Group A is different than B; The mean of Group A is greater than B).
  • Choose a test statistic.
  • Permute the members of the original groups. Combine data, re-assign into fictive groups that match the size of the original samples, recompute the test statistic.
  • Repeat this shuffling (permutation) many times, each time recording the test statistic.
  • Compare the distribution of recorded test statistics with the observed values (translates to a p-value).
In [6]:
# Permutation test with alpaca data set
import numpy as np
import pandas as pd
df = pd.read_csv('alpaca.csv')
df
Out[6]:
treatment control
0 7.2 4.6
1 8.3 4.6
2 8.3 5.1
3 7.1 2.8
4 4.4 5.4
5 4.1 4.4
6 4.8 4.2
7 6.2 5.6
8 7.7 5.8
9 7.4 4.1
10 4.3 3.9
11 5.8 3.8
In [12]:
# Visualize the data
import matplotlib.pyplot as plt
treat_xvals = np.zeros(len(df['treatment']))
ctrl_xvals = np.zeros(len(df['control'])) + 1
fig, ax = plt.subplots(figsize=(3,3))
ax.scatter(treat_xvals, df['treatment'])
ax.scatter(ctrl_xvals, df['control'])
ax.plot(0.1, np.mean(df['treatment']), '<')
ax.plot(1.1, np.mean(df['control']), '<')
ax.set_xticks([0,1])
ax.set_xticklabels(['treatment', 'control'])
ax.set_ylabel('wool quality')
Out[12]:
Text(0, 0.5, 'wool quality')
No description has been provided for this image

Setting up the permutation test with our test statistic as the difference in means¶

In [ ]:
# combining arrays
combined_array = np.append(df['treatment'], df['control'])
print(combined_array)
# shuffling the array
shuffled_array = np.random.permutation(combined_array)
print(shuffled_array)
[7.2 8.3 8.3 7.1 4.4 4.1 4.8 6.2 7.7 7.4 4.3 5.8 4.6 4.6 5.1 2.8 5.4 4.4
 4.2 5.6 5.8 4.1 3.9 3.8]
[5.8 4.8 7.4 4.2 5.6 7.1 4.6 3.8 3.9 4.1 6.2 8.3 4.4 4.4 4.1 5.8 4.3 7.7
 5.1 7.2 4.6 2.8 5.4 8.3]
In [15]:
# splitting into groups
shuffled_treat = shuffled_array[0:len(df['treatment'])]
shuffled_control = shuffled_array[len(df['treatment']):]

# visualize random groupings
fig, ax = plt.subplots(figsize=(3,3))
ax.scatter(treat_xvals, shuffled_treat)
ax.scatter(ctrl_xvals, shuffled_control)
ax.plot(0.1, np.mean(shuffled_treat), '<')
ax.plot(1.1, np.mean(shuffled_control), '<')
ax.set_xticks([0,1])
ax.set_xticklabels(['treatment', 'control'])
ax.set_ylabel('wool quality')
Out[15]:
Text(0, 0.5, 'wool quality')
No description has been provided for this image

Repeating the shuffle many times¶

In [25]:
number_of_shuffles = 1000
mean_diff_list = []
for each_shuffle in np.arange(number_of_shuffles):
    shuffled_array = np.random.permutation(combined_array)
    shuffled_treat = shuffled_array[0:len(df['treatment'])]
    shuffled_control = shuffled_array[len(df['treatment']):]
    mean_diff = np.mean(shuffled_treat) - np.mean(shuffled_control)
    mean_diff_list.append(mean_diff)


# Visualizing the test statistic from many shuffles
fig, ax = plt.subplots(figsize = (3,3))
ax.hist(mean_diff_list)
ax.set_ylabel('counts')
ax.set_xlabel('mean treatment - mean control')
# Overlaying observed difference
obs_diff = np.mean(df['treatment']) - np.mean(df['control'])
ax.plot(obs_diff,50,'rv')
ax.text(obs_diff-0.75,60, 'obs diff', color = 'r')
Out[25]:
Text(1.0250000000000004, 60, 'obs diff')
No description has been provided for this image

Calculating a p-value (the probability that the null hypothesis is true meaning there is no difference between group means)¶

In [28]:
#counting how many observations are equal to or more extreme than the observed difference
num_greater_than_observed = sum(np.array(mean_diff_list)>=obs_diff)
pvalue = num_greater_than_observed/number_of_shuffles
pvalue
Out[28]:
0.002