Prediction models¶

In [1]:
import numpy as np
import pandas as pd
df=pd.read_csv('precip.csv')
df.head()
Out[1]:
precip growth
0 2.176092 25.350882
1 2.280644 17.534213
2 1.703581 28.590446
3 1.061713 21.454899
4 1.718713 14.993775

Converting to standard units¶

In [5]:
def convert_to_su(array):
    mn_subtracted=array-np.mean(array)
    norm_data=mn_subtracted/np.std(array)
    return norm_data

Manually calculating Pearson Correlation Coefficient¶

In [7]:
precip_su = convert_to_su(df['precip'])
growth_su = convert_to_su(df['growth'])
r = np.mean(precip_su*growth_su)
r
Out[7]:
0.19418647123114102

Plotting the best fit line in standard units¶

In [10]:
import matplotlib.pyplot as plt
fig,ax=plt.subplots()
ax.scatter(precip_su,growth_su)
ax.set_xlabel('precip_su')
ax.set_ylabel('growth_su')
xvls=np.linspace(-5,5,20)
r=np.mean(precip_su*growth_su)
yvls=r*xvls
ax.plot(xvls,yvls,'r')
Out[10]:
[<matplotlib.lines.Line2D at 0x1cc450bac10>]
No description has been provided for this image

Quick way to determine Pearson Correlation Coeffecient¶

In [13]:
corr_matrix=np.corrcoef(df['precip'], df['growth'])
print(corr_matrix)
corr_coeff=corr_matrix[0,1]
print(corr_coeff)
[[1.         0.19418647]
 [0.19418647 1.        ]]
0.19418647123114094

Plotting best fit line in original units¶

slope = r * (sd(y)/sd(x))
intercept = y - slope(x)

In [18]:
slope = corr_coeff*(np.std(df['growth'])/np.std(df['precip']))
intercept = np.mean(df['growth']) - slope*np.mean(df['precip'])
In [20]:
fig,ax=plt.subplots()
ax.scatter(df['precip'],df['growth'])
ax.set_xlabel('precip')
ax.set_ylabel('growth')
xvls=np.linspace(0,3,20)
yvls=slope*xvls+intercept
ax.plot(xvls,yvls,'r')
Out[20]:
[<matplotlib.lines.Line2D at 0x1cc46e75550>]
No description has been provided for this image

Predicting a value¶

In [21]:
growth_prediction = slope*3+intercept
growth_prediction
Out[21]:
24.481837780469004