Perfecting Models with Error Minimization¶

In [3]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("old_faithful.csv")
df
Out[3]:
eruptions waiting
0 3.600 79
1 1.800 54
2 3.333 74
3 2.283 62
4 4.533 85
... ... ...
267 4.117 81
268 2.150 46
269 4.417 90
270 1.817 46
271 4.467 74

272 rows × 2 columns

Plotting data¶

In [4]:
fig,ax1=plt.subplots()
ax1.scatter(df['eruptions'], df['waiting'])
ax1.set_xlabel('eruption time')
ax1.set_ylabel('waiting time')
Out[4]:
Text(0, 0.5, 'waiting time')
No description has been provided for this image

Visualize the raw error¶

In [5]:
model=np.empty(2)
#guess slope
model[0]=20
#guess intercept
model[1]=10

def plot_raw_error(xdata, ydata, model):
    fig,ax2=plt.subplots()
    
    #plot raw data points
    ax2.scatter(xdata, ydata)

    #get model y data
    model_ydata=xdata*model[0] + model[1]

    #plot best fit line
    ax2.plot(xdata, model_ydata)

    #calculate error values
    raw_error_values=ydata-model_ydata
    plt.plot([xdata,xdata], [ydata,ydata-raw_error_values], 'c')
In [12]:
plot_raw_error(df['eruptions'], df['waiting'], model)
No description has been provided for this image

Calculate root mean square error¶

In [6]:
xdata=df['eruptions']
ydata=df['waiting']

def calc_rms_error(model):
    model_ydata=xdata*model[0] + model[1]
    raw_error_values=ydata-model_ydata
    rms_error=np.sqrt(np.mean(raw_error_values**2))
    return rms_error
In [7]:
calc_rms_error(model)
Out[7]:
14.991246220221123

Optimizing RMS error¶

In [8]:
import scipy.optimize as opt
model_fit=opt.minimize(calc_rms_error, [15,20])
model_fit['x']
Out[8]:
array([10.72963549, 33.47440846])
In [10]:
plot_raw_error(xdata, ydata, model_fit['x'])
No description has been provided for this image

Prediction using optimized model¶

In [12]:
# Waiting time for an eruption when eruption time is 6
eruption_time=6
waiting=model_fit['x'][0]*eruption_time + model_fit['x'][1]
waiting
Out[12]:
97.85222136878929