This page explains how to estimate parameters in an ODE model using MATLAB's lsqcurvefit
function. The script and data files are available for download.
lsqcurvefit
uses [least squares](https://mathworld.wolfram.com/LeastSquaresFitting.html#:~:text=A mathematical procedure for finding,the points from the curve.), which minimizes the total sum of the squared errors (SSE) between the model and the data.
For this example, an SIR model will be fit to data from a flu outbreak in an English boarding school in 1978. A total of 763 boys were at risk and 512 became infected. An article on the outbreak can be found in the British Medical Journal, while the exact data used in this example is taken from Dr. Brian Reich's fitting tutorial.
**Note: This tutorial goes in the order I follow when writing my own scripts, although the order that MATLAB requires overall is different. In particular, MATLAB requires all function definitions go at the end of the script. This is clear when looking at the SIR_lsqcurvefit_example.mlx
file.
The Susceptible-Infected-Recovered (SIR) model is described by the following equations (where a dot denotes differentiation with respect to time t):
$$ \begin{align*}\dot{S} &= -\frac{\beta SI}{N},\\ \dot{I} &= \frac{\beta SI}{N} - \gamma I,\\ \dot{R} &= \gamma I.\end{align*} $$
The total population is denoted $N(t) = S(t) + I(t) + R(t)$. The contact rate per day is denoted by $\beta$, and the recovery rate per day is denoted by $\gamma$. Note that $1/\gamma$ gives the average time it takes (in days) to recover.
First, we type out the model following MATLAB's ode45
example:
function dydt = SIR(t,y,params)
% parameters to be fit are listed in the "params" vector.
beta = params(1);
gamma = params(2);
S = y(1); I = y(2); R = y(3);
N = S + I + R;
dydt = [-1*beta*S*I/N; % S'(t)
beta*S*I/N - gamma*I; % I'(t)
gamma*I]; % R'(t)
end
We want to estimate values for beta
and gamma
, so we put them in the vector params
. You can choose what order the estimated parameters will be listed in, but it must be consistent everywhere in the script.
The vector y
is a list of all the state variables in the model in our desired order, while dydt
represents the equations of the model in the desired order. For example, since we assign S=y(1)
, the first equation in dydt
is $\dot{S}=-\beta SI/N$. Typically, this follows the order of the equations of the model.