Introduction
Non-linear models are no longer popular as they used to be, today many books skip a discussion on non-linear models or explain it in a limited capacity. Apparently, there are a good number of reasons for this. Firstly, one can fit a non-linear curve using linear regression by either using higher-order polynomial terms or applying a basis function (in other words, apply a log
or sqrt
transformation on the explanatory variables). Secondly, one may choose simplicity over complex models as linear models are easy to interpret and explain. Furthermore, in order to apply a non-linear model, it is important to visualize the data and get a sense of trajectory which is not possible in a multivariate setting. Lastly, non-linear models are best suited for mechanistic cases where the phenomena are purely physical or in deterministic terms which is not the case for big data where the user activities are logged by machines.
However, despite all these good reasons non-linear modeling is a beautiful science that should be explored, compared with its linear counterpart.
Non Linear Regression Model
The estimation techniques for a non-linear model are explicitly iterative. Additionally, there is a fundamental difference in the way one applies a non-linear formula which is not the same as a linear formula. Following three different equations/formulas, each one is considered linear because the relation between variable y and
What makes a formula non-linear is something like this
The use of
Unfortunately, this makes the computations more difficult and a successful numerical solution to the estimation problem will not be guaranteed. So non-linear models come with a cautionary warning that getting answers out may not be intuitive or accurate.
Specialized Model
Usually, when one applies a non-linear model they use specialized types. This makes fitting a non-linear model more convenient or numerically easier but it needs to be agreed on by some subject matter expert and aligned with the data. By specialized types we mean one uses a predefined mathematical equation. Here is the list of most frequently used non-linear equations (some were omitted for brevity).
Non Linear Function Name | Equation |
---|---|
Michaelis-Menten | |
Logistic S Function | |
Weibull | |
Gompertz | |
Bell-shaped | $y=aexp(- |
Biexponential |
Strategy involved for Non Linear Regression Model
- Visualize the data to see if a specialized mathematical function can best explain the data.
- Either eyeballing the plot or using a self starter function (explained below) select the initial parameter values.
- Fit the model and examine it.
- Perform a statistical analysis on the fitted model to understand the parameter confidence interval, residual error, the variation explained.
- Try to add more parameters to the model if it helps to reduce the residual error.
- Always fit a linear model and compare the results of non-linear and the linear model.
Explaining Non-linear model through an Example
We are going to fit a specialized non-linear model on a popular dataset Puromycin which presents the reaction rate and the concentration information for an enzymatic reaction involving cells with Puromycin drug. The data comes with two classes of cells, one treated with the drug Puromycin and the other one without this drug.
names(Puromycin)
Puromycin
The three variables in this dataframe are the concentration of the substrate, the initial rate of the reaction, and an indicator of treated or untreated.
By plotting the rate against concentration and labeling the two levels of the treatment, we can see a general pattern. The curves are asymptotic and quite distinct for treated and untreated cell cohorts.
plot(Puromycin$conc, Puromycin$rate, type="n", xlab = "conc", ylab = "rate")
text(Puromycin$conc, Puromycin$rate,
ifelse(Puromycin$state == "treated", "T", "U"))
Because this data exhibits a strong Michaelis-Menten relation between the reaction rate and the concentration, the experimenters expect to fit the following mathematical equation.
where the
Fitting a Non Linear Model
Even though we know the relationship, fitting a non-linear model is not that straightforward because the model requires initial guesses for
- Either eyeball the plot to find the initial values.
- Or use a self-starter function (given below).
Self-Starting Function | Model |
---|---|
SSasymp | asymptotic regression model |
SSasympOff | asymptotic regression model with an offset |
SSasympOrig | asymptotic regression model through the origin |
SSbiexp | biexponential model |
SSfol | first-order compartment model |
SSfpl | four-parameter logistic model |
SSgompertz | Gompertz growth model |
SSlogis | logistic model |
SSmicmen | Michaelis–Menten model |
SSweibull | Weibull growth curve model |
For each model type, we have a corresponding self-starter function that can be used for an initial guess. |
Fit a Model using an Initial Guess
Using the initial value of nls()
to fit the data. Usually, for linear regression, we do not need to specify the parameters Puromycin
with initial values.
Purboth_1 <- nls(rate ~ (Vm)*conc/(K+conc), Puromycin, list(Vm=160, K=0.05))
summary(Purboth_1)
Calculate Residual Error
We calculate the residual error as
sse <- Purboth_1$m$deviance()
sse