Little Project : Introduction to Non-Linear Model and Insights using R (nls)

Introduction

Non-linear models are no longer popular as they used to be, today many books skip a discussion on non-linear models or explain it in a limited capacity. Apparently, there are a good number of reasons for this. Firstly, one can fit a non-linear curve using linear regression by either using higher-order polynomial terms or applying a basis function (in other words, apply a log or sqrt transformation on the explanatory variables). Secondly, one may choose simplicity over complex models as linear models are easy to interpret and explain. Furthermore, in order to apply a non-linear model, it is important to visualize the data and get a sense of trajectory which is not possible in a multivariate setting. Lastly, non-linear models are best suited for mechanistic cases where the phenomena are purely physical or in deterministic terms which is not the case for big data where the user activities are logged by machines.

However, despite all these good reasons non-linear modeling is a beautiful science that should be explored, compared with its linear counterpart.

Non Linear Regression Model

The estimation techniques for a non-linear model are explicitly iterative. Additionally, there is a fundamental difference in the way one applies a non-linear formula which is not the same as a linear formula. Following three different equations/formulas, each one is considered linear because the relation between variable y and and is a straight line.

What makes a formula non-linear is something like this

The use of or is arbitrary, however, the latter two-equation do not have a straight-line relation between variable y and

Unfortunately, this makes the computations more difficult and a successful numerical solution to the estimation problem will not be guaranteed. So non-linear models come with a cautionary warning that getting answers out may not be intuitive or accurate.

Specialized Model

Usually, when one applies a non-linear model they use specialized types. This makes fitting a non-linear model more convenient or numerically easier but it needs to be agreed on by some subject matter expert and aligned with the data. By specialized types we mean one uses a predefined mathematical equation. Here is the list of most frequently used non-linear equations (some were omitted for brevity).

Non Linear Function Name	Equation
Michaelis-Menten
Logistic S Function
Weibull
Gompertz
Bell-shaped	$y=aexp(-
Biexponential

Strategy involved for Non Linear Regression Model

Visualize the data to see if a specialized mathematical function can best explain the data.
Either eyeballing the plot or using a self starter function (explained below) select the initial parameter values.
Fit the model and examine it.
Perform a statistical analysis on the fitted model to understand the parameter confidence interval, residual error, the variation explained.
Try to add more parameters to the model if it helps to reduce the residual error.
Always fit a linear model and compare the results of non-linear and the linear model.

Explaining Non-linear model through an Example

We are going to fit a specialized non-linear model on a popular dataset Puromycin which presents the reaction rate and the concentration information for an enzymatic reaction involving cells with Puromycin drug. The data comes with two classes of cells, one treated with the drug Puromycin and the other one without this drug.

names(Puromycin)
Puromycin

The three variables in this dataframe are the concentration of the substrate, the initial rate of the reaction, and an indicator of treated or untreated.

By plotting the rate against concentration and labeling the two levels of the treatment, we can see a general pattern. The curves are asymptotic and quite distinct for treated and untreated cell cohorts.

plot(Puromycin$conc, Puromycin$rate, type="n", xlab = "conc", ylab = "rate")
text(Puromycin$conc, Puromycin$rate, 
     ifelse(Puromycin$state == "treated", "T", "U"))

Because this data exhibits a strong Michaelis-Menten relation between the reaction rate and the concentration, the experimenters expect to fit the following mathematical equation.

where the is the experimentation error. Using the same analogy used for the linear model, through a non-linear specialized model we try to estimate parameters and that will minimize the sum of squares of the residuals (however, there are other methods one can use to these estimate parameters):

Fitting a Non Linear Model

Even though we know the relationship, fitting a non-linear model is not that straightforward because the model requires initial guesses for and . If one fails to not choose a good initial guess, the model may fit the data poorly. One has two options to choose from:

Either eyeball the plot to find the initial values.
Or use a self-starter function (given below).

Self-Starting Function	Model
SSasymp	asymptotic regression model
SSasympOff	asymptotic regression model with an offset
SSasympOrig	asymptotic regression model through the origin
SSbiexp	biexponential model
SSfol	first-order compartment model
SSfpl	four-parameter logistic model
SSgompertz	Gompertz growth model
SSlogis	logistic model
SSmicmen	Michaelis–Menten model
SSweibull	Weibull growth curve model
For each model type, we have a corresponding self-starter function that can be used for an initial guess.

Fit a Model using an Initial Guess

Using the initial value of , found by eyeballing the plot, one can use R function nls() to fit the data. Usually, for linear regression, we do not need to specify the parameters or but it is different for a non-linear model. Every iterative algorithm needs a good starting point otherwise it may fail to converge. This is how we fit in a non-linear model on Puromycin with initial values.

Purboth_1 <- nls(rate ~ (Vm)*conc/(K+conc), Puromycin, list(Vm=160, K=0.05))
summary(Purboth_1)

Calculate Residual Error

We calculate the residual error as

sse <- Purboth_1$m$deviance()
sse

‌
‌
‌