[[rdoc:stats:glm]]

**Fitting Generalized Linear Models**

`glm`

is used to fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of the error distribution.

glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = glm.control(...), model = TRUE, method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...) glm.fit(x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = glm.control(), intercept = TRUE) ## S3 method for class 'glm': weights(object, type = c("prior", "working"), ...)

`formula` |
---|

The details of model specification are given below. |

`family` |
---|

function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family`

for details of family functions.) |

`data` |
---|

coercible by `as.data.frame`

to a data frame) containing the variables in the model. If not found in `data`

, the variables are taken from `environment(formula)`

, typically the environment from which `glm`

is called. |

`weights` |
---|

process. Should be `NULL`

or a numeric vector. |

`subset` |
---|

to be used in the fitting process. |

`na.action` |
---|

when the data contain `NA`

s. The default is set by the `na.action`

setting of `options`

, and is `na.fail`

if that is unset. The “factory-fresh” default is `na.omit`

. Another possible value is `NULL`

, no action. Value `na.exclude`

can be useful. |

`start` | starting values for the parameters in the linear predictor. |
---|---|

`etastart` | starting values for the linear predictor. |

`mustart` | starting values for the vector of means. |

`offset` |

known component to be included in the linear predictor during fitting. This should be `NULL`

or a numeric vector of length either one or equal to the number of cases. One or more `offset`

terms can be included in the formula instead or as well, and if both are specified their sum is used. See `model.offset`

. |

`control` |
---|

process. See the documentation for `glm.control`

for details. |

`model` |
---|

should be included as a component of the returned value. |

`method` |
---|

The default method `“glm.fit”`

uses iteratively reweighted least squares (IWLS). The only current alternative is `“model.frame”`

which returns the model frame and does no fitting. |

`x, y` |
---|

logical values indicating whether the response vector and model matrix used in the fitting process should be returned as components of the returned value.

For `glm.fit`

: `x`

is a design matrix of dimension ```
n
* p
```

, and `y`

is a vector of observations of length `n`

. |

`contrasts` |
---|

of `model.matrix.default`

. |

`object` | an object inheriting from class `“glm”` . |
---|---|

`type` |

extract from the fitted model object. |

`intercept` |
---|

*null* model? |

`...` | further arguments passed to or from other methods. |
---|

A typical predictor has the form `response ~ terms`

where `response`

is the (numeric) response vector and `terms`

is a series of terms which specifies a linear predictor for `response`

. For `binomial`

and `quasibinomial`

families the response can also be specified as a `factor`

(when the first level denotes failure and all others success) or as a two-column matrix with the columns giving the numbers of successes and failures. A terms specification of the form `first + second`

indicates all the terms in `first`

together with all the terms in `second`

with duplicates removed. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a `terms`

object as the formula.

A specification of the form `first:second`

indicates the the set of terms obtained by taking the interactions of all terms in `first`

with all terms in `second`

. The specification `first*second`

indicates the *cross* of `first`

and `second`

. This is the same as `first + second + first:second`

.

`glm.fit`

is the workhorse function.

If more than one of `etastart`

, `start`

and `mustart`

is specified, the first in the list will be used. It is often advisable to supply starting values for a `quasi`

family, and also for families with unusual links such as `gaussian(“log”)`

.

All of `weights`

, `subset`

, `offset`

, `etastart`

and `mustart`

are evaluated in the same way as variables in `formula`

, that is first in `data`

and then in the environment of `formula`

.

`glm`

returns an object of class inheriting from `“glm”`

which inherits from the class `“lm”`

. See later in this section.

The function `summary`

(i.e., `summary.glm`

) can be used to obtain or print a summary of the results and the function `anova`

(i.e., `anova.glm`

) to produce an analysis of variance table.

The generic accessor functions `coefficients`

, `effects`

, `fitted.values`

and `residuals`

can be used to extract various useful features of the value returned by `glm`

.

`weights`

extracts a vector of weights, one for each case in the fit (after subsetting and `na.action`

).

An object of class `“glm”`

is a list containing at least the following components:

`coefficients` | a named vector of coefficients |
---|---|

`residuals` |

in the final iteration of the IWLS fit. Since cases with zero weights are omitted, their working residuals are `NA`

. |

`fitted.values` |
---|

the linear predictors by the inverse of the link function. |

`rank` | the numeric rank of the fitted linear model. |
---|---|

`family` | the `family` object used. |

`linear.predictors` | the linear fit on link scale. |

`deviance` |

log-likelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero. |

`aic` |
---|

maximized log-likelihood plus twice the number of coefficients (so assuming that the dispersion is known). |

`null.deviance` |
---|

`deviance`

. The null model will include the offset, and an intercept if there is one in the model |

`iter` | the number of iterations of IWLS used. |
---|---|

`weights` |

in the final iteration of the IWLS fit. |

`prior.weights` | the case weights initially supplied. |
---|---|

`df.residual` | the residual degrees of freedom. |

`df.null` | the residual degrees of freedom for the null model. |

`y` |

model.) |

`converged` | logical. Was the IWLS algorithm judged to have converged? |
---|---|

`boundary` |

attainable values? |

`call` | the matched call. |
---|---|

`formula` | the formula supplied. |

`terms` | the `terms` object used. |

`data` | the `data argument` . |

`offset` | the offset vector used. |

`control` | the value of the `control` argument used. |

`method` |

`“glm.fit”`

. |

`contrasts` | (where relevant) the contrasts used. |
---|---|

`xlevels` |

used in fitting. |

In addition, non-empty fits will have components `qr`

, `R`

and `effects`

relating to the final weighted linear fit.

Objects of class `“glm”`

are normally of class ```
c(“glm”,
“lm”)
```

, that is inherit from class `“lm”`

, and well-designed methods for class `“lm”`

will be applied to the weighted linear model at the final iteration of IWLS. However, care is needed, as extractor functions for class `“glm”`

such as `residuals`

and `weights`

do **not** just pick out the component of the fit with the same name.

If a `binomial`

`glm`

model is specified by giving a two-column response, the weights returned by `prior.weights`

are the total numbers of cases (factored by the supplied case weights) and the component `y`

of the result is the proportion of successes.

The original implementation of `glm`

was written by Simon Davies working for Ross Ihaka at the University of Auckland, but has since been extensively re-written by members of the R Core team.

The design was inspired by the S function of the same name described in Hastie \& Pregibon (1992).

Dobson, A. J. (1990) *An Introduction to Generalized Linear Models.* London: Chapman and Hall.

Hastie, T. J. and Pregibon, D. (1992) *Generalized linear models.* Chapter 6 of *Statistical Models in S* eds J. M. Chambers and T. J. Hastie, Wadsworth \& Brooks/Cole.

McCullagh P. and Nelder, J. A. (1989) *Generalized Linear Models.* London: Chapman and Hall.

Venables, W. N. and Ripley, B. D. (2002) *Modern Applied Statistics with S.* New York: Springer.

`anova.glm`

, `summary.glm`

, etc. for `glm`

methods, and the generic functions `anova`

, `summary`

, `effects`

, `fitted.values`

, and `residuals`

. Further, `lm`

for non-generalized *linear* models.

`esoph`

, `infert`

and `predict.glm`

have examples of fitting binomial glms.

## Dobson (1990) Page 93: Randomized Controlled Trial : counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) print(d.AD <- data.frame(treatment, outcome, counts)) glm.D93 <- glm(counts ~ outcome + treatment, family=poisson()) anova(glm.D93) summary(glm.D93) ## an example with offsets from Venables & Ripley (2002, p.189) data(anorexia, package="MASS") anorex.1 <- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) summary(anorex.1) # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) summary(glm(lot1 ~ log(u), data=clotting, family=Gamma)) summary(glm(lot2 ~ log(u), data=clotting, family=Gamma)) ## Not run: ## for an example of the use of a terms object as a formula demo(glm.vr) ## End(Not run)