Thus far our focus has been on describing interactions or associations between two or three categorical variables mostly via single summary statistics and with significance testing.

Models can handle more complicated binary predictor software and analyze the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables. The structural form of the model describes the patterns of interactions and associations.

Models can handle more complicated situations and analyze the simultaneous effects of multiple variables, including mixtures of categorical and continuous variables. The structural form of the model describes the patterns of interactions and associations.

The model parameters provide measures of strength of associations. In models, the focus is on estimating the model parameters. The basic inference tools e.g., significance tests, confidence intervals apply. When discussing models, we will keep in mind:

The first widely used software package for fitting these models was called GLIM.

The table below provides a good summary of GLMs following Agresti ch. For a more detailed discussion refer to AgrestiCh. Following are examples of GLM components binary predictor software models that we are already familiar, such as linear regression, and for some of the models that we will cover in this class, such as logistic regression and binary predictor software models.

Simple Linear Regression models how mean expected value of a binary predictor software response variable binary predictor software on a set of explanatory variables, where index i stands for each data point:.

Binary logistic regression models are also known as logit models when the predictors are all categorical. Log-linear Model models the expected cell counts as a function of levels of categorical binary predictor software, e. The log-linear models are more general than logit models, and some logit models are equivalent to certain log-linear models.

Log-linear model is also equivalent to Poisson regression model **binary predictor software** all explanatory variables are discrete. For additional details see AgrestiSec.

There are ways around these restrictions; e.g., overdispersion.

Parameter estimates and interpretation: Do you recall, what is the interpretation of the intercept and the slope? R 2residual binary predictor software, F -statistic Model selection: From a plethora of possible predictors, which variables to include? Also called a noise model or error model. How is random error added to the prediction that comes out of the link function? Systematic Component - specifies the explanatory variables X 1X 2X k in the model, more specifically their linear combination in creating the so called linear predictor ; e.

It says how the expected value of the response relates to the linear predictor of explanatory variables; e. The dependent variable Y i does NOT need to be normally distributed, but it typically assumes a distribution from an exponential family e. GLM does NOT assume a linear relationship between the dependent variable and the independent variables, but it does assume linear relationship between the transformed response in terms of the link function and the explanatory variables; e.

Independent explanatory variables can be even the power terms or some other nonlinear transformations of the original independent variables. The homogeneity of variance does NOT need to be satisfied. In fact, it is not even possible in many cases given the model binary predictor software, and overdispersion when the observed variance is larger than what the model assumes maybe present.

Errors need to be independent but NOT normally distributed. It uses maximum binary predictor software estimation MLE rather than ordinary least binary predictor software OLS to estimate the parameters, and thus relies on large-sample approximations.

Simple Linear Regression models how mean expected value of a continuous response variable depends on a set of explanatory variables, where index i stands for each data point: Notice that with a multiple linear regression where we have more than binary predictor software explanatory variable, e. X' s are explanatory variables can be continuous, discrete, or both and are linear in the parameters, e.

Again, transformation of the X's binary predictor software are allowed like in linear regression; this holds for any GLM. The distribution of counts, which are the responses, is Poisson Systematic component: Summary of advantages of GLMs over traditional OLS regression We do not need to transform the response Y to have a normal distribution The choice of link is separate from the choice of random component thus we have more flexibility in modeling If the binary predictor software produces additive effects, then we do not need constant variance.

The models are fitted via Maximum Likelihood estimation; thus optimal properties of the estimators. All the inference tools and model checking that we will discuss for log-linear and logistic regression models apply for other GLMs too; e.g., deviance statistics, Pearson and deviance residuals, etc. There is often one procedure in a software package to capture all the models listed above, e.g., PROC GENMOD in SAS or function glm() in R. But there are some limitations of GLMs too, such as: Linear function, e.g., X β may not be flexible enough.