Distinguishing the "best" equation is somewhat subjective, but statisticians have developed some criteria to evaluate whether one model is likely better than another. For my class we are using SPSS as our statistical software, since that's the licensed software on our campus (IUPUI). It's expensive, and even with our campus license, you have to "rent" it every semester you want to use it. I personally don't use it for my research, since, while it's a reasonable GUI option, there are many advanced functions that it just can't do, and its flexibility to alter parameters is limited. I use the free, open-source software R, which has a steep learning curve, since it is command-line, but far more powerful and flexible. I don't have my students to learn it because of the learning curve, and since most of them in their future careers will just want simple software that they can point-and-click to get reasonable results. They likely won't want to do any programming to get their results.
In my search for a way to allow my students compare "nested regression models" using SPSS, I spent a great deal of time Googling ways to get SPSS to generate AIC, and it just won't, except for logistic regression, or using the advanced Generalized Regression Models feature--both of which are great options, but the former is a specialized technique for probability outcomes, while the latter is not necessarily a good option for an introductory sociological stats class.
I found 5 ways to get SPSS to give me AIC, and I will teach the students 2 of those ways--one formula, and manually forcing SPSS to produce the regression AIC using syntax. I reproduce the 5 methods below, since there is no simple "checkbox" for regular linear regression in SPSS. Recognize that the linear regression method and the GZM (generalized linear regression) AIC produce different numbers. The absolute AIC number is not relevant, but only the difference in the AICs of different models--then choose the model that produces the smallest AIC.
In the equations below, n = sample size, k = number of parameters, SSE = sum of squares error (or residual sum of squares as listed in SPSS output)
- AIC formula #1 (same result as SPSS linear regression syntax)
n*Ln(SSE/n) + 2*(k+1)
- AIC formula #2 (same result as SPSS GZM)
2k + n [Ln(2(pi) SSE/n ) + 1]
- AIC formula #3 (same result as SPSS GZM)
Requires you to obtain the log-likelihood, which in SPSS, you can only get using GZM (generalized linear model, see option #5 below)
2*k – 2* loglikelihood
- SPSS method #1: Use linear regression syntax
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/STATISTICS COEFF OUTS R ANOVA Selection
/METHOD=ENTER IndependentVariables separated by a space.
- SPSS method #2: Use GZM
Click the following:
Analyze --> Generalized Linear Models --> Generalized Linear Models
Under the "Response" tab, put the outcome variable into the "Dependent Variable" box.
Under the "Predictors" tab, put all continuous independent variables into the “Covariates” box--if you have any categorical predictors, those go into the "Factors" box.
Under the "Models" tab, put all listed variables (independents) into the “model” box and make sure that as "Type" they are listed as "Main effects"