How to Run Logistic Regression in SPSS (Step-by-Step Guide)

Logistic regression is used when your outcome (dependent) variable is categorical — typically binary (yes/no, pass/fail, purchased/did not purchase). Unlike linear regression which predicts a continuous number, logistic regression predicts the probability of belonging to a category. It is one of the most widely used techniques in medical research, marketing analytics, and social sciences.

When to Use Logistic Regression

Use binary logistic regression when:

Your dependent variable is binary (two categories, e.g., 0/1, yes/no)
You have one or more predictors (continuous or categorical)
You want to know which predictors significantly influence the outcome
You want to predict group membership or calculate probabilities

Example research questions:

Which factors predict whether a student will graduate (yes/no)?
Does smoking status, age, and BMI predict the likelihood of heart disease?
What variables predict whether a customer will churn?

Assumptions

Logistic regression has fewer assumptions than linear regression, but they still matter:

Binary dependent variable — The DV must have exactly two categories
Independence of observations — Each case is independent (no repeated measures)
No multicollinearity — Predictors should not be too highly correlated (VIF < 10)
Linearity of logit — Continuous predictors must have a linear relationship with the log-odds of the outcome
Large sample size — A common rule of thumb is at least 10–20 cases per predictor for the least frequent outcome category

Note: Logistic regression does not assume normality, homoscedasticity, or linearity between predictors and the DV.

Step-by-Step in SPSS

Step 1: Prepare Your Data

Ensure your DV is coded as 0 and 1 (or has two clear categories)
Categorical predictors should be defined as nominal in Variable View
Check for missing data and extreme outliers

Step 2: Run the Analysis

Go to Analyze → Regression → Binary Logistic
Move your binary outcome variable to the Dependent box
Move your predictor variables to the Covariates box
For categorical predictors, click Categorical and move them in, selecting the reference category (usually "First" or "Last")
Under Method, keep Enter (forces all predictors in) or choose Forward: LR for stepwise selection
Click Options and check:
- Classification plots
- Hosmer-Lemeshow goodness-of-fit
- CI for Exp(B) at 95%
- Iteration history
Click OK

Step 3: Check Model Fit

SPSS produces several fit statistics:

Omnibus Tests of Model Coefficients

This tests whether your model as a whole is significant:

Look at the Model row
If p < .05, the model with predictors is significantly better than the null model (intercept only)

Hosmer-Lemeshow Test

This tests goodness of fit:

If p > .05, the model fits the data well
If p < .05, the model does not adequately fit — consider adding/removing predictors

Model Summary

-2 Log Likelihood: Lower is better (compared across models)
Cox & Snell R²: Pseudo R-squared (never reaches 1.0)
Nagelkerke R²: Adjusted pseudo R-squared (ranges 0 to 1, more interpretable)

Interpreting the Output

The Variables in the Equation Table

This is the most important table. For each predictor:

Column	Meaning
B	The logistic coefficient (log-odds)
S.E.	Standard error of B
Wald	Test statistic (B² / S.E.²)
df	Degrees of freedom
Sig.	p-value — if < .05, the predictor is significant
Exp(B)	The odds ratio — the key result

Understanding Odds Ratios (Exp(B))

The odds ratio is the most meaningful output:

Exp(B) = 1: No effect — the predictor does not change the odds
Exp(B) > 1: The predictor increases the odds. For example, Exp(B) = 2.5 means the odds are 2.5 times higher for a one-unit increase
Exp(B) < 1: The predictor decreases the odds. For example, Exp(B) = 0.6 means the odds decrease by 40% for a one-unit increase

For a categorical predictor (e.g., gender), the odds ratio compares the target group to the reference group.

Classification Table

This shows the model's prediction accuracy:

Overall percentage: How accurately the model classifies cases
Sensitivity: Correct predictions for the positive class
Specificity: Correct predictions for the negative class

A model should predict better than chance (50% for balanced data).

Checking Assumptions

Multicollinearity

Run a separate linear regression with the same predictors to get VIF values:

Analyze → Regression → Linear
Check Collinearity diagnostics in Statistics
VIF > 10 indicates a problem

Linearity of the Logit

For continuous predictors, create interaction terms between the predictor and its natural log. If the interaction is significant, the linearity assumption is violated. Use the Box-Tidwell approach:

Compute: ln_predictor = LN(predictor)
Add both the predictor and the interaction term (predictor × ln_predictor) to the model
If the interaction is significant (p < .05), the assumption is violated

Outliers and Influential Cases

Save residuals when running logistic regression:

In the Logistic Regression dialog, click Save
Check Cook's influence and Standardized residuals
Cases with standardized residuals > ±2.5 or Cook's D > 1 should be investigated

APA Reporting

A binary logistic regression was performed to assess the effects of age, smoking status, and BMI on the likelihood of developing heart disease. The logistic regression model was statistically significant, χ²(3) = 24.56, p < .001. The model explained 28.4% (Nagelkerke R²) of the variance in heart disease diagnosis and correctly classified 76.3% of cases. Age was a significant predictor (b = 0.08, SE = 0.03, Wald = 7.12, p = .008). Increasing age was associated with increased likelihood of heart disease, with an odds ratio of 1.08, 95% CI [1.02, 1.15], indicating that for each additional year of age, the odds of heart disease increased by 8%.

Common Mistakes

Using linear regression for a binary outcome — Linear regression produces impossible probabilities (below 0 or above 1). Always use logistic regression for binary DVs
Ignoring the Hosmer-Lemeshow test — A significant result means your model does not fit. Do not ignore it
Misinterpreting B as a direct effect — B is in log-odds. Report Exp(B) (the odds ratio) for meaningful interpretation
Insufficient sample size — With rare outcomes, you need more cases per predictor. Models with too few cases per variable produce unreliable estimates
Not reporting odds ratios with confidence intervals — Always include the 95% CI for Exp(B)

Beyond Binary Logistic Regression

Multinomial logistic regression: When the DV has three or more unordered categories
Ordinal logistic regression: When the DV has ordered categories (e.g., low/medium/high)
Hierarchical logistic regression: Enter predictors in blocks to test incremental prediction

Need help with logistic regression or other predictive modeling? Our team runs the analysis, checks assumptions, and delivers APA-formatted results. Get your free quote.