How to Run Logistic Regression in SPSS (Step-by-Step Guide)
Logistic regression is used when your outcome (dependent) variable is categorical — typically binary (yes/no, pass/fail, purchased/did not purchase). Unlike linear regression which predicts a continuous number, logistic regression predicts the probability of belonging to a category. It is one of the most widely used techniques in medical research, marketing analytics, and social sciences.
When to Use Logistic Regression
Use binary logistic regression when:
- Your dependent variable is binary (two categories, e.g., 0/1, yes/no)
- You have one or more predictors (continuous or categorical)
- You want to know which predictors significantly influence the outcome
- You want to predict group membership or calculate probabilities
Example research questions:
- Which factors predict whether a student will graduate (yes/no)?
- Does smoking status, age, and BMI predict the likelihood of heart disease?
- What variables predict whether a customer will churn?
Assumptions
Logistic regression has fewer assumptions than linear regression, but they still matter:
- Binary dependent variable — The DV must have exactly two categories
- Independence of observations — Each case is independent (no repeated measures)
- No multicollinearity — Predictors should not be too highly correlated (VIF < 10)
- Linearity of logit — Continuous predictors must have a linear relationship with the log-odds of the outcome
- Large sample size — A common rule of thumb is at least 10–20 cases per predictor for the least frequent outcome category
Note: Logistic regression does not assume normality, homoscedasticity, or linearity between predictors and the DV.
Step-by-Step in SPSS
Step 1: Prepare Your Data
- Ensure your DV is coded as 0 and 1 (or has two clear categories)
- Categorical predictors should be defined as nominal in Variable View
- Check for missing data and extreme outliers
Step 2: Run the Analysis
- Go to Analyze → Regression → Binary Logistic
- Move your binary outcome variable to the Dependent box
- Move your predictor variables to the Covariates box
- For categorical predictors, click Categorical and move them in, selecting the reference category (usually "First" or "Last")
- Under Method, keep Enter (forces all predictors in) or choose Forward: LR for stepwise selection
- Click Options and check:
- Classification plots
- Hosmer-Lemeshow goodness-of-fit
- CI for Exp(B) at 95%
- Iteration history
- Click OK
Step 3: Check Model Fit
SPSS produces several fit statistics:
Omnibus Tests of Model Coefficients
This tests whether your model as a whole is significant:
- Look at the Model row
- If p < .05, the model with predictors is significantly better than the null model (intercept only)
Hosmer-Lemeshow Test
This tests goodness of fit:
- If p > .05, the model fits the data well
- If p < .05, the model does not adequately fit — consider adding/removing predictors
Model Summary
- -2 Log Likelihood: Lower is better (compared across models)
- Cox & Snell R²: Pseudo R-squared (never reaches 1.0)
- Nagelkerke R²: Adjusted pseudo R-squared (ranges 0 to 1, more interpretable)
Interpreting the Output
The Variables in the Equation Table
This is the most important table. For each predictor:
| Column | Meaning |
|---|---|
| B | The logistic coefficient (log-odds) |
| S.E. | Standard error of B |
| Wald | Test statistic (B² / S.E.²) |
| df | Degrees of freedom |
| Sig. | p-value — if < .05, the predictor is significant |
| Exp(B) | The odds ratio — the key result |
Understanding Odds Ratios (Exp(B))
The odds ratio is the most meaningful output:
- Exp(B) = 1: No effect — the predictor does not change the odds
- Exp(B) > 1: The predictor increases the odds. For example, Exp(B) = 2.5 means the odds are 2.5 times higher for a one-unit increase
- Exp(B) < 1: The predictor decreases the odds. For example, Exp(B) = 0.6 means the odds decrease by 40% for a one-unit increase
For a categorical predictor (e.g., gender), the odds ratio compares the target group to the reference group.
Classification Table
This shows the model's prediction accuracy:
- Overall percentage: How accurately the model classifies cases
- Sensitivity: Correct predictions for the positive class
- Specificity: Correct predictions for the negative class
A model should predict better than chance (50% for balanced data).
Checking Assumptions
Multicollinearity
Run a separate linear regression with the same predictors to get VIF values:
- Analyze → Regression → Linear
- Check Collinearity diagnostics in Statistics
- VIF > 10 indicates a problem
Linearity of the Logit
For continuous predictors, create interaction terms between the predictor and its natural log. If the interaction is significant, the linearity assumption is violated. Use the Box-Tidwell approach:
- Compute:
ln_predictor = LN(predictor) - Add both the predictor and the interaction term (predictor × ln_predictor) to the model
- If the interaction is significant (p < .05), the assumption is violated
Outliers and Influential Cases
Save residuals when running logistic regression:
- In the Logistic Regression dialog, click Save
- Check Cook's influence and Standardized residuals
- Cases with standardized residuals > ±2.5 or Cook's D > 1 should be investigated
APA Reporting
A binary logistic regression was performed to assess the effects of age, smoking status, and BMI on the likelihood of developing heart disease. The logistic regression model was statistically significant, χ²(3) = 24.56, p < .001. The model explained 28.4% (Nagelkerke R²) of the variance in heart disease diagnosis and correctly classified 76.3% of cases. Age was a significant predictor (b = 0.08, SE = 0.03, Wald = 7.12, p = .008). Increasing age was associated with increased likelihood of heart disease, with an odds ratio of 1.08, 95% CI [1.02, 1.15], indicating that for each additional year of age, the odds of heart disease increased by 8%.
Common Mistakes
- Using linear regression for a binary outcome — Linear regression produces impossible probabilities (below 0 or above 1). Always use logistic regression for binary DVs
- Ignoring the Hosmer-Lemeshow test — A significant result means your model does not fit. Do not ignore it
- Misinterpreting B as a direct effect — B is in log-odds. Report Exp(B) (the odds ratio) for meaningful interpretation
- Insufficient sample size — With rare outcomes, you need more cases per predictor. Models with too few cases per variable produce unreliable estimates
- Not reporting odds ratios with confidence intervals — Always include the 95% CI for Exp(B)
Beyond Binary Logistic Regression
- Multinomial logistic regression: When the DV has three or more unordered categories
- Ordinal logistic regression: When the DV has ordered categories (e.g., low/medium/high)
- Hierarchical logistic regression: Enter predictors in blocks to test incremental prediction
Need help with logistic regression or other predictive modeling? Our team runs the analysis, checks assumptions, and delivers APA-formatted results. Get your free quote.
Keep Reading
Get More Guides Like This
Free tutorials on SPSS, Excel, Python, and research methods delivered to your inbox.
Need Professional Data Analysis Services?
Save time and get accurate results. Our experts provide statistical analysis services using SPSS, Excel, and Python — from hypothesis testing to APA-formatted reports.