Business Analytics

HR Analytics Case Study: Predicting Employee Attrition and Cutting Turnover

By Mohammad Abu Sufian2026-05-2811 min read
HR analytics case studyemployee attrition predictionturnover analysis SPSSpeople analyticspredict employee turnover

A 1,400-person technology company was bleeding talent — annual voluntary turnover had climbed to 16%, and each departure of a mid-level engineer was costing roughly 1.5× their salary to replace. HR had exit-interview transcripts and gut feelings, but no model. We were asked one question: which employees are most likely to leave, and what actually drives it? Here is the full analysis.

The Data

HR exported an anonymized snapshot of 1,400 employees with these fields:

  • Attrition (target): 1 = left voluntarily in the past year, 0 = stayed
  • MonthlyIncome, YearsAtCompany, Age
  • OverTime: yes / no
  • JobSatisfaction: 1 (low) to 4 (very high)
  • DistanceFromHome: kilometers
  • Department: Sales, R&D, HR

Overall attrition was 16.1%. The job was to move from that single number to who and why.

Phase 1: Exploratory Comparisons

Before modeling, we compared leavers and stayers one variable at a time. This builds intuition and tells you which variables are even worth putting in a model.

Income — Independent Samples T-Test

Do leavers earn less? In SPSS:

Analyze > Compare Means > Independent-Samples T Test

  • Test Variable: MonthlyIncome
  • Grouping Variable: Attrition (1, 0)

We first read Levene's Test for equal variances (p = .02, so we used the "equal variances not assumed" row). The result:

  • Stayers: M = $6,832, SD = $4,818
  • Leavers: M = $4,787, SD = $3,640
  • t(290.4) = 6.38, p < .001

Leavers earned significantly less. Effect size (Cohen's d) was 0.47 — a meaningful, moderate gap.

Overtime — Chi-Square Test

Is overtime linked to leaving? Using Analyze > Descriptive Statistics > Crosstabs with Chi-square:

  • Among employees working overtime, 30.5% left.
  • Among those who did not, only 10.4% left.
  • χ²(1) = 89.0, p < .001

This was the most striking pattern in the whole dataset — overtime nearly tripled the attrition rate.

Phase 2: The Logistic Regression Model

The single-variable tests are suggestive but confounded — maybe overtime workers also earn less. A logistic regression lets every variable compete while holding the others constant.

Analyze > Regression > Binary Logistic

  • Dependent: Attrition
  • Covariates: MonthlyIncome, YearsAtCompany, Age, OverTime, JobSatisfaction, DistanceFromHome
  • Mark OverTime and Department as Categorical
  • Under Options: Hosmer-Lemeshow, CI for exp(B)

Model Fit

  • Omnibus test: χ²(7) = 198.4, p < .001 — the model is significant.
  • Nagelkerke R² = 0.27 — explains about 27% of attrition variance.
  • Hosmer-Lemeshow: χ²(8) = 6.1, p = .64 — good fit (non-significant is what you want).

Variables in the Equation

Predictor Exp(B) (Odds Ratio) p
OverTime (yes) 3.41 <.001
JobSatisfaction (per point) 0.66 <.001
MonthlyIncome (per $1k) 0.92 .002
YearsAtCompany (per year) 0.93 .004
Age (per year) 0.97 .03
DistanceFromHome 1.02 .18 (n.s.)

Reading the results:

  • OverTime, Exp(B) = 3.41: even after controlling for pay, satisfaction, and tenure, overtime workers had 3.4× the odds of leaving. The earlier chi-square wasn't a confound — overtime is a genuine driver.
  • JobSatisfaction, Exp(B) = 0.66: each one-point rise in satisfaction cut the odds of leaving by 34%.
  • DistanceFromHome was not significant once other factors were accounted for — a good reminder that a variable everyone assumes matters sometimes doesn't.

Phase 3: Scoring the Workforce

We saved the predicted probabilities (Save > Probabilities in the logistic dialog) so every current employee got a churn-risk score from 0 to 1. Sorting the workforce by that score produced a watchlist: the top 8% of employees by risk accounted for nearly a third of expected departures.

What HR Actually Changed

The analysis pointed at two concrete levers, not vague culture talk:

  1. Overtime management. R&D, where overtime was heaviest, got hiring backfill and a workload audit. Within two quarters, sustained overtime dropped 22%.
  2. Targeted retention for the watchlist. Managers had structured stay-conversations with high-risk, high-performing employees, focused on pay banding and satisfaction blockers.

Twelve months later, voluntary turnover fell from 16.1% to 11.4%. The estimated saving in replacement costs was over $2.1M.

Pitfalls in People Analytics

  • Survivorship bias. Your data only describes people still in the system plus recent leavers. Long-gone patterns are invisible — interpret cautiously.
  • Ethics and fairness. We deliberately excluded gender, ethnicity, and marital status from the model. Predicting attrition is fine; building a model that could enable discrimination is not. Decide this before you analyze.
  • Acting on correlation as if it were cause. The model tells you overtime is associated with leaving. The workload audit tested whether reducing it caused lower attrition — and it did, which is why we trust it.
  • Black-box scores with no story. A risk score nobody understands gets ignored. We always pair the number with the two or three reasons behind it.

Reporting It

A binary logistic regression predicted voluntary attrition from income, tenure, age, overtime, job satisfaction, and commute distance, χ²(7) = 198.4, p < .001, Nagelkerke R² = .27. Overtime was the strongest predictor; employees working overtime were 3.41 times more likely to leave (95% CI [2.4, 4.8], p < .001), while each additional point of job satisfaction reduced the odds of leaving by 34%.


Sitting on HR data and unsure what it's telling you? Insighter Digital builds attrition and engagement models in SPSS and Python — ethically scoped, clearly explained, and tied to actions your managers can take. Reach out.

Get More Guides Like This

Free tutorials on SPSS, Excel, Python, and research methods delivered to your inbox.

Need Professional Data Analysis Services?

Save time and get accurate results. Our experts provide statistical analysis services using SPSS, Excel, and Python — from hypothesis testing to APA-formatted reports.