Market Segmentation Case Study: Finding Customer Groups With Cluster Analysis in SPSS
A specialty coffee retailer with both an online store and a loyalty app wanted to stop sending the same email to 40,000 customers. The marketing lead's question was direct: "Are there natural groups in our customers, and what should we say to each?" This is the cluster analysis we ran to answer it — a complete, reproducible segmentation in SPSS.
What Cluster Analysis Does
Most analysis you do is supervised — you have an outcome (churn, GPA, sales) and you predict it. Segmentation is unsupervised: there is no target. You hand the algorithm customer attributes and it finds groups where members are similar to each other and different from other groups. K-means is the most common method and the one we used.
The Variables (an RFM-Style Setup)
We pulled three behavioral metrics per customer — the classic RFM framework, which works remarkably well for retail:
- Recency: days since last purchase (lower = more recent)
- Frequency: number of orders in the past 12 months
- Monetary: total spend in the past 12 months ($)
We deliberately used behavior, not demographics. What people do segments far better than how old they are.
Step 1: Standardize the Variables (Non-Negotiable)
K-means measures distance between customers. If one variable is in dollars (0–5,000) and another in order counts (0–30), the dollar variable will dominate the distance purely because its numbers are bigger. The fix is to convert everything to a common scale.
Analyze > Descriptive Statistics > Descriptives → tick Save standardized values as variables.
This creates Z-scores (mean 0, SD 1) for Recency, Frequency, and Monetary. We clustered on those Z-scored versions. Skipping this step is the single most common cluster-analysis mistake — it silently produces garbage segments.
Step 2: Decide How Many Clusters (K)
K-means makes you choose K in advance. We used two checks:
- Business sense: marketing could realistically run 3–5 distinct campaigns, not 12.
- The elbow method: we ran K-means for K = 2 through 7 and recorded the within-cluster variation each time. Plotting it, the drop flattened sharply after K = 4 — the "elbow." More clusters barely improved fit while making the strategy unmanageable.
We settled on 4 clusters.
Step 3: Run K-Means
Analyze > Classify > K-Means Cluster
- Variables: the three standardized scores (ZRecency, ZFrequency, ZMonetary)
- Number of Clusters: 4
- Click Save → tick Cluster Membership (writes each customer's cluster to a new column)
- Click Options → tick Initial cluster centers and ANOVA table
- Increase Iterate > Maximum Iterations to 25 so the solution stabilizes
Did the Variables Actually Discriminate?
The ANOVA table in the output shows whether each variable differs across clusters. All three (Recency, Frequency, Monetary) had p < .001, confirming each contributed to separating the groups. (Note: this ANOVA is descriptive here, not a formal significance test, because clusters were chosen to maximize differences — but it's a useful sanity check.)
Step 4: Profile and Name the Segments
A cluster number means nothing to a marketer. The real work is profiling each group on the original (un-standardized) variables so the numbers are interpretable, then giving each a name.
Analyze > Compare Means > Means, with the original Recency/Frequency/Monetary as dependents and Cluster Membership as the independent:
| Cluster | n | Recency (days) | Frequency (orders) | Monetary ($) | Name |
|---|---|---|---|---|---|
| 1 | 6,200 | 18 | 22 | 1,840 | Champions |
| 2 | 11,500 | 47 | 9 | 520 | Loyal Regulars |
| 3 | 9,800 | 142 | 3 | 180 | Occasional Buyers |
| 4 | 12,500 | 240 | 1 | 60 | At-Risk / Dormant |
The segments told a clear story:
- Champions buy often, recently, and spend the most — 16% of customers driving a disproportionate share of revenue.
- Loyal Regulars are steady mid-value customers.
- Occasional Buyers dip in now and then.
- At-Risk / Dormant haven't bought in 8 months — many already gone.
Step 5: Turn Segments Into Strategy
This is where segmentation pays for itself. Each group got a different message:
- Champions → early access to new roasts, a referral program, VIP treatment. Do not discount — they already pay full price.
- Loyal Regulars → a subscription nudge to lift frequency, and cross-sells based on past orders.
- Occasional Buyers → a time-limited bundle to build a habit.
- At-Risk / Dormant → an aggressive win-back ("we miss you, 25% off") — the only group where deep discounting made sense.
The result of segment-specific email campaigns over the next quarter: overall email revenue rose 31%, and unsubscribe rates fell, because people received offers that were actually relevant to them.
Pitfalls We Watched For
- Not standardizing. Covered above — always Z-score first.
- Chasing too many clusters. A statistically "better" 8-cluster solution that the business can't act on is worse than a clean 4.
- Outliers hijacking clusters. A handful of corporate bulk-buyers can form a tiny meaningless cluster. We capped extreme Monetary values before clustering.
- Treating clusters as permanent. Customers move between segments over time. We re-run the segmentation every quarter and track migration (a Champion slipping toward At-Risk is an early warning).
- Random starting points. K-means can land on slightly different solutions depending on initial centers. We ran it several times to confirm the four-segment structure was stable.
When to Use a Different Method
K-means assumes roughly spherical, similarly sized clusters and needs you to pick K. If you don't know K and your dataset is small, hierarchical clustering (Analyze > Classify > Hierarchical Cluster) with a dendrogram is a better starting point. For mixed categorical-and-numeric data, two-step cluster (Analyze > Classify > TwoStep Cluster) handles it and even suggests K automatically.
Sitting on customer data and sending everyone the same message? Insighter Digital builds RFM and behavioral segments using K-means, hierarchical, and two-step clustering in SPSS and Python — then hands you a profile and a playbook for each segment. Let's segment your customers.
Keep Reading
Get More Guides Like This
Free tutorials on SPSS, Excel, Python, and research methods delivered to your inbox.
Need Professional Data Analysis Services?
Save time and get accurate results. Our experts provide statistical analysis services using SPSS, Excel, and Python — from hypothesis testing to APA-formatted reports.