How to Handle Missing Values in SPSS (Listwise, Pairwise, Imputation)
Missing data is one of the most common problems in research. Participants skip survey questions, sensors fail to record measurements, and databases have gaps. How you handle these missing values can dramatically affect your results, and doing it wrong can bias your findings or get your thesis rejected.
This guide covers every method SPSS offers for dealing with missing data, from simple deletion to advanced imputation techniques.
Why Missing Data Matters
Missing data is not just an inconvenience. It creates three serious problems:
- Reduced sample size — Every missing value shrinks your effective sample, reducing statistical power
- Biased estimates — If data is not missing at random, your results may systematically over- or under-estimate the true values
- Incompatible analyses — Different SPSS procedures handle missing data differently by default, which can produce inconsistent sample sizes across your results chapter
Before choosing a handling method, you need to understand why the data is missing.
Types of Missing Data
Statisticians classify missing data into three categories, and the type determines which handling method is appropriate:
Missing Completely at Random (MCAR)
The probability of a value being missing is unrelated to both observed and unobserved data. For example, a participant accidentally skips a question because they turned two pages at once.
Test for MCAR: Use Little's MCAR test in SPSS (Analyze → Missing Value Analysis). If p > .05, the data is consistent with MCAR.
Missing at Random (MAR)
The probability of a value being missing is related to other observed variables but not to the missing value itself. For example, younger participants are more likely to skip income questions, but within each age group, the missingness is unrelated to actual income.
MAR cannot be directly tested, but it is a reasonable assumption in most research settings when you can identify variables that predict missingness.
Missing Not at Random (MNAR)
The probability of a value being missing is related to the missing value itself. For example, people with very high incomes are more likely to refuse to report their income.
MNAR is the most problematic type and cannot be fully addressed with standard imputation methods.
Step 1: Diagnose the Missing Data
Before handling missing values, you need to understand the extent and pattern.
Check the Extent
- Go to Analyze → Descriptive Statistics → Frequencies
- Select all your variables
- Look at the Missing count and percentage for each variable
Rules of thumb:
- Less than 5% missing on a variable: Generally not a serious problem
- 5–20% missing: Needs careful handling
- More than 20% missing: Consider dropping the variable or using advanced methods
Run Missing Value Analysis
- Go to Analyze → Missing Value Analysis
- Move your variables into the Quantitative Variables box
- Check Little's MCAR test under the EM tab
- Click OK
SPSS produces a summary table showing the pattern and extent of missing data, plus the result of Little's MCAR test.
Check Missing Data Patterns
In the Missing Value Analysis output, the Patterns tab shows which combinations of variables have missing values together. This helps you understand whether missing data is concentrated in certain cases or scattered randomly.
Step 2: Choose a Handling Method
Method 1: Listwise Deletion (Complete Case Analysis)
Listwise deletion removes any case that has a missing value on any variable in the analysis. This is the default method in most SPSS procedures.
When to use it:
- Data is MCAR
- The percentage of missing data is small (less than 5%)
- Your sample is large enough that losing cases does not reduce power below acceptable levels
How it works in SPSS: Most SPSS procedures use listwise deletion by default. You do not need to change anything. SPSS will report the effective sample size used in each analysis.
Pros: Simple, produces unbiased estimates when data is MCAR Cons: Can dramatically reduce sample size, wastes observed data, biased when data is not MCAR
Method 2: Pairwise Deletion
Pairwise deletion uses all available data for each pair of variables. Instead of removing a case entirely, it only excludes it from specific calculations where the value is missing.
When to use it:
- Computing correlation or covariance matrices
- You want to maximize the use of available data
- Data is MCAR
How to enable it:
- In Correlations (Analyze → Correlate → Bivariate): Under Missing Values, select Exclude cases pairwise
- In Regression (Analyze → Regression → Linear): Under Missing Values, select Exclude cases pairwise
Pros: Preserves more data than listwise deletion Cons: Different analyses use different subsets of cases, which can produce inconsistent results and non-positive-definite correlation matrices
Method 3: Mean Substitution
Mean substitution replaces each missing value with the mean of the observed values for that variable.
How to do it:
- Go to Transform → Replace Missing Values
- Select the variable(s)
- Choose Series mean as the method
- Click OK
SPSS creates new variables with the suffix _1 that have the missing values replaced.
Pros: Simple, preserves sample size Cons: Artificially reduces variance, weakens correlations, underestimates standard errors, produces biased results in most situations. Most methodologists advise against this method.
Method 4: Regression Imputation
Regression imputation predicts the missing value using a regression equation based on other variables in the dataset.
How to do it:
- Go to Transform → Replace Missing Values
- Select the variable(s)
- Choose Linear trend at point for time series data, or use the Regression method under Missing Value Analysis
Pros: Produces better estimates than mean substitution because it uses information from related variables Cons: Still underestimates variability because every imputed value falls exactly on the regression line
Method 5: Multiple Imputation (Recommended)
Multiple imputation is the gold standard for handling missing data. It creates multiple complete datasets, each with slightly different imputed values that reflect the uncertainty of the missing data. SPSS then pools the results across all imputed datasets.
How to run it:
- Go to Analyze → Multiple Imputation → Impute Missing Data Values
- Under the Variables tab, move your variables into the analysis
- Set the role for each variable (imputed variables should be set to Impute)
- Under the Method tab:
- Choose Automatic (SPSS selects the best method) or Custom if you want control
- Set the Number of imputations to at least 5 (20 is recommended for higher missing rates)
- Under the Output tab, select where to save the imputed datasets
- Click OK
Running analyses on imputed data:
After imputation, your dataset will contain multiple copies of each case (one per imputation). When you run analyses:
- Go to Analyze and run your test as usual
- SPSS automatically detects the imputed datasets and produces pooled results that combine estimates across all imputations
Pros: Produces unbiased estimates under MAR, properly accounts for uncertainty, recognized as best practice by most journals and supervisors Cons: More complex to run and report, requires understanding of the pooled output
Method 6: Expectation-Maximization (EM)
The EM algorithm estimates parameters (means, variances, covariances) directly from the incomplete data using maximum likelihood. It does not create a new filled-in dataset but rather estimates what the statistics would be if the data were complete.
How to run it:
- Go to Analyze → Missing Value Analysis
- Move variables into the analysis
- Click the EM tab and check Normal under the estimation method
- Click OK
Pros: Efficient, produces consistent estimates under MAR Cons: Does not create imputed datasets, so it cannot be used as input for all procedures. Standard errors may be underestimated
Decision Flowchart: Which Method to Use
- Is less than 5% of data missing and MCAR holds? → Listwise deletion is usually fine
- Is 5–20% missing and MCAR or MAR holds? → Use multiple imputation
- Is more than 20% missing? → Use multiple imputation with 20+ imputations, and consider whether the variable should be included at all
- Is data MNAR? → No standard method fully addresses MNAR. Use multiple imputation as a starting point and conduct sensitivity analyses
How to Report Missing Data Handling
Your methods section should include:
- The extent of missing data (percentage per variable and overall)
- The results of Little's MCAR test
- The method you used and why
- The software and settings used
Example Report
Missing data ranged from 0.5% to 8.3% across study variables. Little's MCAR test was not significant, χ²(42) = 48.16, p = .238, suggesting the data was missing completely at random. Multiple imputation was performed using SPSS Version 29, generating 20 imputed datasets. All variables in the analysis model were included in the imputation model to satisfy the missing at random assumption. Pooled parameter estimates are reported following Rubin's rules.
Common Mistakes to Avoid
- Ignoring missing data entirely — Simply hoping SPSS handles it is not a strategy. Reviewers will ask what you did
- Using mean substitution — Despite being easy, it biases almost every statistic. Avoid it
- Not reporting how missing data was handled — This is a required element in the methods section of any thesis or journal article
- Imputing the dependent variable for cases not in the analysis — Only impute variables that are part of your analysis model
- Using too few imputations — Five imputations was the traditional recommendation, but current best practice suggests 20 or more when the missing rate exceeds 10%
- Forgetting to include auxiliary variables — Variables that predict missingness or are correlated with missing variables should be included in the imputation model even if they are not in the analysis model
Quick Reference Table
| Method | Best For | Bias Risk | Complexity |
|---|---|---|---|
| Listwise deletion | Small MCAR data | Low (if MCAR) | Low |
| Pairwise deletion | Correlation matrices | Moderate | Low |
| Mean substitution | Never recommended | High | Low |
| Regression imputation | Single imputation | Moderate | Medium |
| Multiple imputation | Most situations | Low | Medium-High |
| EM algorithm | Parameter estimation | Low | Medium |
Need help handling missing data in your thesis dataset? Our team diagnoses the missingness pattern and applies the right imputation method so your results are defensible. Get a free quote.
Keep Reading
Get More Guides Like This
Free tutorials on SPSS, Excel, Python, and research methods delivered to your inbox.
Need Professional Data Analysis Services?
Save time and get accurate results. Our experts provide statistical analysis services using SPSS, Excel, and Python — from hypothesis testing to APA-formatted reports.