Data Preparation

How to Handle Missing Values in SPSS (Listwise, Pairwise, Imputation)

By Mohammad Abu Sufian2026-05-2711 min read
missing values SPSShandle missing data SPSSmultiple imputation SPSSlistwise deletionmissing data analysis thesis

Missing data is one of the most common problems in research. Participants skip survey questions, sensors fail to record measurements, and databases have gaps. How you handle these missing values can dramatically affect your results, and doing it wrong can bias your findings or get your thesis rejected.

This guide covers every method SPSS offers for dealing with missing data, from simple deletion to advanced imputation techniques.

Why Missing Data Matters

Missing data is not just an inconvenience. It creates three serious problems:

  1. Reduced sample size — Every missing value shrinks your effective sample, reducing statistical power
  2. Biased estimates — If data is not missing at random, your results may systematically over- or under-estimate the true values
  3. Incompatible analyses — Different SPSS procedures handle missing data differently by default, which can produce inconsistent sample sizes across your results chapter

Before choosing a handling method, you need to understand why the data is missing.

Types of Missing Data

Statisticians classify missing data into three categories, and the type determines which handling method is appropriate:

Missing Completely at Random (MCAR)

The probability of a value being missing is unrelated to both observed and unobserved data. For example, a participant accidentally skips a question because they turned two pages at once.

Test for MCAR: Use Little's MCAR test in SPSS (Analyze → Missing Value Analysis). If p > .05, the data is consistent with MCAR.

Missing at Random (MAR)

The probability of a value being missing is related to other observed variables but not to the missing value itself. For example, younger participants are more likely to skip income questions, but within each age group, the missingness is unrelated to actual income.

MAR cannot be directly tested, but it is a reasonable assumption in most research settings when you can identify variables that predict missingness.

Missing Not at Random (MNAR)

The probability of a value being missing is related to the missing value itself. For example, people with very high incomes are more likely to refuse to report their income.

MNAR is the most problematic type and cannot be fully addressed with standard imputation methods.

Step 1: Diagnose the Missing Data

Before handling missing values, you need to understand the extent and pattern.

Check the Extent

  1. Go to Analyze → Descriptive Statistics → Frequencies
  2. Select all your variables
  3. Look at the Missing count and percentage for each variable

Rules of thumb:

  • Less than 5% missing on a variable: Generally not a serious problem
  • 5–20% missing: Needs careful handling
  • More than 20% missing: Consider dropping the variable or using advanced methods

Run Missing Value Analysis

  1. Go to Analyze → Missing Value Analysis
  2. Move your variables into the Quantitative Variables box
  3. Check Little's MCAR test under the EM tab
  4. Click OK

SPSS produces a summary table showing the pattern and extent of missing data, plus the result of Little's MCAR test.

Check Missing Data Patterns

In the Missing Value Analysis output, the Patterns tab shows which combinations of variables have missing values together. This helps you understand whether missing data is concentrated in certain cases or scattered randomly.

Step 2: Choose a Handling Method

Method 1: Listwise Deletion (Complete Case Analysis)

Listwise deletion removes any case that has a missing value on any variable in the analysis. This is the default method in most SPSS procedures.

When to use it:

  • Data is MCAR
  • The percentage of missing data is small (less than 5%)
  • Your sample is large enough that losing cases does not reduce power below acceptable levels

How it works in SPSS: Most SPSS procedures use listwise deletion by default. You do not need to change anything. SPSS will report the effective sample size used in each analysis.

Pros: Simple, produces unbiased estimates when data is MCAR Cons: Can dramatically reduce sample size, wastes observed data, biased when data is not MCAR

Method 2: Pairwise Deletion

Pairwise deletion uses all available data for each pair of variables. Instead of removing a case entirely, it only excludes it from specific calculations where the value is missing.

When to use it:

  • Computing correlation or covariance matrices
  • You want to maximize the use of available data
  • Data is MCAR

How to enable it:

  • In Correlations (Analyze → Correlate → Bivariate): Under Missing Values, select Exclude cases pairwise
  • In Regression (Analyze → Regression → Linear): Under Missing Values, select Exclude cases pairwise

Pros: Preserves more data than listwise deletion Cons: Different analyses use different subsets of cases, which can produce inconsistent results and non-positive-definite correlation matrices

Method 3: Mean Substitution

Mean substitution replaces each missing value with the mean of the observed values for that variable.

How to do it:

  1. Go to Transform → Replace Missing Values
  2. Select the variable(s)
  3. Choose Series mean as the method
  4. Click OK

SPSS creates new variables with the suffix _1 that have the missing values replaced.

Pros: Simple, preserves sample size Cons: Artificially reduces variance, weakens correlations, underestimates standard errors, produces biased results in most situations. Most methodologists advise against this method.

Method 4: Regression Imputation

Regression imputation predicts the missing value using a regression equation based on other variables in the dataset.

How to do it:

  1. Go to Transform → Replace Missing Values
  2. Select the variable(s)
  3. Choose Linear trend at point for time series data, or use the Regression method under Missing Value Analysis

Pros: Produces better estimates than mean substitution because it uses information from related variables Cons: Still underestimates variability because every imputed value falls exactly on the regression line

Method 5: Multiple Imputation (Recommended)

Multiple imputation is the gold standard for handling missing data. It creates multiple complete datasets, each with slightly different imputed values that reflect the uncertainty of the missing data. SPSS then pools the results across all imputed datasets.

How to run it:

  1. Go to Analyze → Multiple Imputation → Impute Missing Data Values
  2. Under the Variables tab, move your variables into the analysis
  3. Set the role for each variable (imputed variables should be set to Impute)
  4. Under the Method tab:
    • Choose Automatic (SPSS selects the best method) or Custom if you want control
    • Set the Number of imputations to at least 5 (20 is recommended for higher missing rates)
  5. Under the Output tab, select where to save the imputed datasets
  6. Click OK

Running analyses on imputed data:

After imputation, your dataset will contain multiple copies of each case (one per imputation). When you run analyses:

  1. Go to Analyze and run your test as usual
  2. SPSS automatically detects the imputed datasets and produces pooled results that combine estimates across all imputations

Pros: Produces unbiased estimates under MAR, properly accounts for uncertainty, recognized as best practice by most journals and supervisors Cons: More complex to run and report, requires understanding of the pooled output

Method 6: Expectation-Maximization (EM)

The EM algorithm estimates parameters (means, variances, covariances) directly from the incomplete data using maximum likelihood. It does not create a new filled-in dataset but rather estimates what the statistics would be if the data were complete.

How to run it:

  1. Go to Analyze → Missing Value Analysis
  2. Move variables into the analysis
  3. Click the EM tab and check Normal under the estimation method
  4. Click OK

Pros: Efficient, produces consistent estimates under MAR Cons: Does not create imputed datasets, so it cannot be used as input for all procedures. Standard errors may be underestimated

Decision Flowchart: Which Method to Use

  1. Is less than 5% of data missing and MCAR holds? → Listwise deletion is usually fine
  2. Is 5–20% missing and MCAR or MAR holds? → Use multiple imputation
  3. Is more than 20% missing? → Use multiple imputation with 20+ imputations, and consider whether the variable should be included at all
  4. Is data MNAR? → No standard method fully addresses MNAR. Use multiple imputation as a starting point and conduct sensitivity analyses

How to Report Missing Data Handling

Your methods section should include:

  1. The extent of missing data (percentage per variable and overall)
  2. The results of Little's MCAR test
  3. The method you used and why
  4. The software and settings used

Example Report

Missing data ranged from 0.5% to 8.3% across study variables. Little's MCAR test was not significant, χ²(42) = 48.16, p = .238, suggesting the data was missing completely at random. Multiple imputation was performed using SPSS Version 29, generating 20 imputed datasets. All variables in the analysis model were included in the imputation model to satisfy the missing at random assumption. Pooled parameter estimates are reported following Rubin's rules.

Common Mistakes to Avoid

  1. Ignoring missing data entirely — Simply hoping SPSS handles it is not a strategy. Reviewers will ask what you did
  2. Using mean substitution — Despite being easy, it biases almost every statistic. Avoid it
  3. Not reporting how missing data was handled — This is a required element in the methods section of any thesis or journal article
  4. Imputing the dependent variable for cases not in the analysis — Only impute variables that are part of your analysis model
  5. Using too few imputations — Five imputations was the traditional recommendation, but current best practice suggests 20 or more when the missing rate exceeds 10%
  6. Forgetting to include auxiliary variables — Variables that predict missingness or are correlated with missing variables should be included in the imputation model even if they are not in the analysis model

Quick Reference Table

Method Best For Bias Risk Complexity
Listwise deletion Small MCAR data Low (if MCAR) Low
Pairwise deletion Correlation matrices Moderate Low
Mean substitution Never recommended High Low
Regression imputation Single imputation Moderate Medium
Multiple imputation Most situations Low Medium-High
EM algorithm Parameter estimation Low Medium

Need help handling missing data in your thesis dataset? Our team diagnoses the missingness pattern and applies the right imputation method so your results are defensible. Get a free quote.

Get More Guides Like This

Free tutorials on SPSS, Excel, Python, and research methods delivered to your inbox.

Need Professional Data Analysis Services?

Save time and get accurate results. Our experts provide statistical analysis services using SPSS, Excel, and Python — from hypothesis testing to APA-formatted reports.