Research Methods

How to Write a Data Analysis Plan for Your Thesis or Dissertation

2026-05-249 min read
data analysis plan thesisdissertation data analysisthesis statistical analysisresearch methodologyquantitative data analysis

If you are writing a quantitative thesis or dissertation, your supervisor will expect a clear data analysis plan before you touch the data. This plan lives in your methodology chapter and tells the reader exactly how you will analyze each research question or hypothesis. Without it, your analysis looks like guesswork.

This guide walks you through building a solid data analysis plan that satisfies your committee and keeps your analysis organized.

What Is a Data Analysis Plan?

A data analysis plan is a structured outline that maps each research question to a specific statistical test. It explains what variables are involved, what assumptions need to be met, and what software you will use.

Think of it as a contract between you and your reader: "For this question, I will use this test, on these variables, and here is why."

Why You Need One Before Collecting Data

Writing the plan before data collection forces you to think through critical issues early:

  • Do you have the right variables? If your plan calls for a two-way ANOVA but your survey only collects one grouping variable, you have a problem
  • Is your sample size adequate? Different tests require different minimum sample sizes
  • Will your data meet the assumptions? If you expect non-normal data, you should plan for nonparametric alternatives
  • Can you actually answer your research questions? Sometimes students collect data and then discover their design does not support the analysis they need

Discovering these issues after data collection means wasted time or a compromised study.

Structure of a Data Analysis Plan

1. Restate Your Research Questions and Hypotheses

Start by listing each research question (RQ) and its corresponding hypothesis. Be specific:

Weak: "Is there a relationship between study habits and grades?"

Strong: "RQ1: Is there a statistically significant relationship between weekly study hours and final exam scores among undergraduate students?"

H1: "There is a statistically significant positive correlation between weekly study hours and final exam scores."

2. Identify Variables for Each Question

For each research question, clearly define:

  • Independent Variable (IV): The variable you think causes the effect (e.g., teaching method, gender, treatment group)
  • Dependent Variable (DV): The variable you are measuring as the outcome (e.g., test score, satisfaction rating, performance)
  • Measurement level: Nominal, ordinal, interval, or ratio

The measurement level determines which tests are appropriate. This is where many students make mistakes — using a parametric test on ordinal Likert data, for example.

3. Match Each Question to a Statistical Test

This is the core of your plan. Here is a reference for the most common scenarios:

| Research Question Type | IV | DV | Statistical Test | |---|---|---|---| | Difference between 2 independent groups | 1 categorical (2 levels) | 1 continuous | Independent samples t-test | | Difference between 2 related measurements | 1 within-subjects (2 time points) | 1 continuous | Paired samples t-test | | Difference between 3+ independent groups | 1 categorical (3+ levels) | 1 continuous | One-way ANOVA | | Effect of 2 factors on an outcome | 2 categorical | 1 continuous | Two-way ANOVA | | Relationship between 2 continuous variables | 1 continuous | 1 continuous | Pearson correlation | | Predicting an outcome from one predictor | 1 continuous | 1 continuous | Simple linear regression | | Predicting an outcome from multiple predictors | 2+ continuous/categorical | 1 continuous | Multiple regression | | Association between 2 categorical variables | 1 categorical | 1 categorical | Chi-square test | | Predicting a yes/no outcome | 1+ continuous/categorical | 1 binary | Logistic regression |

If your data violates normality assumptions, include the nonparametric alternative: Mann-Whitney U instead of t-test, Kruskal-Wallis instead of one-way ANOVA, Spearman instead of Pearson.

4. State Your Assumptions and How You Will Check Them

For each test, list the assumptions and how you plan to verify them:

Example for One-Way ANOVA:

  • Independence of observations (ensured by study design)
  • Normality of the dependent variable within each group (Shapiro-Wilk test, Q-Q plots)
  • Homogeneity of variances across groups (Levene's test)
  • If Levene's test is significant, Welch's ANOVA and Games-Howell post-hoc will be used instead

This shows your supervisor you understand the tests, not just which buttons to click.

5. Specify Your Significance Level

State your alpha level. In most social science and business research, this is .05:

"All statistical tests will be conducted at the .05 significance level (two-tailed)."

If you are running multiple comparisons, mention whether you will apply a correction (e.g., Bonferroni).

6. Describe Your Software

"All analyses will be conducted using IBM SPSS Statistics version 27. Mediation and moderation analyses will use the PROCESS macro version 4.0 (Hayes, 2022)."

7. Address Missing Data

Explain your plan for handling missing values:

  • Less than 5%: listwise deletion
  • 5-20%: multiple imputation or expectation-maximization
  • More than 20%: variable may be excluded with justification

Example Data Analysis Plan

Here is what a complete entry looks like for one research question:

RQ2: Is there a statistically significant difference in customer satisfaction scores among the three service delivery methods (in-person, online, hybrid)?

IV: Service delivery method (categorical, 3 levels) DV: Customer satisfaction score (continuous, measured on a 30-item Likert scale)

Statistical Test: One-way ANOVA will be used to compare the mean satisfaction scores across the three groups. If the omnibus F-test is significant, Tukey HSD post-hoc tests will be conducted to determine which specific groups differ.

Assumptions: Normality will be assessed using the Shapiro-Wilk test and visual inspection of Q-Q plots. Homogeneity of variances will be tested using Levene's test. If the homogeneity assumption is violated, Welch's ANOVA with Games-Howell post-hoc comparisons will be used.

Significance level: α = .05 (two-tailed)

Effect size: Eta-squared (η²) will be reported to indicate practical significance.

Repeat this structure for every research question in your study.

Common Mistakes in Data Analysis Plans

  1. Being too vague — "Data will be analyzed using SPSS" is not a plan. Name the specific tests for each question.

  2. Mismatching variables and tests — Using Pearson correlation when one variable is categorical, or running a t-test with three groups instead of ANOVA.

  3. Ignoring assumptions — Your committee will ask how you checked them. Plan for it.

  4. No backup plan — If normality is violated, what will you do? Always state the nonparametric alternative.

  5. Forgetting effect sizes — Modern research standards require effect sizes alongside p-values. Include them in your plan.

When to Get Help

A data analysis plan requires understanding both your research design and the statistical methods. If you are unsure which test fits your research questions, or if your design involves complex structures like mediation, moderation, or repeated measures, it is worth consulting an expert before finalizing your methodology chapter.

At Insighter Digital, we help thesis and dissertation students build their data analysis plans, run the analysis in SPSS, and deliver APA-formatted results chapters. If you have your research questions and data ready, we can take it from there — or review your existing plan to make sure it holds up under committee scrutiny.

Need This Analysis Done For You?

Save time and get accurate results. Our experts handle the analysis while you focus on your research.

Get a Free Quote