Soojeong Kim
- BSc (Kyungpook National University, 2015)
Topic
Permutation in Regression Revisited: The Residual Route Proven Optimal Theoretically
Department of Mathematics and Statistics
Date & location
- Thursday, August 14, 2025
- 1:00 P.M.
- David Turpin Building, Room A203
Examining Committee
Supervisory Committee
- Dr. Xuekui Zhang, Department of Mathematics and Statistics, ßÉßɱ¬ÁÏ (Supervisor)
- Dr. Mary Lesperance, Department of Mathematics and Statistics, UVic (Member)
External Examiner
- Dr. Alex Thomo, Department of Computer Science, UVic
Chair of Oral Examination
- Dr. Jody Klymak, School of Earth and Ocean Sciences, UVic
Abstract
The assumptions for classical linear-model are never met in practice. Recent evidence shows that such violations inflate Type I error as sample size grows, while simple permutation tests can restore control in single-predictor regressions. Yet in multiple regression, practitioners face a confusing menu of residual- and raw-data shuffling schemes, with little theory to guide the choice.
We develop the first closed-form, finite-sample comparison of six widely used permutation strategies for a coefficient of interest in the presence of nuisance covariates. By projecting any full-rank regression onto an equivalent two-predictor “working” model and treating the permutation matrix itself as random, we derive exact means and variances of the permuted estimator, and we establish its asymptotic distribution. The analysis reveals that (i) the three residual-based schemes—permuting response residuals, predictor residuals, or both—are identically distributed; they match the parametric null up to second moments in finite samples and match in distribution as 𝑛→∞, guaranteeing valid Type I error control. (ii) Raw-data permutations behave unpredictably: shuffling the response is overly conservative, shuffling the predictor is liberal when covariates are correlated, and shuffling both can be unstable. Closed-form results quantify how predictor–covariate correlation, error variance, and sample size drive these patterns and specify the Monte-Carlo sample size needed for accurate 𝑝-values.
Extensive simulations confirm the theory: residual permutations maintain nominal error and retain power comparable to the classical linear model when assumptions hold, whereas raw-data schemes either inflate or deflate Type I error and sacrifice power. The work reconciles decades of ad-hoc practice, provides actionable guidelines, and equips analysts with a principled, computationally feasible framework for exact inference in large-sample regression.