Soojeong Kim

BSc (Kyungpook National University, 2015)

Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Permutation in Regression Revisited: The Residual Route Proven Optimal Theoretically

Department of Mathematics and Statistics

Date & location

Thursday, August 14, 2025
1:00 P.M.
David Turpin Building, Room A203

Examining Committee

Supervisory Committee

Dr. Xuekui Zhang, Department of Mathematics and Statistics, ��ɱ�� (Supervisor)
Dr. Mary Lesperance, Department of Mathematics and Statistics, UVic (Member)

External Examiner

Dr. Alex Thomo, Department of Computer Science, UVic

Chair of Oral Examination

Dr. Jody Klymak, School of Earth and Ocean Sciences, UVic

Abstract

The assumptions for classical linear-model are never met in practice. Recent evidence shows that such violations inflate Type I error as sample size grows, while simple permutation tests can restore control in single-predictor regressions. Yet in multiple regression, practitioners face a confusing menu of residual- and raw-data shuffling schemes, with little theory to guide the choice.

We develop the first closed-form, finite-sample comparison of six widely used permutation strategies for a coefficient of interest in the presence of nuisance covariates. By projecting any full-rank regression onto an equivalent two-predictor “working” model and treating the permutation matrix itself as random, we derive exact means and variances of the permuted estimator, and we establish its asymptotic distribution. The analysis reveals that (i) the three residual-based schemes—permuting response residuals, predictor residuals, or both—are identically distributed; they match the parametric null up to second moments in finite samples and match in distribution as 𝑛→∞, guaranteeing valid Type I error control. (ii) Raw-data permutations behave unpredictably: shuffling the response is overly conservative, shuffling the predictor is liberal when covariates are correlated, and shuffling both can be unstable. Closed-form results quantify how predictor–covariate correlation, error variance, and sample size drive these patterns and specify the Monte-Carlo sample size needed for accurate 𝑝-values.

Extensive simulations confirm the theory: residual permutations maintain nominal error and retain power comparable to the classical linear model when assumptions hold, whereas raw-data schemes either inflate or deflate Type I error and sacrifice power. The work reconciles decades of ad-hoc practice, provides actionable guidelines, and equips analysts with a principled, computationally feasible framework for exact inference in large-sample regression.

Back to oral exams

���ɱ���