Processing math: 3%
Home Some theoretical foundations for the design and analysis of randomized experiments
Article Open Access

Some theoretical foundations for the design and analysis of randomized experiments

  • Lei Shi EMAIL logo and Xinran Li
Published/Copyright: November 20, 2024
Become an author with De Gruyter Brill

Abstract

Neyman’s seminal work in 1923 has been a milestone in statistics over the century, which has motivated many fundamental statistical concepts and methodology. In this review, we delve into Neyman’s groundbreaking contribution and offer technical insights into the design and analysis of randomized experiments. We shall review the basic setup of completely randomized experiments and the classical approaches for inferring the average treatment effects. We shall, in particular, review more efficient design and analysis of randomized experiments by utilizing pretreatment covariates, which move beyond Neyman’s original work without involving any covariate. We then summarize several technical ingredients regarding randomizations and permutations that have been developed over the century, such as permutational central limit theorems and Berry–Esseen bounds, and we elaborate on how these technical results facilitate the understanding of randomized experiments. The discussion is also extended to other randomized experiments including rerandomization, stratified randomized experiments, matched pair experiments, and cluster randomized experiments.

MSC 2010: 62K15; 62J05; 62G05

1 Review of the proposal in Neyman [1] and its influence

Neyman’s seminal work [1] has been a cornerstone in the field of statistics over the last century. It has laid foundational principles that have significantly shaped multiple research areas such as causal inference, experimental design, and survey sampling. Its influence has been profound across a diverse range of applications, encompassing sectors such as agriculture, economics, biomedical research, social science, and beyond.

The main purpose of Neyman [1] is the analysis of field experiments conducted in order to compare a number of crop varieties. Suppose there are m plots and ν varieties. Neyman [1] introduced the notion of potential yield of the k th variety being applied to the ith plot, which is denoted as Uik , for 1im and 1kν ; we use slightly different indices from Neyman to make them more intuitive. In Neyman’s framework, the quantities {Uik} are fixed but may be unknown. The number

ak=1mmi=1Uik

is called “the best estimate” of the yield from the k th variety on the field, which is, in fact, an estimand representing the average yield in modern terminology. Neyman [1] then used an urn model as a thought experiment to depict the framework of sampling from a finite population. The ν types of varieties are treated as ν urns. Each urn contains m balls, and each ball is associated with two labels: a plot label indexing the plots and a yield label indicating the unknown potential yields on the plots for each of the varieties. Specifically, in the k th urn, there are m balls with yield labels

U1k,,Uik,,Umk.

Also, the urns have the property that “if one ball is taken from one of them, the balls having the same plot label disappear from all the other urns.” Then, from each urn, a number of balls are drawn without replacement. With this model, Neyman studied the properties (in particular, the means and variances) of the sample averages across all varieties as well as their difference under the randomization distribution. This marks the pioneering effort for studying the difference-in-means estimator in modern terminology. Notably, he was able to “determine empirically that the difference of partial averages of the plots sampled shows a fair agreement with the Gaussian law distribution.” This corrects the “common misunderstanding” at that time that inference can be performed only if the yields from different plots follow the Gaussian law. Combined with a conservative variance estimation strategy, he suggested a confidence interval for the true difference between two varieties based on normal approximation.

Neyman [1] offered a series of groundbreaking and foundational insights. In the following, we outline three key facets of Neyman’s [1] contributions.

The first contribution is the introduction of the potential outcome model. This model has since become a standard framework for illustrating possible experimental outcomes, as referenced in works such as [25]. The potential outcome paradigm serves as an impeccable model for a discussion in causation within randomized experiments. Within this framework, researchers pose and address causal questions by analyzing causal effects that are defined as comparisons between potential outcomes, which represent various hypothetical scenarios or states of the world. This framework also elegantly facilitates the representation of interference between units [68], the prolonged impacts of interventions [911], and the causal analyses involving post-treatment variables such as instrumental variables [12] and mediation [13,14]. Moreover, the importance of potential outcomes transcends experimental settings and is also profound in observational studies, as highlighted by Rubin [15].

The second contribution of Neyman [1] lies in that it further highlights the importance of physical randomization or random selection when conducting experiments or performing sampling. Randomization has been in the air since the 1920s, as commented by Rubin [16] citing Student [17] and Fisher and Mackenzie [18] as references. Neyman [1] contributed to the randomization world by introducing the potential outcome model and describing a finite population inference framework for randomization. Within this framework, potential outcomes are viewed as fixed, and physical randomization emerges as the “reasoned basis” [19] for facilitating statistical testing and estimation [4,2022]. Moreover, the proposal of sampling without replacement also inspires the pursuit of the parallels and linkages between survey sampling and randomized experiments [2326].

The third contribution of Neyman [1] centers on the repeated sampling properties of statistics over their non-null randomization distribution. This viewpoint offers a new perspective on randomization-based or design-based inference, distinguishing it from Fisher’s focus on the sharp null hypothesis of no causal effects for any units and finite-sample exact p -values [16]. Neyman [1] recognized from an empirical perspective that the asymptotic normality holds under the described sampling scheme, without requiring the outcomes to come from a Gaussian law. Moreover, he proposed to estimate the variance of an estimator conservatively in expectation, which can further lead to a conservative confidence interval. These efforts built up the foundation for large-sample randomization-based inference in finite populations.

Building upon the pioneering contribution of Neyman [1] in randomization-based inference, there have been many new developments in the design and analysis of randomized experiments. In the following sections, we shall first review the basic setup of completely randomized experiments (CREs) and the classical approaches for analysis. We then present several technical ingredients regarding randomizations and permutations, such as central limit theorems (CLTs) and Berry–Esseen bounds (BEBs), which were developed over the century, and elaborate on how these results enhance and expand our understanding of the design and analysis of CREs. We also extend the discussion to other randomized experiments and permutation-related technical tools.

Notations. We summarize a set of notations for the whole article. For an integer N , we use [N] to denote the set of integers {1,,N} . For two positive semidefinite matrices V1 and V2 , we use V1 to indicate that V 1 dominates V 2 , in the sense that V 1 V 2 is positive semidefinite. For a random sequence { X N } N = 1 , we write X N if X N converges weakly to the distribution as N , and X N L N if X N and L N converge weakly to the same distribution. When X N converges in probability, we use plim N X N to denote its probabilistic limit.

2 Design and analysis of CREs

In this section, we introduce the basic setup for the design and analysis of CREs. Section 2.1 discusses the setup of a simple treatment-control CRE as well as strategies for estimation and inference. The results are extended to a more general multi-level CRE. We then consider more efficient design and analysis of randomized experiments by incorporating pretreatment covariates. In particular, Section 2.2 presents several covariate-adjusted estimators, and Section 2.3 discusses rerandomization.

2.1 Basic design and analysis of CREs

2.1.1 Treatment-control CRE

We start by considering a treatment-control CRE that enrolls N units, with N 1 units in the treatment arm and N 0 in the control arm. Let Z i denote the treatment assignment indicator for the ith unit, for 1 i N . The treatment assignment status for the entire experiment is vectorized as Z = ( Z 1 , , Z N ) . Under complete randomization,

P { Z = ( z 1 , , z N ) } = 1 N N 1 , for any ( z 1 , , z N ) { 0 , 1 } N with i = 1 N z i = N 1 , and i = 1 N ( 1 z i ) = N 0 .

The potential outcomes for the ith unit are ( Y i ( 1 ) and Y i ( 0 ) ) . This is essentially a special case of Neyman’s [1] setup with two interventional arms. The more general notions of experimental units, treatment/control arms, and potential outcomes presented here correspond to Neyman’s [1] notions of plots, varieties, and potential yields.

Rubin [27] called the N × 2 matrix of potential outcomes in Table 1 as the science table. The observed outcome for the i-th unit is Y i = Z i Y i ( 1 ) + ( 1 Z i ) Y i ( 0 ) . Importantly, the potential outcomes are fixed and the randomness comes merely from the random allocation of the treatment, reflected by the random vector Z . Scheffé [28, Chapter 9] called it the randomization model. Under this model, it is conventional to call the resulting inference as randomization-based inference, design-based inference, or finite population inference. It has become increasingly popular in both theory and practice [e.g., 4,20,22,2938]. Define further the following finite-population mean and variance of potential outcomes for each arm, which are essentially summaries of the science table in Table 1:

(1) Y ¯ ( 0 ) = 1 N i = 1 N Y i ( 0 ) , Y ¯ ( 1 ) = 1 N i = 1 N Y i ( 1 ) ;

S 2 ( 0 ) = 1 N 1 i = 1 N ( Y i ( 0 ) Y ¯ ( 0 ) ) 2 , and S 2 ( 1 ) = 1 N 1 i = 1 N ( Y i ( 1 ) Y ¯ ( 1 ) ) 2 .

Table 1

Science table for treatment-control CRE

i Y i ( 0 ) Y i ( 1 )
1 Y 1 ( 0 ) Y 1 ( 1 )
N Y N ( 0 ) Y N ( 1 )

Under the potential outcome framework, the ith unit has individual treatment effect τ i = Y i ( 1 ) Y i ( 0 ) , for 1 i N . The average treatment effect (ATE) over all units is then defined as

τ = 1 N i = 1 N τ i = Y ¯ ( 1 ) Y ¯ ( 0 ) .

Neyman [1] proposed to estimate the ATE τ by the difference-in-means estimator:

(2) τ ˆ = Y ˆ ( 1 ) Y ˆ ( 0 ) , where Y ˆ ( z ) = 1 N z i = 1 N Y i 1 { Z i = z } , for z = 0 , 1 .

He proved that τ ˆ is an unbiased estimator for τ , i.e., E { τ ˆ } = Y ¯ ( 1 ) Y ¯ ( 0 ) = τ , with true variance

Var { τ ˆ } = 1 N 1 S 2 ( 1 ) + 1 N 0 S 2 ( 0 ) 1 N S 2 ( τ ) ,

where the variances S 2 ( 0 ) and S 2 ( 1 ) are defined in (1), and S 2 ( τ ) is the variance of the individual treatment effects

(3) S 2 ( τ ) = 1 N 1 i = 1 N ( τ i τ ) 2 .

Due to the fact that we are never able to jointly observe the two potential outcomes for any unit, the variance of individual effects in (3) is generally not estimable based on the observed data. Neyman [1] proposed the following variance estimator:

(4) V ˆ = 1 N 1 S ˆ 2 ( 1 ) + 1 N 0 S ˆ 2 ( 0 ) , where S ˆ 2 ( z ) = 1 N z 1 i = 1 N ( Y i Y ˆ ( z ) ) 2 1 { Z i = z } ,

which essentially circumvents the problem by dropping the unestimable component regarding S 2 ( τ ) . The variance estimator in (4) has expectation

E { V ˆ } = 1 N 1 S 2 ( 1 ) + 1 N 0 S 2 ( 0 ) Var { τ ˆ } ,

which suggests that V ˆ is, in general, not unbiased but conservative. A level- α confidence interval is then given by

(5) [ τ ˆ z α 2 V ˆ , τ ˆ + z α 2 V ˆ ] ,

where z α 2 is the α 2 upper quantile of a standard normal distribution. In Sections 3 and 4, we will discuss more technical results for the asymptotic validity of the confidence interval in (5).

Remark 1

Neyman’s [1] approach can also be used to test the following null hypothesis:

H 0 N : τ = Y ¯ ( 1 ) Y ¯ ( 0 ) = 0 ,

which is often called the weak null hypothesis [39]. In contrast, Fisher [19] proposed to test the following null hypothesis:

(6) H 0 F : Y i ( 1 ) = Y i ( 0 ) , for all units i = 1 , , N ,

which is called the sharp null hypothesis by Rubin [40] or the strong null hypothesis by Wu an Ding [39]. The Fisherian perspective is fundamentally different, as it focuses on testing the hypothesis of no causal effects for any units whatsoever, whereas the Neymanian perspective focuses on testing no average causal effect [41]. Obviously, Fisher’s null implies Neyman’s null, but either of them can be practically relevant depending on the application. Under (6), one can impute the unobserved potential outcomes and perform Fisher’s randomization test (FRT) to deliver finite sample exact inference [19]. Fisher’s test has the advantage of being finite-sample valid, while Neyman’s requires large-sample approximation. Nevertheless, Neyman’s asymptotic results can also help ease the computation for Fisher’s null. We refer interested readers to references [4244] for a unification of both perspectives, and to references [4550] for extending FRT to nonsharp null hypotheses.

Remark 2

For analysis, practitioners usually prefer regression-based inference for the average causal effect. The standard approach is to run the ordinary least-squares (OLS) of the outcomes on the treatment indicators with an intercept:

(7) ( γ ˆ , τ ˆ ) = arg min γ , τ R i = 1 N ( Y i γ Z i τ ) 2 .

As implicitly written in (7), the point estimator from the OLS for the treatment effect is identical to the difference-in-means estimator in (2). However, the usual variance estimation based on the OLS usually fails (in the sense of either underestimating or overestimating the truth by possibly a quite large factor), due to heteroskedasticity in potential outcomes [32]. More concretely, the OLS-based variance estimator is

V ˆ OLS = N ( N 1 1 ) ( N 2 ) N 1 N 0 S ˆ 2 ( 1 ) + N ( N 0 1 ) ( N 2 ) N 1 N 0 S ˆ 2 ( 0 ) S ˆ 2 ( 1 ) N 0 + S ˆ 2 ( 0 ) N 1 ,

which can be very different from (13) if the number of units or the sample variances of observed outcomes differ a lot between the two arms. Instead, one can use the Eicker–Huber–White (EHW) variance estimator to obtain a robust estimation:

V ˆ EHW = S ˆ 2 ( 1 ) N 1 N 1 1 N 1 + S ˆ 2 ( 0 ) N 0 N 0 1 N 0 ,

which is asymptotically equivalent to V ˆ in (4). Alternatively, the so-called HC2 variant of the EHW robust variance estimator is identical to V ˆ (see Chapter 4 of Ding [51] for a more detailed discussion on regression-based analyses for the ATE).

2.1.2 Multi-level CREs

Much efforts have been devoted to extending the treatment-control CRE to multi-level scenarios, which caters for many practical problems and designs such as (fractional) factorial experiments [35,52], conjoint analysis [53,54], partially nested experiment [55,56], and sampling-based randomized experiments [57,58].

In a multi-level randomized experiment, there are N units and Q treatment arms, where the number of units under treatment q equals N q , with q = 1 Q N q = N . Corresponding to treatment level q , unit i has the potential outcome Y i ( q ) , where i = 1 , , N and q = 1 , , Q (see the science table in Table 2). Despite its simplicity, the multi-level CRE has been widely used in practice and has generated rich theoretical results. Definition 1 characterizes the joint distribution of Z = ( Z 1 , , Z N ) under complete randomization, where Z i { 1 , , Q } is the treatment indicator for unit i .

Table 2

Science table for multi-level CRE

i Y i ( 1 ) Y i ( 2 ) Y i ( Q )
1 Y 1 ( 1 ) Y 1 ( 2 ) Y 1 ( Q )
N Y N ( 1 ) Y N ( 2 ) Y N ( Q )

Definition 1

(Complete randomization) Fix integers N 1 , , N Q with q = 1 Q N q = N . The treatment vector Z is uniformly distributed over Z { z { 1 , 2 , , Q } N : i = 1 N 1 { z i = q } = N q , for 1 q Q } .

Mathematically, Definition 1 implies that P ( Z = z ) = N 1 ! N Q ! N ! for all possible values of z in Z . Computationally, Definition 1 implies that Z is from a random permutation of N 1 1’s, N 2 2’s, …, N Q   Q ’s. The observed outcome is Y i = q = 1 Q Y i ( q ) 1 { Z i = q } for each unit i .

Similar to the two-arm setting discussed in Section 2.1, in Neyman’s [1] framework, all potential outcomes are fixed and only the treatment indicators are random according to Definition 1.

We consider a general contrast matrix F R Q × H of full column rank, i.e., F 1 Q = 0 H and rank ( F ) = H , and a set of individual treatment effects defined as the linear contrasts of the potential outcomes:

(8) τ i = F Y i ( ) ,

where Y i ( ) is the vectorized potential outcome for unit i :

Y i ( ) = ( Y i ( 1 ) , , Y i ( Q ) ) .

The average effect is defined as

(9) τ = 1 N i = 1 N τ i = F Y ¯ ( ) ,

where Y ¯ ( ) is the vectorized average potential outcome:

Y ¯ ( ) = 1 N i = 1 N Y i ( ) = ( Y ¯ ( 1 ) , , Y ¯ ( Q ) ) .

When Q = 2 and F = ( 1 , 1 ) , τ in (9) reduces to the ATE in the treatment-control setting. Moreover, we can estimate τ by the following generalization of the difference-in-means estimator:

(10) τ ˆ = F Y ˆ ( ) ,

where Y ˆ ( ) = ( Y ˆ ( 1 ) , , Y ˆ ( Q ) ) is the vectorized sample average of observed outcomes for all treatment arms, with Y ˆ ( q ) = N q 1 i = 1 N Y i 1 { Z i = q } . The estimator in (10) has variance [59]

(11) Var { τ ˆ } = F Diag 1 N q S ( q , q ) q = 1 Q F 1 N F S F ,

where S R Q × Q is a covariance matrix for the potential outcomes with the ( q , q ) th entry given by

(12) S ( q , q ) = 1 N 1 i = 1 N ( Y i ( q ) Y ¯ ( q ) ) ( Y i ( q ) Y ¯ ( q ) ) , q , q = 1 , , Q ,

and F S F is essentially the finite population covariance of the individual effects τ i ’s in (8). A variance estimator for (11) is

(13) V ˆ = F Diag 1 N q S ˆ ( q , q ) q = 1 Q F ,

where S ˆ ( q , q ) is the sample variance within the treatment level q :

(14) S ˆ ( q , q ) = 1 N q 1 i = 1 N ( Y i Y ˆ ( q ) ) 2 1 { Z i = q } .

Using (10) and (13), a Wald-type confidence region for τ is given by

(15) { τ : ( τ ˆ τ ) V ˆ 1 ( τ ˆ τ ) q H , α } ,

where q H , α is the upper- α quantile of the χ H 2 distribution. (15) can be proved to be asymptotically valid under mild regularity conditions. More details are deferred to Sections 3 and 4.

In the following, we give two remarks in parallel with Remarks 1 and 2. First, FRT has also been a popular tool for analyzing multiple-level randomized experiments, which can be used to test sharp nulls and deliver finite-sample exact p -values [22] (see also [39,43,44] for the unification of Neyman’s and Fisher’s approaches). Second, similar to the treatment-control case, we can perform analysis with the regression-based approach. Zhao and Ding [60] studied general regression-based analyses in multi-level experiments.

2.2 Covariate adjustment

In many randomized experiments, there are pre-treatment covariates X 1 , , X N for the N units, where X i ’s are encoded as vectors in R p . Covariate adjustment has become a standard approach for analyzing randomized experiments and has been widely adopted in many fields. As one example, in 2023, US Food and Drug Administration issued the final guidance on Adjusting for Covariates in Randomized Clinical Trials for Drugs and Biological Products Final Guidance for Industry. This guidance describes the agency’s current recommendations regarding adjusting for covariates in the statistical analysis of randomized clinical trials in drug and biological product development programs. A natural question is how to optimally adjust covariates for inference? The problem is nontrivial in several aspects: (i) the true relation between outcomes and covariates is usually unknown and (ii) the potential outcomes under different treatment levels are, in general, heterogeneous. Many research works explored covariate adjustment from both practical and theoretical perspectives. It has become a standard practice to use a model-assisted method for covariate adjustment to gain efficiency for inference while being robust to model misspecification [34].

2.2.1 Fisher’s analysis of covariance (ANCOVA)

Historically, Fisher [61] proposed to use ANCOVA to improve estimation efficiency. This remains a standard strategy in many fields. He suggested running the OLS of Y i on ( 1 , Z i , X i ) and using the coefficient of Z i as an estimator for τ . Mathematically, let X ¯ be the mean of the covariates: X ¯ = N 1 i = 1 N X i . Fisher’s ANCOVA estimator τ ˆ is given by the following OLS output:

(16) ( τ ˆ F , α ˆ F , β ˆ F ) = arg min α , τ R , β R p i = 1 N { Y i α Z i τ ( X i X ¯ ) β } 2 ,

noting that the centering of covariates in (16) will not affect the OLS estimator τ ˆ F .

Freedman [32,33] studied Fisher’s ANCOVA estimator under the CRE. He showed that τ ˆ F can be biased in the finite sample, but is consistent for the true average effect as the sample size goes to infinity. Moreover, he showed some negative results for Fisher’s ANCOVA estimator. First, the asymptotic variance of τ ˆ F can be even larger than the simple difference-in-means estimator τ ˆ without adjusting any covariates. Second, the standard error estimator from OLS can underestimate the true standard error of τ ˆ F under the CRE.

2.2.2 Lin’s estimator

In response to Freedman’s negative findings, Lin [34] proposed a remedy, which is called “Lin’s estimator” nowadays. Concretely speaking, he proposed to run OLS of Y i on Z i and X i as well as their interaction term Z i × X i :

(17) ( τ ˆ L , α ˆ L , β ˆ L , η ˆ L ) = arg min α , τ R , β , η R p 1 2 N i = 1 N { Y i α Z i τ ( X i X ¯ ) β Z i × ( X i X ¯ ) η } 2 .

Importantly, unlike (16), the centering of covariates here is critical since it will affect the OLS estimator τ ˆ L .

Lin’s estimator is also consistent when the sample size N goes to infinity. Moreover, it enjoys several benefits. First, the asymptotic variance of τ ˆ L is not larger than that of both τ ˆ and τ ˆ F . Second, the EHW variance estimator for (17) is asymptotically conservative for the true variance of τ ˆ L . As a side note, the EHW standard error estimator for (16) is also asymptotically conservative for the true variance of τ ˆ F (see Lin [34] for a more formal presentation of the theoretical results).

Besides the regression proposal, a second perspective for understanding Lin’s estimator is based on minimizing the true or estimated variance of linearly adjusted estimators [62]. Consider the following class of linearly covariate-adjusted estimators:

(18) τ ˆ ( β 1 , β 0 ) = N 1 1 i = 1 N Z i { Y i ( X i X ¯ ) β 1 } N 0 1 i = 1 N ( 1 Z i ) { Y i ( X i X ¯ ) β 0 } = { Y ˆ ( 1 ) ( X ˆ ( 1 ) X ¯ ) β 1 } { Y ˆ ( 0 ) ( X ˆ ( 0 ) X ¯ ) β 0 } = τ ˆ δ τ ˆ X ,

where X ˆ ( 1 ) and X ˆ ( 0 ) denote the averages of covariates in treatment and control groups, τ ˆ X X ˆ ( 1 ) X ˆ ( 0 ) denotes the difference-in-means of covariates, and δ = N 0 N β 1 + N 1 N β 0 is a weighted average of the two linear adjustment coefficients. Obviously, the true variance of the covariate-adjusted estimator in (18) is minimized when δ is the least-squares coefficient from regressing the difference-in-means estimator τ ˆ on the difference-in-means of covariates τ ˆ X under the CRE. Li and Ding [59] showed that this is further achieved when β 1 and β 0 are the least-squares coefficients from projecting the treatment and control potential outcomes on covariates, respectively. Moreover, since the potential outcomes cannot be fully observed, we can estimate the least-squares coefficients by their sample analog β ˆ 1 and β ˆ 0 , which are the least-squares coefficients from the linear projection of observed outcomes on covariates in treatment and control groups, respectively. The resulting covariate-adjusted estimator τ ˆ δ ˆ τ ˆ X with δ ˆ = N 0 N β ˆ 1 + N 1 N β ˆ 0 is actually identical to Lin’s estimator.

We consider then the estimated variance for the covariate-adjusted estimator in (18). We can essentially view the covariate-adjusted estimator as the difference-in-means estimator but with the adjusted potential outcomes Y i ( 1 ) ( X i X ¯ ) β 1 and Y i ( 0 ) ( X i X ¯ ) β 0 . From the discussion in Section 2.1, a conservative variance estimator for (18) can be

(19) V ˆ ( β 1 , β 0 ) = { N 1 ( N 1 1 ) } 1 i = 1 N Z i { Y i γ ˆ 1 ( X i X ¯ ) β 1 } 2 + { N 0 ( N 0 1 ) } 1 i = 1 N ( 1 Z i ) { Y i γ ˆ 0 ( X i X ¯ ) β 0 } 2 ,

where γ ˆ 1 and γ ˆ 0 are the sample mean of the adjusted outcomes for the treatment and control arm, respectively:

γ ˆ 1 = 1 N 1 i = 1 N Z i { Y i ( X i X ¯ ) β 1 } and γ ˆ 0 = 1 N 0 i = 1 N ( 1 Z i ) { Y i ( X i X ¯ ) β 0 } .

This formulation suggests choosing β 1 and β 0 to minimize the variance estimator V ˆ ( β 1 , β 0 ) to obtain a plug-in estimator for β 1 and β 0 , which is equivalent to solving the following two regression problems for treated and control groups, respectively, with intercept terms γ 1 and γ 0 [51]:

(20) min γ 1 , β 1 i = 1 N Z i { Y i γ 1 ( X i X ¯ ) β 1 } 2 and min γ 0 , β 0 i = 1 N ( 1 Z i ) { Y i γ 0 ( X i X ¯ ) β 0 } 2 .

It is not difficult to see that the least-squares estimators for β 1 and β 0 from (20) are actually β ˆ 1 and β ˆ 0 defined before. Consequently, the resulting covariate-adjusted estimator τ ˆ ( β ˆ 1 , β ˆ 0 ) is equivalent to Lin’s estimator τ ˆ L . In addition, the corresponding variance estimator constructed as in (19) is asymptotically equivalent to the EHW variance estimator suggested by Lin [34].

From the above, Lin’s estimator not only achieves the minimum true variance among all linearly covariate-adjusted estimators in (18), but also achieves the minimum estimated variance when we use the conservative variance estimator of form (19). A subtle issue here is that Lin’s estimator uses estimated coefficients rather than fixed ones. With the technical tools discussed later, we can prove that the difference between Lin’s estimator and the one with the oracle adjustment coefficients is asymptotically equivalent (see, e.g., Li and Ding [59]).

2.2.3 Further extensions

There are a variety of extensions of covariate adjustment beyond treatment-control CREs.

First, it is natural to consider generalization to multiple treatment levels. Lu [63] studied covariate adjustment in 2 K factorial designs by extending (20) to multi-level settings. Zhao and Ding [60] considered covariate adjustment in general multi-level experiments and made comprehensive comparison among the unadjusted estimator, Fisher’s ANCOVA, and Lin’s estimator. The unadjusted estimator is given by the regression:

(21) ( γ ˆ 1 , N , , γ ˆ Q , N ) = arg min γ 1 , , γ Q i = 1 N Y i q = 1 Q γ q 1 { Z i = q } 2 .

The generalization of Fisher’s ANCOVA is given by the following additive treatment regression:

(22) ( γ ˆ 1 , F , , γ ˆ Q , F , η ˆ F ) = arg min γ 1 , , γ Q , η i = 1 N Y i q = 1 Q γ q 1 { Z i = q } ( X i X ¯ ) η 2 .

Meanwhile, Lin’s estimator can be generalized from either the regression with interaction perspective (17) or the (estimated) variance minimization perspective (18) or (20). Here, we present the former one, which applies the following fully interacted regressions:

(23) ( γ ˆ 1 , L , , γ ˆ Q , L , η ˆ 1 , L , , η ˆ Q , L ) = arg min γ 1 , , γ Q , η 1 , , η Q i = 1 N Y i q = 1 Q γ q 1 { Z i = q } q = 1 Q 1 { Z i = q } ( X i X ¯ ) η q 2 .

With the vectorized slope estimates γ ˆ * = ( γ ˆ 1 , * , , γ ˆ Q , * ) , where * = N , F , L , an estimator for the target average effect (9) is given by the plug-in estimator

τ ˆ * = F γ ˆ * , * = N , F , L .

Besides, we can obtain EHW variance estimators V ˆ EHW , * , which is conservative in large samples. In multi-level CRE, Lin’s estimator is also guaranteed to be at least as efficient as Fisher’s ANCOVA and Neyman’s difference-in-means estimator.

Second, covariate adjustment has also been discussed in treatment-control trials when the dimension of the covariates is diverging or high-dimensional. For example, Lei and Ding [64] proposed the following debiased estimator in treatment-control experiments:

τ ˆ adj de = τ ˆ L N 1 N 0 Δ ˆ 0 N 0 N 1 Δ ˆ 1 ,

where Δ ˆ z = N z 1 Z i = z e ˆ i H i i , z = 0 , 1 , where e ˆ i is the ith residual from Lin’s estimator (17) and H i i is the ith diagonal element of the hat matrix H = X ( X X ) 1 X , where X is an N × p matrix with rows consisting of the covariates for the N units. Under some structural conditions, the estimator τ ˆ adj de achieves asymptotic normality if the following condition holds:

κ 2 p log p = o ( 1 ) , where κ = max i = 1 , , N H i i .

In the favorable case where κ = O ( p N ) , the dimension p is allowed to grow as fast as o ( N 2 3 log ( N ) 1 3 ) , which is a strictly weaker restriction than that of τ ˆ L (see also the study of Lu et al. [65] for some recent development that allows p to be in the same order as N ). As another example, Bloniarz et al. [66] considered LASSO estimator for covariate adjustment in the high-dimensional regime. Under a sparse linear model and some regularity conditions, the LASSO-adjusted regression estimator is asymptotically normal and the asymptotic variance is not greater than that of the difference-in-means estimator.

Third, some works explored the other variants of Lin’s estimators. For example, Zhao and Ding [60] studied restricted least-squares and established for the first time its properties for inferring ATE from the design-based perspective. Guo and Basse [38] considered generalized Oaxaca–Blinder estimators and extend the covariate adjustment framework from linear models to nonlinear ones (see also the study of Cohen and Fogarty [67]).

2.3 Rerandomization

Neyman [1] focused on the CREs, which can balance all potential confounding factors, no matter observed or unobserved, on average and justifies the intuitive difference-in-means estimator for estimating the ATE. In practice, in the design stage of an experiment, we often have access to a (rich) set of pretreatment covariates, and it has been a routine to check whether these covariates are balanced between different treatment groups. As commented by Morgan and Rubin [68], for a realized treatment allocation, the covariates are likely to be imbalanced; for example, with ten independent covariates, at least one of the t -statistics for checking the imbalance of these covariates will exceed 2 with a probability of about 40%. It is then natural to incorporate the pretreatment covariate information into the design, aiming to get more balanced treatment groups as well as more efficient inference for treatment effects.

Blocking is a classical and popular approach that can balance a few discrete covariates, but its implementation is not obvious once we have many continuous covariates. Rerandomization, a design recently formally proposed by Morgan and Rubin [68], provides a general approach to improve covariate imbalance, although its idea has existed for a long time in the literature and dates back to many earlier works [6974]. In a recent survey of researchers conducting randomized experiments in developing countries [75], the authors discovered that rerandomization has been commonly used in practice. For example, Lee et al. [76] conducted a rerandomized experiment to study the effect of mobile banking for rural households and their migrated family members.

Under a general rerandomization design, for a randomly drawn treatment allocation, we will check the covariate balance between different treatment groups and see whether it satisfies a prespecified covariate balance criterion; if the balance criterion is met, we proceed to the actual experiment with this treatment allocation; otherwise, we redraw the treatment allocation and will keep redrawing until the balance criterion is met. Although the balance criterion can be general, in the context of a treatment-control experiment, Morgan and Rubin [68] suggested a balance criterion based on the Mahalanobis distance:

M = τ ˆ X { Cov ( τ ˆ X ) } 1 τ ˆ X = N 1 N 0 N τ ˆ X ( S X 2 ) 1 τ ˆ X ,

recalling that N 1 and N 0 are the treated and control group sizes, τ ˆ X is the difference-in-means of covariates defined as in Section 2.2.2, and S X 2 is the finite population covariance matrix of covariates defined as follows:

S X 2 = 1 N 1 i = 1 N ( X i X ¯ ) ( X i X ¯ ) .

Under rerandomization using the Mahalanobis distance, denoted by rerandomization with the Mahalanobis distance (ReM), we will repeatedly draw random treatment assignment from the CRE until obtaining an acceptable one with the corresponding Mahalanobis distance bounded by a prespecified threshold a .

Importantly, the analysis for rerandomization needs to take into account the selection step in its design. This is often ignored in practice, and rerandomization is often analyzed as if it was a CRE. Morgan and Rubin [68] proposed randomization tests for sharp null hypotheses, employing assignments generated randomly in accordance with the rerandomization protocol. More recently, Li et al. [77] conducted Neyman-type large-sample inference for rerandomization, considering also the intuitive difference-in-means estimator. They demonstrated that, asymptotically, the difference-in-means estimator is more concentrated around the true ATE with smaller asymptotic variance and shorter asymptotic quantile ranges, and proposed accurate confidence intervals for the average effect, which are always shorter than Neyman’s intervals for the CRE while remaining valid asymptotically under ReM.

In recent years, rerandomization has been extended to more general experiments, such as factorial experiments [78,79], blocked experiments [80,81], and survey experiments [58], and it can also be combined with covariate adjustment discussed in Section 2.2  [82]. Zhao and Ding [83] studied the procedure of conducting rerandomization directly based on p -values from covariate balance tests, which is a general strategy that works for many basic designs. An alternative rerandomization scheme that randomizes treatment assignments multiple times and chooses the one with the best covariate balance has also been used in practice [75], and its property has recently been studied in the work of Wang and Li [84].

3 Permutational CLTs

With all the design and analysis strategies introduced earlier, one natural question is how to theoretically justify their statistical property. In the following two sections, we focus on the technical aspect of CREs. The main question to answer is how to deliver valid inference with different estimators for different designs. Permutational/combinatorial CLTs and BEBs are core to the technical development of randomization-based inference. In Sections 3 and 4, we summarize the theoretical results regarding permutational CLTs and BEBs and discuss their application in analyzing randomized experiments.

3.1 Sample average under simple random sampling

We start with the simple random sampling from a finite population [62]. Let { a N ( i ) } i = 1 N be a sequence of real numbers. Suppose we randomly sample N 1 elements without replacement from the population and use a binary variable Z i to indicate the sampling status of the ith element, i.e., Z i = 1 indicates a N ( i ) being sampled while Z i = 0 not sampled. Write N 0 = N N 1 . Consider the sample average obtained from the aforementioned procedure:

(24) Γ = 1 N 1 i = 1 N a N ( i ) 1 { Z i = 1 } .

Γ has mean and variance

E { Γ } = a ¯ N , V N = Var { Γ } = 1 N 1 1 N S N 2 ,

where

a ¯ N = 1 N i = 1 N a N ( i ) , S N 2 = 1 N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 .

One fundamental technical question is to establish CLTs for Γ to characterize its asymptotic distribution. Paul and Rényi [85] established the following CLT for (24):

Proposition 1

If for any ε > 0 ,

(25) lim N i = 1 N ( a N ( i ) a ¯ N ) 2 1 { a N ( i ) a ¯ N > ε N 1 V N } i = 1 N ( a N ( i ) a ¯ N ) 2 0 ,

then as N ,

Γ E { Γ } V N N ( 0 , 1 ) .

Hájek [86] further proved that Condition (25) is not only sufficient but also necessary provided that N 1 , N 0 . Moreover, Theorem 1 covers some other works on finite population CLTs. For example, Madow [87] proved asymptotic normality under the conditions that N 1 and there exists δ ( 0 , 1 ) such that N 1 N < 1 δ when N is sufficiently large and that the following moment condition holds:

(26) N 1 i = 1 N a N ( i ) a ¯ N r N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 r 2 = O ( 1 ) , for all integers r > 2 .

The aforementioned moment condition (26) is stronger than (25), because for any r > 2 ,

i = 1 N ( a N ( i ) a ¯ N ) 2 1 { a N ( i ) a ¯ N > ε N 1 V N } i = 1 N ( a N ( i ) a ¯ N ) 2 = 1 N 1 N N 1 ε 2 N 1 i = 1 N a N ( i ) a ¯ N ε N 1 V N 2 1 { a N ( i ) a ¯ N > ε N 1 V N } 1 N 1 N N 1 ε 2 r N 1 i = 1 N a N ( i ) a ¯ N N 1 V N r 1 N 1 r 2 1 N 1 i = 1 N a N ( i ) a ¯ N r { N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 } r 2 ,

which converges to zero under (26).

David [88] established a CLT for the hypergeometric distribution, which is a special case of Madow [87] thus also stronger than (25). Li and Ding [59, Section 2.1] also provided a thorough exposition of CLT under the simple random sampling scheme with a sufficient condition based on the maximum squared distance.

3.2 Simple linear rank statistics

The sample average in (24) from an simple random sampling (SRS) is a special case of a more general type of permutational statistics, called simple linear rank statistics. Formally, let { a N ( i ) } i = 1 N and { b N ( i ) } i = 1 N be two sequences of real numbers. Let π be a random permutation over the indices 1 , , N , with π ( i ) denoting the permuted index of i . A simple linear rank statistic is defined as

(27) Γ = i = 1 N a N ( i ) b N ( π ( i ) ) ,

which has mean and variance

E { Γ } = N a ¯ N b ¯ N , V N = Var { Γ } = 1 N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 i = 1 N ( b N ( i ) b ¯ N ) 2 .

In particular, if we take b N ( i ) = N 1 1 for i = 1 , , N 1 and b N ( i ) = 0 for i = N 1 + 1 , , N , then (27) gives the sample average (24) in SRS. The statistic in (27) has been studied by many researchers. Wald and Wolfowitz [89] established CLT under the following condition: for all integers r > 2 ,

N 1 i = 1 N ( a N ( i ) a ¯ N ) r N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 r 2 = O ( 1 ) and N 1 i = 1 N ( b N ( i ) b ¯ N ) r N 1 i = 1 N ( b N ( i ) b ¯ N ) 2 r 2 = O ( 1 ) .

Noether [90] proved CLT under the following condition that is slightly weaker than Wald and Wolfowitz [89]: for all integers r > 2 ,

N 1 i = 1 N ( a N ( i ) a ¯ N ) r N 1 i = 1 N ( a N ( i ) a ¯ N ) 2 r 2 = O ( 1 ) and i = 1 N ( b N ( i ) b ¯ N ) r i = 1 N ( b N ( i ) b ¯ N ) 2 r 2 = o ( 1 ) ,

which, however, is not symmetric for a N ( i ) ’s and b N ( i ) ’s. Hoeffding [91] further proved CLT under a weaker and symmetric condition: for all integers r > 2 ,

(28) N r 2 1 i = 1 N ( a N ( i ) a ¯ N ) r i = 1 N ( a N ( i ) a ¯ N ) 2 r 2 i = 1 N ( b N ( i ) b ¯ N ) r i = 1 N ( b N ( i ) b ¯ N ) 2 r 2 = o ( 1 ) .

Motoo [92] proved that CLT holds under an even weaker Lindeberg-type condition:

Proposition 2

Suppose that for any ε > 0 ,

(29) lim N i , j = 1 N ( a N ( i ) a ¯ N ) 2 ( b N ( j ) b ¯ N ) 2 1 { ( a N ( i ) a ¯ N ) ( b N ( j ) b ¯ N ) > ε V N } i , j = 1 N ( a N ( i ) a ¯ N ) 2 ( b N ( j ) b ¯ N ) 2 = 0 .

Then,

Γ E { Γ } Var { Γ } N ( 0 , 1 ) .

Hájek [93] proved further that Condition (29) is not only sufficient but also necessary, and presented a comprehensive comparison of the CLT conditions introduced in the literature. There are also several multidimensional extensions based on the Cramér-Wold device (see, for example, refs [94], [93, Section 7], and [95, Lemma S.3.3]).

3.3 General univariate linear permutational statistics

Taking one step further from the simple linear rank statistics, the permutational CLTs are proposed for the following linear permutational statistic:

(30) Γ = i = 1 N M N ( i , π ( i ) ) ,

where { M N ( i , j ) } i , j [ N ] is a matrix in R N × N . In particular, if we take M N ( i , j ) = a N ( i ) b N ( j ) , (30) recovers (27). Hoeffding [91] computed the mean and variance of (30):

E { Γ } = 1 N i , j = 1 N M N ( i , j ) and V N = Var { Γ } = 1 N 1 i , j = 1 N M ˜ N ( i , j ) 2 ,

where M ˜ N ( i , j ) is the centered array based on the following rule:

(31) M ˜ N ( i , j ) = M N ( i , j ) 1 N M N ( i , + ) 1 N M N ( + , j ) + 1 N 2 M N ( + , + ) ,

where “+” means summation over the corresponding index.

Moreover, Hoeffding [91] showed that the asymptotic normality of Γ in (30) holds under the following condition:

(32) lim N N 1 i , j = 1 N M ˜ N ( i , j ) r { N 1 i , j = 1 N M ˜ N ( i , j ) 2 } r 2 = 0 , for all integers  r > 2 .

Condition (32) is equivalent to Condition (28) in the simple linear rank statistics setting. A more compact sufficient condition for (32) is also provided in Hoeffding [91]:

(33) lim N max i , j [ N ] M ˜ N ( i , j ) 2 N 1 i , j = 1 N M ˜ N ( i , j ) 2 = 0 .

Motoo [92] weakened Hoeffding’s [91]’s condition (32) to the following Lindeberg-type condition:

Proposition 3

(Main theorem of Motoo [92]) Suppose for any ε > 0 ,

lim N i , j = 1 N M ˜ N ( i , j ) 2 1 { M ˜ N ( i , j ) > ε V N } i , j = 1 N M ˜ N ( i , j ) 2 = 0 .

Then,

Γ E { Γ } Var { Γ } N ( 0 , 1 ) .

Remark 3

Although Proposition 3 gives the weakest condition for permutational CLT in the literature, it is not very convenient for use in many concrete examples. On the contrary, Condition (33) involves the maximum of the centered matrices and is simpler for use and interpretation. Condition (33) and its multivariate generalization (presented in (36) in Section 3.4) are frequently applied to investigation of the properties of various analysis and design strategies in randomized experiments presented in Section 2; for example, they can faciliate the proof for the convergence of variance estimation. We will have more discussion in Section 3.5.

3.4 General multivariate linear permutational statistics

We now discuss the generalization of (32) to a multivariate case. Concretely, define the multivariate linear permutational statistics:

(34) Γ = ( Γ 1 , , Γ H ) , Γ h = i = 1 N M N , h ( i , π ( i ) ) ,

where { M N , h ( i , j ) } i , j [ N ] , h = 1 , , H are H matrices in R N × N . Shi and Ding [96, Appendix A.1.] presented many basic facts about (34), including its mean and covariance calculation and its standardization. Fraser [94] extended Hoeffding [91] to the multi-dimensional setting by applying the Cramér–Wold device to establish a multivariate CLT. More concretely, define the centered version of M ˜ N , h in the same way as (31). Fraser [94] proposed the following condition for CLT as an extension to (32):

(35) lim N N 1 i , j = 1 N M ˜ N , h ( i , j ) r { N 1 i , j = 1 N M ˜ N , h ( i , j ) 2 } r 2 = 0 , for all integers  r > 2  and h [ H ] .

Similarly, Fraser [94] also provided a sufficient condition for (35):

(36) lim N max i , j [ N ] M ˜ N , h ( i , j ) 2 N 1 i , j = 1 N M ˜ N , h ( i , j ) 2 = 0 , for h [ H ] .

The condition in (36) is further utilized by Li and Ding [59] to build asymptotic normality results for analyzing treatment effects in multi-level CREs.

3.5 Application of permutational CLT in randomization-based inference

In this subsection, we collect several theoretical arguments in randomization-based inference that apply permutational CLTs to deliver technical justification for studying ATEs.

3.5.1 Wald-type inference in CREs

Consider analyzing a multi-level CRE discussed in Section 2.1.2 and adopt the notation introduced there. Li and Ding [59] proved the following result to justify the asymptotic validity of the confidence region (15) under several regularity conditions.

Proposition 4

(Theorem 5 and Proposition 3 of Li and Ding [59]) Let Q be fixed and N go to infinity. If the covariance matrix S has limiting values, N q N has positive limiting value, and

max 1 q Q max 1 i N Y i ( q ) Y ¯ ( q ) 2 N 0 ,

then the following conclusions hold:

  1. Asymptotic normality. N Var { τ ˆ } has a semi-positive definite limiting value V , and

    N ( τ ˆ τ ) N ( 0 , V ) ,

    where τ and τ ˆ are the ATE and the corresponding estimator in (9) and (10).

  2. Variance estimation. The sample variance S ˆ ( q , q ) in (14) is consistent for S ( q , q ) in (12).

  3. Wald-type inference. If the limit of N F Diag { N q 1 S ( q , q ) } F is nonsingular, then V ˆ in (13) is nonsingular with probability converging to one, and the Wald-type confidence region (15) has asymptotic coverage rate at least 1 α . Moreover, the asymptotic coverage rate equals 1 α if and only if the causal effects are asymptotically additive, in the sense that lim N F S F = 0 .

We briefly comment on the technical details behind Proposition 4. Proposition 4 (i) utilized the permutational CLT for general linear permutational statistics. In particular, the estimate τ ˆ follows the same distribution as (34) with matrices M N , h defined as follows: for i , j [ N ] ,

M N , h ( i , j ) = N q 1 F ( q , h ) Y i ( q ) , N q 1 + 1 j N q ,

where N 0 = 0 and F ( q , h ) is the ( q , h ) th element of the contrast matrix F . Now, applying Condition (36), we can obtain the regularity conditions on potential outcomes in Proposition 4 and justify the multivariate asymptotic normality of the estimator τ ˆ . Proposition 4 (ii) applied the Chebyshev inequality to the sample variance estimators and showed their consistency. Proposition 4 (iii) combined (i) and (ii) and formally established the asymptotic validity of (15). As a side remark, the matrix V is not required to be invertible, because the multivariate combinatorial CLT proved by Fraser [94] does not require an invertible limit for the covariance matrix. However, to justify the validity of the Wald-type confidence intervals in part (iii), invertibility is required. Proposition 4 covers the treatment-control experiments as special cases. In other words, under certain regularity conditions, the difference-in-means estimator τ ˆ in (2) is asymptotically normal, and the variance estimator V ˆ in (4) is consistent for a limit that is no less than the true asymptotic variance of τ ˆ . These then justify the asymptotic validity of the level- ( 1 α ) confidence interval in (5).

3.5.2 Analyzing covariate adjustment

For covariate adjustment, we present a result by Zhao and Ding [60]. Let Y ˆ * R Q ( * = N , F , L ) be the estimators for the averaged potential outcomes across all Q treatment levels from three estimation strategies: Y ˆ N from Neyman’s approach, Y ˆ F from Fisher’s ANCOVA, and Y ˆ L from Lin’s regression. These three estimators correspond to the coefficients γ ˆ * in front of the treatment indicators from the regressions introduced in Section 2.2.3, i.e., (21), (22), and (23). We slightly modified the notation in order to better present the results. Also, define V ˆ * to be the corresponding EHW robust covariance estimator from the three regressions. The following proposition from Zhao and Ding [60] established the asymptotic properties of these point and variance estimators.

Proposition 5

(Lemma 1 of Zhao and Ding [60]) Let N . Assume that, for q [ Q ] , e q = N q N has a limit in ( 0 , 1 ) . Assume that the first two finite population moments of { Y i ( q ) , X i , X i Y i ( q ) : q [ Q ] } have finite limits, and both S X 2 = ( N 1 ) 1 i = 1 N X i X i and its limit are nonsingular, where the covariates have been centered so that N 1 i = 1 N X i = 0 . Also, assume that N 1 i = 1 N Y i 4 ( q ) = O ( 1 ) , N 1 i = 1 N X i 4 4 = O ( 1 ) , and N 1 i = 1 N X i Y i ( q ) 4 4 = O ( 1 ) . Then, the following results hold:

  1. Asymptotic normality. N ( Y ˆ * Y ¯ ) N ( 0 , V * ) for some V * 0 , * = N , F , L .

  2. Conservative variance estimation. plim N N V ˆ * , EHW V * , * = N , F , L .

  3. Efficiency comparison. V L V N and V L V F .

Proposition 5 (i) established the asymptotic normality property of Neyman’s difference-in-means, Fisher’s ANCOVA, and Lin’s estimator. Together with the conservative variance estimation in Proposition 5 (ii), one can justify the asymptotic validity of Wald-type confidence regions constructed from these estimators. Proposition 5 (iii) indicates that Lin’s estimator guarantees at least as much asymptotic efficiency as the difference-in-means estimator and Fisher’s ANCOVA.

In terms of the technical derivation, Proposition 5 (i) utilized Proposition 4, which is a result motivated by Hoeffding and Fraser’s permutational CLT (more specifically, Conditions (33) and (36)) and can accommodate vector outcomes and multi-level randomized experiments. In particular, if we study the pseudo-potential outcome vector ( Y i ( q ) , X i ) , for q [ Q ] , we can apply similar tricks as in Section 3.5.1 to establish a multivariate CLT for the arm-wise sample means ( Y ˆ q , X ˆ q ) for q [ Q ] based on Condition (36). The covariate-adjusted estimator γ ˆ L in (23) can be formulated as linear combinations of these sample means, where the combination coefficients are consistent for certain constant coefficients in the sense that their difference is of order o P ( 1 ) . Then, a CLT can be derived after filling in the details [60]. Proposition 5 (ii) utilized the Chebyshev inequality under the bounded moment conditions. Proposition 5 (iii) involves some delicate analysis of the limiting variance structure V * for * = N , F , L , which has closed-form expressions [60].

3.5.3 Analyzing rerandomization

For rerandomization using the Mahalanobis distance discussed in Section 2.3, we adopt the notation introduced there and present the following result by Li et al. [77]. Define V as the variance of N τ ˆ under the CRE, and R 2 as the squared multiple correlation between the difference-in-means of outcome and covariates (see Proposition 1 of Li et al. [77] for its explicit expression). Let ε 0 N ( 0 , 1 ) , L K , a D 1 D D a with D = ( D 1 , , D K ) N ( 0 , I K ) , and ε 0 L K , a .

Proposition 6

(Li et al. [77], Theorems 1 and 2 and Appendix A4.2) Consider ReM with a fixed positive threshold a,and assume that, as N , (a) the proportions of units under treatment and control have positive limits, (b) the finite population variances and covariances for potential outcomes and covariates have limits, and the limit of S X 2 is nonsingular; and (c) N 1 max 1 i N Y i ( z ) Y ¯ ( z ) 2 0 and N 1 max 1 i N X i X ¯ 2 2 0 .

  1. Asymptotic distribution. N ( τ ˆ τ ) M a V 1 2 ( 1 R 2 ε 0 + R 2 L K , a ) .

  2. Conservative inference. We can construct estimators V ˆ and R ˆ 2 such that, as N , plim N ( V ˆ V ) 0 and plim N ( V ˆ R ˆ 2 V R 2 ) = 0 .

  3. Efficiency comparison. The asymptotic distribution under ReM has a smaller (or equal) variance and narrower (or equal) symmetric quantile ranges than that under the CRE. Here, the symmetric quantile range means the interval formulated by the lower and upper α 2 th quantile of the asymptotic distribution of N ( τ ˆ τ ) , for α ( 0 , 1 ) .

Proposition 6 (i) means that the distribution of the difference-in-means estimator under rerandomization converges in distribution to the same limit as V 1 2 ( 1 R 2 ε 0 + R 2 L K , a ) , a convolution of a Gaussian and a constrained Gaussian random variable, where the coefficient depends crucially on R 2 that represents an R 2 -type measure for the association between potential outcomes and covariates (see Li et al. [77] for more details). Interestingly, unlike that under the CRE, the asymptotic distribution of τ ˆ under rerandomization is non-Gaussian in general, while it is still symmetric and unimodal around zero [77]. In addition, when a = or R 2 = 0 , the asymptotic distribution reduces to that for the CRE. The former is not surprising, because ReM without rejecting any assignment is essentially the CRE. The latter is also intuitive, implying that ReM using covariates that are irrelevant to potential outcomes is asymptotically equivalent to the CRE without using any covariates. In Proposition 6(ii), we omit the explicit expressions of the estimators for conciseness. As a side remark, Li et al. [77] used an estimator for V that is less conservative than or asymptotically equivalent to references (4) by utilizing the covariate information; we refer interesting readers to references [77,97] for details. Importantly, Proposition 6(ii) shows that we can consistently estimate the coefficient of L K , a in the asymptotic distribution, while only conservatively estimating the coefficient of ε 0 . Fortunately, due to the symmetric and unimodal property of the asymptotic distribution, these will lead to conservative variance estimation and confidence intervals.

Proposition 6 (iii) demonstrates the advantage of rerandomization over the CRE. In particular, the stronger the association between covariates and potential outcomes, as measured by R 2 , the larger the gain from ReM [77]. Branson et al. [98] recently extended the comparison to non-symmetric quantile ranges.

We now discuss the technical aspects of Proposition 6. A key for its derivation is to note that the distribution of τ ˆ under rerandomization is the same as its conditional distribution under the CRE given that the covariate balance criterion is satisfied. This is emphasized by the conditioning on M a in Proposition 6 (i). Thus, to understand this conditional distribution, it suffices to study the joint distribution of the difference-in-means vector ( τ ˆ , τ ˆ X ) for both outcome and covariates, noting that M is a deterministic function of τ ˆ X . Such a joint distribution will be asymptotically normal, which can be derived using Proposition 4 by viewing ( Y ( z ) , X ) as a “potential outcome” vector. For the asymptotic conservative inference in Proposition 6 (ii), we can study the probability limits of the estimators V ˆ and R ˆ 2 again utilizing their properties under the CRE through the conditioning argument (see details at Li et al. [77]). Proposition 6(iii) involves careful analysis of the non-Gaussian distribution.

4 Permutational BEBs

4.1 Several univariate and multivariate permutational BEBs

Recently, permutational BEBs (also called combinatorial BEBs) start to raise attention in randomization-based inference for experiments. BEBs depict the distance between the sampling distribution of a statistic and a target, often normal, distribution. Theoretically speaking, it measures the convergence rate of CLTs. In general, the distance between two probability distributions is based on a class of metric of the following form:

d ( P 1 , P 2 ) = sup h h d P 1 h d P 2 .

In particular, BEBs consider to be the class of indicator functions over a family of sets. For univariate distributions, BEBs study the upper bound based on the Kolmogorov metric, where contains half-line indicator functions:

sup t R P 1 { X t } P 2 { X t } .

In the multivariate case, there are many choices of sets for different purposes, such as Euclidean balls [99], rectangular sets [100], and measurable convex sets [101,102].

Below, we review some theoretical progresses of permutational BEBs and their important application for analyzing randomized experiments in finite populations.

4.1.1 Univariate case

We consider the univariate linear permutational statistics in (30) and adopt the notation from Section 3.3. We will summarize BEB results for Γ upon standardization. The standardized version of Γ can be expressed as

Var { Γ } 1 2 ( Γ E { Γ } ) = i = 1 N M ˜ N ( i , π ( i ) ) 1 N 1 i , j = 1 N M ˜ N ( i , j ) 2 1 2 = i = 1 N M ˇ N ( i , π ( i ) ) ,

where

M ˇ ( i , j ) = M ˜ N ( i , j ) ( N 1 ) 1 i , j = 1 N M ˜ N ( i , j ) 2 1 2 .

Therefore, without loss of generality, we assume the following condition.

Condition 1

(Normalizing Γ ) Γ in Definition (30) is defined with M N satisfying the following normalizing condition:

M N ( i , + ) = M N ( + , j ) = M N ( + , + ) = 0 , for all i , j [ N ] ; i , j [ N ] M N ( i , j ) 2 = N 1 .

Von Bahr [103] and Ho and Chen [104] established some early results. Bolthausen [105] applied one version of Stein’s method [106] to establish the following result requiring only conditions concerning the third moment of the matrix M N .

Proposition 7

(Main theorem of Bolthausen [105]) Assume Condition 1. There exists some universal constant C > 0 , such that

(37) sup t R P { Γ t } Φ ( t ) C N 1 i , j [ N ] M N ( i , j ) 3 .

The upper bound on the right-hand-side of (37) achieves the rate of O ( N 1 2 ) if

(38) N 1 2 i , j [ N ] M N ( i , j ) 3 = O ( 1 ) .

von Bahr [103] imposed the following boundedness condition, which is sufficient for (38):

sup i , j [ N ] M N ( i , j ) = O ( N 1 2 ) .

As a side note, the aforementioned boundedness condition is also sufficient for the BEB in the study of Ho and Chen [104] to achieve the O ( N 1 2 ) convergence rate. Chen et al. [107, Chapter 6.1] presented a thorough discussion about the univariate permutational BEB. Proposition 7 is very helpful for analyzing the finite-sample quality of normal approximation for linear estimators. In Section 4.2, we provide an example of using Proposition 7 to analyze CREs with possibly varying group sizes and diverging treatment levels.

4.1.2 Multivariate case

We now consider the multivariate linear permutational statistics (34) in Section 3.4. For the ease of presentation, we focus on results upon standardization. Specifically, Shi and Ding [96, Lemma S2] proved that the standardized version of (34) can still be written as a multivariate linear permutational statistics with a different set of M ˇ N , h s:

Var { Γ } 1 2 ( Γ E { Γ } ) = i = 1 N M ˇ N , 1 ( i , π ( i ) ) , , i = 1 N M ˇ N , H ( i , π ( i ) ) ,

where M ˇ N , h s satisfy the following normalizing conditions:

(39) M ˇ N , h ( i , + ) = M ˇ N , h ( + , j ) = M ˇ N , h ( + , + ) = 0 , for all i , j [ N ] and h [ H ] ;

(40) i , j [ N ] M ˇ N , h ( i , j ) 2 = N 1 , for all h [ H ] ;

(41) i , j [ N ] M ˇ N , h ( i , j ) M ˇ N , h ( i , j ) = 0 , for all h h [ H ] .

M ˇ N , h s can be constructed from M N , h s by performing the centering step as in (31) and then applying a linear combination using the matrix Var { Γ } 1 2 . Therefore, without loss of generality, we assume the following condition.

Condition 2

(Normalizing Γ in the multivariate case) Γ in (34) is defined with M N , h ’s satisfying the normalizing conditions in (39), (40), and (41), which guarantees that E { Γ } = 0 H and Var { Γ } = I H .

Bolthausen and Gotze [108] extended the univariate result in Proposition 7 to the multivariate, possibly nonlinear setting. In particular, for the multivariate linear case, Bolthausen and Gotze [108, Theorem 1] established the following BEB.

Proposition 8

Let A be the family of all measurable convex sets. Let Γ Z be a random Gaussian vector that follows N ( 0 H , I H ) . Assume Condition 2. Then, there exists a constant C H that depends only on the dimension H such that

(42) sup A A P { Γ A } P { Γ Z A } C H N i , j [ N ] h = 1 H M N , h ( i , j ) 2 3 2 .

The BEB in (42) covers the univariate case in Proposition 7 as a special case with H = 1 . However, Bolthausen and Gotze [108] did not give a closed form expression for C H , whose dependence on the dimension H is unknown. Raic [109] conjectured the following result:

sup A A P { Γ A } P { Γ Z A } C H 1 4 N i , j [ N ] h [ H ] M N , h ( i , j ) 2 3 2 ,

where C H can be an absolute constant that does not depend on the dimension H . However, no formal proof is provided by the author. Chatterjee and Meckes [110] made one step forward to reveal the dimensional dependence using Stein’s method with multivariate exchangeable pairs. In Chatterjee and Meckes [110, Section 3.2], the authors established a bound for the following distance:

sup g C 2 ( R H ) E { g ( Γ ) } E { g ( Γ Z ) } ,

where C 2 ( R H ) stands for the class of 2-times continuously differentiable functions on R H . We state in the following a special case of Chatterjee and Meckes [110]’s result.

Proposition 9

Under Condition 2 and the condition of bounded entries:

(43) sup i , j [ N ] , h [ H ] M N , h ( i , j ) = O ( N 1 2 ) ,

we have

(44) sup g C 2 ( R H ) E { g ( Γ ) } E { g ( Γ Z ) } = O H 3 N 1 2 .

Nevertheless, (44) does not translate directly into a BEB under the Kolmogorov metric because the indicator functions are not members of C 2 ( R H ) . Shi and Ding [96, Theorem S2] made use of one key result established by Fang and Röllin [111] regarding Stein’s coupling and established the following multivariate permutational BEB with explicit dependence on the dimension.

Proposition 10

Under Condition 2 and the condition of bounded entries (43), we have

sup A A P { Γ A } P { Γ Z A } = O H 13 4 N 1 2 .

Proposition 10 is also useful for analyzing the finite-sample performance of many non-linear permutational statistics. Shi and Ding [96] used Proposition 10 to obtain a BEB for quadratic forms of a multi-dimensional estimator for causal effects in CRE, which builds up the ground for Wald-type inference (see Appendix E of Shi and Ding [96] for more discussion).

4.2 Application of permutational BEBs to randomization-based inference

In this section, we present several applications of permutational BEBs in randomization-based inference.

4.2.1 CREs with possibly varying group sizes and diverging treatment levels

Many classical experiments only involve a small number of treatment levels. For example, classical factorial experiments typically include a small number of factors (like K 5 ) [52]. However, many modern experiments involve a much larger number of treatment levels and units due to the need for analyzing more complex relations as well as the development of experimentation technologies. For example, in political science, powered by the development of computers and web-based technology, conjoint survey experiments [53,112,113] (as a special type of factorial experiments) are very popular for analyzing the effects of many factors together and answering complex causal questions. Zhirkov [113] investigated an experiment examining the impact of six different attributes of immigrants on public support for their admission to the United States. Caughey et al. [112] studied the impact of 12 ( K = 12 ) personal traits on citizens’ preference for U.S. presidential candidates. A large number of treatment levels pose new challenges to the analysis of randomized experiments and call for new methodological and theoretical developments. Shi and Ding [96] and Shi et al. [114] discussed general CREs where the number of treatment levels Q and the treatment group sizes N q ’s follow a variety of asymptotic regimes beyond the classical setup. Table 3 presents several possible regimes that are of interest both technically and practically.

Table 3

Theoretical results for multi-level experiments under the randomization model, originally from Table 1 of the study of Shi and Ding [96]

Regime Q N q CLT, variance estimation, and BEB
(R1) Small Large CLT and variance estimation established; no BEB
(R2) Large Large Seems similar to (R1) but not studied
(R3) Large Small but N q 2 Not studied
(R4) Large N q = 1 Not studied; variance estimation is nontrivial
(R5) Mixture of the above Not studied

The columns “ Q ” and “ N q ” stand for the number of treatment levels and the number of replications within the treatment levels, respectively. The last column summarizes how well each of the regimes is studied in the literature regarding CLT, variance estimation, and BEB.

Most of the regimes in Table 3 are less visited by literature and lack scientific justification. Shi and Ding [96] utilized permutational BEBs to characterize the normal approximation for sampling distributions of statistics in general CREs, and managed to present a unified discussion of all the regimes listed in Table 3. We elaborate on the usage of permutational BEBs with a canonical example in factorial experiments from Shi and Ding [96].

In a 2 K factorial design with K binary factors, there are Q = 2 K possible treatment levels. Index the potential outcomes Y i ( q ) ’s also as Y i ( z 1 , , z K ) ’s, where q = 1 , , Q and z 1 , , z K = 0 , 1 . The parameter of interest τ = F Y ¯ ( ) may consist of a subset of factorial effects, where F is a contrast matrix with orthogonal columns and entries of ± ( Q 2 ) 1 (see Dasgupta et al. [35] for precise definitions of main effects and interactions). The factorial design is called nearly uniform if the sizes of each arm, N q ’s, are approximately of the same order. More rigorously, we assume that there exists a positive integer N 0 > 0 and absolute constants c ̲ c ¯ such that N q = c q N 0 with c ̲ c q c ¯ , for all q = 1 , , Q . Such a setup can cover many cases in regimes (R1)–(R4) in Table 3. Shi and Ding [96] established the following result for the plug-in estimator τ ˆ = F Y ˆ ( ) :

Proposition 11

(Shi and Ding [96], Example 6, nearly uniform factorial design) Consider a nearly uniform 2 K factorial experiment. Let τ ˜ = Var { τ ˆ } 1 2 ( τ ˆ τ ) be the standardized version of τ ˆ . Let F R Q × H with H = K + K ( K 1 ) 2 = K ( K + 1 ) 2 be the contrast matrix for all main effects and two-way interactions. Recall the definition of S ( q , q ) from (12). Under some mild regularity conditions, we have

(45) sup b R H , b 2 = 1 sup t R P { b τ ˜ t } Φ ( t ) C σ F max q [ Q ] , i [ N ] Y i ( q ) Y ¯ ( q ) { min q [ Q ] S ( q , q ) } 1 2 K 2 N ,

where C > 0 is an absolute constant, and σ F > 0 is a certain constant related to the matrix F.

Proposition 11 is established based on the permutational BEB from Bolthausen [105] (presented in Proposition 7 in Section 4.1). In particular, one can formulate b τ ˜ as a linear permutational statistic with a carefully defined matrix M N and apply Proposition 7 to obtain a raw BEB. After taking supreme over all unit-norm vector b , the BEB can be simplified to the presented form (45). More technical details are provided in Appendix A of Shi and Ding [96]. Also, the BEB in (45) is uniform in the linear coefficient vector b R H with b 2 = 1 . This uniformity results in the additional dependence in K 2 (or the dimension H ). Intuitively with higher dimension H , the uniform bound becomes larger. From Proposition 11, we can obtain a sufficient condition for the upper bound (45) to converge to 0, which implies a CLT for any one-dimensional linear transformation of τ ˜ . In addition, Proposition 11 requires mainly the total sample size N to be large enough, and therefore allows either a fixed number of treatment levels Q and diverging replications N 0 , or a diverging Q with limited replications N 0 . Shi and Ding [96] also established design-based properties of Wald-type inference under general CREs, which utilizes multivarite permutational BEBs such as Proposition 10.

4.2.2 Rerandomization with diminishing covariate imbalance and diverging number of covariates

Li et al. [77] studied the asymptotic theory of rerandomization with a fixed covariate imbalance threshold that does not vary with the sample size, as discussed in Sections 2.3 and 3.5.3. The theory there suggests that the smaller the threshold, the more improvement we can gain from rerandomization over the complete randomization. Although intuitive, such a conclusion is not precise. When the covariate balance criterion is too stringent, there may be no acceptable assignments, and, even if there are acceptable ones, the asymptotic approximation may work poorly due to the small and even diminishing acceptance probability, i.e., the probability that a complete randomization is acceptable under rerandomization. Specifically and technically, the derivation for properties of rerandomization is through analyzing conditional distributions under the CRE, which will involve the acceptance probability in the denominator. The resulting asymptotic analysis will then encounter a ratio between two quantities of order o ( 1 ) when we allow the acceptance probability (or the imbalance threshold) to diminish with the sample size. In such cases, BEBs are crucial for conducting asymptotic analysis.

In the context of simple random sampling, Wang and Li [115] derived a multivariate BEB for the sample average using Hájek’s [86] coupling and the BEB for sums of independent random vectors [116] with explicit dependence on the dimension. The bound, although weaker than that implied by the conjecture in Raic [116], is sufficient for studying rerandomization with diminishing covariate imbalance threshold (or equivalently acceptance probability) and diverging number of covariates. With the derived BEBs, Wang and Li [115] presented the following asymptotic theory for ReM, which is stronger than Proposition 6. We adopt the same notation from Section 3.5.3 and denote the covariance imbalance threshold by a n and the number of covariates by K n , allowing them to vary with the sample size N . Let r 1 = N 1 N , r 0 = N 0 N , u i = ( r 0 Y i ( 1 ) + r 1 Y i ( 0 ) , X i ) , u ¯ and S u 2 be the finite population average and covariance of u i ’s, and

γ N ( K N + 1 ) 1 4 N r 1 r 0 1 N i = 1 N S u 1 ( u i u ¯ ) 2 3 ,

where S u 1 is the inverse of the positive semidefinite square root of S u 2 . We have the following BEB under ReM.

Proposition 12

(Wang and Li [115], Theorems 1 and 3) As N , if γ N 0 and p N γ N 1 3 with p N Pr ( χ K N 2 a N ) , then

sup c R Pr { Var { τ ˆ } 1 2 ( τ ˆ τ ) c M a N } Pr ( 1 R 2 ε 0 + R 2 L K N , a N c ) 0 ,

where τ is the true ATE.

Wang and Li [115] further studied additional conditions such that the constrained Gaussian random variable L K N , a N becomes ignorable as N , under which Var { τ ˆ } 1 2 ( τ ˆ τ ) can asymptotically follow the Gaussian distribution N ( 0 , 1 R 2 ) under rerandomization. This is the ideally optimal precision that one can expect under rerandomization, since the remaining variation is due to the part of potential outcomes that cannot be linearly explained by the covariates. Moreover, the Gaussian asymptotic distribution is the same as that of Lin’s regression-adjusted estimator under the CRE. Intuitively, rerandomization and covariate adjustment are dual of each other, where the former is at the design stage, while the latter is at the analysis stage. Wang and Li [115] further proposed large-sample valid confidence intervals for the ATE under rerandomization.

5 Extensions

Neyman [1] has motivated many important extensions for the design and analysis of randomized experiments, and the technical tools regarding permutations have been evolving during the past century. In this section, we discuss some other extensions beyond Neyman [1].

5.1 Other randomized experiments

In this section, we discuss several other widely used and studied randomized experiments, beyond Neyman’s [1] focus on the CRE.

5.1.1 Stratified (block) randomized experiments

Stratified randomized experiments (SRE) have been used widely in many fields, including agricultural study [117], biomedical study [118], and social science [119]. An SRE combines several different CREs according to the levels of a stratum indicator. Concretely speaking, consider an experiment with K strata. Denote the number and proportion of units in stratum k as N [ k ] and π [ k ] = N [ k ] N , respectively, where k = 1 , , K . Within stratum k , N [ k ] 1 units are randomized to receive treatment and the remaining N [ k ] 0 = N [ k ] N [ k ] 1 units are assigned to control. Across strata, the randomization is conducted independently. The treatment assignment distribution is uniform over all possible randomizations.

Analogous to CRE, in SRE, for unit i in stratum k , we have potential outcomes Y k i ( 1 ) and Y k i ( 0 ) and individual causal effect τ k i = Y k i ( 1 ) Y k i ( 0 ) . For stratum k , we have stratum-specific average causal effect

τ [ k ] = N [ k ] 1 i = 1 N [ k ] τ k i .

The overall average causal effect is

τ = N 1 k = 1 K i = 1 N [ k ] τ k i = k = 1 K π [ k ] τ [ k ] ,

which is also a weighted average of the stratum-specific average causal effects. For Neyman-type analysis, a point estimator can be obtained by taking a weighted average of stratum-specific difference-in-means estimators:

(46) τ ˆ S = k = 1 K π [ k ] τ ˆ [ k ] ,

where τ ˆ [ k ] is the difference-in-means estimator for stratum k . It has variance

Var { τ ˆ S } = k = 1 K π [ k ] 2 Var { τ ˆ [ k ] } ,

which motivates the variance estimator

V ˆ S = k = 1 K π [ k ] 2 S ˆ [ k ] 2 ( 1 ) N [ k ] 1 + S ˆ [ k ] 2 ( 0 ) N [ k ] 0 ,

with S ˆ [ k ] 2 ( 1 ) and S ˆ [ k ] 2 ( 0 ) being the stratum-specific sample variances for the treatment and control arms. A Wald-type confidence interval can then be constructed for τ .

Under certain regularity conditions, the point estimator (46) is asymptotically normal and Wald-type inference is proved to be asymptotically valid (see, for example, previous studies [120,121]). The random assignment mechanism requires studying a convolution of independent permutational distributions, which motivates new theoretical tools. When the total number of strata K is small and the sizes of the strata are large, the permutational CLTs play a central role in the analysis. When K is large and the sizes of the strata are small, CLTs for independent summations play a crucial role instead. With a mixture of large and small strata, there are also theoretical results in the literature (see, for example, previous studies [120,121]). Moreover, Liu et al. [122] and Wang et al. [80] further investigated covariate adjustment and rerandomization in SREs.

5.1.2 Matched-pair experiments (MPEs)

The MPE is another popular experimental design in practice [19,123,124]. The MPE is the most extreme version of the SRE with only one treated unit and one control unit within each stratum, which is called a pair. We can adopt the notations for the SRE to define potential outcomes, causal effects, stratum-specific difference-in-means estimator (denoted again as τ ˆ [ k ] ), and the aggregated difference-in-means estimator (denoted as τ ˆ M ) in the MPE. However, the variance estimation strategy discussed in Section 5.1.1 is no longer applicable for the MPE, since it implicitly requires at least two treated and control units within each matched set so that we can calculate the stratum-specific sample variances. Imai [124] proposed the following variance estimator by instead considering the sample variance of the stratum-specific difference-in-means estimators:

V ˆ M = 1 n ( n 1 ) k = 1 n ( τ ˆ [ k ] τ ˆ M ) 2 ,

and he showed that it is conservative in expectation for the true variance of τ ˆ M . We can then construct the Wald-type confidence interval

[ τ ˆ M z α 2 V ˆ M 1 2 , τ ˆ M + z α 2 V ˆ M 1 2 ] ,

which can be asymptotically valid under certain regularity conditions. Moreover, regression adjustment can be applied to improve efficiency when baseline covariates are available, as shown in the study by Fogarty [37].

In general stratified experiments with possibly one treated or one control unit in some strata, Fogarty [125] and Pashley and Miratrix [126] discussed general strategies to conservatively estimate the variance of the aggregated difference-in-means estimator.

5.1.3 Cluster randomized experiments

Cluster randomized experiments are widely used due to their logistical convenience and policy relevance. In a cluster randomized experiment, the treatment is assigned at the cluster level instead of the individual level. Consider a study with N units and M clusters. Cluster i has n i units ( i = 1 , , M ) . Let ( i , j ) index the j th unit within cluster i for i = 1 , , M and j = 1 , , n i . The experimenter randomly assigns M 1 clusters to receive the treatment and M 0 clusters to receive the control, where M 1 and M 0 are fixed positive integers satisfying M 1 + M 0 = M . Let Z i be the treatment indicator for cluster i and Z i j be the treatment indicator for unit ( i , j ) . In a cluster randomized experiment, units within a cluster receive identical treatment levels. So if cluster i receives treatment, then Z i j = Z i = 1 for all j . If cluster i receives control, then Z i j = Z i = 0 for all j . Let Y i j ( 1 ) and Y i j ( 0 ) be the potential outcomes under treatment and control, respectively, for unit ( i , j ) . The observed outcome is then Y i j = Z i j Y i j ( 1 ) + ( 1 Z i j ) Y i j ( 0 ) . The ATE over all units is

τ = 1 N i = 1 M j = 1 n i { Y i j ( 1 ) Y i j ( 0 ) } .

There are different strategies for inferring τ , including individual-level estimators and cluster-level estimators [127], both enjoying desirable asymptotic properties implied by permutational CLTs. We refer interested readers to a collection of works on analyzing cluster randomized experiments [127132].

5.2 Some technical aspects for permutations

Permutation is a core element in the design and analysis of randomized experiments, and its development, such as CLTs and BEBs, has involved many technical tools including moment matching [91], coupling [93,133], Stein’s method [105107,110]. In this section, we discuss some technical aspects for analyzing permutation-related problems.

5.2.1 Hajek’s coupling

Hajek’s coupling is one technique developed in the study by Hájek [86] for proving CLTs in simple random sampling. The idea is based on constructing a coupling between simple random sampling and Bernoulli random sampling so that CLTs for independent and identically distributed (i.i.d.) sampling can be applied. Wang and Li [115] utilized Hajek’s coupling together with a multivariate BEB for sum of independent random vectors [116] to study rerandomization with diminishing covariate imbalance. The techniques are useful to establish theories for a wide range of permutational statistics (see, e.g., previous studies [122,133]).

5.2.2 Double and multiple permutations

Nowadays, there are many new variants of permutations in designing randomized experiments. For example, Fredrickson and Chen [134] and Chen and Friedman [135] discussed permutation and randomization tests for analyzing network data. D’Amour and Airoldi [136] and Deng et al. [137] studied randomized experiments with dyadic outcomes, i.e., outcomes that measure the relationship between pairs of units. When randomization occurs at the unit level, the dyadic outcomes are in turn randomized with double permutations. Doubly indexed permutation statistics (DIPS) is useful in these settings because the dyadic potential outcomes are functions of pairs of treatments, and the statistics for studying causal effects in these problems are generally represented as DIPS. Bajari et al. [138,139] proposed multiple randomization designs for marketplaces in which multiple populations interact and causal questions regarding interference are of particular interest. In terms of technical tools that are potentially useful for analyzing double or multiple permutations, Chen et al. [107], Zhao et al. [140], Reinert and Röllin [141], among others, used Stein’s method [106] to study the asymptotic properties of DIPS.

5.2.3 Concentration inequalities

Another technical tool that has been recognized by the literature is permutational/combinatorial concentration inequalities. Bloniarz et al. [66] and Lei and Ding [64] used permutational concentration inequalities to analyze regression adjustment in CREs when the dimension of the covariates is diverging. It will be interesting to explore related potential research questions that involve delicate analysis of finite sample properties of permutational statistics and inspire the use of concentration inequalities.

6 Conclusion

In this review, we revisited the fundamental contributions of Neyman’s [1] seminal work regarding the introduction of potential outcomes, the promotion of physical randomization, and the emphasis of repeated sampling properties of statistics over the randomization distribution. These contributions lay down the foundation for the design and analysis of randomized experiments. We also reviewed permutational CLTs and BEBs in great detail, and listed applications of these technical results in randomization-based inference.

Beyond what we have covered in the review, many research works are closely related to Neyman [1]. From a technical point of view, many theoretical tools are not fully covered in the discussion. For example, when analyzing SREs, we need CLTs and BEBs that combine the independent permutational distributions [121,122]. This is also closely related to Rosenbaum’s sensitivity analysis for matched observational studies with biased permutations in each matched set [20,142,143]. As another example, for the design and analysis of adaptive experiments, a general martingale structure typically exists, which requires a martingale CLT or Berry–Esseen result [144,145].

From a practical point of view, many real-world examples can motivate the study of new designs, outcomes, assumptions, and causal estimands under the finite population framework. For example, interference among units is a common phenomenon in many experimental and observational studies. The study of interference and peer influence has motivated a lot of new designs and methods, such as designing and analyzing bipartite experiments [8,146], multiple randomization [138], randomized experiments with network interference [147], group formulation design [148,149], etc. Another example is randomization with missing observations or covariates. Zhao and Ding [150] discussed several strategies for randomization-based inference with missing covariates, and Zhao et al. [151] further studied covariate adjustment in randomized experiments with both missing outcomes and covariates. Censored survival outcomes are another type of missingness that occurs frequently in clinical trials. In these settings, to test the null hypothesis of no treatment effect for any unit, Rosenbaum [20] proposed randomization tests for censored outcomes using a partial ordering, and Zhang and Rosenberger [152] established asymptotic normality of the randomization distribution of the log-rank statistic. Both approaches require the assumption of identical potential censoring times under treatment and control. Recently, Li and Small [153] relaxed this assumption and proved that, under a Bernoulli randomized experiment, with non-informative i.i.d. censoring, the log-rank test is asymptotically valid for testing Fisher’s null hypothesis of no treatment effect on any unit.

At the same time, there have been extensive progresses for analyzing treatment effects from a super-population perspective, and many of them share similar spirit as the randomization-based inference [154]. For example, Yang and Tsiatis [155] have suggested linear covariate adjustment with treatment–covariate interaction under a semiparametric model (see also previous studies [156,157]). In the presence of censoring, there have been many works studying semiparametric estimation of treatment effect [158160], as well as covariate adjustment to improve inference efficiency [161163].

Acknowledgements

We thank the reviewers for carefully reading the manuscript and providing many constructive suggestions for improving the manuscript.

  1. Funding information: X. L. was partly supported by the National Science Foundation under Grant DMS-2400961.

  2. Author contributions: All authors have accepted responsibility for the entire content of this manuscript and consented to its submission to the journal, reviewed all the results, and approved the final version of the manuscript. LS and XL worked together to frame the structure of the review, collect related literature, and organize the presentation of the methodology and theory for the design and analysis of randomized experiments.

  3. Conflict of interest: Authors state no conflict of interest.

References

[1] Neyman J. On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statistical Science. 1923/1990. pp. 465–72. Search in Google Scholar

[2] Pitman EJ. Significance tests which may be applied to samples from any populations. Suppl J R Stat Soc. 1937;4(1):119–30. 10.2307/2984124Search in Google Scholar

[3] Welch BL. On the z-test in randomized blocks and Latin squares. Biometrika. 1937;29(1/2):21–52. 10.1093/biomet/29.1-2.21Search in Google Scholar

[4] Kempthorne O. The design and analysis of experiments. New York: Wiley; 1952. 10.1097/00010694-195205000-00012Search in Google Scholar

[5] Kempthorne O. The randomization theory of experimental inference. J Amer Stat Assoc. 1955;50(271):946–67. 10.1080/01621459.1955.10501979Search in Google Scholar

[6] Hudgens MG, Halloran ME. Toward causal inference with interference. J Amer Stat Assoc. 2008;103(482):832–42. 10.1198/016214508000000292Search in Google Scholar PubMed PubMed Central

[7] Tchetgen EJT, VanderWeele TJ. On causal inference in the presence of interference. Stat Methods Med Res. 2012;21(1):55–75. 10.1177/0962280210386779Search in Google Scholar PubMed PubMed Central

[8] Zigler CM, Papadogeorgou G. Bipartite causal inference with interference. Stat Sci Rev J Inst Math Stat. 2021;36(1):109. 10.1214/19-STS749Search in Google Scholar

[9] Liu L, Wang Y, Xu Y. A practical guide to counterfactual estimators for causal inference with time-series cross-sectional data. Amer J Politic Sci. 2022;68:160–76.10.1111/ajps.12723Search in Google Scholar

[10] Sjölander A, Frisell T, Kuja-Halkola R, Öberg S, Zetterqvist J. Carryover effects in sibling comparison designs. Epidemiology. 2016;27(6):852–8. 10.1097/EDE.0000000000000541Search in Google Scholar PubMed

[11] Imai K, Kim IS, Wang EH. Matching methods for causal inference with time-series cross-sectional data. Amer J Politic Sci. 2023;67(3):587–605. 10.1111/ajps.12685Search in Google Scholar

[12] Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Amer Stat Assoc. 1996;91(434):444–55. 10.1080/01621459.1996.10476902Search in Google Scholar

[13] VanderWeele T. Explanation in causal inference: methods for mediation and interaction. New York, NY: Oxford University Press; 2015. 10.1093/ije/dyw277Search in Google Scholar PubMed PubMed Central

[14] VanderWeele TJ. Mediation analysis: a practitioner’s guide. Ann Rev Public Health. 2016;37:17–32. 10.1146/annurev-publhealth-032315-021402Search in Google Scholar PubMed

[15] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688. 10.1037/h0037350Search in Google Scholar

[16] Rubin DB. Comment: Neyman (1923) and causal inference in experiments and observational studies. Stat Sci. 1990;5(4):472–80. 10.1214/ss/1177012032Search in Google Scholar

[17] Student. On testing varieties of cereals. Biometrika. 1923;15:271–93. 10.1093/biomet/15.3-4.271Search in Google Scholar

[18] Fisher RA, Mackenzie WA. Studies in crop variation. II. The manurial response of different potato varieties. J Agricult Sci. 1923;13(3):311–20. 10.1017/S0021859600003592Search in Google Scholar

[19] Fisher RA. The design of experiments. 1st ed. Edinburgh, London: Oliver and Boyd; 1935. Search in Google Scholar

[20] Rosenbaum PR. Observational studies. New York, NY: Springer-Verlag; 2002. 10.1007/978-1-4757-3692-2Search in Google Scholar

[21] Hinkelmann K, Kempthorne O. Design and analysis of experiments, Volume 1: Introduction to experimental design. vol. 1. Hoboken, NJ: John Wiley & Sons; 2007. Search in Google Scholar

[22] Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences. New York: Cambridge University Press; 2015. 10.1017/CBO9781139025751Search in Google Scholar

[23] Splawa-Neyman J. Contributions to the theory of small samples drawn from a finite population. Biometrika. 1925;17:472–9. 10.1093/biomet/17.3-4.472Search in Google Scholar

[24] Neyman J. On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J R Stat Soc Ser A: Stat Soc. 1934;97(4):558–606. 10.2307/2342192Search in Google Scholar

[25] Neyman J, Iwaszkiewicz K. Statistical problems in agricultural experimentation. Suppl J R Stat Soc. 1935;2(2):107–80. 10.2307/2983637Search in Google Scholar

[26] Fienberg SE, Tanur JM. Reconsidering the fundamental contributions of Fisher and Neyman on experimentation and sampling. Int Stat Rev/Revue Int de Stat. 1996;64:237–53. 10.2307/1403784Search in Google Scholar

[27] Rubin DB. Causal inference using potential outcomes: Design, modeling, decisions. J Amer Stat Assoc. 2005;100(469):322–31. 10.1198/016214504000001880Search in Google Scholar

[28] Scheffé H. The analysis of variance. New York: John Wiley & Sons; 1959. Search in Google Scholar

[29] Copas JB. Randomization models for the matched and unmatched 2×2 tables. Biometrika. 1973;60(3):467–76. 10.1093/biomet/60.3.467Search in Google Scholar

[30] Robins JM. Confidence intervals for causal parameters. Stat Med. 1988;7(7):773–85. 10.1002/sim.4780070707Search in Google Scholar PubMed

[31] Hinkelmann K, Kempthorne O. Design and analysis of experiments, introduction to experimental design. vol. 1. New York: John Wiley & Sons; 2007. 10.1002/9780470191750Search in Google Scholar

[32] Freedman DA. On regression adjustments to experimental data. Adv Appl Math. 2008;40(2):180–93. 10.1016/j.aam.2006.12.003Search in Google Scholar

[33] Freedman DA. On regression adjustments in experiments with several treatments. Ann Appl Stat. 2008;2:176–96. 10.1214/07-AOAS143Search in Google Scholar

[34] Lin W. Agnostic notes on regression adjustments to experimental data: Reexamining Freedmanas critique. Ann Appl Stat. 2013;7(1):295–318. https://doi.org/10.1214/12-AOAS583. Search in Google Scholar

[35] Dasgupta T, Pillai NS, Rubin DB. Causal inference from 2K factorial designs by using potential outcomes. J R Stat Soc Ser B. 2015;77:727–53. 10.1111/rssb.12085Search in Google Scholar

[36] Athey S, Imbens GW. The econometrics of randomized experiments. In: Banerjee A, Duflo E, editors. Handbook of economic field experiments. vol. 1. North-Holland, Amsterdam; 2017. p. 73–140. Search in Google Scholar

[37] Fogarty CB. Regression assisted inference for the average treatment effect in paired experiments. Biometrika. 2018;105:994–1000. 10.1093/biomet/asy034Search in Google Scholar

[38] Guo K, Basse G. The generalized Oaxaca-Blinder estimator. J Amer Stat Assoc. 2021;118:1–13. 10.1080/01621459.2021.1941053Search in Google Scholar

[39] Wu J, Ding P. Randomization tests for weak null hypotheses in randomized experiments. J Amer Stat Assoc. 2021;116(536):1898–913. 10.1080/01621459.2020.1750415Search in Google Scholar

[40] Rubin DB. Randomization analysis of experimental data: The Fisher randomization test comment. J Amer Stat Assoc. 1980;75(371):591–3. 10.2307/2287653Search in Google Scholar

[41] Ding P. A paradox from randomization-based causal inference. Stat Sci. 2017;32:331–45. 10.1214/16-STS571Search in Google Scholar

[42] Ding P, Dasgupta T. A randomization-based perspective on analysis of variance: a test statistic robust to treatment effect heterogeneity. Biometrika. 2018;105(1):45–56. 10.1093/biomet/asx059Search in Google Scholar

[43] Zhao A, Ding P. Covariate-adjusted Fisher randomization tests for the average treatment effect. J Econ. 2021;225(2):278–94. 10.1016/j.jeconom.2021.04.007Search in Google Scholar

[44] Cohen PL, Fogarty CB. Gaussian prepivoting for finite population causal inference. J R Stat Soc Ser B Stat Meth. 2022;84(2):295–320. 10.1111/rssb.12439Search in Google Scholar

[45] Rosenbaum PR. Effects attributable to treatment: inference in experiments and observational studies within a discrete pivot. Biometrika. 2001;88:219–31. 10.1093/biomet/88.1.219Search in Google Scholar

[46] Rigdon J, Hudgens MG. Exact confidence intervals in the presence of interference. Stat Probab Lett. 2015;105:130–5. 10.1016/j.spl.2015.06.011Search in Google Scholar PubMed PubMed Central

[47] Li X, Ding P. Exact confidence intervals for the average causal effect on a binary outcome. Stat Med. 2016;35:957–60. 10.1002/sim.6764Search in Google Scholar PubMed

[48] Caughey D, Dafoe A, Li X, Miratrix L. Randomization inference beyond the sharp null: bounded null hypotheses and quantiles of individual treatment effects. J R Stat Soc Ser B (Stat Meth). 2023;85:1471–91.10.1093/jrsssb/qkad080Search in Google Scholar

[49] Su Y, Li X. Treatment effect quantiles in stratified randomized experiments and matched observational studies. Biometrika. 2023;111:235–54. 10.1093/biomet/asad030Search in Google Scholar

[50] Chen Z, Li X, Zhang B. The role of randomization inference in unraveling individual treatment effects in clinical trials: Application to HIV vaccine trials. 2023. arXiv: http://arXiv.org/abs/arXiv:231014399. 10.1515/scid-2024-0001Search in Google Scholar PubMed PubMed Central

[51] Ding P. A first course in causal inference. 2023. arXiv: http://arXiv.org/abs/arXiv:230518793. Search in Google Scholar

[52] Wu CJ, Hamada MS. Experiments: planning, analysis, and optimization. Hoboken, NJ: John Wiley & Sons; 2011. Search in Google Scholar

[53] Hainmueller J, Hopkins DJ, Yamamoto T. Causal inference in conjoint analysis: Understanding multidimensional choices via stated preference experiments. Politic Anal. 2014;22(1):1–30. 10.1093/pan/mpt024Search in Google Scholar

[54] Hainmueller J, Hopkins DJ. The hidden American immigration consensus: A conjoint analysis of attitudes toward immigrants. Amer J Politic Sci. 2015;59(3):529–48. 10.1111/ajps.12138Search in Google Scholar

[55] Bauer DJ, Sterba SK, Hallfors DD. Evaluating group-based interventions when control participants are ungrouped. Multivariate Behav Res. 2008;43(2):210–36. 10.1080/00273170802034810Search in Google Scholar PubMed PubMed Central

[56] Hallfors D, Cho H, Sanchez V, Khatapoush S, Kim HM, Bauer D. Efficacy vs effectiveness trial results of an indicated “model” substance abuse program: implications for public health. Amer J Public Health. 2006;96(12):2254–9. 10.2105/AJPH.2005.067462Search in Google Scholar PubMed PubMed Central

[57] Branson Z, Dasgupta T. Sampling-based randomised designs for causal inference under the potential outcomes framework. Int Stat Rev. 2020;88:101–21. 10.1111/insr.12339Search in Google Scholar

[58] Yang Z, Qu T, Li X. Rejective sampling, rerandomization, and regression adjustment in survey experiments. J Amer Stat Assoc. 2021;118:1207–21.10.1080/01621459.2021.1984926Search in Google Scholar

[59] Li X, Ding P. General forms of finite population central limit theorems with applications to causal inference. J Amer Stat Assoc. 2017;112(520):1759–69. 10.1080/01621459.2017.1295865Search in Google Scholar

[60] Zhao A, Ding P. Covariate adjustment in multiarmed, possibly factorial experiments. J R Stat Soc Ser B Stat Methodol. 2023;85(1):1–23. 10.1093/jrsssb/qkac003Search in Google Scholar

[61] Fisher RA. Statistical methods for research workers. 1st ed. Edinburgh: Oliver and Boyd; 1925. Search in Google Scholar

[62] Cochran WG. Sampling techniques. Hoboken, NJ: John Wiley & Sons; 1977. Search in Google Scholar

[63] Lu J. Covariate adjustment in randomization-based causal inference for 2K factorial designs. Stat Probabil Lett. 2016;119:11–20. 10.1016/j.spl.2016.07.010Search in Google Scholar

[64] Lei L, Ding P. Regression adjustment in completely randomized experiments with a diverging number of covariates. Biometrika. 2020 Dec;108(4):815–28. https://doi.org/10.1093. Search in Google Scholar

[65] Lu X, Yang F, Wang Y. Debiased regression adjustment in completely randomized experiments with moderately high-dimensional covariates. 2023. arXiv: http://arXiv.org/abs/arXiv:230902073. Search in Google Scholar

[66] Bloniarz A, Liu H, Zhang CH, Sekhon JS, Yu B. Lasso adjustments of treatment effect estimates in randomized experiments. Proc Nat Acad Sci. 2016;113(27):7383–90. 10.1073/pnas.1510506113Search in Google Scholar PubMed PubMed Central

[67] Cohen PL, Fogarty CB. No-harm calibration for generalized oaxaca-blinder estimators. 2020. arXiv: http://arXiv.org/abs/arXiv:201209246. Search in Google Scholar

[68] Morgan KL, Rubin DB. Rerandomization to improve covariate balance in experiments. Ann Stat. 2012;40(2):1263–82. 10.1214/12-AOS1008Search in Google Scholar

[69] Sprott D, Farewell V. Randomization in experimental science. Stat Papers. 1993;34:89–94. 10.1007/BF02925530Search in Google Scholar

[70] Rubin DB. Comment: The design and analysis of gold standard randomized experiments. J Amer Stat Assoc. 2008;103(484):1350–3. 10.1198/016214508000001011Search in Google Scholar

[71] Worrall J. Evidence: philosophy of science meets medicine. J Evaluat Clin Practice. 2010;16(2):356–62. 10.1111/j.1365-2753.2010.01400.xSearch in Google Scholar PubMed

[72] Cox D. Randomization in the design of experiments. Int Stat Rev. 2009;77(3):415–29. 10.1111/j.1751-5823.2009.00084.xSearch in Google Scholar

[73] Bruhn M, McKenzie D. In pursuit of balance: Randomization in practice in development field experiments. Amer Econ J Appl Econ. 2009;1(4):200–32. Search in Google Scholar

[74] Maclure M, Nguyen A, Carney G, Dormuth C, Roelants H, Ho K, et al. Measuring prescribing improvements in pragmatic trials of educational tools for general practitioners. Basic Clin Pharm Toxicol. 2006;98(3):243–52. 10.1111/j.1742-7843.2006.pto_301.xSearch in Google Scholar PubMed

[75] Bruhn M, McKenzie D. In pursuit of balance: randomization in practice in development field experiments. Amer Econ J Appl Econ. 2009;1:200–32. 10.1257/app.1.4.200Search in Google Scholar

[76] Lee JN, Morduch J, Ravindran S, Shonchoy A, Zaman H. Poverty and migration in the digital age: experimental evidence on mobile banking in Bangladesh. Amer Econ J Appl Econ. 2021;13:38–71. 10.1257/app.20190067Search in Google Scholar

[77] Li X, Ding P, Rubin DB. Asymptotic theory of rerandomization in treatment-control experiments. Proc Nat Acad Sci. 2018;115(37):9157–62. 10.1073/pnas.1808191115Search in Google Scholar PubMed PubMed Central

[78] Branson Z, Dasgupta T, Rubin DB. Improving covariate balance in 2 K factorial designs via rerandomization with an application to a New York city department of education high school study. Ann Appl Stat. 2016:1958–76. 10.1214/16-AOAS959Search in Google Scholar

[79] Li X, Ding P, Rubin D. Rerandomization in 2K factorial experiments. Ann Stat. 2020;48(1):43–63. 10.1214/18-AOS1790Search in Google Scholar

[80] Wang X, Wang T, Liu H. Rerandomization in stratified randomized experiments. J Amer Stat Assoc. 2023;118(542):1295–304. 10.1080/01621459.2021.1990767Search in Google Scholar

[81] Johansson P, Schultzberg M. Rerandomization: A complement or substitute for stratification in randomized experiments? J Stat Plan Inference. 2022;218:43–58. 10.1016/j.jspi.2021.09.002Search in Google Scholar

[82] Li X, Ding P. Rerandomization and regression adjustment. J R Stat Soc Ser B Stat Meth. 2020;82(1):241–68. 10.1111/rssb.12353Search in Google Scholar

[83] Zhao A, Ding P. No star is good news: A unified look at rerandomization based on p-values from covariate balance tests. J Econ. 2024;241(1):105724. 10.1016/j.jeconom.2024.105724Search in Google Scholar

[84] Wang Y, Li X. Asymptotic theory of the best-choice rerandomization using the Mahalanobis distance. 2023. arXiv: http://arXiv.org/abs/arXiv:231202513. Search in Google Scholar

[85] Paul E, Rényi A. On the central limit theorem for samples from a finite population. Publ Math Inst Hungarian Acad Sci. 1959;4:49–61. Search in Google Scholar

[86] Hájek J. Limiting distributions in simple random sampling from a finite population. Publ Math Inst Hungarian Acad Sci. 1960;5:361–74. Search in Google Scholar

[87] Madow WG. On the limiting distributions of estimates based on samples from finite universes. Ann Math Stat. 1948;19:535–45. 10.1214/aoms/1177730149Search in Google Scholar

[88] David F. Limiting distributions connected with certain methods of sampling human populations. Stat Res Mem. 1938;2:69–90. Search in Google Scholar

[89] Wald A, Wolfowitz J. Statistical tests based on permutations of the observations. Ann Math Stat. 1944;15(4):358–72. 10.1214/aoms/1177731207Search in Google Scholar

[90] Noether GE. On a theorem by Wald and Wolfowitz. Ann Math Stat. 1949;20(3):455–8. 10.1214/aoms/1177730000Search in Google Scholar

[91] Hoeffding W. A combinatorial central limit theorem. Ann Math Stat. 1951;22:558–66. 10.1214/aoms/1177729545Search in Google Scholar

[92] Motoo M. On the Hoeffding’s combinatorial central limit theorem. Ann Inst Stat Math. 1956;8:145–54. 10.1007/BF02863580Search in Google Scholar

[93] Hájek J. Some extensions of the Wald-Wolfowitz-Noether theorem. Ann Math Stat. 1961;32:506–23. 10.1214/aoms/1177705057Search in Google Scholar

[94] Fraser D. A vector form of the Wald-Wolfowitz-Hoeffding theorem. Ann Math Stat. 1956;27:540–3. 10.1214/aoms/1177728279Search in Google Scholar

[95] DiCiccio CJ, Romano JP. Robust permutation tests for correlation and regression coefficients. J Amer Stat Assoc. 2017;112(519):1211–20. 10.1080/01621459.2016.1202117Search in Google Scholar

[96] Shi L, Ding P. Berry–Esseen bounds for design-based causal inference with possibly diverging treatment levels and varying group sizes. 2022. arXiv: http://arXiv.org/abs/arXiv:220912345. Search in Google Scholar

[97] Ding P, Feller A, Miratrix L. Decomposing treatment effect variation. J Amer Stat Assoc. 2019;114:304–17. 10.1080/01621459.2017.1407322Search in Google Scholar

[98] Branson Z, Li X, Ding P. Power and sample size calculations for rerandomization. Biometrika. 2023;111:355–63. 10.1093/biomet/asad027Search in Google Scholar

[99] Bentkus V. On the dependence of the Berry–Esseen bound on dimension. J Stat Plan Infer. 2003;113(2):385–402. 10.1016/S0378-3758(02)00094-0Search in Google Scholar

[100] Chernozhukov V, Chetverikov D, Kato K. Central limit theorems and bootstrap in high dimensions. Ann Probability. 2017;45(4):2309. 10.1214/16-AOP1113Search in Google Scholar

[101] Bentkus V. A Lyapunov-type bound in Rd. Theory Probabil Appl. 2005;49(2):311–23. 10.1137/S0040585X97981123Search in Google Scholar

[102] Bhattacharya RN, Rao RR. Normal approximation and asymptotic expansions. Philadelphia, PA: SIAM; 2010. 10.1137/1.9780898719895Search in Google Scholar

[103] von Bahr B. Remainder term estimate in a combinatorial limit theorem. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. 1976;35(2):131–9. 10.1007/BF00533317Search in Google Scholar

[104] Ho ST, Chen LH. An Lp bound for the remainder in a combinatorial central limit theorem. Ann Probability. 1978;6(2):231–49. 10.1214/aop/1176995570Search in Google Scholar

[105] Bolthausen E. An estimate of the remainder in a combinatorial central limit theorem. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete. 1984;66(3):379–86. 10.1007/BF00533704Search in Google Scholar

[106] Stein C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: Probability Theory. vol. 6. University of California Press; 1972. p. 583–603. Search in Google Scholar

[107] Chen LH, Goldstein L, Shao QM. Normal approximation by Steinas method. vol. 2. New York, NY: Springer; 2011. 10.1007/978-3-642-15007-4Search in Google Scholar

[108] Bolthausen E, Gotze F. The rate of convergence for multivariate sampling statistics. Ann Stat. 1993;21:1692–710. 10.1214/aos/1176349393Search in Google Scholar

[109] Raic M. Multivariate normal approximation: Permutation statistics, local dependence and beyond; 2015. Search in Google Scholar

[110] Chatterjee S, Meckes E. Multivariate normal approximation using exchangeable pairs. 2007. arXiv: http://arXiv.org/abs/math/0701464v1. Search in Google Scholar

[111] Fang X, Röllin A. Rates of convergence for multivariate normal approximation with applications to dense graphs and doubly indexed permutation statistics. Bernoulli. 2015;21:2157–89. 10.3150/14-BEJ639Search in Google Scholar

[112] Caughey D, Katsumata H, Yamamoto T. Item response theory for conjoint survey experiments. Working Paper; 2019. Search in Google Scholar

[113] Zhirkov K. Estimating and using individual marginal component effects from conjoint experiments. Politic Anal. 2022;30(2):236–49. 10.1017/pan.2021.4Search in Google Scholar

[114] Shi L, Wang J, Ding P. Forward screening and post-screening inference in factorial designs. 2023. arXiv: http://arXiv.org/abs/arXiv:230112045. Search in Google Scholar

[115] Wang Y, Li X. Rerandomization with diminishing covariate imbalance and diverging number of covariates. Ann Stat. 2022;50(6):3439–65. 10.1214/22-AOS2235Search in Google Scholar

[116] Raiccc M. A multivariate Berry–Esseen theorem with explicit constants. Bernoulli. 2019;25(4A):2824–53.10.3150/18-BEJ1072Search in Google Scholar

[117] Petersen RG. Agricultural field experiments: design and analysis. Boca Raton, FL: CRC Press; 1994. Search in Google Scholar

[118] Goldner MG, Knatterud GL, Prout TE. Effects of hypoglycemic agents on vascular complications in patients with adult-onset diabetes: III. Clinical implications of UGDP results. JAMA. 1971;218(9):1400–10. 10.1001/jama.218.9.1400Search in Google Scholar

[119] Chong A, Cohen I, Field E, Nakasone E, Torero M. Iron deficiency and schooling attainment in Peru. Amer Econ J Appl Econ. 2016;8(4):222–55. 10.1257/app.20140494Search in Google Scholar

[120] Bickel PJ, Freedman DA. Asymptotic normality and the bootstrap in stratified sampling. Ann Stat. 1984;12:470–82. 10.1214/aos/1176346500Search in Google Scholar

[121] Liu H, Yang Y. Regression-adjusted average treatment effect estimates in stratified randomized experiments. Biometrika. 2020;107(4):935–48. 10.1093/biomet/asaa038Search in Google Scholar

[122] Liu H, Ren J, Yang Y. Randomization-based joint central limit theorem and efficient covariate adjustment in randomized block 2K factorial experiments. J Amer Stat Assoc. 2022;119:1–15. 10.1080/01621459.2022.2102985Search in Google Scholar

[123] Ball S, et al. Reading with television: an evaluation of the electric company. A report to the children’s television workshop. Volumes 1 and 2. 1973. Search in Google Scholar

[124] Imai K. Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Stat Med. 2008;27(24):4857–73. 10.1002/sim.3337Search in Google Scholar PubMed

[125] Fogarty CB. On mitigating the analytical limitations of finely stratified experiments. J R Stat Soc Ser B Stat Methodol. 2018;80(5):1035–56. 10.1111/rssb.12290Search in Google Scholar

[126] Pashley NE, Miratrix LW. Insights on variance estimation for blocked and matched pairs designs. J Educat Behav Stat. 2021;46(3):271–96. 10.3102/1076998620946272Search in Google Scholar

[127] Su F, Ding P. Model-assisted analyses of cluster-randomized experiments. J R Stat Soc Ser B Stat Meth. 2021;83(5):994–1015. 10.1111/rssb.12468Search in Google Scholar

[128] Abadie A, Athey S, Imbens GW, Wooldridge JM. When should you adjust standard errors for clustering? Quarter J Econ. 2023;138(1):1–35. 10.1093/qje/qjac038Search in Google Scholar

[129] Middleton JA, Aronow PM. Unbiased estimation of the average treatment effect in cluster-randomized experiments. Stat Politic Policy. 2015;6(1–2):39–75. 10.1515/spp-2013-0002Search in Google Scholar

[130] Lu X, Liu T, Liu H, Ding P. Design-based theory for cluster rerandomization. Biometrika. 2023;110(2):467–83. 10.1093/biomet/asac045Search in Google Scholar

[131] Schochet PZ, Pashley NE, Miratrix LW, Kautz T. Design-based ratio estimators and central limit theorems for clustered, blocked RCTs. J Amer Stat Assoc. 2022;117(540):2135–46. 10.1080/01621459.2021.1906685Search in Google Scholar

[132] Athey S, Imbens GW. The econometrics of randomized experiments. In: Handbook of economic field experiments. vol. 1. Amsterdam: Elsevier; 2017. p. 73–140. 10.1016/bs.hefe.2016.10.003Search in Google Scholar

[133] Hájek J. Asymptotic normality of simple linear rank statistics under alternatives. Ann Math Stat. 1968;39:325–46.10.1214/aoms/1177698394Search in Google Scholar

[134] Fredrickson MM, Chen Y. Permutation and randomization tests for network analysis. Soc Networks. 2019;59:171–83. 10.1016/j.socnet.2019.08.001Search in Google Scholar

[135] Chen H, Friedman JH. A new graph-based two-sample test for multivariate and object data. J Amer Stat Assoc. 2017;112(517):397–409. 10.1080/01621459.2016.1147356Search in Google Scholar

[136] D’Amour A, Airoldi E. Causal inference for dyadic outcomes in social network analysis. 2016. Search in Google Scholar

[137] Deng L, Li Y, Zhang J, Wang Y, Chen C. Unbiased estimation for total treatment effect under interference using aggregated dyadic data. 2024. arXiv: http://arXiv.org/abs/arXiv:240212653. Search in Google Scholar

[138] Bajari P, Burdick B, Imbens GW, Masoero L, McQueen J, Richardson T, et al. Multiple randomization designs; 2021. arXiv:2112.13495.Search in Google Scholar

[139] Bajari P, Burdick B, Imbens GW, Masoero L, McQueen J, Richardson TS, et al. Experimental design in marketplaces. Stat Sci. 2023;1(1):1–19. 10.1214/23-STS883Search in Google Scholar

[140] Zhao L, Bai Z, Chao CC, Liang WQ. Error bound in a central limit theorem of double-indexed permutation statistics. Ann Stat. 1997;25(5):2210–27. 10.1214/aos/1069362395Search in Google Scholar

[141] Reinert G, Röllin A. Multivariate normal approximation with Stein’s method of exchangeable pairs under a general linearity condition. Ann Probability. 2007;37(6):2150–73. 10.1214/09-AOP467Search in Google Scholar

[142] Gastwirth JL, Krieger AM, Rosenbaum PR. Asymptotic separability in sensitivity analysis. J R Stat Soc Ser B. 2000;62:545–55. 10.1111/1467-9868.00249Search in Google Scholar

[143] Wu D, Li X. Sensitivity analysis for quantiles of hidden biases in matched observational studies. 2023. arXiv: http://arXiv.org/abs/arXiv:230906459. Search in Google Scholar

[144] Hu F, Rosenberger WF. The theory of response-adaptive randomization in clinical trials. Hoboken, NJ: John Wiley & Sons; 2006. 10.1002/047005588XSearch in Google Scholar

[145] Hall P, Heyde CC. Martingale limit theory and its application. San Diego, CA: Academic Press; 2014. Search in Google Scholar

[146] Harshaw C, Sävje F, Eisenstat D, Mirrokni V, Pouget-Abadie J. Design and analysis of bipartite experiments under a linear exposure-response model. Elect J Stat. 2023;17(1):464–518. 10.1214/23-EJS2111Search in Google Scholar

[147] Leung MP. Causal inference under approximate neighborhood interference. Econometrica. 2022;90(1):267–93. 10.3982/ECTA17841Search in Google Scholar

[148] Li X, Ding P, Lin Q, Yang D, Liu JS. Randomization Inference for Peer Effects. J Amer Stat Assoc. 2019;114:1651–64. 10.1080/01621459.2018.1512863Search in Google Scholar

[149] Basse G, Ding P, Feller A, Toulis P. Randomization tests for peer effects in group formation experiments. 2019. arXiv: http://arXiv.org/abs/arXiv:190402308. Search in Google Scholar

[150] Zhao A, Ding P. To adjust or not to adjust? estimating the average treatment effect in randomized experiments with missing covariates. J Amer Stat Assoc. 2022;119:1–11. 10.1080/01621459.2022.2123814Search in Google Scholar

[151] Zhao A, Ding P, Li F. Covariate adjustment in randomized experiments with missing outcomes and covariates. Biometrika. 2024;111:asae017. 10.1093/biomet/asae017Search in Google Scholar

[152] Zhang Y, Rosenberger WF. On asymptotic normality of the randomization-based logrank test. Nonparametric Stat. 2005;17(7):833–9. 10.1080/10485250500270826Search in Google Scholar

[153] Li X, Small DS. Randomization-based test for censored outcomes: a new look at the Logrank test. Stat Sci. 2023;38(1):92–107. 10.1214/22-STS851Search in Google Scholar

[154] Ding P, Li X, Miratrix LW. Bridging finite and super population causal inference. J Causal Infer. 2017;5:20160027. 10.1515/jci-2016-0027Search in Google Scholar

[155] Yang L, Tsiatis AA. Efficiency study of estimators for a treatment effect in a Pretest-Posttest trial. Amer Stat. 2001;55:314–21. 10.1198/000313001753272466Search in Google Scholar

[156] Rosenblum M, van der Laan MJ. Simple, efficient estimators of treatment effects in randomized trials using generalized linear models to Leverage baseline variables. Int J Biostat. 2010;6:6. 10.2202/1557-4679.1138Search in Google Scholar PubMed PubMed Central

[157] Bang H, Robins JM. Doubly robust estimation in missing data and causal inference models. Biometrics. 2005;61:962–73. 10.1111/j.1541-0420.2005.00377.xSearch in Google Scholar PubMed

[158] Laan MJ, Robins JM. Unified methods for censored longitudinal data and causality. New York, NY: Springer; 2003. Search in Google Scholar

[159] Rubin D, van der Laan MJ. A doubly robust censoring unbiased transformation. Int J Biostat. 2007;3(1):4. 10.2202/1557-4679.1052.Search in Google Scholar PubMed

[160] Van der Laan MJ, Rose S, et al. Targeted learning: causal inference for observational and experimental data. vol. 4. New York, NY: Springer; 2011. 10.1007/978-1-4419-9782-1Search in Google Scholar

[161] Hernández AV, Eijkemans MJ, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: how much does prespecified covariate adjustment increase power? Ann Epidemiol. 2006;16(1):41–8. 10.1016/j.annepidem.2005.09.007Search in Google Scholar PubMed

[162] Lu X, Tsiatis AA. Improving the efficiency of the log-rank test using auxiliary covariates. Biometrika. 2008;95(3):679–94. 10.1093/biomet/asn003Search in Google Scholar

[163] Moore KL, van der Laan MJ. Increasing power in randomized trials with right censored outcomes through covariate adjustment. J Biopharm Stat. 2009;19(6):1099–131. 10.1080/10543400903243017Search in Google Scholar PubMed PubMed Central

Received: 2023-10-06
Revised: 2024-02-18
Accepted: 2024-04-22
Published Online: 2024-11-20

© 2024 the author(s), published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Articles in the same Issue

  1. Research Articles
  2. Evaluating Boolean relationships in Configurational Comparative Methods
  3. Doubly weighted M-estimation for nonrandom assignment and missing outcomes
  4. Regression(s) discontinuity: Using bootstrap aggregation to yield estimates of RD treatment effects
  5. Energy balancing of covariate distributions
  6. A phenomenological account for causality in terms of elementary actions
  7. Nonparametric estimation of conditional incremental effects
  8. Conditional generative adversarial networks for individualized causal mediation analysis
  9. Mediation analyses for the effect of antibodies in vaccination
  10. Sharp bounds for causal effects based on Ding and VanderWeele's sensitivity parameters
  11. Detecting treatment interference under K-nearest-neighbors interference
  12. Bias formulas for violations of proximal identification assumptions in a linear structural equation model
  13. Current philosophical perspectives on drug approval in the real world
  14. Foundations of causal discovery on groups of variables
  15. Improved sensitivity bounds for mediation under unmeasured mediator–outcome confounding
  16. Potential outcomes and decision-theoretic foundations for statistical causality: Response to Richardson and Robins
  17. Quantifying the quality of configurational causal models
  18. Design-based RCT estimators and central limit theorems for baseline subgroup and related analyses
  19. An optimal transport approach to estimating causal effects via nonlinear difference-in-differences
  20. Estimation of network treatment effects with non-ignorable missing confounders
  21. Double machine learning and design in batch adaptive experiments
  22. The functional average treatment effect
  23. An approach to nonparametric inference on the causal dose–response function
  24. Review Article
  25. Comparison of open-source software for producing directed acyclic graphs
  26. Special Issue on Neyman (1923) and its influences on causal inference
  27. Optimal allocation of sample size for randomization-based inference from 2K factorial designs
  28. Direct, indirect, and interaction effects based on principal stratification with a binary mediator
  29. Interactive identification of individuals with positive treatment effect while controlling false discoveries
  30. Neyman meets causal machine learning: Experimental evaluation of individualized treatment rules
  31. From urn models to box models: Making Neyman's (1923) insights accessible
  32. Prospective and retrospective causal inferences based on the potential outcome framework
  33. Causal inference with textual data: A quasi-experimental design assessing the association between author metadata and acceptance among ICLR submissions from 2017 to 2022
  34. Some theoretical foundations for the design and analysis of randomized experiments
Downloaded on 12.7.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jci-2023-0067/html
Scroll to top button