We estimate the magnitude of attrition bias for 10 Randomized Controlled Trials (RCTs) in education. We make use of a unique feature of administrative school data in England that allows us to analyse post-test academic outcomes for nearly all students, including those who originally dropped out of the RCTs we analyse. We find that the typical magnitude of attrition bias is 0.015o, with no estimate greater than 0.034o. This suggests that, in practice, the risk of attrition bias is limited. However, this risk should not be ignored as we find some evidence against the common ‘Missing At Random’ assumption. Attrition appears to be more problematic for treated units. We recommend that researchers incorporate uncertainty due to attrition bias, as well as performing sensitivity analyses based on the types of attrition mechanisms that are observed in practice.
Multisite studies are a commonly used way to assess how a treatment works across contexts. In multisite random controlled trials (RCT), cross-site treatment effect variance is a way to quantify treatment effect variation. However, there are no standard methods for estimating cross-site treatment effect variation designed to be used in multisite regression discontinuity designs (RDD). In this research, we rectify this gap in the literature by developing and evaluating two methods for estimate cross-site treatment effect variance in RDDs. The first method combines a fixed intercepts/random coefficients (FIRC) model with a local linear RDD analysis. The second method borrows techniques from random effects meta-analysis and employs them with the RDD model. We find that although the FIRC model may look appealing ex-post to a researcher because it has a smaller confidence interval than the random effects meta-analysis model, simulations show the FIRC model estimates of the cross-site treatment effect standard deviation have substantial bias, poor coverage, and lack well defined confidence intervals. In contrast, the random effects meta-analysis estimates of the cross-site treatment effect standard deviation have good coverage across a range of conditions. We then apply these models to a high school exit exam policy in Massachusetts that required students who passed the high school exit exam but were still determined to be nonproficient to complete a Education Proficiency Plan". We find that students on the margin of proficiency required to complete an Education Proficiency Plan in math were seven percentage points more likely to complete a math course their senior year. However, if we assume normality, the cross-high school treatment effect standard deviation was high enough in three cohorts for the treatment effect to have been negative in more than a third of high schools.
Researchers face many choices when conducting large-scale multisite individually randomized control trials. One of the most common quantities of interest in multisite RCTs is the overall average effect. Even this quantity is non-trivial to define and estimate. The researcher can target the average effect across individuals or sites. Furthermore, the researcher can target the effect for the experimental sample or a larger population. If treatment effects vary across sites, these estimands can differ. Once an estimand is selected, an estimator must be chosen. Standard estimators, such as fixed-effects regression, can be biased. We describe 15 estimators, consider which estimands they are appropriate for, and discuss their properties in the face of cross-site effect heterogeneity. Using data from 12 large multisite RCTs, we estimate the effect (and standard error) using each estimator and compare the results. We assess the extent that these decisions matter in practice and provide guidance for applied researchers.
This study examines whether unobserved factors substantially bias education evaluations that rely on the Conditional Independence Assumption. We add 14 new within‐study comparisons to the literature, all from primary schools in England. Across these 14 studies, we generate 42 estimates of selection bias using a simple approach to observational analysis. A meta‐analysis of these estimates suggests that the distribution of underlying bias is centered around zero. The mean absolute value of estimated bias is 0.03σ, and none of the 42 estimates are larger than 0.11σ. Results are similar for math, reading, and writing outcomes. Overall, we find no evidence of substantial selection bias due to unobserved characteristics. These findings may not generalize easily to other settings or to more radical educational interventions, but they do suggest that non‐experimental approaches could play a greater role than they currently do in generating reliable causal evidence for school education.
Matching for causal inference is a well-studied problem, but standard methods fail when the units to match are text documents: the high-dimensional and rich nature of the data renders exact matching infeasible, causes propensity scores to produce incomparable matches, and makes assessing match quality difficult. In this paper, we characterize a framework for matching text documents that decomposes existing methods into (1) the choice of text representation and (2) the choice of distance metric. We investigate how different choices within this framework affect both the quantity and quality of matches identified through a systematic multifactor evaluation experiment using human subjects. Altogether, we evaluate over 100 unique text-matching methods along with 5 comparison methods taken from the literature. Our experimental results identify methods that generate matches with higher subjective match quality than current state-of-the-art techniques. We enhance the precision of these results by developing a predictive model to estimate the match quality of pairs of text documents as a function of our various distance scores. This model, which we find successfully mimics human judgment, also allows for approximate and unsupervised evaluation of new procedures in our context. We then employ the identified best method to illustrate the utility of text matching in two applications. First, we engage with a substantive debate in the study of media bias by using text matching to control for topic selection when comparing news articles from thirteen news sources. We then show how conditioning on text data leads to more precise causal inferences in an observational study examining the effects of a medical intervention.