Research

The lab has a few general areas of focus, listed below along with a brief motivation and description of current work. We are lucky to have some grant funding for some of our activities, which we acknowledge below as well.

Treatment effect variation

Characterizing treatment variation in general is a difficult problem, but is integral to getting the most out of large-scale randomized trials. This general area undergirds almost everything we do.  In both education and statistics, it is known that variation is key, and this is no different for treatment impacts than school quality or uncertainty.

We have a wide variety of work in this area.  For example, much of our work has focused on variation across latent (unobserved) groups via a framework called "principal stratification." Other work has tried to decompose treatment variation into idiosyncratic (unobserved) and systematic (explainable by observed covariates) using robust methods that do not overly rely on assumptions.

Multisite and cluster randomized experiments

Cluster randomized and multisite experiments are both very common in education.  Cluster randomized experiments allow for treatment at the institution level (e.g., schools), which can give evaluations that more closly capture what a rolled out policy might look like. But they can be hard to sufficiently power, if the sites are highly variable. We currently are working on identifying how different analytic choices can lead to different estimates under an IES grant, "Identifying Best Practices for Estimating Average Treatment Effects in Cluster Randomized Trials" (IES R305D220046, joint with MDRC).

Multisite experients provide great opportunity in that they are actually a collection of small randomized experiments with many shared properties. By leveraging this view, we can ask how treatment varies across site and attempt to understand more about the reasons a treatment may, or may not have been successful. We currently are working on best practices for identifying the distribution of site-level impacts in these contexts.

Overall, we are also focused on how to easily design such experiments, providing new software for conducting power analyses and identifying how different analytic models can actually be targeting different quantities of interest such as the average impact across sites vs. the average impact across individuals.  These different quantities have different policy relevance.

Panel data and its discontents

When you have data on units across time, such as the number of tutoring centers per capita for different school neighborhoods, there is great opportunity for leveraging past trends in order to assess impacts of policy shifts. Many methods exist for this, such as the synthetic control method.  We are focused on how to use these tools most effectively in education contexts, and also are investigating when different analytic choices can be preferred given features such as how much past data one has. Much of our work in this area is funded by IES grant “Improving methods for policy impact evaluation with group panel data in education research” (R305D200010, joint with UCB).

Geographic and 2-dimensional regression discontinuity designs

Geographic regression discontinuity focuses on looking at how to analyze data where a geographic border separates units who have been treated from units who have not been. By focusing one's attention to units on either side of the border, we can hope to understand the impact of treatment, or provide sharper comparisons of differences in the two regions, controlling for other characteristics.  This also applies to "2 dimensional" designs where, for example, a student may receive services if either a reading or math test scores below some threshold.

Text data, with a focus on causality

Text data are increasingly ubiquitous in the modern era. Consider, for example, internet forums, newspapers, or student essays. But text is also high dimensional and messy and complex. This makes evaluating contexts where text is considered a covariate or outcome quite tricky. Currently, via a grant generously funded by the IES (Practical tools for large-scale evaluation of text data in randomized trials in education, grant number R305D220032), we are focusing in particular on randomized trials where text is an outcome of the trial (e.g., "did we help students write awesome essays?"). Our goal is to use data science tools to get the most out of such evaluations.  This grant is joint with Drs. Reagan Mozer (Bently) and Shireen Al-Adeimi (Michigan State).