About Luke Miratrix

Luke Miratrix is currently a Professor at the Harvard Graduate School of Education and affiliate faculty of the Department of Statistics. He primarily works on how to best use modern statistical methods in applied social science contexts. He directs the Miratrix C.A.R.E.S. lab, a group of students in statistics, education, and elsewhere dedicated to high quality causal inference research in the social sciences.

His primary focus is on how to best analyze data in a transparent manner that allows engagement from diverse stakeholders while also preserving rigor. Miratrix also interested in how to best use machine learning and other high-dimensional methodology for text analysis, with a focus on using text in causal inference settings.

While primarily focused on problems in education (ranging from evaluating early childhood impact evaluations to designing risk detection methods for community college), he has also worked on projects in elections and voting systems, media analysis, behavioral political science, the effectiveness of regulatory agencies such as OSHA, pre-trial risk assessment systems and criminal justice reform, and human-computer interactions.

He received his Doctorate in Statistics from University of California, Berkeley in Spring, 2012. His interest in Statistics came out of an interest in mathematics education which developed while being a high school teacher and tutor for seven years. He also has a Masters in Computer Science from M.I.T., a Bachelors of Science in Computer Science from the California Institute of Technology, and a Bachelors of Arts in Mathematics from Reed College.

Selected Research Areas

Causal inference (propensity scores, matching, regression discontinuity designs, instrumental variables when forced, and so forth).
Principal Stratification (a method for causal analysis that incorporate post-treatment covariates)
Assessing and characterizing variation in treatment effects (treatment heterogeneity).
Analyzing data from randomized trials, in particular multisite and cluster randomized trials.
High-dimensional and sparse-regression methods.
Bayesian modeling (e.g., gaussian processes).
Random effect (multilevel) models.
Text as data, including text summarization, causal inference with text, and the use of LLMs for coding text.