4 How can machine learning improve the quality of social impact research?

When incorporating machine learning into the research and policy process, practitioners and policymakers see many benefits. Here are some.

4.1 Personalized predictions and treatment assignment policies

These methods allow us to learn about what works and why in a data-driven way. It can discover unexpected patterns that previously a researcher may have only noticed if she explicitly investigated them. Machine learning can make it easier to identify which interventions work well for some but less so for others by considering heterogeneous treatment effects. For example, a certain user interface may be easier or harder to use depending on the user’s English language skills and digital literacy. Estimates of heterogeneous treatment effects can be used to guide future intervention designs. They can also be used to help shed light on mechanisms that underlie treatment effects. For example, if an intervention works well for non-English speakers but has no effect on English speakers, that supports the hypothesis that the intervention relates to the language demands required in the interface. Heterogeneous treatment effects can also be used to help design targeted treatment assignment policies; once we have answered the question “for whom does this intervention work?” we can deliver an intervention to those who will benefit from it, and also design alternative interventions for those who did not benefit from the initial approach.

4.2 Reproducibility and transparency

P-hacking or cherry-picking results is a well-documented problem in quantitative research. We now have access to large and varied datasets, which drastically increases the number of choices a researcher can make when selecting a model. This can be good: it provides more flexibility and avenues for new research. However, it also involves significant risks. More choice means more opportunities to cherry-pick a model that shows the results we are looking for, since there is no way to track all the models or specifications that were considered but never chosen. In other words, more and better data can increase ‘researcher degrees of freedom’ while keeping the model selection process opaque and non-reproducible. Machine learning methods can help reduce these problems by delegating model selection to an algorithm. Since the algorithm is reproducible, we can trace the steps taken to reach the final model. However, it remains important to use machine learning carefully if estimation and hypothesis testing are the goal. Recent research develops methods that are both reproducible and also provide estimates of treatment effects and the ability to test hypotheses.

4.3 Targeted data collection

When attempting to take maximum advantage of an experiment, adaptive experimentation can be used. For example, when conducting a trial to compare many treatment arms, machine learning can use outcome data collected during the early part of the trial to help assign the right treatment to the right person later in the trial (assuming the time frame for the outcome of interest or a good proximate measure for the outcome of interest is relatively short). Known as bandit algorithms in the machine learning literature, adaptive experiments discard suboptimal treatment arms, and therefore learn about optimal arms much faster than regular randomized controlled trials. Other experimental designs optimize the allocation of treatments for best-arm identification, or to output an arm that has the highest value at the end of the experiment.

4.4 Speeds up innovation

Algorithms may do the work of many humans very quickly. While it may take an analyst weeks or months to consider which of a variety of models best explains the data, a machine learning method can determine this in minutes. This allows us to spend more time crafting the best policies and interventions.

4.5 Additional insights for future innovation

Using machine learning to estimate heterogeneous treatment effects can highlight which individuals were poorly served with prior interventions, highlighting segments of the population for whom different approaches are needed. On the other hand, finding a group for whom interventions work well can motivate doubling down on the intervention and improving it for that group.