Clustered Standard Errors in AB Tests | by Matteo Courthoud | Mar, 2024

What to do when the unit of observation differs from the unit of randomization

Matteo Courthoud
Cover, image by Author

A/B tests are the golden standard of causal inference because they allow us to make valid causal statements under minimal assumptions, thanks to randomization. In fact, by randomly assigning a treatment (a drug, ad, product, …), we are able to compare the outcome of interest (a disease, firm revenue, customer satisfaction, …) across subjects (patients, users, customers, …) and attribute the average difference in outcomes to the causal effect of the treatment.

Sometimes it happens that the unit of treatment assignment differs from the unit of observation. In other words, we do not take the decision on whether to treat every single observation independently, but rather in groups. For example, we might decide to treat all customers in a certain region while observing outcomes at the customer level, or treat all articles of a certain brand, while observing outcomes at the article level. Usually this happens because of practical constraints. In the first example, the so-called geo-experiments, it happens because we are unable to track users because of cookie deprecations.

When this happens, treatment effects are not independent across observations anymore. In fact, if a customer in a region is treated, also other customers in the same region will be treated. If an article of a brand is not treated, also other articles of the same brand will not be treated. When doing inference, we have to take this dependence into account: standard errors, confidence intervals, and p-values should be adjusted. In this article, we will explore how to do that using cluster-robust standard errors.

Imagine you were an online platform and you were interested in increasing sales. You just had a great idea: showing a carousel of related articles at checkout to incentivize customers to add other articles to their basket. In order to understand whether the carousel increases sales, you decide to AB test it. In principle, you could just decide for every order whether to display the carousel or not, at random. However, this would give…

Source link


gaitQ and machineMD secure million dollar research grant to monitor Parkinson’s development in UK and Switzerland

Oxford-based medical technology start-up gaitQ and Swiss medical device company machineMD have announced the joint award of a million dollar research grant from Innovate UK and Innosuisse to enable the collection and analysis of critical movement data from people with Parkinson’s (PwP). The grant will fund an 18-month research project that will record movement data […]

Read More

Take-Two plans to lay off 5 percent of its employees by the end of 2024

Take-Two Interactive plans to lay off 5 percent of its workforce, or about 600 employees, by the end of the year, as reported in an SEC filing Tuesday. The studio is also canceling several in-development projects. These moves are expected to cost $160 million to $200 million to implement, and should result in $165 million […]

Read More

10 tips to avoid planting AI timebombs in your organization

At the recent HIMSS Global Health Conference & Exhibition in Orlando, I delivered a talk focused on protecting against some of the pitfalls of artificial intelligence in healthcare. The objective was to encourage healthcare professionals to think deeply about the realities of AI transformation, while providing them with real-world examples of how to proceed safely […]

Read More