Independent re-analysis of MFA veg ads RCT data

By Jeff Kaufman 🔸 @ 2016-02-20T04:48 (+11)

If you would like to get people to stop eating animals, there are a lot of things you could do: protest meat-serving restaurants, hand out leaflets, show online ads, lobby companies to do "meatless mondays", etc. To compare these, it would be useful to know how much of an impact they have. A while ago I proposed a simple survey to measure the impact of online ads:

Show your ads, as usual
Randomly divide people into control and experimental groups, 50-50.
Experimental group sees anti-veg page , control group sees something irrelevant.
Use retargeting cookies to advertise to pull people back in for a follow-up.
Ask people whether they eat meat.

Well, some people planned a study along these lines (methodology) and the results are now out. They randomized who saw the anti-meat videos, followed up with retargeting cookies, and asked people questions about their consumption of various animal products. This is the biggest study of its type I know of, and I'm very excited that its now complete.

The biggest problem I see is that they ended up surveying many fewer people than they set out to. The methodology considered how many people would need to complete the survey to pick up changes of varying sizes, and concluded:

We need to get at minimum 3.2k people to take the survey to have any reasonable hope of finding an effect. Ideally, we'd want say 16k people or more.

They only got 2k responses, however, and only 1.8k valid ones. This means the study is "underpowered": even if an effect exists at the size the experimenters expected, there's a large chance the study wouldn't be able to clearly show the effect.

Still, let's work with what we have. To compensate for having minimal data, we should run a single test, the one we think is most applicable. Running multiple tests would mean we'd need to use a Bonferroni correction or something similar, and that dramatically decreases your statistical power.

Before looking at the data or reading the writeup, I committed (via email to David and Allison) to an approach, what I thought of as the simplest, most straight-forward way of lookign at it. I would categorize each sample as "meat-eating" or "vegetarian" based on whether they reported eating any meat in the past two days, compute an effect size as the difference in vegetarianism between the two groups, and compute a p-value with a standard two-tailed t-test.

So what do we have to work with for questions? The survey asked, among other things:

In the past two days, how many servings have you had of the following foods? Please give your best guess.

Pork (ham, bacon, ribs, etc.)

Beef (hamburgers, meatballs, in tacos, etc.)

Dairy (milk, yogurt, cheese, etc.)

Eggs (omelet, in salad, etc.)

Chicken and Turkey (fried chicken, turkey sandwich, in soup, etc.)

Fish and Seafood (tuna, crab, baked fish, etc.)

This is potentially rich data, except I don't expect people's responses to be very good. If I tried to answer it, I'm sure I'd miss things for silly reasons, like forgetting what I had for dinner yesterday or not being sure what counts as a serving. On the other hand, if I had a policy for myself of not eating meat, it would very easy to answer those questions! So I categorized people just as "eats meat" vs "doesn't eat meat".

There were 970 control and 1054 experimental responses in the dataset they released. Of these, only 864 (89%) and 934 (89%) fully filled out this set of questions. I counted someone as a meat-eater if they answered anything other than "0 servings" to any of the four meat-related questions, and a vegetarian otherwise. Totaling up responses I see:

	valid responses	vegetarians	%
control	864	55	6.4%
experimental	934	78	8.4%

The bottom line is, 2% more people in the experimental group were vegetarians than in the control group (p=0.053 p=0.108). Honestly, this is far higher than I expected. We're surveying people who saw a single video four months ago, and we're seeing that about 2% more of them are vegetarian than they would have been otherwise.

Update 2016-02-20: I computed the p-value wrong; 0.053 was from a one-tailed test instead of a two-tailed test. The right p-value is 0.108. (I had used an online calcalculator intended for evaluating A/B tests that give you conversion numbers. It didn't specify one- or two-tailed, but since two-tailed is what you should use for A/B tests that's what I thought it would be using. After Alexander, Michael, and Dan pointed out that it looked wrong, I computed a p-value computationally. [1])

This is a very different way of interpreting the study results than any of the writeups I've seen. Edge's Report, Mercy for Animals, and Animal Charity Evaluators all conclude that there was basically no effect. I think this mostly comes from their asking questions where I'd expect the data to be noisier, like looking at how much of various things people think they eat or their attitudes toward meat consumption, plus their asking lots of different questions and so needing to correct downward to compensate for the multiple comparisons.

(There's probably something interesting you could do comparing the responses to the attitude questions with whether people reported eating any meat. I started looking at this some, just roughly, but didn't get very far. Maybe there are hints that the ads do their work by reducing recidivism instead of convincing people to give up meat, but I'm too sleepy to figure this out. My work is all in this sheet.)

[1] See footnote on jefftk.com version; e-a.com doesn't preserve indentation in pre-tags.

undefined @ 2016-02-20T16:22 (+4)

Thanks very much for doing this Jeff. It's useful to have an independent re-analysis. My credence that these ads work is increased from knowing that the data has been re-analysed by someone who would have expected no effect, and in fact did find one. Even if the effect was reducing recidivism, that still seems pretty useful! Hopefully in the future there will be more studies done that actually get statistically significant results.

undefined @ 2016-02-20T05:01 (+4)

This is a very different way of interpreting the study results than any of the writeups I've seen. Edge's Report, Mercy for Animals, and Animal Charity Evaluators all conclude that there was basically no effect.

From your results it still would be reasonable to conclude that there's "no effect" since the p-value is >0.05, but the p-value is low enough that I would give a follow-up study a reasonably high chance of getting a statistically significant result.

Also: it looks like you're using a one-sided t-test to get your p-value. I don't know much about significance testing but wouldn't it be better to use a two-sided t-test? My understanding is that one-sided tests are sort of cheating by making your p-value half of what it really should be.

undefined @ 2016-02-20T06:08 (+3)

it looks like you're using a one-sided t-test to get your p-value.

I agree that a two-sided test would be the right thing to use here, and p-value calculations aren't something I fully understand. Is this calculation one-sided or two-sided?

undefined @ 2016-02-20T16:33 (+4)

It looks like the NORMDIST function on your sheet is taking the integral from 0 to z_score, which is one-sided. A two-sided test would take

(integral from 0 to z_score) + (integral from -z_score to infinity)

undefined @ 2016-02-20T06:50 (+2)

I can't tell what's being done in that calculation.

I'm getting a p-value of 0.108 from a Pearson chi-square test (with cell values 55, 809; 78, 856). A chi-square test and a two-tailed t-test should give very similar results with these data, so I agree with Michael that it looks like your p=0.053 comes from a one-tailed test.

undefined @ 2016-02-20T14:46 (+1)

Yes, you're right. Sorry! I redid it computationally and also got 0.108. Post updated.