This is great research! But to me it looks like the "fact" message you gave was really an "opportunity" message, and the "opportunity" message was really... well, I don't know how to describe it! I think the takeaway, for talking to people with bachelor's degrees, is that opportunity is an effective mode of communication as long as it's "opportunity to make the world better", not "opportunity to be a great person".
I adapted that framing from Will MacAskill (example of this starting 12:45 in the podcast with Sam Harris here: https://www.samharris.org/podcast/item/being-good-and-doing-good). MacAskill refers to the framing as "Excited Altruism" It might come across as better when he tells it than in a web survey. But I think it's pretty similar. I grouped this in with "opportunity", which I've also seen called "exciting opportunity" in the ea community (http://lukemuehlhauser.com/effective-altruism-as-opportunity-or-obligation/).
But, regardless of what it's called, I agree with you on the takeaway.
Ah, gotcha. But re: code review, even the most beautifully constructed chains can fail, and how you specify your model can easily cause things to go kabloom even if the machine's doing everything exactly how it's supposed to. And it only takes a few minutes to drag your log files into something like Tracer and do some basic peace-of-mind checks (and others, e.g. examine bivariate posterior distributions to assess nonidentifiably wrt your demographic params). More sophisticated diagnostics are scattered across a few programs but don't take too long to run either (unless you have e.g. hundreds or thousands of chains, like in marginal likelihood estimation w/ stepping stones... a friend's actually coming out with a program soon -- BONSAI -- that automates a lot of that grunt work, which might be worth looking out for!). :]
(on phone at gym with shit wifi so can't provide links/refs atm, sorry!)
Sounds interesting. Would love to take a look when you get a chance to provide the links.
Of course (though wheel reinvention can be super helpful educationally), but there are great free public R packages that interface to STAN (I use "rethinking" for my hierarchical Bayesian regression needs but I think Rstan would work, too), so going with someone's unnamed, private code isn't necessary imo. How much did the survey cost (was it a lot longer than the included google doc, then? e.g. Did you have screening questions to make sure people read the paragraph?). And model+mcmc specification can have lots of fiddly bits that can easily lead us astray, I'd say
Yeah, the survey was a lot longer. Typically general public surveys will cost over 10 dollars a complete, so getting 1200 cases for a survey like this can cost thousands of dollars.
I agree that model specification can be tricky, which is a reason I felt it well worth it to use the proprietary software I had access to that has been thoroughly vetted and code reviewed and is used frequently to run similar analyses rather than trying to construct my own.
I did not make sure people read the paragraph. I discussed the issue a bit in my discussion section, but one way a web survey might understate the effect is if people would pay closer attention and respond better to a friend delivering the message. OTOH, surveys do have some potentual vulnerability to the hawthorne effect, though that didn't seem to express itself in the donations question.
Ah, I guess that's better than no control, and presumably paying attention to a paragraph of text doesn't make someone substantially more or less generous. Did you fit a bunch of models with different predictors and test for a sufficient improvement of fit with each? Might do to be wary of overfitting in those regards maybe... though since those aren't focal Bayes tends to be pretty robust there, imo, so long as you used sensible priors
"I used a multilevel model to estimate the effects among those with and without a bachelor's degree. So, the bachelor's estimate borrow's power from those without a degree, reducing problems with over fitting."
If I'm understanding correctly, you had a hyperprior on the effect of education level? With just two options? IDK that that would help you much (if you had more: e.g. HS, BA/S, MS, PhD, etc. it might, but I'd try to preserve ordering there, myself).
"These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics."
STAN's great, but certainly not magic or perfect, and though idk them personally I'm sure its authors would strongly advocate paranoia about its output. So you got convergence with multiple (2?) chains from a random (hopefully) starting value? R_hats were all 1? That's good! Did all the other cheap diagnostics turn up ok (e.g trace plots, autocorrelation times/ESS, marginal histograms, quick within-chain metrics, etc.)?
No; I did not fit multiple models. Lasso regression was used to fit a propensity model using the predictors.
Using bachelor's vs. non-bachelor's has advantages in interpretability, so I think this was the right move for my purposes.
I did not spend an exorbitant amount of time investigating diagnostics, for the same reason I used a proprietary package was has been built for running these tests at a production level and has been thoroughly code reviewed. I don't think it's worth the time to construct an overly customized analysis.
Ah, interesting! What package? I've never heard of something like that before. Usually in the cold, mechanical heart of every R package is the deep desire to be used and shared as far as possible. If it's just someone's personal interface code, why not use something more publicly available? Can you write out your basic script in pseudocode (or just math/words?)? Especially the model and MCMC specification bits?
Sure, in an ideal world, software would all be free for everyone; alas, we do not live in such a world :p. I used the proprietary package because it did exactly what I needed and doesn't require writing STAN code or anything myself. I'd rather not re-invent the wheel. I felt the tradeoff of transparency for efficiency and confidence in its accuracy was worth it, especially since I wouldn't be able to share the data either way (such are the costs of getting these questions on a 1200 person survey without paying a substantial amount).
But the basic model was just a multilevel binomial model predicting the dependent variable using the treatments and questions asked earlier in the survey as controls.
On a related (and elucidatory) note, could you more explicitly clarify which models you fitted, exactly?
On a related (and elucidatory) note, could you more explicitly clarify which models you fitted, exactly?
It would be cool to provide the code, for both learning and verification purposes.
Unfortunately, because I used proprietary survey data/a proprietary R package to run this analysis, I don't think I'll be able to share the data and code.
Yay for Bayesian regression (binomial, I'm guessing? You re-binned your attitude and donations responses? I think an ordered logit would be more appropriate here and result in less of a loss in resolution, or even a dirichlet, but then you'd lose yer ordering)! Those posteriors look decently tight, though I do have some questions!
I'm a little confused on what your control was, exactly. You have both points and distributions in your posterior plots, but you don't have any control paragraph blurb in you google doc questionnaire. How did you evaluate your control? Did you give them a paragraph entirely unrelated to EA? These plots are the posterior estimates for p_binomial when each dummy variable for treatment is 0? Is "average treatment effect" some posterior predictive difference from the control p (i.e. why it's exactly 0)?
On a related (and elucidatory) note, could you more explicitly clarify which models you fitted, exactly? Did you do any model comparison or averaging, or evaluate model adequacy? You mention "controlling for other variables in the survey" but I don't see any e.g. demographic questions in your questionnaire. You said you "examined these relationships overall and among the critical subgroup of those with at least a bachelor’s degree" -- did you do this by excluding everyone without a bachelor's, or by modeling the effects of educational attainment and then doing model comparison to test the legitimacy of those effects (I'd think looking at the posterior for the interaction between your paragraph and education dummies would be the clearest test)? Did you use diffuse, "uninformative" priors (and hyperpriors)? Which ones, exactly?
I assume that since this is a hierarchical analysis you used MCMC (HMC?) to do the fitting. Are your posterior distributions smoothed substantially, e.g. with a kernel density estimator? Or did you just get fantastic performance? What diagnostics did you run to ensure MCMC health? How many chains did you run? Did you use stopping rules? In my experience, hierarchical regression models can be pretty finicky to fit as they get more complex.
Kudos on not just using some wackily inappropriate out-of-the-box frequentist test!
edit: also, what are the boxplot-looking things? 95% HPDIs? CIs? Some other %? Ah wait they're the sd of your marginal samples?
The respondents in a treatment were each shown a message and asked how compelling they thought it was. The control was shown no message.
Yeah; the plots are the predicted values for those given a particular treatment. and Average Treatment Effect is the difference with the control.
I did not include every control used in the provided questionnaire. There were a mix of demographics/attitudinal/behavioral questions asked in the survey that I also used. These controls, particularly previous donations, were important for decreasing variance.
I used a multilevel model to estimate the effects among those with and without a bachelor's degree. So, the bachelor's estimate borrow's power from those without a degree, reducing problems with over fitting.
These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics.
I agree that the modal outcome of a Trump presidency is that he changes little and the Democrats come out stronger at the end of his presidency than they entered. However, I still think it would have been better that Clinton had won (even if we assume the same congress).
The most important reason is tail risk. As others have commented, the risk of nuclear war may be greater under Trump than it would have been under Clinton. So far, he seems to be pursuing a more conventional foreign policy than I feared, but I still believe the risk is higher than with Clinton. Additionally, I'm worried that the Trump presidency is increasing the salience of Russian hostility among Democrats and could increase the chance of conflict in the future even when a Democrat takes office.
Another are of concern is pandemics. Trump has expressed anti-Vaccine sentiments and submitted budgets which cut pandemic preparedness. Furthermore, the overall level of incompetence in his administration and many of his appointees leaves me worried that the response of the US to a major pandemic could be diminished.
None of the above is likely to happen, but I'd much rather play it safe with a Clinton presidency. Additionally, even the modal outcome of a presidency isn't all good for the liberals. Most notably, he'll almost certainly be able to move at least one conservative into the supreme court and has a high chance of moving at least one more. If Trump replaces a liberal with a conservative on the court, the court will move to the right and it will likely be quite a while until Democrats retake it. With a Clinton presidency, liberals would have been able to achieve a majority on the court that would likely have lasted a long time itself.
Thanks for the write up. I think you make a compelling case that this is more effective than canvassing, which can be over 1000 dollars for votes at the margin in a competitive election like 2016. I do think there are a few ways your estimate may be an overestimate though.
Of those who claimed they would follow through with vote trading, some may not have. You mention that there wouldn't have been much value to defecting. However, much of the value of a vote for individual comes from tribal loyalties rather than affecting the outcome. That's why turnout is higher in safe presidential states in a presidential election than midterm elections, even when the midterm election is competitive. Some individuals may still have defected because of this.
Secondly, many of the 3rd party folks who made the trade could have voted for Clinton anyway. People who sign up for these sites are necessarily strategic thinkers. If they wanted more total votes for Stein/Johnson, but recognized that a vote for Clinton was more important in a swing state, they might have signed up for the site to gain the Stein/Johnson voter, but planned to vote for Clinton even if they didn't get a match. Additionally, even if they were acting in good faith when they signed up, they may have changed their mind as the election approached. 3rd parties are historically over estimated in polling compared to the election results, and 2016 was no exception: http://www.realclearpolitics.com/epolls/2016/president/us/general_election_trump_vs_clinton_vs_johnson_vs_stein-5952.html.
I don't think these problems are enough to reduce the value by an order of magnitude, but it is worth keeping in mind.
Additionally, while vote trading may be high EV now, I am skeptical that it is easy to scale. It's even more difficult to apply outside of presidential elections, so, unlike other potential political interventions, it will mostly be confined to every 4 years in one race. Furthermore, the individuals who signed up now may be lower cost to acquire than additional potential third party traders. They are likely substantially more strategic than the full population of 3rd party voters; in many years, the full population isn't that large to begin with. The cost per additional vote may be larger than your current estimates.
Nevertheless, I agree that right now it's probably more valuable than traditional canvassing and I'm glad people are putting resources into it.
This sounds really great to me. I love the idea of having more RCTs in the EA sphere. I would definitely record how much they are giving 1 year later.
I also think it's worth having a hold out set. People can pre-register the list of friends, than a random number generator can be used to randomly selects some friends not to make an explicit GWWC pitch to. It's possible many of the friends/contacts who join GWWC and start donating are those who have already been exposed to EA ideas before over a long period of time, and the effect size of the direct GWWC pitch isn't as large as it would appear. Having a hold out set would account for this. With a hold out set, CEA wouldn't have to worry about who they contact. The holdout set would take care of this and make the estimate of the treatment effect unbiased.
© 2017 Effective Altruism Forum |
Powered by reddit