Comment author: Khorton 20 September 2017 09:18:03AM 1 point [-]

If possible, I'd reduce the reading age of the questions by using simpler words and shorter sentences. I consistently overestimate the reading ability of average citizens.

If these statements were really on a ballot, people would likely have seen advertisements or news clips about the proposal. Right now, people have never heard these proposals before. It's important that they understand what you're asking.

Comment author: Michael_S 20 September 2017 01:38:33PM 2 points [-]

I disagree. I believe good ballot measure polling should more accurately reflect the actual language that would appear on the ballot. There's a known bias towards voters being more likely to support simpler language.

Unless this is an extremely expensive measure (which is probably won't be), I don't think that assumption is correct. Most voters will probably never hear about the initiative before they see it on the ballot/will have seen a cursory ad that they barely paid attention to.

Comment author: Milan_Griffes 15 September 2017 01:54:09AM *  1 point [-]

Thanks for the comments!

am I correct in interpreting that you assume 100% chance of passage in your model conditional on good polling?

No, the best-guess input is an 80% chance of passage, conditional on good polling and sufficient funding (see row 81). What "good" means here is a little underspecified – an initiative that polls at 70% favorability would have a much higher probability of passing than one that polls at 56%.

you seem to be focus on the individual treatment cost of the intervention, which overwhelms the cost of the ballot measure.

Right. You could think of this analysis as trying to model whether psychedelic treatments for mental health conditions would be cost-effective if they were available today. For example, consider a promising intervention that would entirely cure someone's depression for a year, but costs $10,000,000 per treatment. We probably wouldn't want to run a ballot initiative to increase access to such a intervention, as it wouldn't be cost-effective even if it were easily accessible.

Comment author: Michael_S 15 September 2017 02:48:50AM 0 points [-]

Cool; had missed that row. Yeah, if it polls, 70% the chance of passage might be close to 80%. Conditional upon that level of support, your estimate seems reasonable to me (assuming the ballot summary language would not be far more complex than the polled language).

Yeah, I agree that it being an effective treatment is a necessary precursor to it being a good ballot law to pass by ballot initiative and part of the EV calculation for spending money on the ballot measure itself.

Comment author: Peter_Hurford  (EA Profile) 14 September 2017 11:45:00PM 1 point [-]

I believe the gains from the ballot measure should be the estimated sum of the utility gains from people being able to purchase the drugs multiplied by the probability of passage; the costs should be how much it would cost to run the campaign. On the doc, you made the point that Givewell doesn't include leverage on other funding in their estimates, but when it comes to ballot measures, leverage is exactly what you're trying to produce, so I think an estimate is important.

One potential way of thinking about this is that the ballot measure in itself does not accomplish much, it just "unlocks" the ability for people to more cheaply help themselves. This could be modeled as the costs of the ballot measure + the costs of people helping themselves over a stream of X years, put against the benefits of people helping themselves over X years. I would use 5 for X, assuming that a lot can change in 5 years and maybe drug legalization would happen anyway, but I think a higher value for X could also be justified.

This kind of (costs of unlocking + costs of what is unlocked over time) vs. benefits of what is unlocked over time is also how I model the cost-benefit of developing a new medicine (like a vaccine), since the medicine is useless unless it is actually given to people, which costs additional money.

Comment author: Michael_S 14 September 2017 11:52:15PM 1 point [-]

That seems similar to Milan_Griffes' approach. However, when we're comparing ballot measures to other opportunities, I think the relevant cost to EA would be the cost to launch the campaign. That's what EAs would actually be spending money on and what could be spent on other interventions.

We don't have to assume away the additional costs of getting the medicine, but that can be factored into the benefit (ie. the net benefit is the gains they would get from the medicine - the gains they lose from giving up the funds to purchase the drugs)

Comment author: Michael_S 14 September 2017 10:52:08PM *  5 points [-]

Hey; I made some comments on this on the doc, but I thought it was worth bringing them to the main thread and expanding.

First of all, I'm really happy to sea other EAs looking at ballot measures. They're a potentially very high EV method of passing policy/raising funding. They're particularly high value per dollar when spending on advertising is limited/nothing since the increased probability of passage from getting a relatively popular measure on the ballot is far more than the increased probability from spending the same amount advertising for it.

Also, am I correct in interpreting that you assume 100% chance of passage in your model conditional on good polling? Polling can help, but ballot measure polling does have a lot of error (in both directions). So even a popular measure in polling is hardly guarantee of passage (http://themonkeycage.org/2011/10/when-can-you-trust-polling-about-ballot-measures/).

Finally, in your EV estimates, you seem to be focus on the individual treatment cost of the intervention, which overwhelms the cost of the ballot measure. I don't think this is getting at the right question when it comes to running a ballot measure. I believe the gains from the ballot measure should be the estimated sum of the utility gains from people being able to purchase the drugs multiplied by the probability of passage; the costs should be how much it would cost to run the campaign. On the doc, you made the point that Givewell doesn't include leverage on other funding in their estimates, but when it comes to ballot measures, leverage is exactly what you're trying to produce, so I think an estimate is important.

Comment author: kbog  (EA Profile) 01 April 2017 11:52:24PM *  4 points [-]

This is great research! But to me it looks like the "fact" message you gave was really an "opportunity" message, and the "opportunity" message was really... well, I don't know how to describe it! I think the takeaway, for talking to people with bachelor's degrees, is that opportunity is an effective mode of communication as long as it's "opportunity to make the world better", not "opportunity to be a great person".

Comment author: Michael_S 02 April 2017 12:31:11AM *  0 points [-]

Thanks!

I adapted that framing from Will MacAskill (example of this starting 12:45 in the podcast with Sam Harris here: https://www.samharris.org/podcast/item/being-good-and-doing-good). MacAskill refers to the framing as "Excited Altruism" It might come across as better when he tells it than in a web survey. But I think it's pretty similar. I grouped this in with "opportunity", which I've also seen called "exciting opportunity" in the ea community (http://lukemuehlhauser.com/effective-altruism-as-opportunity-or-obligation/).

But, regardless of what it's called, I agree with you on the takeaway.

Comment author: nikvetr 01 April 2017 10:10:30PM *  1 point [-]

Ah, gotcha. But re: code review, even the most beautifully constructed chains can fail, and how you specify your model can easily cause things to go kabloom even if the machine's doing everything exactly how it's supposed to. And it only takes a few minutes to drag your log files into something like Tracer and do some basic peace-of-mind checks (and others, e.g. examine bivariate posterior distributions to assess nonidentifiably wrt your demographic params). More sophisticated diagnostics are scattered across a few programs but don't take too long to run either (unless you have e.g. hundreds or thousands of chains, like in marginal likelihood estimation w/ stepping stones... a friend's actually coming out with a program soon -- BONSAI -- that automates a lot of that grunt work, which might be worth looking out for!). :]

(on phone at gym with shit wifi so can't provide links/refs atm, sorry!)

Comment author: Michael_S 01 April 2017 10:37:08PM *  0 points [-]

Sounds interesting. Would love to take a look when you get a chance to provide the links.

Comment author: nikvetr 01 April 2017 09:47:59PM *  0 points [-]

Of course (though wheel reinvention can be super helpful educationally), but there are great free public R packages that interface to STAN (I use "rethinking" for my hierarchical Bayesian regression needs but I think Rstan would work, too), so going with someone's unnamed, private code isn't necessary imo. How much did the survey cost (was it a lot longer than the included google doc, then? e.g. Did you have screening questions to make sure people read the paragraph?). And model+mcmc specification can have lots of fiddly bits that can easily lead us astray, I'd say

Comment author: Michael_S 01 April 2017 10:07:30PM *  -1 points [-]

Yeah, the survey was a lot longer. Typically general public surveys will cost over 10 dollars a complete, so getting 1200 cases for a survey like this can cost thousands of dollars.

I agree that model specification can be tricky, which is a reason I felt it well worth it to use the proprietary software I had access to that has been thoroughly vetted and code reviewed and is used frequently to run similar analyses rather than trying to construct my own.

I did not make sure people read the paragraph. I discussed the issue a bit in my discussion section, but one way a web survey might understate the effect is if people would pay closer attention and respond better to a friend delivering the message. OTOH, surveys do have some potentual vulnerability to the hawthorne effect, though that didn't seem to express itself in the donations question.

Comment author: nikvetr 01 April 2017 09:39:32PM *  0 points [-]

Ah, I guess that's better than no control, and presumably paying attention to a paragraph of text doesn't make someone substantially more or less generous. Did you fit a bunch of models with different predictors and test for a sufficient improvement of fit with each? Might do to be wary of overfitting in those regards maybe... though since those aren't focal Bayes tends to be pretty robust there, imo, so long as you used sensible priors

"I used a multilevel model to estimate the effects among those with and without a bachelor's degree. So, the bachelor's estimate borrow's power from those without a degree, reducing problems with over fitting."

If I'm understanding correctly, you had a hyperprior on the effect of education level? With just two options? IDK that that would help you much (if you had more: e.g. HS, BA/S, MS, PhD, etc. it might, but I'd try to preserve ordering there, myself).

"These models used STAN, which handles these multilevel models well. Convergence was assessed with gelman-rubin statistics."

STAN's great, but certainly not magic or perfect, and though idk them personally I'm sure its authors would strongly advocate paranoia about its output. So you got convergence with multiple (2?) chains from a random (hopefully) starting value? R_hats were all 1? That's good! Did all the other cheap diagnostics turn up ok (e.g trace plots, autocorrelation times/ESS, marginal histograms, quick within-chain metrics, etc.)?

Comment author: Michael_S 01 April 2017 09:53:57PM -1 points [-]

No; I did not fit multiple models. Lasso regression was used to fit a propensity model using the predictors.

Using bachelor's vs. non-bachelor's has advantages in interpretability, so I think this was the right move for my purposes.

I did not spend an exorbitant amount of time investigating diagnostics, for the same reason I used a proprietary package was has been built for running these tests at a production level and has been thoroughly code reviewed. I don't think it's worth the time to construct an overly customized analysis.

Comment author: nikvetr 01 April 2017 09:18:19PM 1 point [-]

Ah, interesting! What package? I've never heard of something like that before. Usually in the cold, mechanical heart of every R package is the deep desire to be used and shared as far as possible. If it's just someone's personal interface code, why not use something more publicly available? Can you write out your basic script in pseudocode (or just math/words?)? Especially the model and MCMC specification bits?

Comment author: Michael_S 01 April 2017 09:39:50PM -1 points [-]

Sure, in an ideal world, software would all be free for everyone; alas, we do not live in such a world :p. I used the proprietary package because it did exactly what I needed and doesn't require writing STAN code or anything myself. I'd rather not re-invent the wheel. I felt the tradeoff of transparency for efficiency and confidence in its accuracy was worth it, especially since I wouldn't be able to share the data either way (such are the costs of getting these questions on a 1200 person survey without paying a substantial amount).

But the basic model was just a multilevel binomial model predicting the dependent variable using the treatments and questions asked earlier in the survey as controls.

Comment author: Peter_Hurford  (EA Profile) 01 April 2017 08:23:28PM 2 points [-]

On a related (and elucidatory) note, could you more explicitly clarify which models you fitted, exactly?

It would be cool to provide the code, for both learning and verification purposes.

Comment author: Michael_S 01 April 2017 09:04:13PM -1 points [-]

Unfortunately, because I used proprietary survey data/a proprietary R package to run this analysis, I don't think I'll be able to share the data and code.

View more: Next