1) I'm not sure spending on RCTs is especially promising. Well-run RCTs that actually have power to update you can easily cost tens of millions of dollars, so you'd need to be considering spending hundreds of millions for it to be worth it. We're only just getting to this scale. GiveWell has considered funding RCTs in the past and rejected it, I think for this reason (though I'm not sure).

2) It might be interesting for someone to think more about multi-arm bandit problems, since it seems like it could be a good analogy for cause selection. An approximate solution is to exploit your best opportunity 90% of the time, then randomly select another opportunity to explore 10% of the time.
https://en.wikipedia.org/wiki/Multi-armed_bandit

Comment author:SoerenMind
06 February 2017 11:41:46PM
*
5 points
[-]

An approximate solution is to exploit your best opportunity 90% of the time, then randomly select another opportunity to explore 10% of the time.

This is the epsilon-greedy strategy with epsilon = 0.1, which is probably a good rule of thumb for when one's prior for each of the causes has a slim-tailed distribution (e.g. Gaussian). The optimal value of epsilon increases with the variance in our prior for each of the causes. So if we have a cause and our confidence interval for its cost effectiveness goes over more than an order of magnitude (high variance), a higher value of epsilon could be better. Point is - the rule of thumb doesn't really apply when you think some causes are much better than others and you have plenty of uncertainty.

That said, if you had realistic priors for the effectiveness of each cause, you can calculate an optimal solution using Gittins indeces.

It might be interesting for someone to think more about multi-arm bandit problems, since it seems like it could be a good analogy for cause selection. An approximate solution is to exploit your best opportunity 90% of the time, then randomly select another opportunity to explore 10% of the time. https://en.wikipedia.org/wiki/Multi-armed_bandit

I'm doing some research along these lines with Bayesian Bandits.

Comment author:Michael_PJ
06 February 2017 11:35:08PM
0 points
[-]

1) I nearly added a section about whether exploration is funiding- or talent-constrained! In short, I'm not sure, and I suspect it's different in different places. It sounds like OPP is probably talent-constrained, but other orgs may differ. In particular, if we wanted to try some of my other suggestions for improving exploration, like building institutions to start new orgs, then that's potentially quite funding-intensive.

2) I'm not sure whether multi-armed bandits actually model our situation, since I'm not sure if you can incorporate situations where you can change the efficiencies of your actions. What does "improving exploration capacity" look like in a multi-armed bandit? There may also be complications because we don't even know the size of the option set.

What does "improving exploration capacity" look like in a multi-armed bandit?

You could potentially model this as an (a) increase in the amount of bandit pulls you can do in parallel (simple models only assume one pull at a time), (b) a decrease in the amount of time it takes between a bandit pull and the information being received (simple bandit models assume this to be instantaneous), (c) an increase in the accuracy of information received by each bandit pull (simple models assume the information received is perfectly accurate).

This seems likely to me given that they certainly have more funding than they currently know how to spend, but given that they are not openly hiring right now, I imagine they are probably just not constrained by talent or money.

## Comments (27)

BestThanks for the post. I broadly agree.

There are some more remarks on "gaps" in EA here: https://80000hours.org/2015/11/why-you-should-focus-more-on-talent-gaps-not-funding-gaps/

Two quick additions:

1) I'm not sure spending on RCTs is especially promising. Well-run RCTs that actually have power to update you can easily cost tens of millions of dollars, so you'd need to be considering spending hundreds of millions for it to be worth it. We're only just getting to this scale. GiveWell has considered funding RCTs in the past and rejected it, I think for this reason (though I'm not sure).

2) It might be interesting for someone to think more about multi-arm bandit problems, since it seems like it could be a good analogy for cause selection. An approximate solution is to exploit your best opportunity 90% of the time, then randomly select another opportunity to explore 10% of the time. https://en.wikipedia.org/wiki/Multi-armed_bandit

*5 points [-]This is the epsilon-greedy strategy with epsilon = 0.1, which is probably a good rule of thumb for when one's prior for each of the causes has a slim-tailed distribution (e.g. Gaussian). The optimal value of epsilon increases with the variance in our prior for each of the causes. So if we have a cause and our confidence interval for its cost effectiveness goes over more than an order of magnitude (high variance), a higher value of epsilon could be better. Point is - the rule of thumb doesn't really apply when you think some causes are much better than others and you have plenty of uncertainty.

That said, if you had realistic priors for the effectiveness of each cause, you can calculate an optimal solution using Gittins indeces.

Interesting!

I'm doing some research along these lines with Bayesian Bandits.

1) I nearly added a section about whether exploration is funiding- or talent-constrained! In short, I'm not sure, and I suspect it's different in different places. It sounds like OPP is probably talent-constrained, but other orgs may differ. In particular, if we wanted to try some of my other suggestions for improving exploration, like building institutions to start new orgs, then that's potentially quite funding-intensive.

2) I'm not sure whether multi-armed bandits actually model our situation, since I'm not sure if you can incorporate situations where you can change the efficiencies of your actions. What does "improving exploration capacity" look like in a multi-armed bandit? There may also be complications because we don't even know the size of the option set.

You could potentially model this as an (a) increase in the amount of bandit pulls you can do in parallel (simple models only assume one pull at a time), (b) a decrease in the amount of time it takes between a bandit pull and the information being received (simple bandit models assume this to be instantaneous), (c) an increase in the accuracy of information received by each bandit pull (simple models assume the information received is perfectly accurate).

This seems likely to me given that they certainly have more funding than they currently know how to spend, but given that they are not openly hiring right now, I imagine they are probably just not constrained by talent or money.