5

A model of 80,000 Hours - Oxford Prioritisation Project

By Sindy Li, Fellow.

Created: 2017-05-12

Cross-posted from the Oxford Prioritisation Project blog. We're centralising all discussion on the Effective Altruism forum. To discuss this post, please comment here.

Summary: We built a quantitative model estimating the impact of 80,000 Hours. To measure our uncertainty, we built the model using Monte Carlo simulations in Guesstimate. This post acts as an appendix to that quantitative model.

1. Model mechanics

We would like to estimate the marginal impact of donating 10,000 GBP to 80,000 Hours (henceforth “80K”) now.

We approximate it by the average impact per dollar of 80K for the year 2016. We will discuss limitations of this approach later.

For a meta-charity like Giving What We Can (henceforth “GWWC”) which persuades people to donate to their recommended charities, an important metric is the “multiplier”, namely for each dollar of their operation cost (i.e. of donation to GWWC), how many dollars of donations to their recommended charities are generated.

For 80K, it is more complicated because they aim to cause plan changes, some being career changes and some being changes in donation plans. The main idea of our model is to convert all plan changes to equivalent amounts of donations to charities in order to get the “multiplier”, and then use the cost-effectiveness of charities receiving donation to calculate the cost-effectiveness of the plan changes.

Here are the steps.

Step 1: Measuring (and reweighing) 80K’s output

Their outputs are impact-adjusted plan changes. For each plan change that results in their work (which they collect through ways described here), they assign an impact score of 0.1, 1, or 10, depending on the magnitude of the impact. For some examples, see here.

To measure their output,  need to find out the number of plan changes assigned to 0.1, 1 and 10 over 2016. From here we see that they had 1414 raw plan changes (i.e. each counting as 1), and 910.9 impact-adjusted plan changes (i.e. each assigned a score). We don’t know the breakdown of the 1414 by score, so we eyeballed from the third graph in this section, and got:

●     0.1: 700

●     1: 650

●     10: 14

They add up to 1364, and are equivalent to 860 impact-adjusted plan changes, so we are underestimating their output a bit.

Then, we apply an adjustment. Since people self report on plan changes, there could be social desirability bias and overstating. We think that this may be a bigger problem for smaller plan changes, and less so for major ones (scored 10). So we reweigh 0.1’s and 1’s each by 40%, and do not adjust for 10’s.

This gives us 40%*0.1*700 + 40%*1*650 + 10*14 = 428 reweighed impact-adjusted plan changes.

Step 2: Converting output (IAPCs) to equivalent $s in donation

They say (here): “A typical plan change scored 1 is someone who has taken the Giving What We Can pledge or decided to earn to give in a medium income career.” Assuming that the scores of 0.1, 1 and 10 correctly captures the magnitude of impact of these career changes, we can use a GWWC pledge as the benchmark for measuring the impact of any plan change.

What is the value of a GWWC pledge in terms of donations to GWWC recommended charities? They say (here, the third point under “Impact and cost-effectiveness”):  “In 2016, we caused 115 people to take the Giving What We Can (GWWC) 10% pledge. GWWC estimates this is worth about £5 million in donations to their recommended charities (counterfactually-adjusted, time-discounted, dropout adjusted).” So each pledge is worth on average 5,000,000/115 = 43,478 GBP of donation to GWWC’s recommended charities. (I’m not sure where the GWWC estimate comes from, and whether it applies to GWWC pledges in general or only the ones caused by 80K.)

428 reweighed IAPCs * (43,478 GBP per IAPC) = 18m GBP in total donations (equivalent)

Note that we are assuming their are assigning impact correctly, namely

●     Each plan change scored 1 is as effective as a GWWC pledge, i.e. as effective as 43,478 GBP of donation to GWWC’s recommended charities

●     A plan change scored 0.1 is 1/10 as effective as a plan change scored 1

●     A plan change scored 10 is 40 (rather than 10, a result of the adjustment) times as effective as a plan change scored 1

Step 3: Multiplier

Now, we get the total amount of (equivalent) donation 80K generated in 2016. We just need to divide by their operation cost in 2016 to get the multiplier. Their operation cost in 2016, including opportunity cost (of their staff not doing earning to give), is 500,000 GBP (financial cost only is 250,000 GBP). This is the number we use.

Multiplier = total donations (equivalent) generated / total operation cost = (18m)/(500k) = 36, for 2016

Step 4: Cost-effectiveness: donations to DALY/$

For the last step, to get the cost-effectiveness of $1 in donation to 80K (i.e. $1 of their operation cost), we just need to multiplier the “multiplier” by the cost-effectiveness of $1 in donation to GWWC recommended charities (again, under the assumptions of conversion of cost-effectiveness between plan changes).

For the cost-effectiveness of $1 in donation to GWWC recommended charities, we use AMF, or more precisely, GiveWell’s median value of AMF. It is about 0.011 DALY/$ (we convert a life saved equivalent to 35 DALYs, and use the median value of AMF, $3162, from here). Note that AMF does not have the highest cost-effectiveness (in median values) among GiveWell’s top charities.

So for each dollar in donation to 80K in 2016, there are 38*0.011 = 0.418 DALYs averted.

 

2. Model limitations

1)    Marginal vs. Average

We wanted the marginal impact of donating 10,000 GBP to 80,000 Hours (henceforth “80K”) now. We approximate it by the average impact per dollar of 80K for the year 2016.

Why 2016, rather than e.g. the historical average? Their financial costs per impact-adjusted plan change has been going down over the years (see last row of first table here), so 2016 average cost-effectiveness will probably be closer to 2017 numbers.

To get 2017 average cost-effectiveness, we need predictions about 2017, e.g. their operation cost and number of plan changes they will generate.[1]

From there, to get the marginal cost-effectiveness of donation now, we need to also know what marginal donation (or 10,000 GBP) will do, which depend on 1) their current financial situation, 2) their plan for marginal donation.

In March 2017, they announced that they have reached their 2017 fundraising target (counting donations that have been promised but not yet received for 2017). What do they plan to do with addition funding? They say (on the same page) that “Based on this, I’d say we’re not heavily funding constrained; although we could still use additional money to do things like try to attract more experienced staff with higher salaries and a better office, or take more risk by trying to grow faster.” Previously, they said (see here): "Moreover, even if we made this target, it wouldn’t exhaust our room for more funding. If we raised more, we could increase our reserves, which would make it easier to attract staff, or we could pursue our expansion opportunities more aggressively (e.g. hire more, larger marketing budget)."

How should we think about all these things they could do with additional funding now?

●     Attracting more experienced staff with higher salary and nicer office: more experienced staff are more productive which would increase the average cost-effectiveness above the current level, so the marginal must be greater than the current average. (But this does not happen for sure: donating 10,000 GBP will merely increase the probability that happens by something less than 1, since to fully fund new staff they need more additional donation which is not so likely to come from other donors given they have reached their target. So we’d then adjust by some less-than-1 probability.)

●     Increasing reserve: in addition to attracting more experienced staff, this frees up staff time from fundraising (for future years), so the cost-effectiveness would probably be less than 100% additional cost-effectiveness (since presumably staff first spend time on the most productive things).

●     Expanding marketing budget: we are not sure about its cost effectiveness. (Someone thinks additional funding is unlikely to go to marketing since they are doing an experiment with marketing now and there is not much value in expanding its size -- see point number 5 here, and here. But they also have less experiential marketing approaches that could be scaled up.)

Overall, it’s possible that marginal cost-effectiveness (of additional donation now, after they have reached 2017 fundraising target) will be lower than the average for 2017 without additional donation, but it may not be by much and we do not have a principled way of adjusting for it. So we end up not doing any adjustment.

Conceptually, some have argued that for meta-charities, marginal cost-effectiveness could be much lower than average cost-effectiveness (e.g. see here: “Meta Trap #6. Marginal impact may be much lower than average impact”). I find it helpful to think through the specifics like we did above. (In general, it seems helpful if charities can share their budget as well as current funding situations in a transparent way similar to 80K, to improve coordination among small, low information donors. An alternative is to do what GiveWell top charities do: share that information with GiveWell which updates recommended donation allocations and does regranting. Another alternative is to donate through a donor lottery or the EA fund where a single party collects information from the charity.)

On the contrary, some have argued for increasing returns in small organizations. I do not think we have a case here of increasing returns in money (or staff time). Whether returns are increasing or decreasing in additional funding depends on how the funding is received. Expecting a large chunk of funding (either in the form of receiving such amounts at once, or even expecting a total large amount received in small chunks if there is no lumpy investment or borrowing constraint) could enable an organization to do more risk taking, while getting unanticipated small amounts of funding at a time -- even if the total adds up to more -- will probably just lead the organization to use the marginal dollar to “fund the activity with the lowest (estimated) cost-effectiveness”. 80K’s stated plan with marginal funding at this stage seems consistent with the latter, since marginal funding probably won’t be in large chunks especially given that they aren’t funding constrained now. The scenario Ben Todd has in mind probably applies more when a large funder is considering how much to give to an organization. This may be another argument to enter donor lottery or donate through the EA fund: giving a large and certain amount of donations to a small organization enables them to plan ahead for more risky but growth enhancing strategies, hence could be more valuable than uncoordinated small amounts even if the latter add up to the same total (because the latter may be less certain).[2]

2)    Outcome measure

We relied heavily on the assumption that 80K measures impact correctly in their scoring system (except for applying our own reweighing). Is this a source of concern? I have not spot checked the raw data on plan changes and score assignment (which ideally I should as a donor), but someone else, external to 80K and experienced in the EA community who was facing a donation decision to them, did it and told me that they thought the measure was good.

Another related issue is whether donations (e.g. GWWC pledges that are all assigned 1) have heterogeneous impacts. We assume that they are all on par with AMF. GWWC’s top charities largely coincide with GiveWell’s list, and among the latter AMF is in the middle in terms of cost-effectiveness. However, it is possible that some who took the pledge (or do earning to give) give to other charities whose cost-effectiveness is either lower than or hard to compare with these charities.

One more issue with the plan change scores, specifically related to certain types of changes, is that sometimes people change their plan to work on far future interventions, e.g. AI safety. Such interventions are higher return but also higher variance than conventional global health interventions (e.g. see our MIRI model). In our model we convert everything to donations to global health charities, which results in relatively lower returns and lower variance. If we adjust it to reflect the fact that a fraction of the plan changes are in high return, high variance areas, what would happen? (In our aggregation model we use Bayesian updating, so both return and variance are important.) We wish we had done this but we haven’t. Here is a simple argument why 80K probably will still dominate our contender in the far future area, MIRI, even if we acknowledge that a part of 80K is now high return and high variance like MIRI: 80K in its current version (i.e. exclusively modeled with donations to global health) ended up winning in our aggregation (having a better posterior than MIRI); converting a fraction of it to be like a far future intervention will result in something like a mini MIRI that is more cost-effective than the actual MIRI (since it’s cheaper to persuade people to go into AI safety, which is how 80K causes such changes, than employing them at an organization like MIRI), so it dominates MIRI, and the remaining part (global health) also does as we saw, so the combination should still dominate MIRI. This is a hand waving, ex post argument that is not ideal. We wish we had done the actual comparison of the 2 versions of 80K models in our aggregation. (This question was raised at our presentation. In building our model, we were thinking in a very simplified way that neglected this concern.)

3)    “Growth approach”

Some proposed to use the “growth approach” to evaluate a young non profit rather than calculating the marginal impact. Ideally, the considerations outlined in the article should be incorporated in the cost-effectiveness analysis, just like investments should be evaluated with net present value (expected, discounted stream of future profit) rather than profit in the current period. But in practice cost-effectiveness analysis of charities often neglect such considerations, not to mention these things are hard to incorporate in a quantitative model.[3]

In addition to looking at potential to expand the market mentioned in the article, some other possibilities include: giving money to ensure organizational survival and growth so that they can 1) learn from doing and improve itself, 2) discover new opportunities that currently no one (including funder or organization itself) has thought about -- e.g. neither GiveWell itself or its funders in the early days may have expected them to spin off the Open Philanthropy Project, and the same goes for Animal Charity Evaluators which grew out of 80,000 Hours. Such “unknown unknowns” are hard to address directly, and fostering such opportunities requires identifying young organizations with good people that have the potential to learn and grow.

Given the importance of such considerations and the difficulty of modelling them quantitatively, to holistically evaluate an organization, especially a young one, there is an argument for using a qualitative approach and “cluster thinking”, in addition to a quantitative approach and “sequential thinking.”

4)    “Meta trap”

We are evaluating a meta-charity. A few people have expressed concerns about the “meta trap”.[4] I will discuss a few concerns in the context of our model.

Even if we end up deciding that we should donate the 10,000 GBP to 80K because it will have the highest impact there, does that mean it is the best place for all EA donors (or at least low information donors)? Not necessarily, due to the static and unilateral nature of our model. I discuss some scenarios where our model can be taken too far and hence be no longer valid.

Suppose all other EA donors take literally our conclusion that 80K is the most cost-effective place to donate this year. Even assuming 80K doesn’t run into diminishing returns (the standard “room for more funding” concern), in that they can still reach the same number of people for each additional dollar of donation, we will run into problems.

First, suppose all other EAs donate only to 80K this year. Then other EA top charities including many object-level ones may be much more short on funding. This is bad not only because at that point marginal returns of donating to them (only in terms of the impact they have in carrying out object-level goals, e.g. distributing bednets) could exceed that of donating to 80K, but also because this may significantly weaken the chance of survival or growth for these organizations. These organizations not only carry out object-level work that improves lives, but also contribute to learning and capacity building (see in Howie’s comment here; also related to trap #5 here), something not captured in our model.

Now, we might still be okay if the new “EAs” generated by 80K still donate to recommended object-level charities so they still have enough funding (in fact maybe even way more than the counterfactual, since now all pre-existing EA donors donate to 80K and we assumed constant multiplier with scale -- a perhaps unrealistic assumption made just for the sake of the argument).

But imagine what happens when “new EAs” become “seasoned EAs” next year, start to reason in the EA style (rather than simply following GWWC’s recommendation) and donate to where money has the highest impact. Suppose they also take our recommendation literally, and into next year (an additional assumption), and all donate to 80K instead of object-level charities. Then a part of our model breaks down: the value of a GWWC pledge is going to be smaller than what we used, since over the lifetime of a “new EA” generated by 80K they are going to switch from donating to object-level charities to donating to 80K, and if this happens to every generation of “new EAs” generated by 80K, then in the end very little of their lifetime donation will go to object-level charities, and most will go to 80K which ends up generating little object-level donation.

Of course this is an extreme scenario: not only we assume that people agree on the same highest-impact charity and make all their donations there, but also that every new generation keeps using our static model. (A more immediate, but similarly extreme version, of the scenario is when “new EAs” immediately become “seasoned EAs” and realize that instead of following GWWC’s current recommended object-level charities they should donate right now to 80K, in which case 80K’s chain of impact break down immediately since it causes no object-level work to be done.) But we could imagine even in a weaker version of this, when our static recommendation for a unilateral donor is generalized inappropriately for all EA donors in a dynamic setting but in a less extreme fashion, 80K’s cost-effectiveness will still be undermined. (This is related to traps 2 and 4 discussed here.)

Now imagine a different scenario, where we still assume that all current EAs blindly follow our recommendation and make all donations to 80K. People approached by 80K who could potentially become “new EAs” look at what current EAs do and think: it seems dubious that these people are just donating to build their movement rather than doing object-level work. The multiplier may still apply if current EAs manage to get “new EAs” to contribute to object-level work at the same rate, but they are solely relying on new recruits to contribute to object-level work. They may have doubts about joining the movement, and may think existing EAs are mistaken or even brainwashed by certain self-serving “movement leaders” who care only about growing the “movement”. This could appear to be the case to outsiders even if existing EAs (including any movement leader) were truly trying to maximize impact. (This is another consequence of trap 2 discussed here: that it may hurt the growth of the movement by making it look “dubious”.)

Note that in the above scenarios I am treating 80K more like GWWC that gets people to donate to charities, whereas in fact 80K focuses on plan changes of which donation is only a part. Hence these may not be the most appropriate examples, and I was only using them to illustrate some issues with taking a static model of a meta-charity that is aimed at a unilateral donor and generating it to the entire community in a dynamic setting.[5] This is not a shortcoming of our model per se, but for all models of this static and unilateral nature: due to these limitations they should not be generalized beyond their appropriate scope.[6]

 

In general, a lesson (from both the 80K model and our other quantitative models) is that many important considerations are very difficult (and perhaps impossible) to incorporate in a quantitative model. To really make the best judgement on whether to donate to a charity, qualitative arguments and “cluster thinking” may be valuable in addition to quantitative models.

 

Footnotes

[1] Some questions on this: what is the cost-effectiveness of different ways of reaching out to people (online materials, in person coaching etc.)? Online content is causing more of the impact-adjusted plan changes (see here), and is probably cheapest per plan change. See here for what they think are more/less useful.

[2] This mechanism is articulated in “5.2 The funding uncertainty problem” on this page about the EA fund.

[3] Note that GiveWell seems to take these into account in their charity recommendations, as quantitative cost-effectiveness analysis is only one element, and they often mention potential for learning and growth when talking about room for more funding.

[4] see here for a reply from Ben Todd from 80K.

[5] Some of the reasons here also point to the conclusion that even it is found that some cause to be the most cost-effective at the moment (and even if that is not meta), the EA movement should not invest all resources in any given year in one place, due to the learning value from developing other causes and the possibility that cost-effectiveness of different interventions change over time. This is similar to some of the reasons why the Open Philanthropy Project selects multiple causes (another reason, worldview diversification, is also relevant for the EA movement overall, but beyond the scope of this model).

[6] Like the posts on “meta traps”, I am also not arguing for less meta. We may as well have not enough meta, especially early in the movement where capacity building could be relatively more important. And this should include not only groups like 80K and GWWC that increases the number of people in the EA movements, but research groups like CEA that increase our knowledge and understanding of related issues. Although this would be the topic for another discussion.

 

This post was submitted for comment to 80,000 Hours before publication.

Comments (15)

Comment author: rohinmshah  (EA Profile) 14 May 2017 12:56:54AM 5 points [-]

Attracting more experienced staff with higher salary and nicer office: more experienced staff are more productive which would increase the average cost-effectiveness above the current level, so the marginal must be greater than the current average.

Wait, what? The costs are also increasing, it's definitely possible for marginal cost effectiveness to be lower than the current average. In fact, I would strongly predict it's lower -- if there's an opportunity to get better marginal cost effectiveness than average cost effectiveness, that begs the question of why you don't just cut funding from some of your less effective activities and repurpose it for this opportunity.

Given the importance of such considerations and the difficulty of modelling them quantitatively, to holistically evaluate an organization, especially a young one, there is an argument for using a qualitative approach and “cluster thinking”, in addition to a quantitative approach and “sequential thinking.”

Please do, I think an analysis of the potential for growth (qualitative or quantitative) would significantly improve this post, since that consideration could easily swamp all others.

Comment author: ThomasSittler 23 May 2017 10:37:49AM 1 point [-]

Hi Rohin, thanks for the comment! :) My hunch is also that 80,000 Hours and most organisations have diminishing marginal cost-effectiveness. As far as I know from our conversations, on balance this is Sindy's view too.

The problem with qualitative considerations is that while they are in some sense useful standing on their own, they are very difficult to aggregate into a final decision in a principled way.

Modelling the potential for growth quantitatively would be good. Do you have a suggestion for doing so? The counterfactuals are hard.

Comment author: Ben_Todd 26 May 2017 04:44:46AM 0 points [-]

My hunch is also that 80,000 Hours and most organisations have diminishing marginal cost-effectiveness. As far as I know from our conversations, on balance this is Sindy's view too.

You need to be very careful about what margin and output you're talking about.

As I discuss in my long comment above, I think it's unclear whether our annual ratio of cost per plan change will go up or down, and I think there's a good chance it continues to drop, as it has the last 4 years.

On the other hand, if you're talking about total value created per dollar (including all forms of value), then that seems like it's more likely to be going down. It seems intuitive that our earliest supporters who made 80k possible had more impact than supporters today.

Though even that's not clear. You could get increasing returns due to economies of scale or tipping point effects and so on.

Comment author: rohinmshah  (EA Profile) 25 May 2017 02:04:49AM 0 points [-]

Actually I was suggesting you use a qualitative approach (which is what the quoted section says). I don't think I could come up with a quantitative model that I would believe over my intuition, because as you said the counterfactuals are hard. But just because you can't easily quantify an argument doesn't mean you should discard it altogether, and in this particular case it's one of the most important arguments and could be the only one that matters, so you really shouldn't ignore it, even if it can't be quantified.

Comment author: Ben_Todd 26 May 2017 04:40:55AM 0 points [-]

Wait, what? The costs are also increasing, it's definitely possible for marginal cost effectiveness to be lower than the current average.

Yes, agree with this. Like I say in the long comment above, I think that giving money to us right now probably has diminishing returns because we already made our funding targets for this year.

Comment author: Sindy_Li 24 May 2017 02:36:31AM *  0 points [-]

Robin, for what you quoted about increasing returns I was thinking only in the case of labor. Overall you are right that, if the organization has been maximizing cost-effectiveness, then they probably would have used the money they had before reaching fundraising targets in a way that makes it more cost-effective than money coming in later (assuming they are more certain about the amount of money up to fundraising target, and less certain about money coming in after that).

Comment author: Jon_Behar 26 May 2017 12:07:32AM 2 points [-]

Thanks for sharing this analysis (and the broader project)!

Given the lengthy section on model limitations, I would have liked to have seen a discussion of sensitivity to assumptions. The one that stood out to me was the estimate for the value of a GWWC Pledge, which serves as a basis for all your calcs. While it certainly seems reasonable to use their estimate as a baseline, there’s inherently a lot of uncertainty in estimating a multi-decade donation stream and adjusting for counter-factuals, time discounting, and attrition.

FWIW, I’m pretty dubious about the treatment of plan changes scored 10. The model implies each of those plan changes is worth >$500k (again, adjusted for counterfactuals, time discounting, and attrition), which is an extremely high hurdle to meet. If a university student tells me they're going to "become a major advocate of effective causes" (sufficient for a score of 10), I wouldn't think that has the same expected value as a half million dollars given to AMF today.

Comment author: Ben_Todd 26 May 2017 04:36:39AM 2 points [-]

Hi Jon,

I would have liked to have seen a discussion of sensitivity to assumptions.

I agree - I think, however, you can justify the cost-effectiveness of 80k in multiple, semi-independent ways, which help to make the argument more robust:

https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/

FWIW, I’m pretty dubious about the treatment of plan changes scored 10. The model implies each of those plan changes is worth >$500k...If a university student tells me they're going to "become a major advocate of effective causes" (sufficient for a score of 10), I wouldn't think that has the same expected value as a half million dollars given to AMF today.

Yes, we only weigh them at 10, rather than 40. However, here are some reasons the 500k figure might not be out of the question.

First, we care about the mean value, not the median or threshold. Although some of the 10s will probably have less impact than 500k to AMF now, some of them could have far more. For instance, there's reason to think GPP might have had impact equivalent to over $100m given to AMF. https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#global-priorities-project

You only need a small number of outliers to pull up the mean a great deal.

Less extremely, some of the 10s are likely to donate millions to charity within the next few years.

Second, most of the 10s are focused on xrisk and meta-charity. Personally, I think efforts in these causes are likely at least 5-fold more cost-effective than AMF, so they'd only need to donate a 100k to have as much impact as 500k to AMF.

Comment author: Jon_Behar 26 May 2017 08:58:13PM 0 points [-]

Fair point about outliers driving the mean. Does suggest that a cost-effectiveness estimate should just try to quantify those outliers directly instead of going through a translation.
E.g. if "some of the 10s are likely to donate millions to charity within the next few years", just estimate the value of that rather than assuming that giving will on average equal 10x GWWC's estimate for the value of a pledge.

Comment author: Ben_Todd 27 May 2017 03:19:31AM 1 point [-]

Does suggest that a cost-effectiveness estimate should just try to quantify those outliers directly instead of going through a translation.

Yes, that's the main way I think about our impact. But I think you can also justify it on the basis of getting lots of people make moderate changes, so I think it's useful to consider both approaches.

Comment author: Ben_Todd 26 May 2017 04:29:17AM *  1 point [-]

Hi there,

Thanks for writing this. A couple of quick comments (these are not thoroughly checked - our annual reviews are the more reliable source of information):

How should we think about all these things they could do with additional funding now?

Given that we made the higher end of our funding targets, I'd guess that giving us money right now has diminishing returns compared to those we received earlier in the year. However, they are not super diminishing. First, they give us the option to grow faster. Second, if we don't take that option, then the worst case scenario is that we raise less money next funding round. This means you funge with our marginal donor in early 2018 (which might well be Open Phil), while also saving us time, and giving us greater financial strength in the meantime, which helps to attract staff.

Will our returns diminish from 2016 to 2017? That's less clear.

If you're looking at the ratio of plan changes to costs each year, as you do in your model, then there's a good chance the ratio goes down in 2017. Past investments will pay off, we learn how to be more efficient, and we get economies of scale. More discussion here: https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#whats-the-marginal-cost-per-plan-change

On the other hand, if we invest a lot in long-term growth, then the short-term ratio will go up.

This shows some of the limitation looking at the ratio of costs to plan changes each year, which we discuss more here: https://80000hours.org/2015/11/take-the-growth-approach-to-evaluating-startup-non-profits-not-the-marginal-approach/

If you're reading this and trying to evaluate 80,000 Hours, then I'd encourage you to consider other questions, which are glossed over in this analysis, but similarly, or more important, such as:

1) Is the EA community more talent constrained than funding constrained?

2) Will 80k continue to grow rapidly?

3) How pressing a problem are poor career choice and promoting EA?

4) How effective is AMF vs other EA causes? (80k isn't especially focused on global poverty)

5) Is 80k a well-run organisation with a good team?

You can see more of our thoughts on how to analyse a charity here: https://80000hours.org/articles/best-charity/

Comment author: Peter_Hurford  (EA Profile) 21 May 2017 11:17:47PM 1 point [-]

I think you should add more uncertainty to your model around the value of an 80K career change (in both directions). While 1 impact-adjusted change is approximately the value of a GWWC pledge, that doesn't mean it is equal in both mean and standard deviation as your model suggests, since the plan changes involve a wide variety of different possibilities.

It might be good to work with 80K to get some more detail about the kinds of career changes that are being made and try to model the types of career changes separately. Certainly, some people do take the GWWC pledge, and that is a change that is straightforwardly comparable with the value of the GWWC pledge (minus concerns about the counterfactual share of 80K), but other people make much higher-risk higher-reward career changes, especially in the 10x category.

Speaking just for me, in my personal view looking at a few examples of the 80K 10x category, I've found them to be highly variable (including some changes that I'd personally judge as less valuable than the GWWC pledge)... while this certainly is not a systematic analysis on my part, it would suggest your model should include more uncertainty than it currently does.

Lastly, I think your model right now assumes 80K has 100% responsibility for all their career changes. Maybe this is completely fine because 80K already weights their reported career change numbers for counterfactuality? Or maybe there's some other good reason to not take this into account? I admit there's a good chance I'm missing something here, but it would be nice to see it addressed more specifically.

Comment author: Ben_Todd 26 May 2017 04:39:27AM 1 point [-]

Lastly, I think your model right now assumes 80K has 100% responsibility for all their career changes. Maybe this is completely fine because 80K already weights their reported career change numbers for counterfactuality? Or maybe there's some other good reason to not take this into account? I admit there's a good chance I'm missing something here, but it would be nice to see it addressed more specifically.

I don't think that's true, because the GWWC pledge value figures have been counterfactually adjusted, and because we don't count all of the people we've influenced to take the GWWC pledge.

More discussion here: https://80000hours.org/2016/12/has-80000-hours-justified-its-costs/#giving-what-we-can-pledges

While 1 impact-adjusted change is approximately the value of a GWWC pledge, that doesn't mean it is equal in both mean and standard deviation as your model suggests, since the plan changes involve a wide variety of different possibilities.

Agree with that - the standard deviation should be larger.

Comment author: Sindy_Li 24 May 2017 02:42:41AM 1 point [-]

Peter, indeed your point #2 about uncertainty is what I discuss in the last point of "2) Outcome measures", under "Model limitations". I argued in a handwaving way that because 80K still causes some lower risk and lower return global health type interventions -- which our aggregation model seems to favor, probably due to the Bayesian prior -- it will probably still beat MIRI that focuses exclusively on high risk, high return things that the model seems to penalize. But yes we should have modeled it in this way.

Comment author: ThomasSittler 23 May 2017 10:41:51AM 1 point [-]

One clarification is that our current model incorporates uncertainty at the stage where GWWC-donation-equivalents are converted to HEWALYs. We do not additionally have uncertainty on the value of a plan change scored "10" in terms of GWWC-donation-equivalents. We do have uncertainty on the 0.1s and 1s.