2

A model of the Machine Intelligence Research Institute - Oxford Prioritisation Project

By Sindy Li, Fellow
Created: 2017-05-12

Cross-posted from the Oxford Prioritisation Project blog. We're centralising all discussion on the Effective Altruism forum. To discuss this post, please comment here.

Summary: We built a quantitative model estimating the impact of the Machine Intelligence Research Institute. To measure our uncertainty, we built the model using Monte Carlo simulations in Guesstimate. This post acts as an appendix to that quantitative model.

Model mechanics

MIRI conducts research on AI safety in order to reduce AI risk, i.e. the risk that humanity will go extinct due to highly capable artificial intelligence.

To estimate their cost-effectiveness, i.e. the impact of an additional donation, we go in 4 steps:

  • Step 1: Expected value of far future, in human-equivalent well-being-adjusted life-years (HEWALYs).

  • Step 2: AI risk, i.e. probability of extinction associated with AI

  • Step 3: How much an additional researcher at MIRI reduces AI risk

  • Step 4: How much an additional researcher at MIRI costs

Step 1: Expected value of far future

The expected number of far future HEWALYs that will be averted if we can prevent extinction is the product of

  • The expected population size of the far future

  • The expected number of years future beings are alive

  • The number of HEWALYs accrued per year alive

Note that we allow HEWALY per year alive to be negative in some cases. It’s normally distributed with mean 1 and standard deviation 1.24.

To see our input values for these three parameters, see Guesstimate model.

Step 2: AI risk, i.e., probability of extinction associated with AI

For this we use 2 approaches:

  1. Direct approach: team members reflecting on our belief about the total existential risk (i.e. probability of extinction) associated with developing highly capable AI systems, bearing in mind mind all of the work on safety that will be done.

  2. Two-step approach: team members reflecting on two things, total existential risk (over the next century) bearing in mind all of the work on x-risk reduction that will be done, and fraction of total existential risk associated with developing highly capable AI systems, bearing in mind all the work on safety that will be done. We multiply them together.

We assign â…“ weight to approach 1, and â…” to approach 2, since approach 2 forces us to think more by breaking it down in 2 steps.

Step 3: How much an additional researcher at MIRI reduces AI risk

For this we use 2 approaches:

1. MIRI-specific approach:

For this we need some background. MIRI’s main research agenda until recently is called “Agent Foundations for Aligning Machine Intelligence with Human Interests” where they try to understand rational thinking in order to deal with highly capable AI that will emerge in the future. This is different from the current mainstream approach to AI in academia and industry, which is more machine learning based. (Recently MIRI added a more machine learning based research agenda.) Most AI safety experts outside of MIRI seem to think that their approach is less likely to be relevant for AI safety than the mainstream approach, but in the case that it is relevant it will be highly valuable, and even if we are not sure about the relevance ex ante their work is valuable for the field to have diverse approaches to AI safety. The Open Philanthropy Project has invited external experts to review MIRI’s technical work on their agent foundation research agenda, and the conclusion is that they have not made much progress on it (see here for their discussion on the relevance and quality of MIRI’s recent research).

For this MIRI-specific approach we proceed in 4 steps:

  • 1. Fraction of AI risk that will be reduced if AI alignment problem is solved (our mean value.

  • 2. Fraction of alignment problem that will be solved if MIRI solves their agent foundation research agenda.

Note: Paul Christiano, a technical researcher on AI safety, suggested 0.5 for input 1 and 0.15 for input 2. In the past he has collaborated with MIRI as an external researcher. He is one of the few people in mainstream AI research who has worked with MIRI closely. (Though he says “the numbers off the top of my head, not the numbers on which my own evaluation is based”.)

  • 3. Probability that MIRI solves their agent foundation research agenda.

Note: The most recent time I asked Paul about this, he suggested 0.15. He said: “they are moving something like 1% of the way towards solving their goal per year”, while acknowledging that the probability is not well defined and the number of 1% is “pretty made up”.

  • 4. By what fraction an additional researcher will increase MIRI's chance of solving their agent foundation research agenda:

An argument for using 0.05 is the following: MIRI currently have 10 technical researchers, so we assume the marginal one contributes by 1/11 and adjust downwards by ½ in case the marginal person is not as good as the existing average. Our actual input is the median of our team members’ guesses.

Multiplying these four together we get the fraction of AI risk reduced by adding a researcher to MIRI.

2. Community approach: developed by Owen Cotton-Barratt and Daniel Dewey.

First, we estimate the productivity of one AI safety researcher, in terms of the fraction of AI risk they reduce.

To do this, we assign a value to the size of the research community working on safety by the time we develop those potentially risky AI systems, say 500.

Then, we assign a value to the following: if we double the total amount of work that would be done on AI safety, what fraction of the bad scenarios should we expect this to avert? Say that’s 3%.

Then the average productivity of one researcher is: 3%/(2*500), since 3% is the fraction of AI risk that 2*500 researcher can avert.

We assume this is the productivity of a marginal researcher (i.e. assuming linearity so marginal equals average).

This is the direct effect of adding a researcher today. However, there is an indirect effect: adding a researcher today may eventually cause more or less people to be in this area. We also assign a value to that spillover effect: how many extra researchers eventually enter the field if we add one today (this equals 1 if there is no spillover effect, and is greater/less than 1 if the effect is positive/negative).

We multiply the direct effect by this number above to get the net effect of an additional researcher today.

Note that we are assuming that an additional MIRI researcher is as productive as an average researcher in the field (i.e. not only marginal = average, but marginal MIRI = average in the field).

We assign ½ weight to each of the 2 approaches above.

Step 4: How much an additional researcher at MIRI costs

For this we use 1-3 million USDs as the cost over their career.

To recap, we have 4 components now:

  • 1: Expected value of far future

  • 2: AI risk, i.e. probability of extinction associated with AI

  • 3: What fraction of the AI risk (in #2) is reduced by an additional researcher

  • 4: How much it costs to add a researcher to MIRI, in terms of lifetime salary

We do (#1)*(#2)*(#3)/(#4) to get the number of HEWALYs by an extra dollar of donation to MIRI. In addition to the assumptions above, we are assuming that MIRI will use the marginal donation to hire one more researcher, and the probability of that happening is proportional to (donation received) / (total lifetime salary needed), which means we are probably underestimating the probability of hiring (conditional on them actually wanting to hire someone) since expected donation should also increase the chance of hiring.

The final output for HEWALYs per $ donated to MIRI is about 2*10^5 (this is a point estimate from our range of outputs; see Guesstimate model), with a standard deviation of 2*10^7.

Model limitations

  1. The inputs are highly subjective. For most things we have to put our subjective beliefs that we are not very confident about and have no empirical evidence to back up. In contrast, the cost-effectiveness calculations for GiveWell’s top charities also involves some subjective inputs (like guesses on persistence of the benefit from cash transfers, or subjective values on the tradeoff between saving lives and increasing consumption), but a lot of inputs there do come from empirical evidence from research papers. Even Paul Christiano who is an expert in the area and knows MIRI well is not very confident about his numbers, which gives you a sense of how difficult and speculative this exercise is.

  2. We did put large standard deviations around our point estimates. However, given the speculative nature and our low confidence of such numbers, it is unclear how to even think about that. Holden Karnofsky has written a post on why we should give less weight to more speculative estimates using the heuristics of Bayesian updating. Michael Dickens has a quantitative model to actually do that. However, we or others don’t see to understand the mechanics of such models well (e.g. see Dickens’ discussion of prior here), so it does not seem compelling that we should take the results of such Bayesian quantitative models literally in guiding our decisions.

  3. Related to point #1, even experts have a lot of disagreements. E.g. external AI safety experts that Open Phil invited to review MIRI’s work have disagreements on the relevance and quality of their work. Paul whose inputs we used seems to be on the optimistic side among external AI safety technical experts. He also knows MIRI the best, which could be a reason to trust his opinion more. However, his optimism could also be due to self-selection (perhaps he had an optimistic prior before starting to work with MIRI which is why he worked with them, rather than working with MIRI causing him to be optimistic). In addition to experts’ disagreements, experts seem to also have non-quantitative considerations that may be important for their evaluation of MIRI, e.g. Paul said the numbers he gave wasn’t what he actually used to evaluate MIRI.

  4. We did not incorporate anything about MIRI’s new machine learning research agenda, since it is very new.

  5. We looked at the far future which results in both the mean and the standard deviation being huge. A useful robustness check would be restricting to the next century (so the mean and standard deviation among HEWALYs averted would both be lower) and see how that changes results (through the Bayesian updating aggregation). We have not done this. Our ignorance on how that would change the model again reflects our lack of intuition of the mechanics of such Bayesian models.

  6. We did not incorporate MIRI’s other effects apart from doing technical research. Historically they have had both positive and negative effects on publicizing AI safety. External experts familiar with MIRI believe that they are likely to do better in this dimension in the future.

  7. Dynamic incentives: If as some external experts say, MIRI is working on an important problem but haven’t made much progress (and is not on a very promising path), are we rewarding someone simply for working on something important but not doing a very good job? This may create bad dynamic incentives for other potential recipients: that it will be more important for them to show that they are working on an important problem than showing that they are actually making progress or on a promising path, especially for groups that are taking a more unique and less replaceable approach: maybe they could have spent more effort in making their output better or more accessible to “mainstream” audience, which is a valuable thing in scientific research, but now they lack incentive to do that as it’s not as important in attracting donations. That is, if a group can show that they are working on something “important” and “neglected”, and that is sufficient to attract donation, then they have less incentive to work on (or demonstrating) “tractability”.

  8. Related to dynamic incentives, a big and longstanding donor like Open Phil could give MIRI some money in order to learn about their progress in the future, which provides monitoring and incentives to improve, so it may not be as problematic when it comes to donating to a group that hasn’t showed much “tractability” so far. For us, as a small, one-time donor, we don’t have the structure to provide monitoring and incentives, so the problem mentioned in #7 is more of a concern.

  9. Lastly, something not related to the model itself but related to AI safety: within AI safety, perhaps there is an argument to look beyond groups like FHI and MIRI that have been the center of attention, and look into other potential donation recipients. E.g. Open Phil has some grantees in the area who are probably doing good work, and some of them may be small and can use more funding. One related option is to look into the EA fund in this area.

This post was submitted for comment to the Machine Intelligence Research Institute before publication.

Comments (3)

Comment author: PeterMcCluskey 22 May 2017 02:41:16PM 5 points [-]

Can you explain your expected far future population size? It looks like your upper bound is something like 10 orders of magnitude lower than Bostrom's most conservative estimates.

That disagreement makes all the other uncertainty look extremely trivial in comparison.

Comment author: ThomasSittler 23 May 2017 10:55:02AM *  1 point [-]

Do you mean Bostrom's estimate that "the Virgo Supercluster could contain 10^23 biological humans"? This did come up in our conversations. One objection that was raised is that humanity could go extinct, or for some other reason colonisation of the Supercluster could have a very low probability. There was significant disagreement among us, and if I recall correctly we chose the median of our estimates.

Do you think Bostrom is correct here? What probability distribution would you have chosen for the expected far future population size? :)

Comment author: PeterMcCluskey 23 May 2017 07:11:03PM 3 points [-]

colonisation of the Supercluster could have a very low probability.

What do you mean by very low probability? If you mean a one in a million chance, that's not improbable enough to answer Bostrom. If you mean something that would actually answer Bostrom, then please respond to the SlateStarCodex post Stop adding zeroes.

I think Bostrom is on the right track, and that any analysis which follows your approach should use at least a 0.1% chance of more than 10^50 human life-years.