Many people here, myself included, are very concerned about the risks from rapidly improving artificial general intelligence (AGI). A significant fraction of people in that camp give to the Machine Intelligence Research Institute, or recommend others do so.

Unfortunately, for those who lack the necessary technical expertise, this is partly an act of faith. I am in some position to evaluate the arguments about whether safe AGI is an important cause. I'm also in some position to evaluate the general competence and trustworthiness of the people working at MIRI. On those counts I am satisfied, though I know not everyone is.

However, I am in a poor position to evaluate:

  • The quality of MIRI's past research output.
  • Whether their priorities are sensible or clearly dominated by alternatives.
I could probably make some progress if I tried, but in any case, don't have the time to focus on this one question.

To get around this I have asked a few people who have more technical expertise or inside knowledge for their opinions. But I wish I had access to something more systematic and reliable.

This is a no unique situation - science funding boards are often in a poor position to judge the things they are funding and so have to rely on carefully chosen experts to vet them.

I suggest we conduct a survey of people who are in an unusually good position to know whether MIRI a) is a good investment of skills and money, b) should change its approach in order to do better. 

The ideal person to oversee such a survey would:
  1. Have an existing reputation for trustworthiness and confidentiality.
  2. Think that AI risk is an important cause, but have no particular convictions about the best approach or organisation for dealing with it. They shouldn't have worked for MIRI in the past, but will presumably have some association with the general rationality or AI community.
I suggest the survey have the following traits:
  1. Involve 10-20 people, including a sample of present and past MIRI staff, people at organisations working on related problems (CFAR, FHI, FLI, AI Impacts, CSER, OpenPhil, etc), and largely unconnected math/AI/CS researchers.
  2. Results should be compiled by two or three people - ideally with different perspectives - who will summarise the results in such a way that nothing in the final report could identify what any individual wrote (unless they are happy to be named). Their goal should be purely to represent the findings faithfully, given the constraints of brevity and confidentiality.
  3. The survey should ask about:
    1. Quality of past output.
    2. Suitability of staff for their roles.
    3. Quality of current strategy/priorities.
    4. Quality of operations and other non-research aspects of implementation, etc.
    5. How useful more funding/staff would be.
    6. Comparison with the value of work done by other related organisations.
    7. Suggestions for how the work or strategy could be improved.
  4. Obviously participants should only comment on what they know about. The survey should link to MIRI's strategy and recent publications.
  5. MIRI should be able to suggest people to be contacted, but so should the general public through an announcement. They should also have a chance to comment on the survey itself before it goes out. Ideally it would be checked by someone who understand good survey design, as subtle aspects of wording can be important.
  6. It should be impressed on participants the value of being open and thoughtful in their answers for maximising the chances of solving the problem of AI risk in the long run.
If conducted to a high standard I would find this survey convincing, in either direction.

MIRI/FHI's survey of expected timelines for the development of artificial intelligence has been a similarly valuable resource for discussing the issue with non-experts over the last few years.

This approach could be applied to other organisations as well. However, I feel it is most pressing for MIRI because i) it is so hard for someone like me to know what to say about the above, ii) they want more money than they currently receive, so the evidence is decision-relevant.

I don't expect that this project would be prohibitively costly relative to its value. Ideally, it would only take 100-300 hours total, including time spent filling out the survey. MIRI currently spends around $2 million dollars a year - including some highly skilled labour that is probably underpriced - so the opportunity cost would represent under 1% of their annual budget.

If anyone would like to volunteer please do so here. I would be happy to advise, and also try to find funders, if a small grant would be helpful.

Thanks to Ozy for more or less suggesting the above and prompting me to write this.

11

0
0

Reactions

0
0

More posts like this

Comments21
Sorted by Click to highlight new comments since: Today at 1:09 PM

Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.

FYI, when it comes to evaluating our research progress, I doubt that the methods you propose would get you much Bayesian evidence. Our published output will look like round pegs shoved into square holes regardless of whether we're doing our jobs well or poorly, because we're doing research that doesn't fit neatly into an existing academic niche. Our objective is to make direct progress on what appear to us to be the main neglected technical obstacles to developing reliable AI systems in the long term, with a goal of shifting the direction of AI research in a big way once we hit certain key research targets; and we're specifically targeting research that isn't compatible with industry's economic incentives or academia's publish-or-perish incentives. To get information about how well we're doing our jobs, I think the key questions to investigate are (1) whether we've chosen good research targets; and (2) whether we're making good progress towards them.

We've been focusing our communication efforts mainly on helping people evaluate (1): I've been working on explaining our approach and agenda, and OpenPhil is also on the job. To investigate (2), we'd need to spend a sizable chunk of time with mathematically adept evaluators — we still haven't hit any of our key research targets, which means that evaluating our progress requires understanding our smaller results and why we think they're progress towards the big results. In practice, we've found that explaining this usually requires explaining why we think the big targets are vital, as this informs (e.g.) which shortcuts are and are not acceptable. I plan to wait until after the OpenPhil report is finished before taking on another time-intensive eval.

Fortunately, (2) will become much easier to evaluate as we achieve (or persistently fail to achieve) those key targets. This also provides us with an opportunity to test our approach and methodology. People who understand our approach and find it uncompelling often predict that some of the results we're shooting for cannot be achieved. This means we'll get some evidence about (1) as we learn more about (2). For example, last year I mentioned "naturalized AIXI" as an ambitious 5-year research target. If we are not able to make concrete progress towards that goal, then over the next four years, I will lose confidence in our approach and eventually change our course dramatically. Conversely, if we make discoveries that are important pieces of that puzzle, I'll update in favor of us being onto something, especially if we find puzzle pieces that knowledgeable critics predicted we wouldn’t find. This data will hopefully start rolling in soon, now that our research team is getting up to size.

("Concrete progress" / "important puzzle pieces" in this case are satisfactory asymptotic algorithms for any of: (1) reasoning under logical uncertainty; (2) identifying the best available decision with respect to a utility function; (3) performing induction from inside an environment; (4) identifying the referents of goals in realistic world-models; and (5) reasoning about the behavior of smarter reasoners; the last of which is hopefully a subset of 1 and 2. The linked papers give rough descriptions of what counts as 'satisfactory' in each case; I'll work to make the desiderata more explicit as time goes on.)

I think that it's probably quite important to define in advance what sorts of results would convince us that the quality of MIRI's performance is either sufficient or insufficient. Otherwise I expect those already committed to some belief about MIRI's performance to consider the survey evidence for their existing belief, even if another person with the opposite belief also considers it evidence for their belief.

Relatedly, I also worry about the uniqueness of the problem and how it might change what we consider a cause worth donating to. Although you don't seem to be thinking that you could understand MIRI's arguments and see no flaws and still be inclined to say "I still can't be sure that this is the right way to go," I expect that many people are averse to donating to causes like MIRI because the effectiveness of the proposed interventions does not admit to simple testing. With existential risks, empirical testing is often impossible in the traditional sense, although sometimes possible in a limited sense. Results about sub-existential pandemic risk are probably at least somewhat relevant to the study of existential pandemic risk, for example. But it's not the same as distributing bed nets, looking at the malaria incidence, adjusting, reobserving, and so on and so on. It's not like we can perform an action, look through a time warp, and see whether or not the world ends in the future. And what I'm getting at is that, even if this is not really the nature of these problems, even if it is not the case that interventions upon these problems are not testable, we might imagine the implications if it were the case that they were genuinely untestable. I think that there are some people who would refuse to donate to existential risk charities merely because other charities have interventions testable for effectiveness. And this concerns me. If it is not by human failing that we don't test the effectiveness of our interventions, but it is the nature of the problem that you cannot test the effectiveness of your interventions, do you choose to do nothing? That is not a rhetorical question. I genuinely believe that we are confused about this and that MIRI is an example of a cause that may be difficult to evaluate without resolving this confusion. This is related to ambiguity aversion in cognitive science and decision theory. Even though ambiguity aversion appears in choices between betting on known and unknown risks, and not in choices to bet or not to bet on unknown risks in non-comparative contexts, effective altruists consider almost all charitable decisions within the context of cause prioritization, which means that we might expect EAs to encounter more comparative contexts than a random philanthropist, and thus for them to exhibit more bias towards causes with ambiguity, even if the survey itself would technically be focusing on one cause. It's noteworthy that the expected utility formalism and human behavior differ in the sense that the expected utility formalism prescribes indifference between bets with known and unknown probabilities in the case that each bet has the same payoffs. (In reality the situation is not even this clear, for the payoffs of successfully intervening upon malaria incidence as opposed to human extinction are hardly equal.) I think we must genuinely ask if we should be averse to ambiguity in general, and to attempt to explain why this heuristic was evolutionarily adaptive, and to see if the problem of existential risk is an example of a case either where we should, or where we should not, use ambiguity aversion as a heuristic. After all, a humanity that attempts no interventions on the problem of existential risk merely because it cannot test the effectiveness of its interventions is a humanity that ignores existential risk and goes extinct for it, even if we believed that we were being virtuous philanthropists the entire time.

I admire the motivation, but worry about selection effects.

I'd guess the median computer science professor hasn't heard about MIRI's work. Within the class of people who know about MIRI-esque issues, I'd guess knowledge of MIRI and enthusiasm about MIRI will be correlated: if you think FAI is akin to overpopulation on mars, you probably won't be paying close attention to the field. Thus those in a position to comment intelligently on MIRI's work will be selected (in part) for being favourably disposed to the idea behind it.

That isn't necessarily a showstopper, and it may be worth doing regardless. Perhaps multiple different attempts to gather ('survey' might be too strong a term) relevant opinion on the various points could be a good strategy. E.g.

  1. Similar to the FHI/MIRI timelines research, interrogating computer scientists as to their perception of AI risk, and the importance of alignment etc. would be helpful data.

  2. Folks at MIRI and peer organisations could provide impressions of their organizational efficacy. This sort of 'organisational peer review' could be helpful for MIRI to improve. Reciprocal arrangements between groups within EA reviewing each others performance and suggesting improvements could be a valuable activity going forward.

  3. For technical facility, one obvious port of call would be academics who remarked on the probabilistic set theory paper, as well as MIRI workshop participants (especially those who did not end up working at MIRI). As a general metric (given MIRI's focus on research) a comparison of number of publications/$ or FTE research staff to other academic bodies would be interesting. My hunch is this would be unflattering to MIRI (especially when narrowing down to more technical/math heavy work) - but naively looking at publication count may do MIRI a disservice, given it is looking at weird and emergent branches of science.

Another possibility, instead of surveying people who already know about MIRI (and thus selection worries) is to pay someone independent to get to know about them. I know Givewell made a fairly adverse review of MIRIs performance a few years ago. I'd be interested to hear what they think about them now. I'm unaware of 'academic auditors', but it might not be unduly costly to commission domain experts to have a look at the relevant issues. Someone sceptical of MIRI might suggest that usually this function is performed by academia generally, and MIRI's relatively weak connection to academia at large in these technical fields is a black mark against it (albeit one I know they are working to correct).

A survey like this is probably a good idea, although it might not give us any evidence that isn't already publicly available. A non-AI risk expert already has quite a few indicators about MIRI's quality:

  1. It has gotten several dozen papers accepted to conferences.
  2. Some of these papers have a decent number of citations, many have ~5. (You can find number of citations on Google Scholar but I don't know a good way to get this information other than just manually searching for papers and looking at the citations.) Many of the citations are by other MIRI papers; most are by MIRI/FHI/CSER/associated people, probably because these are the only groups doing real work on AI risk.
  3. MIRI regularly collaborates with other organizations or individuals working on AI risk, which suggests that these people value MIRI's contributions.
  4. Stuart Russell, one of the world's leading AI researchers, sits on the advisory board of MIRI, and appears to have plans to collaborate with MIRI.

If we did a survey like this one, it would probably be largely redundant with the evidence we already have. The people surveyed would need to be AI risk researchers, which pretty much means a small handful of people at MIRI, FHI, FLI, etc. Lots of these people already collaborate with MIRI and cite MIRI papers. Still, we might be able to learn something from hearing their explicit opinions about MIRI, although I don't know what.

MIRI currently spends around $2 million dollars a year - including some highly skilled labour that is probably underpriced

Their 2014 financials on https://intelligence.org/transparency/ say their total expenditures in 2014 were $948k. Their 2015 financials aren't up yet, and I think they did expand in 2015, but I don't think you can claim this unremarked. This is not a neutral error; if you make them look twice as big as they are, then you also make them look half as efficient.

Simply took a look at their latest fundraising page:

"although we may still slow down or accelerate our growth based on our fundraising performance, our current plans assume a budget of roughly $1,825,000 per year."

https://intelligence.org/2015/12/01/miri-2015-winter-fundraiser/

So hopefully it is indeed not a neutral error.

Ok, I admit I didn't think to check there. Arguing the semantics about what "currently spends" means would be pointless, and I recognize that this remark was in the context of estimating how MIRI's future budget would be affected, but I do think that in the context of a discussion about evaluating past performance, it's important not to anchor people's expectations on a budget they don't have yet.

Presumably well-chosen participants in this survey, were it to occur, should not be left to rely on a barely related point in this post to inform them about MIRI's budget over the last decade.

Why wouldn't we just expect them to publish in peer reviewed journals?

AI researchers don't usually publish in peer reviewed journals, they present at conferences. MIRI has presented lots of papers at conferences.

See here: https://intelligence.org/all-publications/

Over the past few years, MIRI has published a couple dozen conference papers and a handful of journal articles.

I have nothing against that specifically, but publishing in peer reviewed journals is very costly and slow. Most MIRI funders would think journals are currently biased against the relevant research and that is one thing MIRI is trying to change. Knowing they are publishing papers also wouldn't speak to the strategy.

This survey makes sense. However, I have a few caveats:

Think that AI risk is an important cause, but have no particular convictions about the best >approach or organisation for dealing with it. They shouldn't have worked for MIRI in the past, but >will presumably have some association with the general rationality or AI community.

Why should the person overseeing the survey think AI risk is an important cause? Doesn't that self-select for people who or more likely to be positive toward MIRI than whatever the baseline is for all people familiar with AI risk (and, obviously, competent to judge who to include in the survey)? The ideal person to me would be neutral and while of course finding someone who is truly neutral would likely prove impractical, selecting someone overtly positive would be a bad idea for the same reasons it would be to select someone overtly negative. The point is the aim should be towards neutrality.

They should also have a chance to comment on the survey itself >before it goes out. Ideally it >would be checked by someone who understand good survey >design, as subtle aspects of >wording can be important.

This should be a set time frame to draft a response to the survey before it goes public. A "chance" is too vague.

It should be impressed on participants the value of being open and thoughtful in their answers >for maximising the chances of solving the problem of AI risk in the long run.

Telling people to be open and thoughtful is great, but explicitly tying it to solving long run AI risk primes them to give certain kinds of answers.

"Why should the person overseeing the survey think AI risk is an important cause?"

Because someone who believes it's a real risk has strong personal incentives to try to make the survey informative and report the results correctly (i.e. they don't want to die). Someone who believed it's a dumb cause would be tempted to discredit the cause by making MIRI look bad (or at least wouldn't be as trusted by prospective MIRI donors).

Such personal incentives are important but, again, I didn't advocate getting someone hostile to AI risk. I proposed aiming for someone neutral. I know, no one is "truly" neutral but you have to weigh potential positive personal incentives of someone invested against potential motivated thinking (or more accurately in this case, "motivated selection").

Someone who was just neutral on the cause area would probably be fine, but I think there are few of those as it's a divisive issue, and they probably wouldn't be that motivated to do the work.

Why should the person overseeing the survey think AI risk is an important cause?

Because the purpose of the survey is to determine MIRI's effectiveness as a charitable organization. If one believes that there is a negligible probability that an artificial intelligence will cause the extinction of the human species within the next several centuries, then it immediately follows that MIRI is an extremely ineffective organization, as it would be designed to mitigate a risk that ostensibly does not need mitigating. The survey is moot if one believes this.

I don't disagree on the problems of getting someone who thinks there is "negligible probability" of AI causing extinction being not suited for the task. That's why I said to aim for neutrality.

But I think we may be disagreeing over whether "thinks AI risk is an important cause" is too close to "is broadly positive towards AI risk as a cause area." I think so. You think not?

But I think we may be disagreeing over whether "thinks AI risk is an important cause" is too close to "is broadly positive towards AI risk as a cause area." I think so. You think not?

Are there alternatives to a person like this? It doesn't seem to me like there are.

"Is broadly positive towards AI risk as a cause area" could mean "believes that there should exist effective organizations working on mitigating AI risk", or could mean "automatically gives more credence to the effectiveness of organizations that are attempting to mitigate AI risk."

It might be helpful if you elaborated more on what you mean by 'aim for neutrality'. What actions would that entail, if you did that, in the real world, yourself? What does hiring the ideal survey supervisor look like in your mind if you can't use the words "neutral" or "neutrality" or any clever rephrasings thereof?

It might be helpful if you elaborated more on what you mean by 'aim for neutrality'. What >actions would that entail, if you did that, in the real world, yourself?

I meant picking someone with no stake whatsoever in the outcome. Someone who, though exposed to arguments about AI risk, has no strong opinions one way or another. In other words, someone without a strong prior on AI risk as a cause area. Naturally, we all have biases, even if they are not explicit, so I am not proposing this as a disqualifying standard, just a goal worth shooting for.

An even broader selection tool I think worth considering alongside this is simply "people who know about AI risk" but that's basically the same as Rob's original point of "have some association with the general rationality or AI community."

Edit: Should say "Naturally, we all have priors..."

Yup, MIRI is in a fairly unique situation with respect to the inscrutability of its research to a large and interested technical audience.

The proposal makes sense to me, though I think that if you want people to trust a survey, you need to exclude the organisation that's subject of the survey from any involvement in the survey, including suggesting survey recipients.

One challenge is that in the case of AI researchers, it might be hard to assess whether they have a particular conviction regarding how to deal with AI risk because they are likely to have views about useful approaches to problems in AI and to have also thought about AI risk at least a little bit before, and you'd need to come up with some idea of how to judge bias.

The report should break out any differences between past/present employees and those suggested by MIRI, vs others. I think you need a mix of both insiders and outsiders to get a overall picture.