Many people here, myself included, are very concerned about the risks from rapidly improving artificial general intelligence (AGI). A significant fraction of people in that camp give to the Machine Intelligence Research Institute, or recommend others do so.
Unfortunately, for those who lack the necessary technical expertise, this is partly an act of faith. I am in some position to evaluate the arguments about whether safe AGI is an important cause. I'm also in some position to evaluate the general competence and trustworthiness of the people working at MIRI. On those counts I am satisfied, though I know not everyone is.
However, I am in a poor position to evaluate:
- The quality of MIRI's past research output.
- Whether their priorities are sensible or clearly dominated by alternatives.
- Have an existing reputation for trustworthiness and confidentiality.
- Think that AI risk is an important cause, but have no particular convictions about the best approach or organisation for dealing with it. They shouldn't have worked for MIRI in the past, but will presumably have some association with the general rationality or AI community.
- Involve 10-20 people, including a sample of present and past MIRI staff, people at organisations working on related problems (CFAR, FHI, FLI, AI Impacts, CSER, OpenPhil, etc), and largely unconnected math/AI/CS researchers.
- Results should be compiled by two or three people - ideally with different perspectives - who will summarise the results in such a way that nothing in the final report could identify what any individual wrote (unless they are happy to be named). Their goal should be purely to represent the findings faithfully, given the constraints of brevity and confidentiality.
- The survey should ask about:
- Quality of past output.
- Suitability of staff for their roles.
- Quality of current strategy/priorities.
- Quality of operations and other non-research aspects of implementation, etc.
- How useful more funding/staff would be.
- Comparison with the value of work done by other related organisations.
- Suggestions for how the work or strategy could be improved.
- Obviously participants should only comment on what they know about. The survey should link to MIRI's strategy and recent publications.
- MIRI should be able to suggest people to be contacted, but so should the general public through an announcement. They should also have a chance to comment on the survey itself before it goes out. Ideally it would be checked by someone who understand good survey design, as subtle aspects of wording can be important.
- It should be impressed on participants the value of being open and thoughtful in their answers for maximising the chances of solving the problem of AI risk in the long run.
Thanks for the write-up, Rob. OpenPhil actually decided to evaluate our technical agenda last summer, and Holden put Daniel Dewey on the job. The report isn't done yet, in part because it has proven very time-intensive to fully communicate the reasoning behind our research priorities, even to someone with as much understanding of the AI landscape as Daniel Dewey. Separately, we have plans to get an independent evaluation of our organizational efficacy started later in 2016, which I expect to be useful for our admin team as well as prospective donors.
FYI, when it comes to evaluating our research progress, I doubt that the methods you propose would get you much Bayesian evidence. Our published output will look like round pegs shoved into square holes regardless of whether we're doing our jobs well or poorly, because we're doing research that doesn't fit neatly into an existing academic niche. Our objective is to make direct progress on what appear to us to be the main neglected technical obstacles to developing reliable AI systems in the long term, with a goal of shifting the direction of AI research in a big way once we hit certain key research targets; and we're specifically targeting research that isn't compatible with industry's economic incentives or academia's publish-or-perish incentives. To get information about how well we're doing our jobs, I think the key questions to investigate are (1) whether we've chosen good research targets; and (2) whether we're making good progress towards them.
We've been focusing our communication efforts mainly on helping people evaluate (1): I've been working on explaining our approach and agenda, and OpenPhil is also on the job. To investigate (2), we'd need to spend a sizable chunk of time with mathematically adept evaluators — we still haven't hit any of our key research targets, which means that evaluating our progress requires understanding our smaller results and why we think they're progress towards the big results. In practice, we've found that explaining this usually requires explaining why we think the big targets are vital, as this informs (e.g.) which shortcuts are and are not acceptable. I plan to wait until after the OpenPhil report is finished before taking on another time-intensive eval.
Fortunately, (2) will become much easier to evaluate as we achieve (or persistently fail to achieve) those key targets. This also provides us with an opportunity to test our approach and methodology. People who understand our approach and find it uncompelling often predict that some of the results we're shooting for cannot be achieved. This means we'll get some evidence about (1) as we learn more about (2). For example, last year I mentioned "naturalized AIXI" as an ambitious 5-year research target. If we are not able to make concrete progress towards that goal, then over the next four years, I will lose confidence in our approach and eventually change our course dramatically. Conversely, if we make discoveries that are important pieces of that puzzle, I'll update in favor of us being onto something, especially if we find puzzle pieces that knowledgeable critics predicted we wouldn’t find. This data will hopefully start rolling in soon, now that our research team is getting up to size.
("Concrete progress" / "important puzzle pieces" in this case are satisfactory asymptotic algorithms for any of: (1) reasoning under logical uncertainty; (2) identifying the best available decision with respect to a utility function; (3) performing induction from inside an environment; (4) identifying the referents of goals in realistic world-models; and (5) reasoning about the behavior of smarter reasoners; the last of which is hopefully a subset of 1 and 2. The linked papers give rough descriptions of what counts as 'satisfactory' in each case; I'll work to make the desiderata more explicit as time goes on.)