Hide table of contents

Comment Permalink

Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the "Inductive Coherence" framework and the "Asymptotic Convergence in Online Learning with Unbounded Delays" framework; (2) the demonstration in "Proof-Producing Reflection for HOL" that a non-pathological form of self-referential reasoning is possible in a certain class of theorem-provers; (3) the reflective oracles result presented in "A Formal Solution to the Grain of Truth Problem," "Reflective Variants of Solomonoff Induction and AIXI," and "Reflective Oracles"; (4) and Vadim Kosoy's "Optimal Predictors" work. The papers we listed under 1, 2, and 4 then got used in an external review process they probably weren't very well-suited for.

I think this was more or less just an honest miscommunication. I told Nick in advance that I only assigned an 8% probability to external reviewers thinking the “Asymptotic Convergence…” result was "good" on its own (and only a 20% probability for "Inductive Coherence"). My impression of what happened is that Open Phil staff interpreted my pushback as saying that I thought the external reviews wouldn’t carry much Bayesian evidence (but that the internal reviews still would), where what I was trying to communicate was that I thought the papers didn’t carry very much Bayesian evidence about our technical output (and that I thought the internal reviewers would need to speak to us about technical specifics in order to understand why we thought they were important). Thus, we were surprised when their grant decision and write-up put significant weight on the internal reviews of those papers (and they were surprised that we were surprised). This is obviously really unfortunate, and another good sign that I should have committed more time and care to clearly communicating my thinking from the outset.

Regarding picking better papers for external review: We only put out 10 papers directly related to our technical agendas between Jan 2015 and Mar 2016, so the option space is pretty limited, especially given the multiple constraints Open Phil wanted to meet. Optimizing for technical impressiveness and non-obviousness as a stand-alone result, I might have instead gone with Critch's bounded Löb paper and the grain of truth problem paper over the AC/IC results. We did submit the grain of truth problem paper to Open Phil, but they decided not to review it because it didn't meet other criteria they were interested in.

If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I’m less pessimistic about building collaborations and partnerships, in part because we’re already on pretty good terms with other folks in the community, and in part because I think we have different models of how technical ideas spread. Regardless, I expect that with more and better communication, we can (upon re-evaluation) raise the probability of Open Phil staff that the work we’re doing is important.

More generally, though, I expect this task to get easier over time as we get better at communicating about our research. There's already a body of AI alignment research (and, perhaps, methodology) that requires the equivalent of multiple university courses to understand, but there aren't curricula or textbooks for teaching it. If we can convince a small pool of researchers to care about the research problems we think are important, this will let us bootstrap to the point where we have more resources for communicating information that requires a lot of background and sustained scholarship, as well as more of the institutional signals that this stuff warrants a time investment.

I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRI's research agenda:

My early discussion with Daniel was framed around questions like "What specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers don't understand logical uncertainty?” I made the mistake of attempting to give straight/non-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying “MIRI's research directions mostly aren't chosen to directly address a specific failure mode in a notional software system” and “I don't think that's a good heuristic for identifying research that's likely to be relevant to long-run AI safety.”

I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. It’s clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldn’t be necessary in this case.

(I plan to blog more about the details of these later.)

I think these are important mistakes that show I hadn't sufficiently clarified several concepts in my own head, or spent enough time understanding Daniel's position. My hope is that I can do a much better job of avoiding these sorts of failures in the next round of discussion, now that I have a better model of where Open Phil’s staff and advisors are coming from and what the review process looks like.

(I am correct in that Yuan previously worked for you, right?)

Yeah, though that was before my time. He did an unpaid internship with us in the summer of 2013, and we’ve occasionally contracted him to tutor MIRI staff. Qiaochu's also a lot socially closer to MIRI; he attended three of our early research workshops.

Unless and until then, I remain sceptical about MIRI's value.

I think that's a reasonable stance to take, and that there are other possible reasonable stances here too. Some of the variables I expect EAs to vary on include “level of starting confidence in MIRI's mathematical intuitions about complicated formal questions” and “general risk tolerance.” A relatively risk-intolerant donor is right to wait until we have clearer demonstrations of success; and a relatively risk-tolerant donor who starts without a very high confidence in MIRI's intuitions about formal systems might be pushed under a donation threshold by learning that an important disagreement has opened up between us and Daniel Dewey (or between us and other people at Open Phil).

Also, thanks for laying out your thinking in so much detail -- I suspect there are other people who had more or less the same reaction to Open Phil's grant write-up but haven't spoken up about it. I'd be happy to talk more about this over email, too, including answering Qs from anyone else who wants more of my thoughts on this.

See in context

MIRI Update and Fundraising Case

by So8res

Oct 9 201612 min read 16

18

Machine Intelligence Research Institute

Frontpage

MIRI Update and Fundraising Case

Update December 22: Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed $589,248, making this our second-biggest fundraiser to date. Although we fell short of our target by $160,000, we have since made up this shortfall thanks to November/December donors. I'm extremely grateful for this support, and will plan accordingly for more staff growth over the coming year.

As described in our post-fundraiser update, we are still fairly funding-constrained. December/January donations will have an especially large effect on our 2017–2018 hiring plans and strategy, as we try to assess our future prospects. For some external endorsements of MIRI as a good place to give this winter, see recent evaluations by Daniel Dewey, Nick Beckstead, Owen Cotton-Barratt, and Ben Hoskin.

The Machine Intelligence Research Institute is running its annual fundraiser, and we're using the opportunity to explain why we think MIRI's work is useful from an EA perspective. To that end, we'll be answering questions on a new "Ask MIRI Anything" EA Forum thread on Wednesday, October 12th.

MIRI is a nonprofit research group based in Berkeley, California. We do technical research that’s aimed at ensuring that smarter-than-human AI systems have a positive impact on the world. Some big recent developments:

We have a new paper out, "Logical Induction," introducing a method for assigning reasonable probabilities to conjectures in mathematics and computer science that outpaces deduction. See Shtetl-Optimized and n-Category Café for recent discussion on math blogs.

We started work on a new research agenda with a stronger focus on machine learning, “Alignment for Advanced Machine Learning Systems.” This agenda will be occupying about half of our time going forward, with the other half focusing on our agent foundations agenda.

The Open Philanthropy Project awarded MIRI a one-year $500,000 grant to scale up our research program, with a strong chance of renewal next year. This grant is not a full endorsement of MIRI, and in fact Open Phil has outlined a number of points of disagreement with MIRI's current research approach. I discuss some of the points of divergence below.

Our recent technical progress and our new set of research directions put us in a better position in this fundraiser to capitalize on donors' help and scale up our research activities further.

Our fundraiser, running from September 16 to October 31, has a $750,000 "basic" funding target, along with higher growth and stretch targets if we hit target 1. Our current progress:

Donate Now

Employer matching and pledges to give later this year also count towards the total. Click here to learn more.

These are ambitious goals. We feel reasonably good about our chance of hitting target 1, but it isn't a sure thing; we'll probably need to see support from new donors in order to hit our target, to offset the fact that a few of our regular donors are giving less than usual this year. To that end, we've written up an introduction to MIRI below, along with updates on what's new at MIRI and what we have planned next:

Why AI? · Why MIRI's approach? · What's new? · Goals for the field · Organizational goals

Why AI?

A number of EA organizations have said that they consider AI to be an important cause area. Here are what I see as the basic reasons for prioritizing research in this area.

Humanity's social and technological dominance stems primarily from our proficiency at reasoning, planning, and doing science (Armstrong). This relatively general problem-solving capacity is roughly what I have in mind when I talk about "intelligence" (Muehlhauser), including artificial intelligence.¹

The case for focusing on AI risk mitigation doesn't assume much about how future AI systems will be implemented or used. Here are the claims that I think are key:

1. Whatever problems/tasks/objectives we assign to advanced AI systems probably won't exactly match our real-world objectives. Unless we put in an (enormous, multi-generational) effort to teach AI systems every detail of our collective values (to the extent there is overlap), realistic systems will need to rely on imperfect approximations and proxies for what we want (Soares, Yudkowsky).

2. If the system’s assigned problems/tasks/objectives don’t fully capture our real objectives, it will likely end up with incentives that catastrophically conflict with what we actually want (Bostrom, Russell, Benson-Tilsen & Soares).

3. AI systems can become much more intelligent than humans (Bostrom), to a degree that would likely give AI systems a decisive advantage in arbitrary conflicts (Soares, Branwen).

4. It's hard to predict when smarter-than-human AI will be developed: it could be 15 years away, or 150 years (Open Philanthropy Project). Additionally, progress is likely to accelerate as AI approaches human capability levels, giving us little time to shift research directions once the finish line is in sight (Bensinger).

Stuart Russell's Cambridge talk is an excellent introduction to long-term AI risk.² Other leading AI researchers who have expressed these kinds of concerns about general AI include Francesca Rossi (IBM), Shane Legg (Google DeepMind), Eric Horvitz (Microsoft), Bart Selman (Cornell), Ilya Sutskever (OpenAI), Andrew Davison (Imperial College London), David McAllester (TTIC), Jürgen Schmidhuber (IDSIA), and Geoffrey Hinton (Google Brain).

Our take-away from this is that we should prioritize early research into aligning future AI systems with our interests, if we can find relevant research problems to study. AI alignment could easily turn out to be many times harder than AI itself, in which case research efforts are currently being wildly misallocated.

Alignment research can involve developing formal and theoretical tools for building and understanding AI systems that are stable and robust ("high reliability"), finding ways to get better approximations of our values in AI systems ("value specification"), and reducing the risks from systems that aren't perfectly reliable or value-specified ("error tolerance").

Why MIRI's approach?

Our recent back-and-forth with the Open Philanthropy Project provides a good jumping-off point for articulating how MIRI's approach differs from other perspectives in AI safety. Open Phil's scientific advisers (along with external reviewers) commented on some of our recent papers here, and I replied here.

One source of disagreement (though certainly not the only one, and not one that everyone at Open Phil considers crucial) is that MIRI is relatively pessimistic about the prospects of aligning advanced AI systems if we don't develop new paradigms that are more theoretically principled than present-day machine learning techniques.

Loosely speaking, we can imagine the space of all smarter-than-human AI systems as an extremely wide and heterogeneous space, in which "alignable AI designs" is a small and narrow target (and "aligned AI designs" smaller and narrower still). I think that the most important thing a marginal alignment researcher can do today is help ensure that the first generally intelligent systems humans design are in the “alignable” region.³ I think that this is unlikely to happen unless researchers have a fairly principled understanding of how the systems they're developing reason, and how that reasoning connects to the intended objectives.

Most of our work is therefore aimed at seeding the field with ideas that may inspire more AI research in the vicinity of (what we expect to be) alignable AI designs. When the first general reasoning machines are developed, we want the developers to be sampling from a space of designs and techniques that are more understandable and reliable than what’s possible in AI today.

Concretely, these three points are often what set apart our research priorities from other candidate approaches:

We prioritize research that we think could help inspire new AI techniques that are more theoretically principled than current techniques. In practice, this usually involves focusing on the biggest gaps in our current theories, in the hope of developing better and more general theories to undergird subsequent engineering work (Soares).

We focus more on AI systems' reasoning and planning, rather than on systems' goals, their input and output channels, or features of their environments.⁴

We avoid problems we think academic and industry researchers are well-positioned to address (Bensinger).

Unlike the ideas in the previous section, most of the ideas here (especially the second bullet point) haven't been written up yet, and they're only recently seeing much discussion. One of my priorities for the remainder of 2016 is to provide a fuller summary of my and other MIRI researchers' views on these topics.

What's new?

Since my last post to the EA Forum (for our July/August 2015 fundraiser), our main new results with associated write-ups have been:

"Logical Induction," and a chain of results leading up to it: "Asymptotic Logical Uncertainty and the Benford Test," "Inductive Coherence," and "Asymptotic Convergence in Online Learning with Unbounded Delays."
"Optimal Predictors," a forthcoming body of work on logical uncertainty by research associate Vadim Kosoy.
"A Formal Solution to the Grain of Truth Problem," the first reasonably general reduction of ideal game-theoretic reasoning to ideal decision-theoretic reasoning.
"Quantilizers," a proposed design for "mild optimizers" that satisfice their goals without maximizing.
"Parametric Bounded Löb’s Theorem and Robust Cooperation of Bounded Agents," a proof that MIRI's previous work on robust and inexploitable game-theoretic cooperation holds for computable systems, by way of new generalizations of Löb's theorem and Gödel’s second incompleteness theorem.

MIRI research fellows and associates co-authored two publications this year in a top AI venue, the 32nd Conference on Uncertainty in Artificial Intelligence: the grain of truth paper, and "Safely Interruptible Agents."

Since August 2015, we've hired two research fellows (Andrew Critch and Scott Garrabrant) and one assistant research fellow (Ryan Carey), and two of our associates (Abram Demski and Mihály Bárász) have signed on to become full-time researchers. We've expanded our associate program, run a seminar series (with a second one ongoing), run seven research workshops (including four during our first Colloquium Series on Robust and Beneficial AI), and sponsored MIRI Summer Fellows and MIRIx programs. We also expanded our operations team and moved into a larger office space.

For more details, see our 2015 in Review and 2016 Strategy Update posts.

Goals for the field

Researchers at MIRI are generally highly uncertain about how the field of AI will develop over the coming years, and there are many different scenarios that strike me as plausible. Conditional on a good outcome, though, I put a fair amount of probability on scenarios that more or less follow the following sketch:

In the short term, a research community coalesces, develops a good in-principle understanding of what the relevant problems are, and produces formal tools for tackling these problems. AI researchers move toward a minimal consensus about best practices, more open discussions of AI’s long-term social impact, a risk-conscious security mindset (Muehlhauser), and work on error tolerance and value specification.

In the medium term, researchers build on these foundations and develop a more mature understanding. As we move toward a clearer sense of what smarter-than-human AI systems are likely to look like — something closer to a credible roadmap — we imagine the research community moving toward increased coordination and cooperation in order to discourage race dynamics (Soares).

In the long term, we would like to see AI-empowered projects used to avert major AI mishaps while humanity works towards the requisite scientific and institutional maturity for making lasting decisions about the far future (Dewey). For this purpose, we’d want to solve a weak version of the alignment problem for limited AI systems — systems just capable enough to serve as useful levers for preventing AI accidents and misuse.

In the very long term, my hope is that researchers will eventually solve the “full” alignment problem for highly capable, highly autonomous AI systems. Ideally, we want to reach a position where engineers and operators can afford to take their time to dot every i and cross every t before we risk “locking in” any choices that have a large and irreversible effect on the future.

The above is a vague sketch, and we prioritize research we think would be useful in less optimistic scenarios as well. Additionally, “short term” and “long term” here are relative, and different timeline forecasts can have very different policy implications. Still, the sketch may help clarify the directions we’d like to see the research community move in.

Organizational goals

MIRI's current objective is to formalize and solve the problems outlined in our technical agendas. To make this more concrete, one (very ambitious) benchmark I set for us last year was to develop a fully naturalized AIXI by 2020. Logical induction looks like a promising step in that direction, though there are still a number of outstanding questions about how far this formalism (and reflective oracles, optimal predictors, and other frameworks we've developed) can get us towards a general-purpose theory of ideal bounded reasoning.

We currently employ seven technical research staff (six research fellows and one assistant research fellow) and seven research contractors (five associates and two interns).⁵ Our budget this year is about $1.75M, up from $1.65M in 2015 and $950k in 2014.⁶ Our eventual goal (subject to revision) is to grow until we have 13–17 technical research staff, at which point our budget would likely be in the $3–4M range. If we reach that point successfully while maintaining a two-year runway, we’re likely to shift out of growth mode.

Our budget estimate for 2017 is roughly $2–2.2M, which means that we’re entering this fundraiser with about 14 months’ runway. We’re uncertain about how many donations we’ll receive between November and next September,⁷ but projecting from current trends, we expect about 4/5ths of our total donations to come from the fundraiser and 1/5th to come in off-fundraiser.⁸ Based on this, we have the following fundraiser goals:

Basic target – $750,000. We feel good about our ability to execute our growth plans at this funding level. We’ll be able to move forward comfortably, albeit with somewhat more caution than at the higher targets.

Growth target – $1,000,000. This would amount to about half a year’s runway. At this level, we can afford to make more uncertain but high-expected-value bets in our growth plans. There’s a risk that we’ll dip below a year’s runway in 2017 if we make more hires than expected, but the growing support of our donor base would make us feel comfortable about taking such risks.

Stretch target – $1,250,000. At this level, even if we exceed my growth expectations, we’d be able to grow without real risk of dipping below a year’s runway. Past $1.25M we would not expect additional donations to affect our 2017 plans much, assuming moderate off-fundraiser support.⁹

If we hit our growth and stretch targets, we’ll be able to execute several additional programs we’re considering with more confidence. These include contracting a larger pool of researchers to do early work with us on logical induction and on our machine learning agenda, and generally spending more time on academic outreach, field-growing, and training or trialing potential collaborators and hires.

As always, you’re invited to get in touch if you have questions about our upcoming plans and recent activities. I’m very much looking forward to seeing what new milestones the growing alignment research community will hit in the coming year, and I’m very grateful for the thoughtful engagement and support that’s helped us get to this point.¹⁰

Donate Now
or
Pledge to Give

Notes

¹ Note that we don't assume that "human-level" artificial intelligence implies artificial consciousness, artificial emotions, or other human-like characteristics. When it comes to artificial intelligence, the only assumption is that if a carbon brain can solve a practical problem, a silicon brain can too (Chalmers). ↩

² For other good overviews of the problem, see Yudkowsky's So Far: Unfriendly AI Edition and Open Phil's cause report. ↩

³ Ensuring that highly advanced AI systems are in the “alignable” region isn’t necessarily a more important or difficult task than getting from “alignable” to “sufficiently aligned.” However, I expect the wider research community to be better-suited to the latter task. ↩

⁴ This is partly because of the first bullet point, and partly because we expect reasoning and planning to be a key part of what makes highly capable systems highly capable. To make use of these capabilities (and do so safely), I expect we need a good model of how the system does its cognitive labor, and how this labor ties in to the intended objective.

I haven’t provided any argument for this view here; for people who don’t find it intuitive, I plan to write more on this topic in the near future. ↩

⁵ This excludes Katja Grace, who heads the AI Impacts project using a separate pool of funds earmarked for strategy/forecasting research. It also excludes me: I contribute to our technical research, but my primary role is administrative. ↩

⁶ We expect to be slightly under the $1.825M budget we previously projected for 2016, due to taking on fewer new researchers than planned this year. ↩

⁷ We’re imagining continuing to run one fundraiser per year in future years, possibly in September. ↩

⁸ Separately, the Open Philanthropy Project is likely to renew our $500,000 grant next year, and we expect to receive the final ($80,000) installment from the Future of Life Institute’s three-year grants.

For comparison, our revenue was about $1.6 million in 2015: $167k in grants, $960k in fundraiser contributions, and $467k in off-fundraiser (non-grant) contributions. Our situation in 2015 was somewhat different, however: we ran two 2015 fundraisers, whereas we’re skipping our winter fundraiser this year and advising December donors to pledge early or give off-fundraiser. ↩

⁹ At significantly higher funding levels, we’d consider running other useful programs, such as a prize fund. Shoot me an e-mail if you’d like to talk about the details. ↩

¹⁰ Thanks to Rob Bensinger for helping draft this post. ↩

18 Reactions

Mentioned in

18Ask MIRI Anything (AMA)

More posts like this

Comments16

Sorted by

New & upvoted

Click to highlight new comments since: Today at 7:25 PM

Gregory Lewis8y5

Is the logical induction result going to be published/presented at a conference?

I would also be interested in your view on current and anticipated performance in publishing/presenting work in the future, and on gaining esteem and citations from academia. I read Open Phil's review where it concerned these to be fairly damning (and, I regret to say, it is not the only aspect of their review which raises concerns).

RobBensinger8y11

It's being submitted to journals; the newsletter and blog should mention when its publication status changes.

We definitely plan to publish more results in top venues like UAI. The advantage of having some AI safety research occur at a nonprofit like MIRI, outside of academia and industry, is that it allows us to prioritize work that isn't very well-aligned with industry's profit incentives or academia's publish-or-perish incentives. From that perspective, reinventing a publish-or-perish incentive for ourselves from scratch obviously isn't a good idea; in that case we'd be better off based at a university. Fortunately, "Logical Induction" looks like a good example of work that's in the intersection of 'useful as object-level technical research' and 'useful for sparking academic collaborations.' Plausibly reflective oracles is in this category too.

Building good relationships with academics and popularizing AI safety work is clearly high-value in its own right, though it isn't our main focus. "Optimal Predictors" is a good example of the kind of work that strikes me as object-level-useful but not very useful for academic outreach and relationship-building, though it's possible we'll revise it for publication at some point.

I'd be happy to talk about the Open Phil review in more detail -- though their focus is primarily on "is the agent foundations agenda useful for solving the alignment problem?" and "has MIRI made good technical progress on the agent foundations agenda?" Open Phil opted to exclude from the review process the only publication-optimized / academic-outreach-optimized paper we sent them ("A Formal Solution to the Grain of Truth Problem"). See Nate's responses to the reviews here + his predictions in Appendix B.

Gregory Lewis7y4

Many thanks for the reply, Rob, and apologies for missing the AMA - although this discussion may work better in this thread anyway.

Respectfully, my reading of the Open Phil report suggests it is more broadly adverse than you suggest: in broad strokes, the worries are 1) That the research MIRI is undertaking probably isn't that helpful at improving AI risk; and 2) The research output MIRI has made along these lines is in any case unimpressive. I am sympathetic to both lines of criticism, but I am more worried by the latter than the former: AI risk is famously recondrite, thus diversity of approaches seems desirable.

Some elements of Open Phil's remarks on the latter concern seem harsh to me - in particular the remark that the suite of papers presented would be equivalent to 1-3 year's work from an unsupervised grad student is inapposite given selection, and especially given the heartening progress of papers being presented at UAI (although one of these is by Armstrong, who I gather is principally supported by FHI).

Yet others are frankly concerning. It is worrying that many of the papers produced by MIRI were considered unimpressive. It is even more worrying that despite the considerable efforts Open Phil made to review MIRI's efficacy - comissioning academics to review, having someone spend a hundred hours looking at them, etc. - they remain unconvinced of the quality of your work. That they emphasize fairly research-independent considerations in offering a limited grant (e.g. involvement in review process, germinating SPARC, hedging against uncertainty of approaches) is hardly a ringing endorsement; that they expressly benchmark MIRI's research quality as less than a higher end academic grantee likewise; comparison to other grants Open Phil have made in the AI space (e.g. 1.1M to FLI, 5.5M for a new center at UC Berkeley) even more so.

It has been remarked on this forum before MIRI is a challenging organisation to evaluate as the output (technical research in computer science) is opaque to most without a particular quantitative background. MIRI's predictions and responses to Open Phil implies a more extreme position: even domain experts are unlikely to appreciate the value of MIRI's work without a considerable back-and-forth with MIRI itself. I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.

I accept MIRI may not be targetting conventional metrics of research success (e.g. academic publications). Yet across most proxy indicators (e.g. industry involvement, academic endorsement, collaboration) for MIRI 'doing good research', the evidence remains pretty thin on the ground - and, as covered above, direct assessment of research quality by domain experts is mixed at best. I look forward to the balance of evidence shifting favourably: the new conference papers are promising, ditto the buzz around logical induction (although I note the blogging is by people already in MIRI's sphere of influence/former staff, and MIRI's previous 'blockbuster result' in decision theory has thus far underwhelmed). Yet this hope, alongside the earnest assurances of MIRI that - if only experts gave them the time - they would be persuaded of their value, is not a promissory note that easily justifies an organisation with a turnover of $2M/year, nor fundraising for over a million dollars more.

Gregory Lewis7y7

I take this opportunity to note I have made an even-odds bet with Carl Shulman for $1000, donated to the charity of the winner's choice over whether Open Phil's next review of MIRI has a more favourable evaluation of their research.

Gregory Lewis6y11

I am wiser, albeit poorer: the bet resolved in Carl's favour. I will edit this comment with the donation destination he selects, with further lamentations from me in due course.

Gregory Lewis6y7

Carl has gotten back to me with where he would like to donate his gains, ill-gotten through picking on epistemic inferiors - akin to crocodiles in the Serengeti river picking off particularly frail or inept wildebeest on their crossing. The $1000 will go to MIRI.

With cognitive function mildly superior to the median geriatric wildebeest, I can take some solace that these circumstances imply this sum is better donated by him than I, and that MIRI is doing better on a crucial problem for the far future than I had supposed.

RyanCarey6y6

Why do people keep betting against Carl Shulman!

So8res7y4

Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so I'm glad you followed up.

Open Phil's grant write-up is definitely quite critical, and not an endorsement. One of Open Phil's main criticisms of MIRI is that they don't think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up more write-ups on this topic, since the hope is that the write-ups will also help third parties better evaluate our research methods on their merits.

I think Open Phil's assessment that the papers they reviewed were ‘technically unimpressive’ is mainly based on the papers "Asymptotic Convergence in Online Learning with Unbounded Delays" and (to a lesser extent) "Inductive Coherence." These are technically unimpressive, in the sense that they're pretty easy results to get once you're looking for them. (The proof in "Asymptotic Convergence..." was finished in less than a week.) From my perspective the impressive step is Scott Garrabrant (the papers’ primary author) getting from the epistemic state (1) ‘I notice AIXI fails in reflection tasks, and that this failure is deep and can't be easily patched’ to:

(2) ‘I notice that one candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting patterns in whether a given theorem can be used to (dis)prove other theorems.”’
(3) ‘I notice that another candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting empirical patterns in whether a claim looks similar to a set of claims that turned out to be theorems.”’
(4) ‘I notice that the two most obvious and straightforward ways to formalize these two abilities don't let you get the other ability for free; in fact, the obvious and straightforward algorithm for the first ability precludes possessing the second ability, and vice versa.’

In contrast, I think the reviewers were mostly assessing how difficult it would be to get from 2/3/4 to a formal demonstration that there’s at least one real (albeit impractical) algorithm that can actually exhibit ability 2, and one that can exhibit ability 3. This is a reasonable question to look at, since it's a lot harder to retrospectively assess how difficult it is to come up with a semiformal insight than how difficult it is to formalize the insight; but those two papers weren't really chosen for being technically challenging or counter-intuitive. They were chosen because they help illustrate two distinct easy/straightforward approaches to LU that turned out to be hard to reconcile, and also because (speaking with the benefit of hindsight) conceptually disentangling these two kinds of approaches turned out to be one of the key insights leading to "Logical Induction."

I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.

I wasn't surprised that there's a big inferential gap for most of Open Phil's technical advisors -- we haven't talked much with Chris/Dario/Jacob about the reasoning behind our research agenda. I was surprised by how big the gap was for Daniel Dewey, Open Phil's AI risk program officer. Daniel's worked with us before and has a lot of background in alignment research at FHI, and we spent significant time trying to understand each other’s views, so this was a genuine update for me about how non-obvious our heuristics are to high-caliber researchers in the field, and about how much background researchers at MIRI and FHI have in common. This led to a lot of wasted time: I did a poor job addressing Daniel's questions until late in the review process.

I'm not sure what prior probability you should have assigned to ‘the case for MIRI's research agenda is too complex to be reliably communicated in the relevant timeframe.’ Evaluating how promising basic research is for affecting the long-run trajectory of the field of AI is inherently a lot more complicated than evaluating whether AI risk is a serious issue, for example. I don't have as much experience communicating the former, so the arguments are still rough. There are a couple of other reasons MIRI's research focus might have more inferential distance than the typical alignment research project:

(a) We've been thinking about these problems for over a decade, so we've had time to arrive at epistemic states that depend on longer chains of reasoning. Similarly, we've had time to explore and rule out various obvious paths (that turn out to be dead ends).
(b) Our focus is on topics we don't expect to jibe well with academia and industry, often because they look relatively intractable and unimportant from standard POVs.
(c) ‘High-quality nonstandard formal intuitions’ are what we do. This is what put us ahead of the curve on understanding the AI alignment problem, and the basic case for MIRI (from the perspective of people like Holden who see our early analysis and promotion of the alignment problem as our clearest accomplishment) is that our nonstandard formal intuitions may continue to churn out correct and useful insights about AI alignment when we zero in on subproblems. MIRI and FHI were unusual enough to come up with the idea of AI alignment research in the first place, so they're likely to come up with relatively unusual approaches within AI alignment.

Based on the above, I think the lack of mutual understanding is moderately surprising rather than extremely surprising. Regardless, it’s clear that we need to do a better job communicating how we think about choosing open problems to work on.

I note the blogging is by people already in MIRI's sphere of influence/former staff, and MIRI's previous 'blockbuster result' in decision theory has thus far underwhelmed)

I don't think we've ever worked with Scott Aaronson, though we're obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months.

is not a promissory note that easily justifies an organization with a turnover of $2M/year, nor fundraising for over a million dollars more.

I think this is a reasonable criticism, and I'm hoping our upcoming write-ups will help address this. If your main concern is that Open Phil doesn't think our work on logical uncertainty, reflection, and decision-theoretic counterfactuals is likely to be safety-relevant, keep in mind that Open Phil gave us $500k expecting this to raise our 2016 revenue from $1.6-2 million (the amount of 2016 revenue we projected absent Open Phil's support) to $2.1-2.5 million, in part to observe the ROI of the added $500k. We've received around $384k in our fundraiser so far (with four days to go), which is maybe 35-60% of what we'd expect based on past fundraiser performance. (E.g., we received $597k in our 2014 fundraisers and $955k in our 2015 ones.) Combined with our other non-Open-Phil funding sources, that means we've so far received around $1.02M in 2016 revenue outside Open Phil, which is solidly outside the $1.6-2M range we've been planning around.

There are a lot of reasons donors might be retracting; I’d be concerned if the reason is that they're expecting Open Phil to handle MIRI's funding on their own, or that they're interpreting some action of Open Phil's as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.

(In all of the above, I’m speaking only for myself; Open Phil staff and advisors don’t necessarily agree with the above, and might frame things differently.)

Sean_o_h7y3

"Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months." A quick note to say that comments that have made their way back to me from relevant circles agree with this. Also, my own impression - from within academia, but outside decision theory and AI - is that the level of recognition of, and respect for, MIRI's work is steadily rising in academia, although inferential gaps like what nate describes certainly exist, plus more generic cultural gaps. I've heard positive comments about MIRI's work from academics I wouldn't have expected even to have heard of MIRI. And my impression, from popping by things like Cambridge's MIRIx discussion group, is that they're populated for the most part by capable people with standard academic backgrounds who have become involved based on the merits of the work rather than any existing connection to MIRI (although I imagine some are or were lesswrong readers).

Gregory Lewis7y2

Nate, my thanks for your reply. I regret I may not have expressed myself well enough for your reply to precisely target the worries I expressed; I also regret insofar as you reply overcomes my poor expression, it make my worries grow deeper.

If I read your approach to the Open Phil review correctly, you submitted some of the more technically unimpressive papers for review because they demonstrated the lead author developing some interesting ideas for research direction, and that they in some sense lead up to the 'big result' (Logical Induction). If so, this looks like a pretty surprising error: one of the standard worries facing MIRI given its fairly slender publication record is the technical quality of the work, and it seemed pretty clear that was the objective behind sending them out for evaluation. Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

In candour, I think 'MIRI barking up the wrong tree' and/or (worse) 'MIRI not doing that much good research)' is a much better explanation for what is going on than 'inferential distance'. I struggle to imagine a fairer (or more propitious-to-MIRI) hearing than the Open Phil review: it involved two people (Dewey and Christiano) who previously worked with you guys, Dewey spent over 100 hours trying to understand the value of your work, they comissioned external experts in the field to review your work.

Suggesting that the fairly adverse review that results may be a product of lack of understanding makes MIRI seem more like a mystical tradition than a research group. If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I don't think we've ever worked with Scott Aaronson, though we're obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months.

I had Aaronson down as within MIRI's sphere of influence, but if I overstate I apologize (I am correct in that Yuan previously worked for you, right?)

I look forward to seeing MIRI producing or germinating some concrete results in decision theory. The 'underwhelming blockbuster' I referred to above was the TDT/UDT etc. stuff, which MIRI widely hyped but has since then languised in obscurity.

There are a lot of reasons donors might be retracting; I’d be concerned if the reason is that they're expecting Open Phil to handle MIRI's funding on their own, or that they're interpreting some action of Open Phil's as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.

It may simply be the usual (albeit regrettable) trait of donors jockeying to be 'last resort' - I guess it would depend what the usual distribution of donations are with respect to fundraising deadlines.

If donors are retracting, I would speculate Open Phil's report may be implicated. One potential model would be donors interpreting Open Phil's fairly critical support to be an argument against funding further growth by MIRI, thus pulling back so MIRIs overall revenue hovers at previous year levels (I don't read in the Open Phil a report a particular revenus target they wanted you guys to have). Perhaps a simpler explanation would be having a large and respected org do a fairly in depth review and give a fairly mixed review makes previously enthusiastic donors update to be more tepid, and perhaps direct their donations to other players in the AI space.

With respect, I doubt I will change my mind due to MIRI giving further write-ups, and if donors are pulling back in part 'due to' Open Phil, I doubt it will change their minds either. It may be that 'High quality non-standard formal insights' is what you guys do, but the value of that is pretty illegible on its own: it needs to be converted into tangible accomplishments (e.g. good papers, esteem from others in the field, interactions in industry) first to convince people there is actually something there, but also as this probably the plausible route to this comparative advantage having any impact.

Thus far this has not happened to a degree commensurate with MIRI's funding base. I wrote four-and-a-half years ago that I was disappointed in MIRI's lack of tangible accomplishments: I am even more disappointed that I find my remarks now follow fairly similar lines. Happily it can be fixed - if the logical induction result 'takes off' as I infer you guys hope it does, it will likely fix itself. Unless and until then, I remain sceptical about MIRI's value.

So8res7y10

Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRI's research agenda:

My early discussion with Daniel was framed around questions like "What specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers don't understand logical uncertainty?” I made the mistake of attempting to give straight/non-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying “MIRI's research directions mostly aren't chosen to directly address a specific failure mode in a notional software system” and “I don't think that's a good heuristic for identifying research that's likely to be relevant to long-run AI safety.”

I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. It’s clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldn’t be necessary in this case.

(I plan to blog more about the details of these later.)

(I am correct in that Yuan previously worked for you, right?)

Unless and until then, I remain sceptical about MIRI's value.

RobBensinger7y2

Relevant update: Daniel Dewey and Nick Beckstead of Open Phil have listed MIRI as one of ten "reasonably strong options in causes of interest" for individuals looking for places to donate this year.

amc8y1

we prioritize research we think would be useful in less optimistic scenarios as well.

I don't think I've seen anything from MIRI on this before. Can you describe or point me to some of this research?

Jessica_Taylor8y2

I agree with Nate that there isn’t much public on this yet. The AAMLS agenda is predicated on a relatively pessimistic scenario: perhaps we won’t have much time before AGI (and therefore not much time for alignment research), and the technology AI systems are based on won’t be much more principled than modern-day deep learning systems. I’m somewhat optimistic that it’s possible to achieve good outcomes in some pessimistic scenarios like this one.

Sursee7y1

Totally agreed with Jessica Taylor! she said it all very well!

So8res8y2

There’s nothing very public on this yet. Some of my writing over the coming months will bear on this topic, and some of the questions in Jessica’s agenda are more obviously applicable in “less optimistic” scenarios, but this is definitely a place where public output lags behind our private research.

As an aside, one of our main bottlenecks is technical writing capability: if you have technical writing skill and you’re interested in MIRI research, let us know.