Comment author: Daniel_Dewey 10 July 2017 07:10:12PM 3 points [-]

Thanks Nate!

The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

This is particularly helpful to know.

We worry about "unknown unknowns", but I’d probably give them less emphasis here. We often focus on categories of failure modes that we think are easy to foresee. As a rule of thumb, when we prioritize a basic research problem, it’s because we expect it to help in a general way with understanding AGI systems and make it easier to address many different failure modes (both foreseen and unforeseen), rather than because of a one-to-one correspondence between particular basic research problems and particular failure modes.

Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research.

As an example, the reason we work on logical uncertainty isn’t that we’re visualizing a concrete failure that we think is highly likely to occur if developers don't understand logical uncertainty. We work on this problem because any system reasoning in a realistic way about the physical world will need to reason under both logical and empirical uncertainty, and because we expect broadly understanding how the system is reasoning about the world to be important for ensuring that the optimization processes inside the system are aligned with the intended objectives of the operators.

I'm confused by this as a follow-up to the previous paragraph. This doesn't look like an example of "focusing on categories of failure modes that are easy to foresee," it looks like a case where you're explicitly not using concrete failure modes to decide what to work on.

“how do we ensure the system’s cognitive work is being directed at solving the right problems, and at solving them in the desired way?”

I feel like this fits with the "not about concrete failure modes" narrative that I believed before reading your comment, FWIW.

Comment author: So8res 10 July 2017 10:46:05PM *  5 points [-]

Can you give an example or two of failure modes or "categories of failure modes that are easy to foresee" that you think are addressed by some HRAD topic? I'd thought previously that thinking in terms of failure modes wasn't a good way to understand HRAD research.

I want to steer clear of language that might make it sound like we’re saying:

  • X 'We can't make broad-strokes predictions about likely ways that AGI could go wrong.'

  • X 'To the extent we can make such predictions, they aren't important for informing research directions.'

  • X 'The best way to address AGI risk is just to try to advance our understanding of AGI in a general and fairly undirected way.'

The things I do want to communicate are:

  • All of MIRI's research decisions are heavily informed by a background view in which there are many important categories of predictable failure, e.g., 'the system is steering toward edges of the solution space', 'the function the system is optimizing correlates with the intended function at lower capability levels but comes uncorrelated at high capability levels', 'the system has incentives to obfuscate and mislead programmers to the extent it models its programmers’ beliefs and expects false programmer beliefs to result in it better-optimizing its objective function.’

  • The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'.

  • There usually isn't a simple relationship between a particular open problem and a particular failure mode, but if we thought there were no way to predict in advance any of the ways AGI systems can go wrong, or if we thought a very different set of failures were likely instead, we'd have different research priorities.

Comment author: So8res 08 July 2017 09:10:42PM *  20 points [-]

Thanks for this solid summary of your views, Daniel. For others’ benefit: MIRI and Open Philanthropy Project staff are in ongoing discussion about various points in this document, among other topics. Hopefully some portion of those conversations will be made public at a later date. In the meantime, a few quick public responses to some of the points above:

2) If we fundamentally "don't know what we're doing" because we don't have a satisfying description of how an AI system should reason and make decisions, then we will probably make lots of mistakes in the design of an advanced AI system.

3) Even minor mistakes in an advanced AI system's design are likely to cause catastrophic misalignment.

I think this is a decent summary of why we prioritize HRAD research. I would rephrase 3 as "There are many intuitively small mistakes one can make early in the design process that cause resultant systems to be extremely difficult to align with operators’ intentions.” I’d compare these mistakes to the “small” decision in the early 1970s to use null-terminated instead of length-prefixed strings in the C programming language, which continues to be a major source of software vulnerabilities decades later.

I’d also clarify that I expect any large software product to exhibit plenty of actually-trivial flaws, and that I don’t expect that AGI code needs to be literally bug-free or literally proven-safe in order to be worth running. Furthermore, if an AGI design has an actually-serious flaw, the likeliest consequence that I expect is not catastrophe; it’s just that the system doesn’t work. Another likely consequence is that the system is misaligned, but in an obvious ways that makes it easy for developers to recognize that deployment is a very bad idea. The end goal is to prevent global catastrophes, but if a safety-conscious AGI team asked how we’d expect their project to fail, the two likeliest scenarios we’d point to are "your team runs into a capabilities roadblock and can't achieve AGI" or "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time."

This case does not revolve around any specific claims about specific potential failure modes, or their relationship to specific HRAD subproblems. This case revolves around the value of fundamental understanding for avoiding "unknown unknown" problems.

We worry about "unknown unknowns", but I’d probably give them less emphasis here. We often focus on categories of failure modes that we think are easy to foresee. As a rule of thumb, when we prioritize a basic research problem, it’s because we expect it to help in a general way with understanding AGI systems and make it easier to address many different failure modes (both foreseen and unforeseen), rather than because of a one-to-one correspondence between particular basic research problems and particular failure modes.

As an example, the reason we work on logical uncertainty isn’t that we’re visualizing a concrete failure that we think is highly likely to occur if developers don't understand logical uncertainty. We work on this problem because any system reasoning in a realistic way about the physical world will need to reason under both logical and empirical uncertainty, and because we expect broadly understanding how the system is reasoning about the world to be important for ensuring that the optimization processes inside the system are aligned with the intended objectives of the operators.

A big intuition behind prioritizing HRAD is that solutions to “how do we ensure the system’s cognitive work is being directed at solving the right problems, and at solving them in the desired way?” are likely to be particularly difficult to hack together from scratch late in development. An incomplete (empirical-side-only) understanding of what it means to optimize objectives in realistic environments seems like it will force designers to rely more on guesswork and trial-and-error in a lot of key design decisions.

I haven't found any instances of complete axiomatic descriptions of AI systems being used to mitigate problems in those systems (e.g. to predict, postdict, explain, or fix them) or to design those systems in a way that avoids problems they'd otherwise face.

This seems reasonable to me in general. I’d say that AIXI has had limited influence in part because it’s combining several different theoretical insights that the field was already using (e.g., complexity penalties and backtracking tree search), and the synthesis doesn’t add all that much once you know about the parts. Sections 3 and 4 of MIRI's Approach provide some clearer examples of what I have in mind by useful basic theory: Shannon, Turing, Bayes, etc.

My perspective on this is a combination of “basic theory is often necessary for knowing what the right formal tools to apply to a problem are, and for evaluating whether you're making progress toward a solution” and “the applicability of Bayes, Pearl, etc. to AI suggests that AI is the kind of problem that admits of basic theory.” An example of how this relates to HRAD is that I think that Bayesian justifications are useful in ML, and that a good formal model of rationality in the face of logical uncertainty is likely to be useful in analogous ways. When I speak of foundational understanding making it easy to design the right systems, I’m trying to point at things like the usefulness of Bayesian justifications in modern ML. (I’m unclear on whether we miscommunicated about what sort of thing I mean by “basic insights”, or whether we have a disagreement about how useful principled justifications are in modern practice when designing high-reliability systems.)

Comment author: Gregory_Lewis 27 October 2016 07:20:09PM *  2 points [-]

Nate, my thanks for your reply. I regret I may not have expressed myself well enough for your reply to precisely target the worries I expressed; I also regret insofar as you reply overcomes my poor expression, it make my worries grow deeper.

If I read your approach to the Open Phil review correctly, you submitted some of the more technically unimpressive papers for review because they demonstrated the lead author developing some interesting ideas for research direction, and that they in some sense lead up to the 'big result' (Logical Induction). If so, this looks like a pretty surprising error: one of the standard worries facing MIRI given its fairly slender publication record is the technical quality of the work, and it seemed pretty clear that was the objective behind sending them out for evaluation. Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

In candour, I think 'MIRI barking up the wrong tree' and/or (worse) 'MIRI not doing that much good research)' is a much better explanation for what is going on than 'inferential distance'. I struggle to imagine a fairer (or more propitious-to-MIRI) hearing than the Open Phil review: it involved two people (Dewey and Christiano) who previously worked with you guys, Dewey spent over 100 hours trying to understand the value of your work, they comissioned external experts in the field to review your work.

Suggesting that the fairly adverse review that results may be a product of lack of understanding makes MIRI seem more like a mystical tradition than a research group. If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I don't think we've ever worked with Scott Aaronson, though we're obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months.

I had Aaronson down as within MIRI's sphere of influence, but if I overstate I apologize (I am correct in that Yuan previously worked for you, right?)

I look forward to seeing MIRI producing or germinating some concrete results in decision theory. The 'underwhelming blockbuster' I referred to above was the TDT/UDT etc. stuff, which MIRI widely hyped but has since then languised in obscurity.

There are a lot of reasons donors might be retracting; I’d be concerned if the reason is that they're expecting Open Phil to handle MIRI's funding on their own, or that they're interpreting some action of Open Phil's as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.

It may simply be the usual (albeit regrettable) trait of donors jockeying to be 'last resort' - I guess it would depend what the usual distribution of donations are with respect to fundraising deadlines.

If donors are retracting, I would speculate Open Phil's report may be implicated. One potential model would be donors interpreting Open Phil's fairly critical support to be an argument against funding further growth by MIRI, thus pulling back so MIRIs overall revenue hovers at previous year levels (I don't read in the Open Phil a report a particular revenus target they wanted you guys to have). Perhaps a simpler explanation would be having a large and respected org do a fairly in depth review and give a fairly mixed review makes previously enthusiastic donors update to be more tepid, and perhaps direct their donations to other players in the AI space.

With respect, I doubt I will change my mind due to MIRI giving further write-ups, and if donors are pulling back in part 'due to' Open Phil, I doubt it will change their minds either. It may be that 'High quality non-standard formal insights' is what you guys do, but the value of that is pretty illegible on its own: it needs to be converted into tangible accomplishments (e.g. good papers, esteem from others in the field, interactions in industry) first to convince people there is actually something there, but also as this probably the plausible route to this comparative advantage having any impact.

Thus far this has not happened to a degree commensurate with MIRI's funding base. I wrote four-and-a-half years ago that I was disappointed in MIRI's lack of tangible accomplishments: I am even more disappointed that I find my remarks now follow fairly similar lines. Happily it can be fixed - if the logical induction result 'takes off' as I infer you guys hope it does, it will likely fix itself. Unless and until then, I remain sceptical about MIRI's value.

Comment author: So8res 29 October 2016 07:00:23PM *  10 points [-]

Under whatever constraints Open Phil provided, I'd have sent the 'best by academic lights' papers I had.

We originally sent Nick Beckstead what we considered our four most important 2015 results, at his request; these were (1) the incompatibility of the "Inductive Coherence" framework and the "Asymptotic Convergence in Online Learning with Unbounded Delays" framework; (2) the demonstration in "Proof-Producing Reflection for HOL" that a non-pathological form of self-referential reasoning is possible in a certain class of theorem-provers; (3) the reflective oracles result presented in "A Formal Solution to the Grain of Truth Problem," "Reflective Variants of Solomonoff Induction and AIXI," and "Reflective Oracles"; (4) and Vadim Kosoy's "Optimal Predictors" work. The papers we listed under 1, 2, and 4 then got used in an external review process they probably weren't very well-suited for.

I think this was more or less just an honest miscommunication. I told Nick in advance that I only assigned an 8% probability to external reviewers thinking the “Asymptotic Convergence…” result was "good" on its own (and only a 20% probability for "Inductive Coherence"). My impression of what happened is that Open Phil staff interpreted my pushback as saying that I thought the external reviews wouldn’t carry much Bayesian evidence (but that the internal reviews still would), where what I was trying to communicate was that I thought the papers didn’t carry very much Bayesian evidence about our technical output (and that I thought the internal reviewers would need to speak to us about technical specifics in order to understand why we thought they were important). Thus, we were surprised when their grant decision and write-up put significant weight on the internal reviews of those papers (and they were surprised that we were surprised). This is obviously really unfortunate, and another good sign that I should have committed more time and care to clearly communicating my thinking from the outset.

Regarding picking better papers for external review: We only put out 10 papers directly related to our technical agendas between Jan 2015 and Mar 2016, so the option space is pretty limited, especially given the multiple constraints Open Phil wanted to meet. Optimizing for technical impressiveness and non-obviousness as a stand-alone result, I might have instead gone with Critch's bounded Löb paper and the grain of truth problem paper over the AC/IC results. We did submit the grain of truth problem paper to Open Phil, but they decided not to review it because it didn't meet other criteria they were interested in.

If MIRI is unable to convince someone like Dewey, the prospects of it making the necessary collaborations or partnerships with the wider AI community look grim.

I’m less pessimistic about building collaborations and partnerships, in part because we’re already on pretty good terms with other folks in the community, and in part because I think we have different models of how technical ideas spread. Regardless, I expect that with more and better communication, we can (upon re-evaluation) raise the probability of Open Phil staff that the work we’re doing is important.

More generally, though, I expect this task to get easier over time as we get better at communicating about our research. There's already a body of AI alignment research (and, perhaps, methodology) that requires the equivalent of multiple university courses to understand, but there aren't curricula or textbooks for teaching it. If we can convince a small pool of researchers to care about the research problems we think are important, this will let us bootstrap to the point where we have more resources for communicating information that requires a lot of background and sustained scholarship, as well as more of the institutional signals that this stuff warrants a time investment.

I can maybe make the time expenditure thus far less mysterious if I mention a couple more ways I erred in trying to communicate my model of MIRI's research agenda:

  1. My early discussion with Daniel was framed around questions like "What specific failure mode do you expect to be exhibited by advanced AI systems iff their programmers don't understand logical uncertainty?” I made the mistake of attempting to give straight/non-evasive answers to those sorts of questions and let the discussion focus on that evaluation criterion, rather than promptly saying “MIRI's research directions mostly aren't chosen to directly address a specific failure mode in a notional software system” and “I don't think that's a good heuristic for identifying research that's likely to be relevant to long-run AI safety.”

  2. I fell prey to the transparency illusion pretty hard, and that was completely my fault. Mid-way through the process, Daniel made a write-up of what he had gathered so far; this write-up revealed a large number of miscommunications and places where I thought I had transmitted a concept of mine but Daniel had come away with a very different concept. It’s clear in retrospect that we should have spent a lot more time with me having Daniel try to explain what he thought I meant, and I had all the tools to predict this in foresight; but I foolishly assumed that wouldn’t be necessary in this case.

(I plan to blog more about the details of these later.)

I think these are important mistakes that show I hadn't sufficiently clarified several concepts in my own head, or spent enough time understanding Daniel's position. My hope is that I can do a much better job of avoiding these sorts of failures in the next round of discussion, now that I have a better model of where Open Phil’s staff and advisors are coming from and what the review process looks like.

(I am correct in that Yuan previously worked for you, right?)

Yeah, though that was before my time. He did an unpaid internship with us in the summer of 2013, and we’ve occasionally contracted him to tutor MIRI staff. Qiaochu's also a lot socially closer to MIRI; he attended three of our early research workshops.

Unless and until then, I remain sceptical about MIRI's value.

I think that's a reasonable stance to take, and that there are other possible reasonable stances here too. Some of the variables I expect EAs to vary on include “level of starting confidence in MIRI's mathematical intuitions about complicated formal questions” and “general risk tolerance.” A relatively risk-intolerant donor is right to wait until we have clearer demonstrations of success; and a relatively risk-tolerant donor who starts without a very high confidence in MIRI's intuitions about formal systems might be pushed under a donation threshold by learning that an important disagreement has opened up between us and Daniel Dewey (or between us and other people at Open Phil).

Also, thanks for laying out your thinking in so much detail -- I suspect there are other people who had more or less the same reaction to Open Phil's grant write-up but haven't spoken up about it. I'd be happy to talk more about this over email, too, including answering Qs from anyone else who wants more of my thoughts on this.

Comment author: Gregory_Lewis 22 October 2016 06:58:07PM *  4 points [-]

Many thanks for the reply, Rob, and apologies for missing the AMA - although this discussion may work better in this thread anyway.

Respectfully, my reading of the Open Phil report suggests it is more broadly adverse than you suggest: in broad strokes, the worries are 1) That the research MIRI is undertaking probably isn't that helpful at improving AI risk; and 2) The research output MIRI has made along these lines is in any case unimpressive. I am sympathetic to both lines of criticism, but I am more worried by the latter than the former: AI risk is famously recondrite, thus diversity of approaches seems desirable.

Some elements of Open Phil's remarks on the latter concern seem harsh to me - in particular the remark that the suite of papers presented would be equivalent to 1-3 year's work from an unsupervised grad student is inapposite given selection, and especially given the heartening progress of papers being presented at UAI (although one of these is by Armstrong, who I gather is principally supported by FHI).

Yet others are frankly concerning. It is worrying that many of the papers produced by MIRI were considered unimpressive. It is even more worrying that despite the considerable efforts Open Phil made to review MIRI's efficacy - comissioning academics to review, having someone spend a hundred hours looking at them, etc. - they remain unconvinced of the quality of your work. That they emphasize fairly research-independent considerations in offering a limited grant (e.g. involvement in review process, germinating SPARC, hedging against uncertainty of approaches) is hardly a ringing endorsement; that they expressly benchmark MIRI's research quality as less than a higher end academic grantee likewise; comparison to other grants Open Phil have made in the AI space (e.g. 1.1M to FLI, 5.5M for a new center at UC Berkeley) even more so.

It has been remarked on this forum before MIRI is a challenging organisation to evaluate as the output (technical research in computer science) is opaque to most without a particular quantitative background. MIRI's predictions and responses to Open Phil implies a more extreme position: even domain experts are unlikely to appreciate the value of MIRI's work without a considerable back-and-forth with MIRI itself. I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.

I accept MIRI may not be targetting conventional metrics of research success (e.g. academic publications). Yet across most proxy indicators (e.g. industry involvement, academic endorsement, collaboration) for MIRI 'doing good research', the evidence remains pretty thin on the ground - and, as covered above, direct assessment of research quality by domain experts is mixed at best. I look forward to the balance of evidence shifting favourably: the new conference papers are promising, ditto the buzz around logical induction (although I note the blogging is by people already in MIRI's sphere of influence/former staff, and MIRI's previous 'blockbuster result' in decision theory has thus far underwhelmed). Yet this hope, alongside the earnest assurances of MIRI that - if only experts gave them the time - they would be persuaded of their value, is not a promissory note that easily justifies an organisation with a turnover of $2M/year, nor fundraising for over a million dollars more.

Comment author: So8res 27 October 2016 04:26:22PM *  4 points [-]

Thanks for the response, Gregory. I was hoping to see more questions along these lines in the AMA, so I'm glad you followed up.

Open Phil's grant write-up is definitely quite critical, and not an endorsement. One of Open Phil's main criticisms of MIRI is that they don't think our agent foundations agenda is likely to be useful for AI alignment; but their reasoning behind this is complicated, and neither Open Phil nor MIRI has had time yet to write up our thoughts in any detail. I suggest pinging me to say more about this once MIRI and Open Phil have put up more write-ups on this topic, since the hope is that the write-ups will also help third parties better evaluate our research methods on their merits.

I think Open Phil's assessment that the papers they reviewed were ‘technically unimpressive’ is mainly based on the papers "Asymptotic Convergence in Online Learning with Unbounded Delays" and (to a lesser extent) "Inductive Coherence." These are technically unimpressive, in the sense that they're pretty easy results to get once you're looking for them. (The proof in "Asymptotic Convergence..." was finished in less than a week.) From my perspective the impressive step is Scott Garrabrant (the papers’ primary author) getting from the epistemic state (1) ‘I notice AIXI fails in reflection tasks, and that this failure is deep and can't be easily patched’ to:

  • (2) ‘I notice that one candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting patterns in whether a given theorem can be used to (dis)prove other theorems.”’
  • (3) ‘I notice that another candidate for “the ability AIXI is missing that would fix these deep defects” is “learning mathematical theorems while respecting empirical patterns in whether a claim looks similar to a set of claims that turned out to be theorems.”’
  • (4) ‘I notice that the two most obvious and straightforward ways to formalize these two abilities don't let you get the other ability for free; in fact, the obvious and straightforward algorithm for the first ability precludes possessing the second ability, and vice versa.’

In contrast, I think the reviewers were mostly assessing how difficult it would be to get from 2/3/4 to a formal demonstration that there’s at least one real (albeit impractical) algorithm that can actually exhibit ability 2, and one that can exhibit ability 3. This is a reasonable question to look at, since it's a lot harder to retrospectively assess how difficult it is to come up with a semiformal insight than how difficult it is to formalize the insight; but those two papers weren't really chosen for being technically challenging or counter-intuitive. They were chosen because they help illustrate two distinct easy/straightforward approaches to LU that turned out to be hard to reconcile, and also because (speaking with the benefit of hindsight) conceptually disentangling these two kinds of approaches turned out to be one of the key insights leading to "Logical Induction."

I confess scepticism at this degree of inferential distance, particularly given the Open Phil staff involved in this report involved several people who previously worked with MIRI.

I wasn't surprised that there's a big inferential gap for most of Open Phil's technical advisors -- we haven't talked much with Chris/Dario/Jacob about the reasoning behind our research agenda. I was surprised by how big the gap was for Daniel Dewey, Open Phil's AI risk program officer. Daniel's worked with us before and has a lot of background in alignment research at FHI, and we spent significant time trying to understand each other’s views, so this was a genuine update for me about how non-obvious our heuristics are to high-caliber researchers in the field, and about how much background researchers at MIRI and FHI have in common. This led to a lot of wasted time: I did a poor job addressing Daniel's questions until late in the review process.

I'm not sure what prior probability you should have assigned to ‘the case for MIRI's research agenda is too complex to be reliably communicated in the relevant timeframe.’ Evaluating how promising basic research is for affecting the long-run trajectory of the field of AI is inherently a lot more complicated than evaluating whether AI risk is a serious issue, for example. I don't have as much experience communicating the former, so the arguments are still rough. There are a couple of other reasons MIRI's research focus might have more inferential distance than the typical alignment research project:

  • (a) We've been thinking about these problems for over a decade, so we've had time to arrive at epistemic states that depend on longer chains of reasoning. Similarly, we've had time to explore and rule out various obvious paths (that turn out to be dead ends).
  • (b) Our focus is on topics we don't expect to jibe well with academia and industry, often because they look relatively intractable and unimportant from standard POVs.
  • (c) ‘High-quality nonstandard formal intuitions’ are what we do. This is what put us ahead of the curve on understanding the AI alignment problem, and the basic case for MIRI (from the perspective of people like Holden who see our early analysis and promotion of the alignment problem as our clearest accomplishment) is that our nonstandard formal intuitions may continue to churn out correct and useful insights about AI alignment when we zero in on subproblems. MIRI and FHI were unusual enough to come up with the idea of AI alignment research in the first place, so they're likely to come up with relatively unusual approaches within AI alignment.

Based on the above, I think the lack of mutual understanding is moderately surprising rather than extremely surprising. Regardless, it’s clear that we need to do a better job communicating how we think about choosing open problems to work on.

I note the blogging is by people already in MIRI's sphere of influence/former staff, and MIRI's previous 'blockbuster result' in decision theory has thus far underwhelmed)

I don't think we've ever worked with Scott Aaronson, though we're obviously on good terms with him. Also, our approach to decision theory stirred up a lot of interest from professional decision theorists at last year's Cambridge conference; expect more about this in the next few months.

is not a promissory note that easily justifies an organization with a turnover of $2M/year, nor fundraising for over a million dollars more.

I think this is a reasonable criticism, and I'm hoping our upcoming write-ups will help address this. If your main concern is that Open Phil doesn't think our work on logical uncertainty, reflection, and decision-theoretic counterfactuals is likely to be safety-relevant, keep in mind that Open Phil gave us $500k expecting this to raise our 2016 revenue from $1.6-2 million (the amount of 2016 revenue we projected absent Open Phil's support) to $2.1-2.5 million, in part to observe the ROI of the added $500k. We've received around $384k in our fundraiser so far (with four days to go), which is maybe 35-60% of what we'd expect based on past fundraiser performance. (E.g., we received $597k in our 2014 fundraisers and $955k in our 2015 ones.) Combined with our other non-Open-Phil funding sources, that means we've so far received around $1.02M in 2016 revenue outside Open Phil, which is solidly outside the $1.6-2M range we've been planning around.

There are a lot of reasons donors might be retracting; I’d be concerned if the reason is that they're expecting Open Phil to handle MIRI's funding on their own, or that they're interpreting some action of Open Phil's as a signal that Open Phil wants broadly Open-Phil-aligned donors to scale back support for MIRI.

(In all of the above, I’m speaking only for myself; Open Phil staff and advisors don’t necessarily agree with the above, and might frame things differently.)

Comment author: John_Maxwell_IV 13 October 2016 01:21:15AM *  2 points [-]

I sometimes see influential senior staff at MIRI make statements on social media that pertain to controversial moral questions. These statements are not accompanied by disclaimers that they are speaking on behalf of themselves and not their employer. Is it safe to assume that these statements represent the de facto position of the organization?

This seems relevant to your organizational mission since MIRI's goal is essentially to make AI moral, but a donor's notion of what's moral might not correspond with MIRI's position. Forcefully worded statements on controversial moral questions could also broadcast willingness to engage in brinkmanship re: a future AI arms race, if different teams in the arms race were staffed by people who fell on different sides of the question.

Comment author: So8res 13 October 2016 05:43:03PM *  9 points [-]

Posts or comments on personal Twitter accounts, Facebook walls, etc. should not be assumed to represent any official or consensus MIRI position, unless noted otherwise. I'll echo Rob's comment here that "a good safety approach should be robust to the fact that the designers don’t have all the answers". If an AI project hinges on the research team being completely free from epistemic shortcomings and moral failings, then the project is doomed (and should change how it's doing alignment research).

I suspect we're on the same page about it being important to err in the direction of system designs that don't encourage arms races or other zero-sum conflicts between parties with different object-level beliefs or preferences. See also the CEV discussion above.

Comment author: Marylen 12 October 2016 05:27:08AM *  5 points [-]

I believe that the best and biggest system of morality so far is the legal system. It is an enormous database where the fairest of men have built over the wisdom of their predecessors for a balance between fairness and avoiding chaos; where the bad or obsolete judgements are weed out. It is a system of prioritisation of law which could be encoded one day. I believe that it would be a great tool for addressing corrigibility and value learning. I'm a lawyer and I'm afraid that MIRI may not understand all the potential of the legal system.

Could you tell me why the legal system would not be a great tool for addressing corrigibility and value learning in the near future?

I describe in a little more detail how I think it could be useful at: https://docs.google.com/document/d/1eRirDom-EA_CtLD9Q5T9hWLD6xEKJ80AL3h_u7K-ErA/edit?usp=sharing

Comment author: So8res 13 October 2016 12:03:00AM 10 points [-]

In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I agree is useful).

In colloquial terms, MIRI is more focused on questions like “if we had a big corpus of information about human values, how could we design a system to learn from that corpus how to act as intended”, and less focused on the lack of corpus.

The reason that we have to work on corrigibility ourselves is that we need advanced learning systems to be corrigible before they’ve finished learning how to behave correctly from a large training corpus. In other words, there are lots of different training corpuses and goal systems where, if the system is fully trained and working correctly, we get corrigibility for free; the difficult part is getting the system to behave corrigibly before it’s smart enough to be doing corrigibility for the “right reasons”.

Comment author: Benito 12 October 2016 09:40:01AM 5 points [-]

You often mention that MIRI is trying to not be a university department, so you can spend researcher time more strategically and not have the incentive structures of a university. Could you describe the main differences in what your researchers spend their time doing?

Also, I think I've heard the above used as an explanation of why MIRI's work often doesn't fit into standard journal articles at a regular rate. If you do think this, in what way does the research not fit? Are there no journals for it, or are you perhaps more readily throwing less-useful-but-interesting ideas away (or something else)?

Comment author: So8res 12 October 2016 11:52:03PM 3 points [-]

Thanks, Benito. With regards to the second half of this question, I suspect that either you’ve misunderstood some of the arguments I’ve made about why our work doesn’t tend to fit into standard academic journals and conferences, or (alternatively) someone has given arguments for why our work doesn’t tend to fit into standard academic venues that I personally disagree with. My view is that our work doesn’t tend to fit into standard journals etc. because (a) we deliberately focus on research that we think academia and industry are unlikely to work on for one reason or another, and (b) we approach problems from a very different angle than the research communities that are closest to those problems.

One example of (b) is that we often approach decision theory not by following the standard philosophical approach of thinking about what decision sounds intuitively reasonable in the first person, but instead by asking “how could a deterministic robot actually be programmed to reliably solve these problems”, which doesn’t fit super well into the surrounding literature on causal vs. evidential decision theory. For a few other examples, see my response to (8) in my comments on the Open Philanthropy Project’s internal and external reviews of some recent MIRI papers.

Comment author: Dr_Manhattan 12 October 2016 08:59:46PM 0 points [-]

I’d guess that humanity as a whole has a fairly low probability of success, with wide error bars.

Just out of curiosity how would your estimate update if you can enough resources to do anything you deemed necessary but not enough to affect current trajectory of the field

Comment author: So8res 12 October 2016 11:40:44PM 0 points [-]

I'm not sure I understand the hypothetical -- most of the actions that I deem necessary are aimed at affecting the trajectory of the AI field in one way or another.

Comment author: ZachWeems 12 October 2016 01:57:22PM 5 points [-]

It seems like people in academia tend to avoid mentioning MIRI. Has this changed in magnitude during the past few years, and do you expect it to change any more? Do you think there is a significant number of public intellectuals who believe in MIRI's cause in private while avoiding mention of it in public?

Comment author: So8res 12 October 2016 11:38:07PM 4 points [-]

I think this has been changing in recent years, yes. A number of AI researchers (some of them quite prominent) have told me that they have largely agreed with AI safety concerns for some time, but have felt uncomfortable expressing those concerns until very recently. I do think that the tides are changing here, with the Concrete Problems in AI Safety paper (by Amodei, Olah, et al) perhaps marking the inflection point. I think that the 2015 FLI conference also helped quite a bit.

Comment author: turchin 12 October 2016 10:51:42PM 0 points [-]

Thanks! Could link there you will write about this subject later?

Comment author: So8res 12 October 2016 11:34:08PM 2 points [-]

I'm not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.

View more: Next