Comment author: kierangreig 12 October 2016 04:29:43PM *  8 points [-]

1) What are the main points of disagreement MIRI has with Open Phil's technical advisors about the importance of Agent Foundations research for reducing risks from AI?

2) Is Sam Harris co-authoring a book with Eliezer on AI Safety? If yes, please provide further details.

3) How many hours do full time MIRI staff work in a usual working week?

4) What’s the biggest mistake MIRI made in the past year?

Comment author: So8res 12 October 2016 10:46:07PM 5 points [-]

Re: 1, "what are the main points of disagreement?" is itself currently one of the points of disagreement :) A lot of our disagreements (I think) come down to diverging inchoate mathematical intuitions, which makes it hard to precisely state why we think different problems are worth prioritizing (or to resolve the disagreements).

Also, I think that different Open Phil technical advisors have different disagreements with us. As an example, Paul Christiano and I seem to have an important disagreement about how difficult it will be to align AI systems if we don’t have a correct theoretically principled understanding of how the system performs its abstract reasoning. But while the disagreement seems to me and Paul to be one of the central reasons the two of us prioritize different projects, I think some other Open Phil advisors don’t see this as a core reason to accept/reject MIRI’s research directions.

Discussions are still ongoing, but Open Phil and MIRI are both pretty time-constrained organizations, so it may take a while for us to publish details on where and why we disagree. My own attempts to gesture at possible points of divergence have been very preliminary so far, and represent my perspective rather than any kind of MIRI / Open Phil consensus summary.

Re: 4, I think we probably spent too much time this year writing up results and research proposals. The ML agenda and “Logical Induction,” for example, were both important to get right, but in retrospect I think we could have gotten away with writing less, and writing it faster. Another candidate mistake is some communication errors I made when I was trying to explain the reasoning behind MIRI’s research agenda to Open Phil. I currently attribute the problem to me overestimating how many concepts we shared, and falling prey to the illusion of transparency, in a way that burned a lot of time (though I’m not entirely confident in this analysis).

Comment author: turchin 12 October 2016 10:47:39AM 1 point [-]

One thing always puzzle me about provable AI. If we able to prove that AI will do X and only X after unlimitedly many generations of self-improvemnet, it still not clear how to choose right X.

For example we could be sure that paperclip maximizer will still makes clip after billion generations.

So my question is what we are proving about provable AI?

Comment author: So8res 12 October 2016 08:56:02PM 6 points [-]

As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to make an AI that is “provably X” for some value of X. I personally doubt AGI teams be able to literally prove anything substantial about how well the system will work in practice, though I expect that they will be able to get some decent statistical guarantees.

There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)

Comment author: amc 12 October 2016 05:36:20AM *  1 point [-]

we prioritize research we think would be useful in less optimistic scenarios as well.

I don't think I've seen anything from MIRI on this before. Can you describe or point me to some of this research?

Comment author: So8res 12 October 2016 08:52:06PM 2 points [-]

There’s nothing very public on this yet. Some of my writing over the coming months will bear on this topic, and some of the questions in Jessica’s agenda are more obviously applicable in “less optimistic” scenarios, but this is definitely a place where public output lags behind our private research.

As an aside, one of our main bottlenecks is technical writing capability: if you have technical writing skill and you’re interested in MIRI research, let us know.

Comment author: Peter_Hurford  (EA Profile) 11 October 2016 09:39:07PM 11 points [-]

Two years ago, I asked why MIRI thought they had a "medium" probability of success and got a lot of good discussion. But now MIRI strategy has changed dramatically. Any updates now on how MIRI defines success, what MIRI thinks their probability of success is, and why MIRI thinks that?

Comment author: So8res 12 October 2016 08:25:16PM 10 points [-]

I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress on any of these problems within 5 years. Scott Garrabrant then promptly developed logical induction, which represents serious progress on two (maybe three) of the big open problems. I consider this to be a good sign of progress, and that set of research priorities remains largely unchanged.

Jessica Taylor is now leading a new research program, and we're splitting our research time between this agenda and our 2014 agenda. I see this as a natural consequence of us bringing on new researchers with their own perspectives on various alignment problems, rather than as a shift in organizational strategy. Eliezer, Benya, and I drafted the agent foundations agenda when we were MIRI’s only full-time researchers; Jessica, Patrick, and Critch co-wrote a new agenda with their take once they were added to the team. The new agenda reflects a number of small changes: some updates that we’ve all made in response to evidence over the last couple of years, some writing-up of problems that we’d been thinking about for some time but which hadn’t made the cut into the previous agenda, and some legitimate differences in intuition and perspective brought to the table by Jessica, Patrick, and Critch. The overall strategy is still “do research that we think others won’t do,” and the research methods and intuitions we rely on continue to have a MIRI-ish character.

Regarding success probability, I think MIRI has a decent chance of success compared to other potential AI risk interventions, but AI risk is a hard problem. I’d guess that humanity as a whole has a fairly low probability of success, with wide error bars.

Unless I’m missing context, I think the “medium probability of success” language comes from old discussions on LessWrong about how to respond to Pascal’s mugging. (See Rob’s note about Pascalian reasoning here.) In that context, I think the main dichotomy Eliezer had in mind was “tiny” probabilities (that can be practically ignored, like gambling in the powerball) and strategically relevant probabilities like 1% or 10%. See Eliezer’s post here. I’m fine with calling the latter probabilities “medium-sized” in the context of lottery-style errors, and calling them “small” in other contexts. With respect to ensuring that the first AGI designs developed by AI scientists are easy to align, I don’t think MIRI’s odds are stellar, though I do feel comfortable saying that they’re higher than 1%. Let me know if I’ve misunderstood the question you had in mind here.

Comment author: kbog  (EA Profile) 11 October 2016 09:26:55PM 5 points [-]

What does the internal drafting and review process look like at MIRI? Do people separate from the authors of a paper check all the proofs, math, citations, etc.?

In response to comment by kbog  (EA Profile) on Ask MIRI Anything (AMA)
Comment author: So8res 12 October 2016 08:17:09PM 4 points [-]

Yep, we often have a number of non-MIRI folks checking the proofs, math, and citations. I’m still personally fairly involved in the writing process (because I write fast, and because I do what I can to free up the researchers’ time to do other work); this is something I’m working to reduce. Technical writing talent is one of our key bottlenecks; if you like technical writing and are interested in MIRI’s research, get in touch.

Comment author: InquilineKea 11 October 2016 09:13:56PM 11 points [-]

What makes for an ideal MIRI researcher? How would that differ from being an ideal person who works for DeepMind, or who does research as an academic? Do MIRI employees have special knowledge of the world that most AI researchers (e.g. Hinton, Schmidhuber) don't have? What about the other way around? Is it possible for a MIRI researcher to produce relevant work even if they don't fully understand all approaches to AI?

How does MIRI aim to cover all possible AI systems (those based on symbolic AI, connectionist AI, deep learning, and other AI systems/paradigms?)

Comment author: So8res 12 October 2016 07:19:15PM 8 points [-]

I largely endorse Jessica’s comment. I’ll add that I think the ideal MIRI researcher has their own set of big-picture views about what’s required to design aligned AI systems, and that their vision holds up well under scrutiny. (I have a number of heuristics for what makes me more or less excited about a given roadmap.)

That is, the ideal researcher isn’t just working on whatever problems catch their eye or look interesting; they’re working toward a solution of the whole alignment problem, and that vision regularly affects their research priorities.

Comment author: Peter_Hurford  (EA Profile) 12 October 2016 01:22:56AM *  10 points [-]

What kind of things, if true, would convince you that MIRI was not worth donating to? What would make you give up on MIRI?

Comment author: So8res 12 October 2016 06:47:48PM 5 points [-]

I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me:

  1. I could be convinced that I was wrong about the type and quality of AI alignment research that the external community is able to do. There’s some inferential distance here, so I'm not expecting to explain my model in full, but in brief, I currently expect that there are a few types of important research that academia and industry won’t do by default. If I was convinced that either (a) there are no such gaps or (b) they will be filled by academia and industry as a matter of course, then I would downgrade my assessment of the importance of MIRI accordingly.
  2. I could learn that our research path was doomed, for one reason or another, and simultaneously learn that repurposing our skill/experience/etc. for other purposes was not worth the opportunity cost of all our time and effort.
Comment author: poppingtonic 11 October 2016 09:04:24PM *  7 points [-]

Quoting Nate's supplement from OpenPhil's review of "Proof-producing reflection for HOL" (PPRHOL) :

there are basic gaps in our models of what it means to do good reasoning (especially when it comes to things like long-running computations, and doubly so when those computations are the reasoner’s source code)

How far along the way are you towards narrowing these gaps, now that "Logical Induction" is a thing people can talk about? Are there variants of it that narrow these gaps, or are there planned follow-ups to PPRHOL that might improve our models? What kinds of experiments seem valuable for this subgoal?

Comment author: So8res 12 October 2016 05:45:34PM 10 points [-]

I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)

Comment author: poppingtonic 11 October 2016 08:39:51PM 7 points [-]

Thanks for doing this AMA! Which of the points in your strategy have you seen a need to update on, based on the unexpected progress of having published the "Logical Induction" paper (which I'm currently perusing)?

Comment author: So8res 12 October 2016 05:26:12PM 9 points [-]

Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success that I was tracking; and as I said in February, failure to achieve results of this caliber in a 5-year timeframe would have caused me to lose confidence in our approach. (As of last year, that seemed like a real possibility.) The logical induction result increases my confidence in our current course, but it doesn't shift it much.

Another reason logical induction doesn’t affect my strategy too much is that it isn’t that big a result. It’s one step on a path, and it’s definitely mathematically exciting, and it gives answers to a bunch of longstanding philosophical problems, but it’s not a tool for aligning AI systems on the object level. We’re building towards a better understanding of “good reasoning”, and we expect this to be valuable for AI alignment, and logical induction is a step in that direction, but it's only one step. It’s not terribly useful in isolation, and so it doesn’t call for much change in course.

19

MIRI Update and Fundraising Case

Update December 22 : Our donors came together during the fundraiser to get us most of the way to our $750,000 goal. In all, 251 donors contributed $589,248 , making this our second-biggest fundraiser to date. Although we fell short of our target by $160,000, we have since made up... Read More

View more: Prev | Next