Effective Altruism Forum
EA Forum

Comment Permalink

There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)

turchin8y0

Thanks! Could link there you will write about this subject later?

See in context

Ask MIRI Anything (AMA)

by RobBensinger

Oct 11 20161 min read 77

18

Ask Me AnythingMachine Intelligence Research Institute

Frontpage

Hi, all! The Machine Intelligence Research Institute (MIRI) is answering questions here tomorrow, October 12 at 10am PDT. You can post questions below in the interim.

MIRI is a Berkeley-based research nonprofit that does basic research on key technical questions related to smarter-than-human artificial intelligence systems. Our research is largely aimed at developing a deeper and more formal understanding of such systems and their safety requirements, so that the research community is better-positioned to design systems that can be aligned with our interests. See here for more background.

Through the end of October, we're running our 2016 fundraiser — our most ambitious funding drive to date. Part of the goal of this AMA is to address questions about our future plans and funding gap, but we're also hoping to get very general questions about AI risk, very specialized questions about our technical work, and everything in between. Some of the biggest news at MIRI since Nate's AMA here last year:

We developed a new framework for thinking about deductively limited reasoning, logical induction.
Half of our research team started work on a new machine learning research agenda, distinct from our agent foundations agenda.
We received a review and a $500k grant from the Open Philanthropy Project.

Likely participants in the AMA include:

Nate Soares, Executive Director and primary author of the AF research agenda
Malo Bourgon, Chief Operating Officer
Rob Bensinger, Research Communications Manager
Jessica Taylor, Research Fellow and primary author of the ML research agenda
Tsvi Benson-Tilsen, Research Associate

Nate, Jessica, and Tsvi are also three of the co-authors of the "Logical Induction" paper.

EDIT (10:04am PDT): We're here! Answers on the way!

EDIT (10:55pm PDT): Thanks for all the great questions! That's all for now, though we'll post a few more answers tomorrow to things we didn't get to. If you'd like to support our AI safety work, our fundraiser will be continuing through the end of October.

18 Reactions

Mentioned in

18MIRI Update and Fundraising Case

Comments77

Sorted by

New & upvoted

Click to highlight new comments since: Today at 1:48 PM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

InquilineKea8y12

What makes for an ideal MIRI researcher? How would that differ from being an ideal person who works for DeepMind, or who does research as an academic? Do MIRI employees have special knowledge of the world that most AI researchers (e.g. Hinton, Schmidhuber) don't have? What about the other way around? Is it possible for a MIRI researcher to produce relevant work even if they don't fully understand all approaches to AI?

How does MIRI aim to cover all possible AI systems (those based on symbolic AI, connectionist AI, deep learning, and other AI systems/paradigms?)

Jessica_Taylor8y10

The ideal MIRI researcher is someone who’s able to think about thorny philosophical problems and break off parts of them to formalize mathematically. In the case of logical uncertainty, researchers started by thinking about the initially vague problem of reasoning well about uncertain mathematical statements, turned some of these thoughts into formal desiderata and algorithms (producing intermediate possibility and impossibility results), and eventually found a way to satisfy many of these desiderata at once. We’d like to do a lot more of this kind of work in the future.

Probably the main difference between MIRI research and typical AI research is that we focus on problems of the form “if we had capability X, how would we achieve outcome Y?” rather than “how can we build a practical system achieving outcome Y?”. We focus less on computational tractability and more on the philosophical question of how we would build a system to achieve Y in principle, given e.g. unlimited computing resources or access to extremely powerful machine learning systems. I don’t think we have much special knowledge that others don’t have (or vice versa), given that most relevant AI research is public; ... (read more)

So8res

I largely endorse Jessica’s comment. I’ll add that I think the ideal MIRI researcher has their own set of big-picture views about what’s required to design aligned AI systems, and that their vision holds up well under scrutiny. (I have a number of heuristics for what makes me more or less excited about a given roadmap.) That is, the ideal researcher isn’t just working on whatever problems catch their eye or look interesting; they’re working toward a solution of the whole alignment problem, and that vision regularly affects their research priorities.

Peter Wildeford8y10

Two years ago, I asked why MIRI thought they had a "medium" probability of success and got a lot of good discussion. But now MIRI strategy has changed dramatically. Any updates now on how MIRI defines success, what MIRI thinks their probability of success is, and why MIRI thinks that?

So8res8y10

I don’t think of our strategy as having changed much in the last year. For example, in the last AMA I said that the plan was to work on some big open problems (I named 5 here: asymptotically good reasoning under logical uncertainty, identifying the best available decision with respect to a predictive world-model and utility function, performing induction from inside an environment, identifying the referents of goals in realistic world-models, and reasoning about the behavior of smarter reasoners), and that I’d be thrilled if we could make serious progress on any of these problems within 5 years. Scott Garrabrant then promptly developed logical induction, which represents serious progress on two (maybe three) of the big open problems. I consider this to be a good sign of progress, and that set of research priorities remains largely unchanged.

Jessica Taylor is now leading a new research program, and we're splitting our research time between this agenda and our 2014 agenda. I see this as a natural consequence of us bringing on new researchers with their own perspectives on various alignment problems, rather than as a shift in organizational strategy. Eliezer, Benya, and I drafted the... (read more)

Dr_Manhattan

Just out of curiosity how would your estimate update if you can enough resources to do anything you deemed necessary but not enough to affect current trajectory of the field

So8res

I'm not sure I understand the hypothetical -- most of the actions that I deem necessary are aimed at affecting the trajectory of the AI field in one way or another.

Dr_Manhattan

Ok, that's informative. So the dominant factor is not the ability to get to the finish line faster (which kind of makes sense)

Josh Jacobson8y9

What would be your next few hires, if resources allow?

Malo

When it comes to growth, at the moment our focus is on expanding the research team. As such, our next few hires are likely to be research fellows, and assistant research fellows[1] for both our agent foundations and machine learning technical agendas. We have two new research fellows who are signed on to join the team, Abram Demski and Mihály Bárász. Abram and Mihály will both be more focused on the AF agenda, so I’m hoping our next couple hires after them will be on the ML side. We’re prioritizing people who can write well and quickly; if you or someone you know is interested and has that sort of skill, you’re encouraged to get in touch with Alex Vermeer. As mentioned in MIRI Update and Fundraising Case, which Nate posted here a few days ago, in the medium term our current plan is to grow our research team to 13–17 people. Since we already have a pretty solid ops foundation, I don’t anticipate that we’ll need to increase our ops capacity very much to support a research team of that size, so unless our strategy changes significantly, I expect most of our upcoming hires will be researchers. [1] At MIRI, research fellow is a full-time permanent position. A decent analogy in academia might be that research fellows are to assistant research fellows as full-time faculty are to post-docs. Assistant research fellowships are intended to be a more junior position with a fixed 1–2 year term.

Peter Wildeford8y9

What kind of things, if true, would convince you that MIRI was not worth donating to? What would make you give up on MIRI?

TsviBT

In my current view, MIRI’s main contributions are (1) producing research on highly-capable aligned AI that won’t be produced by default by academia or industry; (2) helping steer academia and industry towards working on aligned AI; and (3) producing strategic knowledge of how to reduce existential risk from highly-capable AI. I think (1) and (3) are MIRI’s current strong suits. This is not easy to verify without technical background and domain knowledge, but at least for my own thinking I’m impressed enough with these points to find MIRI very worthwhile to work with. If (1) were not strong, and (2) were no stronger than currently, I would trust (3) somewhat less, and I would give up on MIRI. If (1) became difficult or impossible because (2) was done, i.e. if academia and/or industry were already doing all the important safety research, I’d see MIRI as much less crucial, unless there was a pivot to remaining neglected tasks in reducing existential risk from AI. If (2) looked too difficult (though there is already significant success, in part due to MIRI, FHI, and FLI), and (1) were not proceeding fast enough, and my “time until game-changing AI” estimates were small enough, then I’d probably do something different.

Girish_Sastry

By (3), do you mean the publications that are listed under "forecasting" on MIRI's publications page?

So8res

I’ll interpret this question as “what are the most plausible ways for you to lose confidence in MIRI’s effectiveness and/or leave MIRI?” Here are a few ways that could happen for me: 1. I could be convinced that I was wrong about the type and quality of AI alignment research that the external community is able to do. There’s some inferential distance here, so I'm not expecting to explain my model in full, but in brief, I currently expect that there are a few types of important research that academia and industry won’t do by default. If I was convinced that either (a) there are no such gaps or (b) they will be filled by academia and industry as a matter of course, then I would downgrade my assessment of the importance of MIRI accordingly. 2. I could learn that our research path was doomed, for one reason or another, and simultaneously learn that repurposing our skill/experience/etc. for other purposes was not worth the opportunity cost of all our time and effort.

Peter Wildeford8y9

Would you rather prove the friendliness of 100 duck-sized horse AIs or one horse-sized duck AI?

TsviBT8y15

One horse-sized duck AI. For one thing, the duck is the ultimate (route) optimization process: you can ride it on land, sea, or air. For another, capabilities scale very nonlinearly in size; the neigh of even 1000 duck-sized horse AIs does not compare to the quack of a single horse-sized duck AI. Most importantly, if you can safely do something with 100 opposite-sized AIs, you can safely do the same thing with one opposite-sized AI.

In all seriousness though, we don't generally think in terms of "proving the friendliness" of an AI system. When doing research, we might prove that certain proposals have flaws (for example, see (1)) as a way of eliminating bad ideas in the pursuit of good ideas. And given a realistic system, one could likely prove certain high-level statistical features (such as “this component of the system has an error rate that vanishes under thus-and-such assumptions”), though it’s not yet clear how useful those proofs would be. Overall, though, the main challenges in friendly AI seem to be ones of design rather than verification. In other words, the problem is to figure out what properties an aligned system should possess, rather than to figure out how ... (read more)

Peter Wildeford8y9

How should a layman with only college-level mathematical knowledge evaluate the work that MIRI does?

RobBensinger

You can browse our papers and research summaries here and see if anything clicks, but failing that, I’m not sure there’s any simple heuristic I can suggest beyond “look for lots of separate lines of indirect evidence.” One question is whether we’re working on the right problems for addressing AI risk. Relevant indicators that come to mind include: * Stuart Russell’s alignment research group is interested in value learning and “theories of (bounded) rationality,” as well as corrigibility (1, 2). * A number of our research proposals were cited in FLI’s research priorities document, and our agent foundations agenda received one of the larger FLI grants. * FHI and DeepMind have collaborated on corrigibility work. * The Open Philanthropy Project’s research advisors don’t think logical uncertainty, decision theory, or Vingean reflection are likely to be safety-relevant. * The “Concrete Problems in AI Safety” agenda has some overlap with our research interests and goals (e.g., avoiding wireheading). A separate question is whether we’re making reasonable progress on those problems, given that they’re the right problems. Relevant indicators that come to mind: * An OpenPhil external reviewer described our HOL-in-HOL result as “an important milestone toward formal analysis of systems with some level of self-understanding.” * OpenPhil’s internal and external reviewers considered a set of preliminary MIRI results leading to logical induction unimpressive. * Our reflective oracles framework was presented at a top AI conference, UAI. * Scott Aaronson thinks “Logical Induction” is important and theoretically interesting. * We haven’t had any significant public endorsements of our work on decision theory by leading decision theorists. * … and so on. If you don’t trust MIRI or yourself to assess the situation, I don’t think there’s any shortcut besides trying to gather and weigh miscellaneous pieces of evidence. (Possibly the conclusion will be that some parts of MIRI’

kbog

"College level math" can mean a whole lot of things...

Peter Wildeford

Maybe interpret it as someone who would understand calculus and linear algebra, and who would know what a proof is, but not someone who would be able to read a MIRI paper and understand the technical details behind it.

kierangreig8y8

1) What are the main points of disagreement MIRI has with Open Phil's technical advisors about the importance of Agent Foundations research for reducing risks from AI?

2) Is Sam Harris co-authoring a book with Eliezer on AI Safety? If yes, please provide further details.

3) How many hours do full time MIRI staff work in a usual working week?

4) What’s the biggest mistake MIRI made in the past year?

So8res

Re: 1, "what are the main points of disagreement?" is itself currently one of the points of disagreement :) A lot of our disagreements (I think) come down to diverging inchoate mathematical intuitions, which makes it hard to precisely state why we think different problems are worth prioritizing (or to resolve the disagreements). Also, I think that different Open Phil technical advisors have different disagreements with us. As an example, Paul Christiano and I seem to have an important disagreement about how difficult it will be to align AI systems if we don’t have a correct theoretically principled understanding of how the system performs its abstract reasoning. But while the disagreement seems to me and Paul to be one of the central reasons the two of us prioritize different projects, I think some other Open Phil advisors don’t see this as a core reason to accept/reject MIRI’s research directions. Discussions are still ongoing, but Open Phil and MIRI are both pretty time-constrained organizations, so it may take a while for us to publish details on where and why we disagree. My own attempts to gesture at possible points of divergence have been very preliminary so far, and represent my perspective rather than any kind of MIRI / Open Phil consensus summary. Re: 4, I think we probably spent too much time this year writing up results and research proposals. The ML agenda and “Logical Induction,” for example, were both important to get right, but in retrospect I think we could have gotten away with writing less, and writing it faster. Another candidate mistake is some communication errors I made when I was trying to explain the reasoning behind MIRI’s research agenda to Open Phil. I currently attribute the problem to me overestimating how many concepts we shared, and falling prey to the illusion of transparency, in a way that burned a lot of time (though I’m not entirely confident in this analysis).

Malo

Re 2, Sam and Eliezer have been corresponding for a while now. They’ve been exploring the possibility of pursuing a couple of different projects together, including co-authoring a book or recording a dialogue of some sort and publishing it online. Sam discussed this briefly on an episode of his podcast. We’ll mention in the newsletter if things get more finalized. Re 3, it varies a lot month-to-month and person-to-person. Looking at the data, the average and median are pretty close at somewhere between 40–50 hours a week depending on the month. During crunch times some people might be working 60–100-hour weeks. I’ll also mention that although people at MIRI roughly track how many hours they spend working, and on what, I don’t put much weight on these numbers (especially for researchers). If a researcher comes up with a new idea in the shower, at the gym, on their walk to work, or whatever, I don’t expect them to log those hours as work time. (Fun fact: Scott came up with logical induction on his walk to work.) Many of us are thinking about work when we aren’t at our desks, so to speak. It’s also hard to compare someone who spends 80 hours working on a problem they love and find really exciting, to someone who spends 40 hours on really grueling tasks. I prefer to focus on how much people are getting done and how they are feeling. Re 4, for me personally, I think my biggest mistake this year was not delegating enough after transitioning into the COO role. This caused a few ops projects to be blocked on me unnecessarily, which set a few ops projects back a few months. (For example, I finished our 2015-in-review document significantly later than I would have liked.)

John_Maxwell

Isaac Asimov wrote an essay on creativity, here's one of the interesting points:

Peter Wildeford

Relevant to 1: https://agentfoundations.org/item?id=1129

Ben Pace8y8

You say that MIRI is attempting to do research that is, on the margin, less likely to be prioritised by the existing AI community. Why, then, are you moving towards work in Machine Learning?

Jessica_Taylor

I think that the ML-related topics we spend the most effort on (such as those in the ML agenda) are currently quite neglected. See my other comment for more on how our research approach is different from that of most AI researchers. It’s still plausible that some of the ML-related topics we research would be researched anyway (perhaps significantly later). This is a legitimate consideration that is, in my view, outweighed by other considerations (such as the fact that less total safety research will be done if AGI comes soon, making such timelines more neglected; the fact that ML systems are easy to think about due to their concreteness; and the fact that it can be beneficial to “seed” the field with high-quality research that others can build on in the future). Additionally I think that AI alignment researchers should avoid ignoring huge theoretically-relevant parts of the problem. I would have quite a lot of difficulty thinking about AI alignment without thinking about how one might train learning systems to do good things using feedback. One of my goals with the ML agenda is to build theoretical tools that make it possible to think about the rest of the problem more clearly.

jimrandomh8y8

In 2013, MIRI announced it was shifting to do less outreach and more research. How has that shift worked out, and what's the current balance between these two priorities?

RobBensinger

The "more research" part has gone well: we added Benya and Nate in 2014, and Patrick, Jessica, Andrew, and Scott in 2015. We’re hoping to double the size of the research team over the next year or two. MIRI’s Research and All Publications pages track a lot of our output since then, and we’ve been pretty excited about recent developmens there. For “less outreach,” the absolute amount of outreach work we're doing is probably increasing at the moment, though it's shrinking as a proportion of our total activities as the research team grows. (Eyeballing it, right now I think we spend something like 6 hours on research per hour on outreach.) The character of our outreach is also quite different: more time spent dialoguing with AI groups and laying groundwork for research collaborations, rather than just trying to spread safety-relevant memes to various intellectuals and futurists. The last two years have seen a big spike of interest in AI risk, and there's a lot more need for academic outreach now that it's easier to get people interested in these problems. On the other hand, there's also a lot more supply; researchers at OpenAI, Google, UC Berkeley, Oxford, and elsewhere who are interested in safety work often have a comparative advantage over us at reaching out to skeptics or researchers who are new to these topics. So the balance today is probably similar to what Luke and others at MIRI had in mind on a several-year timescale in 2013, though there was a period in 2014/2015 where we had more uncertainty about whether other groups would pop up to help meet the increased need for outreach.

[anonymous]8y7

A lot of the discourse around AI safety uses terms like "human-friendly" or "human interests". Does MIRI's conception of friendly AI take the interests of non-human sentient beings into consideration as well? Especially troubling to me is Yudkowsky's view on animal consciousness, but I'm not sure how representative his views are of MIRI in general.

(I realize that MIRI's research focuses mainly on alignment theory, not target selection, but I am still concerned about this issue.)

RobBensinger

“Human interests” is an unfortunate word choice; Nate talked about this last year too, and we’ve tried to avoid phrasings like that. Unfortunately, most ways of gesturing at the idea of global welfare aren’t very clear or widely understood, or they sound weird, or they borrow arguably speciesist language (“humane,” "humanitarian," “philanthropy”...). I’m pretty sure everyone at MIRI thinks we should value all sentient life (and extremely sure at least in the case of Eliezer, Nate, and myself), including sentient non-human animals and any sentient machines we someday develop. Eliezer thinks, as an empirical hypothesis, that relatively few animal species have subjective experience. Other people at MIRI, myself included, think a larger number of animal species have subjective experience. There's no "consensus MIRI view" on this point, but I think it's important to separate the empirical question from the strictly moral one, and I'm confident that if we learn more about what "subjective experience" is and how it's implemented in brains, then people at MIRI will update. It's also important to keep in mind that a good safety approach should be robust to the fact that the designers don’t have all the answers, and that humanity as a whole hasn’t fully developed scientifically (or morally).

AlexMennen

I am not a MIRI employee, and this comment should not be interpreted as a response from MIRI, but I wanted to throw my two cents in about this topic. I think that creating a friendly AI to specifically advance human values would actually turn out okay for animals. Such a human-friendly AI should optimize for everything humans care about, not just the quality of humans' subjective experience. Many humans care a significant amount about the welfare of non-human animals. A human-friendly AI would thus care about animal welfare by proxy through the values of humans. As far as I am aware, there is not a significant number of humans who specifically want animals to suffer. It is extremely common for humans to want things (like food with the taste and texture of bacon) that currently can currently be produced most efficiently at significant expense to non-human animals. However, it seems unlikely that a friendly AI would not be able to find an efficient way of producing bacon that does not involve actual pigs.

-2[anonymous]8y

AlexMennen

If many people intrinsically value the proliferation of natural Darwinian ecosystems, and the fact that animals in such ecosystems suffer significantly would not change their mind, then that could happen. If it's just that many people think it would be better for there to be more such ecosystems because they falsely believe that wild animals experience little suffering, and would prefer otherwise if their empirical beliefs were correct, then a human-friendly AI should not bring many such ecosystems into existence.

Squark

So you claim that you have values related to animals that most people don't have and you want your eccentric values to be overrepresented in the AI? I'm asking unironically (personally I also care about wild animal suffering but I also suspect that most people would care about if they spent sufficient time thinking about it and looking at the evidence).

poppingtonic8y7

Quoting Nate's supplement from OpenPhil's review of "Proof-producing reflection for HOL" (PPRHOL) :

there are basic gaps in our models of what it means to do good reasoning (especially when it comes to things like long-running computations, and doubly so when those computations are the reasoner’s source code)

How far along the way are you towards narrowing these gaps, now that "Logical Induction" is a thing people can talk about? Are there variants of it that narrow these gaps, or are there planned follow-ups to PPRHOL that might improve our models? What kinds of experiments seem valuable for this subgoal?

So8res8y10

I endorse Tsvi's comment above. I'll add that it’s hard to say how close we are to closing basic gaps in understanding of things like “good reasoning”, because mathematical insight is notoriously difficult to predict. All I can say is that logical induction does seem like progress to me, and we're taking various different approaches on the remaining problems. Also, yeah, one of those avenues is a follow-up to PPRHOL. (One experiment we’re running now is an attempt to implement a cellular automaton in HOL that implements a reflective reasoner with access to the source code of the world, where the reasoner uses HOL to reason about the world and itself. The idea is to see whether we can get the whole stack to work simultaneously, and to smoke out all the implementation difficulties that arise in practice when you try to use a language like HOL for reasoning about HOL.)

TsviBT8y10

Scott Garrabrant’s logical induction framework feels to me like a large step forward. It provides a model of “good reasoning” about logical facts using bounded computational resources, and that model is already producing preliminary insights into decision theory. In particular, we can now write down models of agents that use logical inductors to model the world---and in some cases these agents learn to have sane beliefs about their own actions, other agents’ actions, and how those actions affect the world. This, despite the usual obstacles to self-modeling.

Further, the self-trust result from the paper can be interpreted to say that a logical inductor believes something like “If my future self is confident in the proposition A, then A is probably true”. This seems like one of the insights that the PPRHOL work was aiming at, namely, writing down a computable reasoning system that asserts a formal reflection principle of itself. Such a reflection principle must be weaker than full logical soundness; a system that proved “If my future self proves A, then A is true” would be inconsistent. But as it turns out, the reflection principle is feasible if you replace “proves” with “assigns hi... (read more)

Ben Pace8y6

Would you like to state any crisp predictions for how your Logical Uncertainty paper will be received, and/or the impact it will have?

Jessica_Taylor8y11

I’ll start by stating that, while I have some intuitions about how the paper will be received, I don’t have much experience making crisp forecasts, and so I might be miscalibrated. That said:

In my experience it’s pretty common for ML researchers who are more interested in theory and general intelligence to find Solomonoff induction and AIXI to be useful theories. I think “Logical Induction” will be generally well-received among such people. Let’s say 70% chance that at least 40% of ML researchers who think AIXI is a useful theory, and who spend a couple hours thinking about “Logical Induction” (reading the paper / talking to people about it), will think that “Logical Induction” is at least 1/3 as interesting/useful as AIXI. I think ML researchers who don’t find Solomonoff induction relevant to their interests probably won’t find “Logical Induction” compelling either. This forecast is based on my personal experience of really liking Solomonoff induction and AIXI (long before knowing about MIRI) but finding theoretical gaps in them, many of which “Logical Induction” resolves nicely, and from various conversations with ML researchers who like Solomonoff induction and AIXI.
I hav

... (read more)

Oribis8y6

Everyone knows who to look out for in the creation of AI, who should we be paying attention to for the solving of the control problem? I know of Elizier, Stewart Russel and the team mentioned above but is there anyone else you would recommend is worth following?

Malo

Over the past couple of years I’ve been excited to see the growth of the community of researchers working on technical problems related to AI alignment. Here a quick and non-exhaustive list of people (and associated organizations) that I’m following (besides MIRI research staff and associates) in no particular order: * Stuart Russell and the new Center for Human-Compatible AI. * FHI’s growing technical AI safety team, which includes: * Stuart Armstrong, who is also a research associate at MIRI and co-author of Safely Interruptible Agents with Laurent Orseau of DeepMind); * Eric Drexler; * Owain Evans; and * Jan Leike, who recently collaborated with MIRI on the paper A formal solution to the grain of truth problem. * The authors of the Concrete Problems in AI Safety paper: * Dario Amodei, who is now at OpenAI, and Chris Olah and Dan Mané at Google Brain; * Jacob Steinhardt at Stanford; * Paul Christiano, who is a long-time MIRI collaborator currently at OpenAI (see also his writing at medium.com/ai-control); and * John Schulman, also at OpenAI. * DeepMind’s AI safety team, led by Laurent Orseau. * Various other individual academics; for a sampling see speakers at our Colloquium Series on Robust and Beneficial AI and grant recipients from the Future of Life Institute.

turchin8y6

If you find and prove right strategy for FAI creation, how you will implement it? Will you send it to all possible AI creators, or will try to build own AI, or ask government to pass it as law?

Malo

First, note that we’re not looking for “proven” solutions; that seems unrealistic. (See comments from Tsvi and Nate elsewhere.) That aside, I’ll interpret this question as asking: “if your research programs succeed, how do you ensure that the results are used in practice?” This question has no simple answer, because the right strategy would likely vary significantly depending on exactly what the results looked like, our relationships with leading AGI teams at the time, and many other factors. For example: * What sort of results do we have? The strategy is different depending on whether MIRI researchers develop a generic set of tools for aligning arbitrary AGI systems versus whether they develop a set of tools that only work for developing a sufficiently aligned very limited task-directed AI, and so on.[1] * How dangerous do the results seem? Designs for alignable AI systems could feasibly yield insight into how to construct misaligned AI systems; in that case, we’d have to be more careful with the tools. (Bostrom wrote about issues surrounding openness here.)[2] While the strategy would depend quite a bit on the specifics, I can say the following things in general: * We currently have pretty good relationships with many of the leading AI teams, and most of the leading teams are fairly safety-conscious. If we made a breakthrough in AI alignment, and an expert could easily tell that the tools were useful upon inspection, I think it is very reasonable to expect that the current leading teams would eagerly adopt those tools. * The “pass a law that every AGI must be built a certain way” idea does not seem feasible to me in this context. * In the ideal case, the world will coordinate around the creation of AGI (perhaps via a single collaborative project), in which case there would be more or less only one team that needed to adopt the tools. In short, my answer here is “AI scientists tend to be reasonable people, and it currently seems reasonable to expect that

poppingtonic8y6

Thanks for doing this AMA! Which of the points in your strategy have you seen a need to update on, based on the unexpected progress of having published the "Logical Induction" paper (which I'm currently perusing)?

So8res

Good question. The main effect is that I’ve increased my confidence in the vague MIRI mathematical intuitions being good, and the MIRI methodology for approaching big vague problems actually working. This doesn’t constitute a very large strategic shift, for a few reasons. One reason is that my strategy was already predicated on the idea that our mathematical intuitions and methodology are up to the task. As I said in last year’s AMA, visible progress on problems like logical uncertainty (and four other problems) were one of the key indicators of success that I was tracking; and as I said in February, failure to achieve results of this caliber in a 5-year timeframe would have caused me to lose confidence in our approach. (As of last year, that seemed like a real possibility.) The logical induction result increases my confidence in our current course, but it doesn't shift it much. Another reason logical induction doesn’t affect my strategy too much is that it isn’t that big a result. It’s one step on a path, and it’s definitely mathematically exciting, and it gives answers to a bunch of longstanding philosophical problems, but it’s not a tool for aligning AI systems on the object level. We’re building towards a better understanding of “good reasoning”, and we expect this to be valuable for AI alignment, and logical induction is a step in that direction, but it's only one step. It’s not terribly useful in isolation, and so it doesn’t call for much change in course.

So8res8y5

A question from Topher Halquist, on facebook:

Has MIRI considered hiring a more senior math-Ph.D., to serve in a "postdoc supervisor"-type role?

We considered it, but decided against it because supervision doesn’t seem like a key bottleneck on our research progress. Our priority is just to find people who have the right kinds of math/CS intuitions to formalize the mostly-informal problems we’re working on, and I haven’t found that this correlates with seniority. That said, I'm happy to hire senior mathematicians if we find ones who want to work... (read more)

ZachWeems8y5

It seems like people in academia tend to avoid mentioning MIRI. Has this changed in magnitude during the past few years, and do you expect it to change any more? Do you think there is a significant number of public intellectuals who believe in MIRI's cause in private while avoiding mention of it in public?

So8res

I think this has been changing in recent years, yes. A number of AI researchers (some of them quite prominent) have told me that they have largely agreed with AI safety concerns for some time, but have felt uncomfortable expressing those concerns until very recently. I do think that the tides are changing here, with the Concrete Problems in AI Safety paper (by Amodei, Olah, et al) perhaps marking the inflection point. I think that the 2015 FLI conference also helped quite a bit.

Ben Pace8y5

You often mention that MIRI is trying to not be a university department, so you can spend researcher time more strategically and not have the incentive structures of a university. Could you describe the main differences in what your researchers spend their time doing?

Also, I think I've heard the above used as an explanation of why MIRI's work often doesn't fit into standard journal articles at a regular rate. If you do think this, in what way does the research not fit? Are there no journals for it, or are you perhaps more readily throwing less-useful-but-interesting ideas away (or something else)?

So8res

Thanks, Benito. With regards to the second half of this question, I suspect that either you’ve misunderstood some of the arguments I’ve made about why our work doesn’t tend to fit into standard academic journals and conferences, or (alternatively) someone has given arguments for why our work doesn’t tend to fit into standard academic venues that I personally disagree with. My view is that our work doesn’t tend to fit into standard journals etc. because (a) we deliberately focus on research that we think academia and industry are unlikely to work on for one reason or another, and (b) we approach problems from a very different angle than the research communities that are closest to those problems. One example of (b) is that we often approach decision theory not by following the standard philosophical approach of thinking about what decision sounds intuitively reasonable in the first person, but instead by asking “how could a deterministic robot actually be programmed to reliably solve these problems”, which doesn’t fit super well into the surrounding literature on causal vs. evidential decision theory. For a few other examples, see my response to (8) in my comments on the Open Philanthropy Project’s internal and external reviews of some recent MIRI papers.

Malo

To the first part of your question, most faculty at universities have many other responsibilities beyond research which can include a mix of grant writing, teaching, supervising students, and sitting on various university councils. At MIRI most of these responsibilities simply don’t apply. We also work hard to remove as many distractions from our researchers as we can so they can spend as much of their time as possible actually making research progress. [1] Regarding incentives, as Nate has previously discussed here on the EA Forum, our researchers aren’t subject to the same publish-or-perish incentives that most academics (especially early in their careers) are. This allows them to focus more on making progress on the most important problems, rather than trying to pump out as many papers as possible. [1] For example, the ops team takes care of formatting and submitting all MIRI publications, we take on as much of grant application and management as is practical, we manage all the researcher conference travel booking, we provide food at the office, etc.

Marylen8y5

I believe that the best and biggest system of morality so far is the legal system. It is an enormous database where the fairest of men have built over the wisdom of their predecessors for a balance between fairness and avoiding chaos; where the bad or obsolete judgements are weed out. It is a system of prioritisation of law which could be encoded one day. I believe that it would be a great tool for addressing corrigibility and value learning. I'm a lawyer and I'm afraid that MIRI may not understand all the potential of the legal system.

Could you tell me w... (read more)

So8res8y10

In short: there’s a big difference between building a system that follows the letter of the law (but not the spirit), and a system that follows the intent behind a large body of law. I agree that the legal system is a large corpus of data containing information about human values and how humans currently want their civilization organized. In order to use that corpus, we need to be able to design systems that reliably act as intended, and I’m not sure how the legal corpus helps with that technical problem (aside from providing lots of training data, which I agree is useful).

In colloquial terms, MIRI is more focused on questions like “if we had a big corpus of information about human values, how could we design a system to learn from that corpus how to act as intended”, and less focused on the lack of corpus.

The reason that we have to work on corrigibility ourselves is that we need advanced learning systems to be corrigible before they’ve finished learning how to behave correctly from a large training corpus. In other words, there are lots of different training corpuses and goal systems where, if the system is fully trained and working correctly, we get corrigibility for free; the difficult part is getting the system to behave corrigibly before it’s smart enough to be doing corrigibility for the “right reasons”.

-1

turchin

Agree

FeepingCreature8y5

Do you intend to submit Logical Induction to a relevant magazine for peer review and publication? Do you still hold with ~Eliezer2008 that people who currently object that MIRI doesn't participate in the orthodox scientific progress would still object for other reasons, even if you tried to address the lack of peer review?

Also why no /r/IAmA or /r/science AMA? The audience on this site seems limited from the start. Are you trying to target people who are already EAs in specific?

RobBensinger

We’re submitting “Logical Induction” for publication, yeah. Benya and Jessica (and Stuart Armstrong, a MIRI research associate based at FHI) co-authored papers in a top-10 AI conference this year, UAI, and we plan to publish in similarly high-visibility venues in the future. We’ve thought about doing a Reddit AMA sometime. It sounds fun, though it would probably need to focus more on basic background questions; EAs have a lot of overlapping knowledge, priorities, styles of thinking, etc. with MIRI, so we can take a lot of stuff for granted here that we couldn’t on /r/science. I usually think of orgs like FHI and Leverhulme CFI and Stuart Russell’s new alignment research center as better-suited to that kind of general outreach.

Girish_Sastry8y4

The authors of the "Concrete Problems in AI safety" paper distinguish between misuse risks and accident risks. Do you think in these terms, and how does your roadmap address misuse risk?

turchin8y4

If you will get credible evidences that AGI will be created by Google in next 5 years, what will you do?

kbog8y4

What does the internal drafting and review process look like at MIRI? Do people separate from the authors of a paper check all the proofs, math, citations, etc.?

So8res

Yep, we often have a number of non-MIRI folks checking the proofs, math, and citations. I’m still personally fairly involved in the writing process (because I write fast, and because I do what I can to free up the researchers’ time to do other work); this is something I’m working to reduce. Technical writing talent is one of our key bottlenecks; if you like technical writing and are interested in MIRI’s research, get in touch.

TsviBT

The Logical Induction paper involved multiple iterations of writing and reviewing, during which we refined the notation, terminology, proof techniques, theorem statements, etc. We also had a number of others comment on various drafts, pointing out wrong or unclear parts.

lincolnq8y3

What do you think of OpenAI?

In particular, it seems like OpenAI has both managed to attract both substantial technical talent and a number of safety-conscious researchers.

1) It seems that, to at least some degree, you are competing for resources -- particularly talent but also "control of the AI safety narrative". Do you feel competitive with them, or collaborative, or a bit of both? Do you expect both organizations to be relevant for 5+ years or do you expect one to die off? What, if anything, would convince you that it would make sense to mer... (read more)

RobBensinger

I'd mostly put OpenAI in the same category as DeepMind: primarily an AI capabilities organization, but one that's unusually interested in long-term safety issues. OpenAI is young, so it's a bit early to say much about them, but we view them as collaborators and are really happy with "Concrete Problems in AI Safety" (joint work by people at OpenAI, Google Brain, and Stanford). We helped lead a discussion about AI safety at their recent unconference, contributed to some OpenAI Gym environments, and are on good terms with a lot of people there. Some ways OpenAI's existence adjusts our strategy (so far): 1) OpenAI is in a better position than MIRI to spread basic ideas like 'long-run AI risk is a serious issue.' So this increases our confidence in our plan to scale back outreach, especially outreach toward more skeptical audiences that OpenAI can probably better communicate with. 2) Increasing the number of leading AI research orgs introduces more opportunities for conflicts and arms races, which is a serious risk. So more of our outreach time is spent on trying to encourage collaboration between the big players. 3) On the other hand, OpenAI is a nonprofit with a strong stated interest in encouraging inter-organization collaboration. This suggests OpenAI might be a useful mediator or staging ground for future coordination between leading research groups. 4) The increased interest in long-run safety issues from ML researchers at OpenAI and Google increases the value of building bridges between the alignment and ML communities. This was one factor going into our "Alignment for Advanced ML Systems" agenda. 5) Another important factor is that more dollars going into cutting-edge AI research shortens timelines to AGI, so we put incrementally more attention into research that's more likely to be useful if AGI is developed soon.

John_Maxwell8y2

I sometimes see influential senior staff at MIRI make statements on social media that pertain to controversial moral questions. These statements are not accompanied by disclaimers that they are speaking on behalf of themselves and not their employer. Is it safe to assume that these statements represent the de facto position of the organization?

This seems relevant to your organizational mission since MIRI's goal is essentially to make AI moral, but a donor's notion of what's moral might not correspond with MIRI's position. Forcefully worded statements on... (read more)

So8res

Posts or comments on personal Twitter accounts, Facebook walls, etc. should not be assumed to represent any official or consensus MIRI position, unless noted otherwise. I'll echo Rob's comment here that "a good safety approach should be robust to the fact that the designers don’t have all the answers". If an AI project hinges on the research team being completely free from epistemic shortcomings and moral failings, then the project is doomed (and should change how it's doing alignment research). I suspect we're on the same page about it being important to err in the direction of system designs that don't encourage arms races or other zero-sum conflicts between parties with different object-level beliefs or preferences. See also the CEV discussion above.

kbog8y2

I haven't seen much about coherent extrapolated volition published or discussed recently.

Can you give us the official word on the status of the theory?

RobBensinger

I discussed CEV some in this answer. I think the status is about the same: sounds like a vaguely plausible informal goal to shoot for in the very long run, but also very difficult to implement. As Eliezer notes in https://arbital.com/p/cev/, "CEV is rather complicated and meta and hence not intended as something you'd do with the first AI you ever tried to build." The first AGI systems people develop should probably have much more limited capabilities and much more modest goals, to reduce the probability of catastrophic accidents. See also Nate's paper "The Value Learning Problem."

Girish_Sastry8y2

Do you share Open Phil's view that there is a > 10% chance of transformative AI (defined as in Open Phil's post) in the next 20 years? What signposts would alert you that transformative AI is near?

Relatedly, suppose that transformative AI will happen within about 20 years (not necessarily a self improving AGI). Can you explain how MIRI's research will be relevant in such a near-term scenario (e.g. if it happens by scaling up deep learning methods)?

Jessica_Taylor

I share Open Phil’s view on the probability of transformative AI in the next 20 years. The relevant signposts would be answers to questions like “how are current algorithms doing on tasks requiring various capabilities”, “how much did this performance depend on task-specific tweaking on the part of programmers”, “how much is performance projected to improve due to increasing hardware”, and “do many credible AI researchers think that we are close to transformative AI”. In designing the new ML-focused agenda, we imagined a concrete hypothetical (which isn’t stated explicitly in the paper): what research would we do if we knew we’d have sufficient technology for AGI in about 20 years, and this technology would be qualitatively similar to modern ML technology such as deep learning? So we definitely intend for this research agenda to be relevant to the scenario you describe, and the agenda document goes into more details. Much of this research deals with task-directed AGI, which can be limited (e.g. not self-improving).

ZachWeems8y2

Question 2: Suppose tomorrow MIRI creates a friendly AGI that can learn a value system, make it consistent with minimal alteration, and extrapolate it in an agreeable way. Whose values would it be taught?

I've heard the idea of averaging all humans' values together and working from there. Given that ISIS is human and that many other humans believe that the existence of extreme physical and emotional suffering is good, I find that idea pretty repellent. Are there alternatives that have been considered?

RobBensinger

Right now, we're trying to ensure that people down the road can build AGI systems that it's technologically possible to align with operators' interests at all. We expect that early systems should be punting on those moral hazards and diffusing them as much as possible, rather than trying to lock in answers to tough philosophical questions on the first go. That said, we've thought about this some. One proposal by Eliezer years ago was coherent extrapolated volition (CEV), which (roughly) deals with this problem by basing decisions on what we'd do "if counterfactually we knew everything the AI knew; we could think as fast as the AI and consider all the arguments; [and] we knew ourselves perfectly and had better self-control or self-modification ability." We aren't shooting for a CEV-based system right now, but that sounds like a plausible guess about what we'd want researchers to eventually develop, when our institutions and technical knowledge are much more mature. It's clear that we want to take the interests and preferences of religious extremists into account in making decisions, since they're people too and their welfare matters. (The welfare of non-human sentient beings should be taken into account too.) You might argue that their welfare matters, but they aren't good sources of moral insight: "it's bad to torture people on a whim, even religious militants" is a moral insight you can already get without consulting a religious militant, and perhaps adding the religious militant's insights is harmful (or just unhelpful). The idea behind CEV might help here if we can find some reasonable way to aggregate extrapolated preferences. Rather than relying on what people want in today's world, you simulate what people would want if they knew more, were more reflectively consistent, etc. A nice feature of this idea is that ISIS-ish problems might go away, as more knowledge causes more irreligion. A second nice feature of this idea is that many religious extremists' rep

turchin8y1

One thing always puzzle me about provable AI. If we able to prove that AI will do X and only X after unlimitedly many generations of self-improvemnet, it still not clear how to choose right X.

For example we could be sure that paperclip maximizer will still makes clip after billion generations.

So my question is what we are proving about provable AI?

So8res

As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to make an AI that is “provably X” for some value of X. I personally doubt AGI teams be able to literally prove anything substantial about how well the system will work in practice, though I expect that they will be able to get some decent statistical guarantees. There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)

turchin

Thanks! Could link there you will write about this subject later?

So8res

I'm not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.

shegurin8y0

What would you do if you don't find solution to friendliness problem while it would be clear that strong AI is within one year? What is the second best option after trying to develop AI friendliness theory?