Comment author: jsteinhardt 10 July 2017 03:44:18AM 8 points [-]

Shouldn't this cut both ways? Paul has also spent far fewer words justifying his approach to others, compared to MIRI.

Personally, I feel like I understand Paul's approach better than I understand MIRI's approach, despite having spent more time on the latter. I actually do have some objections to it, but I feel it is likely to be significantly useful even if (as I, obviously, expect) my objections end up having teeth.

Comment author: Wei_Dai 11 July 2017 08:46:40AM 7 points [-]

Shouldn't this cut both ways? Paul has also spent far fewer words justifying his approach to others, compared to MIRI.

The fact that Paul hasn't had a chance to hear from many of his (would-be) critics and answer them means we don't have a lot of information about how promising his approach is, hence my "too early to call it more promising than HRAD" conclusion.

I actually do have some objections to it, but I feel it is likely to be significantly useful even if (as I, obviously, expect) my objections end up having teeth.

Have you written down these objections somewhere? My worry is basically that different people looked at Paul's approach and each thought of a different set of objections, and they think, "that's not so bad", without knowing that there's actually a whole bunch of other objections out there, including additional ones that people would find if they thought and talked about Paul's ideas more.

Comment author: Paul_Christiano 10 July 2017 05:37:42PM *  9 points [-]

I agree with this basic point, but I think on the other side there is a large gap in concreteness that makes makes it much easier to usefully criticize my approach (I'm at the stage of actually writing pseudocode and code which we can critique).

So far I think that the problems in my approach will also appear for MIRI's approach. For example:

  • Solomonoff induction or logical inductors have reliability problems that are analogous to reliability problems for machine learning. So to carry out MIRI's agenda either you need to formulate induction differently, or you need to somehow solve these problems. (And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.) I think Eliezer has long understood this problem and has alluded to it, but it hasn't been the topic of much discussion (I think largely because MIRI/Eliezer have so many other problems on their plates).
  • Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).
  • I've given some concrete definitions of deliberation/extrapolation, and there's been public argument about whether they really capture human values. I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list). If you want to actually give a satisfying definition of CEV, I feel you are probably going to have to go down the same path that started with this post. I suspect Eliezer has some ideas for how to avoid these problems, but at this point those ideas have been subject to even less public discussion than my approach.

I agree there are further problems in my agenda that will be turned up by my discussion. But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

If you agree that many of my problems also come up eventually for MIRI's agenda, that's good news about the general applicability of MIRI's research (e.g. the reliability problems for Solomonoff induction may provide a good bridge between MIRI's work and mainstream ML), but I think it would also be a good reason to focus on the difficulties that are common to both approaches rather than to problems like decision theory / self-reference / logical uncertainty / naturalistic agents / ontology identification / multi-level world models / etc.

Comment author: Wei_Dai 11 July 2017 08:42:59AM 4 points [-]

And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.

I'm not sure which approaches you're referring to. Can you link to some details on this?

Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).

I don't understand how this is true. I can see how solving FAI implies solving capability amplification (just emulate the FAI at a low level *), but if all you had was a solution that allows a specific kind of agent (e.g., with values well-defined apart from its implementation details) keep those values as it self-modifies, how does that help a group of short-lived humans who don't know their own values break down an arbitrary cognitive task and perform it safely and as well as an arbitrary competitor?

(* Actually, even this isn't really true. In MIRI's approach, an FAI does not need to be competitive in performance with every AI design in every domain. I think the idea is to either convert mainstream AI research into using the same FAI design, or gain a decisive strategic advantage via superiority in some set of particularly important domains.)

My understanding is, MIRI's approach is to figure out how to safely increase capability by designing a base agent that can make safe use of arbitrary amounts of computing power and can safely improve itself by modifying its own design/code. The capability amplification approach is to figure out how to safely increase capability by taking a short-lived human as the given base agent, making copies of it and and organize how the copies work together. These seem like very different problems with their own difficulties.

I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list).

I agree that in this area MIRI's approach and yours face similar difficulties. People (including me) have criticized CEV for being vague and likely very difficult to define/implement though, so MIRI is not exactly getting a free pass by being vague. (I.e., I assume Daniel already took this into account.)

But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

This seems like a fair point, and I'm not sure how to weight these factors either. Given that discussion isn't particularly costly relative to the potential benefits, an obvious solution is just to encourage more of it. Someone ought to hold a workshop to talk about your ideas, for example.

I think it would also be a good reason to focus on the difficulties that are common to both approaches

This makes sense.

Comment author: Daniel_Dewey 10 July 2017 07:22:05PM 3 points [-]

I think there's something to this -- thanks.

To add onto Jacob and Paul's comments, I think that while HRAD is more mature in the sense that more work has gone into solving HRAD problems and critiquing possible solutions, the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising. In fact, I think the arguments for Paul's work being promising are more solid than those for HRAD, despite it only being Paul making those arguments -- I've had a much harder time understanding anything more nuanced than the basic case for HRAD I gave above, and a much easier time understanding why Paul thinks his approach is promising.

Comment author: Wei_Dai 11 July 2017 08:42:52AM *  1 point [-]

the gap seems much smaller to me when it comes to the justification for thinking HRAD is promising vs justification for Paul's approach being promising

This seems wrong to me. For example, in the "learning to reason from human" approaches, the goal isn't just to learn to reason from humans, but to do it in a way that maintains competitiveness with unaligned AIs. Suppose a human overseer disapproves of their AI using some set of potentially dangerous techniques, how can we then ensure that the resulting AI is still competitive? Once someone points this out, proponents of the approach, to continue thinking their approach is promising, would need to give some details about how they intend to solve this problem. Subsequently, justification for thinking the approach is promising is more subtle and harder to understand. I think conversations like this have occurred for MIRI's approach far more than Paul's, which may be a large part of why you find Paul's justifications easier to understand.

Comment author: Wei_Dai 09 July 2017 08:53:55AM 18 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

From the perspective of an observer who can only judge from what's published online, I'm worried that Paul's approach only looks more promising than MIRI's because it's less "mature", having received less scrutiny and criticism from others. I'm not sure what's happening internally in various research groups, but the amount of online discussion about Paul's approach has to be at least an order of magnitude less than what MIRI's approach has received.

(Looking at the thread cited by Rob Bensinger, various people including MIRI people have apparently looked into Paul's approach but have not written down their criticisms. I've been trying to better understand Paul's ideas myself and point out some difficulties that others may have overlooked, but this is hampered by the fact that Paul seems to be the only person who is working on the approach and can participate on the other side of the discussion.)

I think Paul's approach is certainly one of the most promising approaches we currently have, and I wish people paid more attention to it (and/or wrote down their thoughts about it more), but it seems much too early to cite it as an example of an approach that is more promising than HRAD and therefore makes MIRI's work less valuable.

Comment author: Wei_Dai 12 June 2015 12:51:18AM 6 points [-]

It seems easy to imagine scenarios where MIRI's work is either irrelevant (e.g., mainstream AI research keeps going in a neuromorphic or heuristic trial-and-error direction and eventually "succeeds" that way) or actively harmful (e.g., publishes ideas that eventually help others to build UFAIs). I don't know how to tell whether MIRI's current strategy overall has positive expected impact. What's your approach to this problem?

View more: Prev