capybaralet comments on My current thoughts on MIRI's "highly reliable agent design" work - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (57)

You are viewing a single comment's thread.

Comment author: capybaralet 11 August 2017 01:15:51AM *  1 point [-]

My main comments:

  1. As others have mentioned: great post! Very illuminating!

  2. I agree value-learning is the main technical problem, although I’d also note that value-learning related techniques are becoming much more popular in mainstream ML these days, and hence less neglected. Stuart Russell has argued (and I largely agree) that things like IRL will naturally become a more popular research topic (but I’ve also argued this might not be net-positive for safety: http://lesswrong.com/lw/nvc/risks_from_approximate_value_learning/)

  3. My main comment wrt the value of HRAD (3a) is: I think HRAD-style work is more about problem definitions than solutions. So I find it to be somewhat orthogonal to the other approach of “learning to reason from humans” (L2R). We don’t have the right problem definitions, at the moment; we know that the RL framework is a leaky abstraction. I think MIRI has done the best job of identifying the problems which could result from our current leaky abstractions, and working to address them by improving our understanding of what problems need to be solved.

  4. It’s also not clear that human reasoning can be safely amplified; the relative safety of existing humans may be due to our limited computational / statistical resources, rather than properties of our cognitive algorithms. But this argument is not as strong as it seems; see comment #3 below.

A few more comments:

  1. RE 3b: I don’t really think the AI community’s response to MIRI’s work is very informative, since it’s just not on people’s radar. The problems and not well known or understood, and the techniques are (AFAIK) not very popular or in vogue (although I’ve only been in the field for 4 years, and only studied machine-learning based approaches to AI). I think decision theory was already a relatively well known topic in philosophy, so I think philosophy would naturally be more receptive to these results.

  2. I’m unconvinced about the feasibility of Paul’s approach**, and share Wei Dai’s concerns about it hinging on a high level of competitiveness. But I also think HRAD suffers from the same issues of competitiveness (this does not seem to be MIRI’s view, which I’m confused by). This is why I think solving global coordination is crucial.

  3. A key missing (assumed?) argument here is that L2R can be a stepping stone, e.g. providing narrow or non-superintelligent AI capabilities which can be applied to AIS problems (e.g. making much more progress on HRAD than MIRI). To me this is a key argument for L2R over HRAD, and generally a source of optimism. I’m curious if this argument plays a significant role in your thought; in other words, is it that HRAD problems don’t need to be solved, or just that the most effective solution path goes through L2R? I’m also curious about the counter-argument for pursuing HRAD now: i.e. what role does MIRI anticipate safe advanced (but not general / superhuman) intelligent systems to play in HRAD?

  4. An argument for more funding for MIRI which isn’t addressed is the apparent abundance of wealth at the disposal of Good Ventures. Since funding opportunities are generally scarce in AI Safety, I think every decent opportunity should be aggressively pursued. There are 3 plausible arguments I can see for the low amount of funding to MIRI: 1) concern of steering other researchers in unproductive directions 2) concern about bad PR 3) internal politics.

  5. Am I correct that there is a focus on shorter timelines (e.g. <20 years)?

Briefly, my overall perspective on the future of AI and safety relevance is:

  1. There ARE fundamental insights missing, but they are unlikely to be key to building highly capable OR safe AI.

  2. Fundamental insights might be crucial for achieving high confidence in a putatively safe AI (but perhaps not for developing an AI which is actually safe).

  3. HRAD line of research is likely to uncover mostly negative results (ala AIXI’s arbitrary dependence on prior)

  4. Theory is behind empiricism, and the gap is likely to grow; this is the main reason I’m a bit pessimistic about theory being useful. On the other hand, I think most paths to victory involve using capability-control for as long as possible while transitioning to completely motivation-control based approaches, so conditioning on victory, it seems more likely that we solve more fundamental problems (i.e. “we have to solve these problems eventually”).

** the two main reasons are: 1) I don’t think it will be competitive and 2) I suspect it will be difficult to prevent compounding errors in a bootstrapping process that yields superintelligent agents.