Daniel_Dewey comments on My current thoughts on MIRI's "highly reliable agent design" work - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (57)

You are viewing a single comment's thread. Show more comments above.

Comment author: Daniel_Dewey 10 July 2017 08:17:50PM 1 point [-]

Thanks Tobias.

In a hard / unexpected takeoff scenario, it's more plausible that we need to get everything more or less exactly right to ensure alignment, and that we have only one shot at it. This might favor HRAD because a less principled approach makes it comparatively unlikely that we get all the fundamentals right when we build the first advanced AI system.

FWIW, I'm not ready to cede the "more principled" ground to HRAD at this stage; to me, it seems like the distinction is more about which aspects of an AI system's behavior we're specifying manually, and which aspects we're setting it up to learn. As far as trying to get everything right the first time, I currently favor a corrigibility kind of approach, as I described in 3c above -- I'm worried that trying to solve everything formally ahead of time will actually expose us to more risk.