JesseClifton comments on My current thoughts on MIRI's "highly reliable agent design" work - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (57)

You are viewing a single comment's thread.

Comment author: JesseClifton 07 July 2017 10:13:46PM 1 point [-]

Great piece, thank you.

Regarding "learning to reason from humans", to what extent do you think having good models of human preferences is a prerequisite for powerful (and dangerous) general intelligence?

Of course, the motivation to act on human preferences is another matter - but I wonder if at least the capability comes by default?

Comment author: Daniel_Dewey 10 July 2017 07:30:27PM 0 points [-]

My guess is that the capability is extremely likely, and the main difficulties are motivation and reliability of learning (since in other learning tasks we might be satisfied with lower reliability that gets better over time, but in learning human preferences unreliable learning could result in a lot more harm).

Comment author: WillPearson 09 July 2017 10:07:31AM 0 points [-]

My own 2 cents. It depends a bit what form of general intelligence is made first. There are at least two possible models.

  1. Super intelligent agent with a specified goal
  2. External brain lobe

With the first you need to be able to specify a human preferences in the form of a goal. Which enables it to pick the right actions.

The external brain lobe would start not very powerful and not come with any explicit goals but would be hooked into the human motivational system and develop goals shaped by human preferences.

HRAD is explicitly about the first. I would like both to be explored.

Comment author: JesseClifton 09 July 2017 05:17:07PM *  0 points [-]

Right, I'm asking how useful or dangerous your (1) could be if it didn't have very good models of human psychology - and therefore didn't understand things like "humans don't want to be killed".