Daniel_Eth comments on Intro to caring about AI alignment as an EA cause - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread.

Comment author: Daniel_Eth 17 April 2017 07:08:49AM 0 points [-]

"So far, we haven't found any way to achieve all three goals at once. As an example, we can try to remove any incentive on the system's part to control whether its suspend button is pushed by giving the system a switching objective function that always assigns the same expected utility to the button being on or off"

Wouldn't this potentially have another negative effect of giving the system an incentive to "expect" an unjustifiably high probability of successfully filling the cauldron? That way if the button is pressed and it's suspended, it gets a higher reward than if it expected a lower chance of success. This is basically an example of reward hacking.