41

Wei_Dai comments on My current thoughts on MIRI's "highly reliable agent design" work - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (50)

You are viewing a single comment's thread. Show more comments above.

Comment author: Wei_Dai 11 July 2017 05:45:15PM 3 points [-]

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

Yeah, this is still not clear. Suppose we had a solution to agent foundations, I don't see how that necessarily helps me figure out what to do as H in capability amplification. For example the agent foundations solution could say, use (some approximation of) exhaustive search in the following way, with your utility function as the objective function, but that doesn't help me because I don't have a utility function.

When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

My point was that HRAD potentially enables the strategy of pushing mainstream AI research away from opaque designs (which are hard to compete with while maintaining alignment, because you don't understand how they work and you can't just blindly copy the computation that they do without risking safety), whereas in your approach you always have to worry about "how do I compete with with an AI that doesn't have an overseer or has an overseer who doesn't care about safety and just lets the AI use whatever opaque and potentially dangerous technique it wants".

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction.

Oh I see. In my mind the problems with Solomonoff Induction means that it's probably not the right way to define how induction should be done as an ideal, so we should look for something kind of like Solomonoff Induction but better, not try to patch it by doing additional things on top of it. (Like instead of trying to figure out exactly when CDT would make wrong decisions and add more complexity on top of it to handle those cases, replace it with UDT.)