Ryan Greenblatt

In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking unethical actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.

This reasoning seems to imply that you could use GPT-2 to oversee GPT-4 by bootstrapping from a chain of models of scales between GPT-2 and GPT-4. However, this isn't true, the weak-to-strong generalization paper finds that this doesn't work and indeed bootstrapping like this doesn't help at all for ChatGPT reward modeling (it helps on chess puzzles and for nothing else they investigate I believe).

I think this sort of bootstrapping argument might work if we could ensure that the each model in the chain was sufficiently aligned and capable of reasoning that it would carefully reason about what humans would want if they were more knowledgeable and then rate outputs based on this. However, I don't think GPT-4 is either aligned enough or capable enough that we see this behavior. And I still think it's unlikely it works under these generous assumptions (though I won't argue for this here).

Matthew_Barnett's Quick takes

Ryan Greenblatt3d10

In fact, it is difficult for me to name even a single technology that I think is currently underregulated by society.

The obvious example would be synthetic biology, gain-of-function research, and similar.

I also think AI itself is currently massively underregulated even entirely ignoring alignment difficulties. I think the probability of the creation of AI capable of accelerating AI R&D by 10x this year is around 3%. It would be extremely bad for US national interests if such an AI was stolen by foreign actors. This suffices for regulation ensuring very high levels of security IMO. And this is setting aside ongoing IP theft and similar issues.

Motivation gaps: Why so much EA criticism is hostile and lazy

Ryan Greenblatt3d3

Sure, but there are many alternative explanations:

There is internal and external pressure to avoid downplaying AI safety.
Regulation is inevitable, so it would be better to ensure that you can at least influence it somewhat. Purely fighting against regulation might go poorly for you.
The leaders care at least a bit about AI safety either out of a bit of altruism or self interest. (Or at least aren't constantly manipulative to such an extent that they choose all words to maximize their power.)

Motivation gaps: Why so much EA criticism is hostile and lazy

Ryan Greenblatt3d5

Not to mention that Big Tech companies whose business plans might be most threatened by "AI pause" advocacy are currently seeing more general "AI safety" arguments as an opportunity to achieve regulatory capture...

Why do you think this? It seems very unclear if this is true to me.

Motivation gaps: Why so much EA criticism is hostile and lazy

Ryan Greenblatt5d21

I'm not sure that I buy that critics lack motivation. At least in the space of AI, there will be (and already are) people with immense financial incentive to ensure that x-risk concerns don't become very politically powerful.

Of course, it might be that the best move for these critics won't be to write careful and well reasoned arguments for whatever reason (e.g. this would draw more attention to x-risk so ignoring it is better from their perspective).

Edit: this is mentioned in the post, but I'm a bit surprised because this isn't emphasized more.

What novel, actionable advice does longtermism offer?

Ryan Greenblatt8d3

because it feels very differently about "99% of humanity is destroyed, but the remaining 1% are able to rebuild civilisation" and "100% of humanity is destroyed, civilisation ends"

Maybe? This depends on what you think about the probability that intelligent life re-evolves on earth (it seems likely to me) and how good you feel about the next intelligent species on earth vs humans.

the particular focus on extinction increases the threat from AI and engineered biorisks

IMO, most x-risk from AI probably doesn't come from literal human extinction but instead AI systems acquiring most of the control over long run resources while some/most/all humans survive, but fair enough.

The argument for near-term human disempowerment through AI

Ryan Greenblatt10d1

Where the main counterargument is that now the groups in power can be immortal and digital minds will be possible.

Ryan Greenblatt

Bio

Posts 2

Comments128

Topic contributions2

Posts
2

Comments
128

Topic contributions
2