RG

Ryan Greenblatt

Member of Technical Staff @ Redwood Research
482 karmaJoined Sep 2022

Bio

This other Ryan Greenblatt is my old account[1]. Here is my LW account.

  1. ^

    Account lost to the mists of time and expired university email addresses.

Comments
128

Topic contributions
2

Explicit +1  to what Owen is saying here.

(Given that I commented with some counterarguments, I thought I would explicitly note my +1 here.)

In particular, I am persuaded by the argument that, because evaluation is usually easier than generation, it should be feasible to accurately evaluate whether a slightly-smarter-than-human AI is taking unethical actions, allowing us to shape its rewards during training accordingly. After we've aligned a model that's merely slightly smarter than humans, we can use it to help us align even smarter AIs, and so on, plausibly implying that alignment will scale to indefinitely higher levels of intelligence, without necessarily breaking down at any physically realistic point.

This reasoning seems to imply that you could use GPT-2 to oversee GPT-4 by bootstrapping from a chain of models of scales between GPT-2 and GPT-4. However, this isn't true, the weak-to-strong generalization paper finds that this doesn't work and indeed bootstrapping like this doesn't help at all for ChatGPT reward modeling (it helps on chess puzzles and for nothing else they investigate I believe).

I think this sort of bootstrapping argument might work if we could ensure that the each model in the chain was sufficiently aligned and capable of reasoning that it would carefully reason about what humans would want if they were more knowledgeable and then rate outputs based on this. However, I don't think GPT-4 is either aligned enough or capable enough that we see this behavior. And I still think it's unlikely it works under these generous assumptions (though I won't argue for this here).

In fact, it is difficult for me to name even a single technology that I think is currently underregulated by society.

The obvious example would be synthetic biology, gain-of-function research, and similar.

I also think AI itself is currently massively underregulated even entirely ignoring alignment difficulties. I think the probability of the creation of AI capable of accelerating AI R&D by 10x this year is around 3%. It would be extremely bad for US national interests if such an AI was stolen by foreign actors. This suffices for regulation ensuring very high levels of security IMO. And this is setting aside ongoing IP theft and similar issues.

Sure, but there are many alternative explanations:

  • There is internal and external pressure to avoid downplaying AI safety.
  • Regulation is inevitable, so it would be better to ensure that you can at least influence it somewhat. Purely fighting against regulation might go poorly for you.
  • The leaders care at least a bit about AI safety either out of a bit of altruism or self interest. (Or at least aren't constantly manipulative to such an extent that they choose all words to maximize their power.)

Not to mention that Big Tech companies whose business plans might be most threatened by "AI pause" advocacy are currently seeing more general "AI safety" arguments as an opportunity to achieve regulatory capture...

Why do you think this? It seems very unclear if this is true to me.

I'm not sure that I buy that critics lack motivation. At least in the space of AI, there will be (and already are) people with immense financial incentive to ensure that x-risk concerns don't become very politically powerful.

Of course, it might be that the best move for these critics won't be to write careful and well reasoned arguments for whatever reason (e.g. this would draw more attention to x-risk so ignoring it is better from their perspective).

Edit: this is mentioned in the post, but I'm a bit surprised because this isn't emphasized more.

because it feels very differently about "99% of humanity is destroyed, but the remaining 1% are able to rebuild civilisation" and "100% of humanity is destroyed, civilisation ends"

Maybe? This depends on what you think about the probability that intelligent life re-evolves on earth (it seems likely to me) and how good you feel about the next intelligent species on earth vs humans.

the particular focus on extinction increases the threat from AI and engineered biorisks

IMO, most x-risk from AI probably doesn't come from literal human extinction but instead AI systems acquiring most of the control over long run resources while some/most/all humans survive, but fair enough.

Where the main counterargument is that now the groups in power can be immortal and digital minds will be possible.

See also: AGI and Lock-in

My views are reasonably messy, complicated, hard to articulate, and based on a relatively diffuse set of intuitions. I think we also reason in a pretty different way about the situation than you seem to (3). I think it wouldn't be impossible to try to write up a post on my views, but I would need to consolidate and think about how exactly to express where I'm at. (Maybe 2-5 person days of work.) I haven't really consolidated my views or something close to reflective equilibrium.

I also just that arguing about pure philosophy very rarely gets anywhere and is very hard to make convincing in general.

I'm somewhat uncertain on the "inside view/mechanistic" level. (But my all considered view is partially defering to some people which makes me overall less worried that I should immediately reconsider my life choices.)

I think my views are compelling, but I'm not sure if I'd say "very compelling"

Load more