1

1a3orn

20 karmaJoined

Comments
3

Answer by 1a3orn0
0
1

just one sophisticated open-source LLM could wipe out everyone


1. LLMs -- and generative AI broadly speaking -- are best understood as [recapitulating](https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dataset/) their training data. Right now, they are unable to generalize far from from their training data -- i.e., they cannot generalize from [A is B to B is A](https://arxiv.org/abs/2309.12288) type statements, their capabilities are best understood by [looking at what they saw a lot during training](https://arxiv.org/abs/2309.13638) and so on. Thus, it's best not to think of them as repositories of potential new, world-crushing information -- but as compressed and easily-accessed information that already existed in the world.

Note that the most advanced LLMs are currently unable to replace even junior software engineers -- even though they have read many hundreds of thousands of tutorials of tutorials on how to be a junior software engineer on the internet.  Given this, how likely is it that an advanced LLM will be agent-like enough to kill everyone when prompted to do so, and carry out a sequence of steps to kill everyone --a sequence of steps for which it has not read hundreds of thousands of tutorials on the internet?

2.  Note that, as with every tool, the vast majority of people using open-source LLMs will be using them for good, including defending against people who wish to use them maliciously. Most forms of technology are neutralized in this fashion. For every 1 person who asks an open source LLM to destroy the world, there will be 1000s of people asking (a) how to defend against specific harms that could happen, which is (b) particularly important because LLMs (like humans) are better at answering more tightly-scoped questions.

I think that it's conceivable that some forms of AI in general might not work like this, but it's immensely likely that LLMs in particular are the kind of thing where the good majority will easily outweigh the bad minority, given that they mostly raise the floor of competence rather than generate new information.

Encyclopedias, the internet, public education, etc -- all these things also make it easier for bad actors to do harm by making them smarter, but are considered obviously worth it by almost everyone. What would make LLMs different?

3. Consider that it is not risk-free banning open source LLMs! The more powerful you think LLMs are, then the more oppressive any such rules will be -- the more this will bring about power struggles over what is permitted; the more tightly contested rule over such regulating bodies will be. 

If most existential risks to the world come from well-resourced actors for whom the presence of an open source LLM is a small matter -- i.e., actors who could obtain an LLM through other means easily -- than by banning them you might very well be making the world more likely to be doomed, by preventing the use of such open-source systems by the vast majority to defend against other threats.

Would the AI Governance & Policy group consider hiring someone in AI policy who disagreed with various policies that organizations you've funded have promoted?

For instance, multiple organizations you've funded have released papers or otherwise advocated for strong restrictions on open source AI -- would you consider hiring someone who disagrees with substantially on their recommendations or many specific points they raise?

I think you've made a mistake in understanding what Quintin means.

Most of the examples of you give of inability to control are "how an AI could escape, given that it wants to escape."

Quintin's examples of ease of control, however, are "how easy is it going to be to get the AI to want to do what we want it to do." The arguments he gives are to that effect, and the points you bring up are orthogonal to them.