Thoughts on responsible scaling policies and regulation

Paul_Christiano

I am excited about AI developers implementing responsible scaling policies; I’ve recently been spending time refining this idea and advocating for it. Most people I talk to are excited about RSPs, but there is also some uncertainty and pushback about how they relate to regulation. In this post I’ll explain my views on that:

I think that sufficiently good responsible scaling policies could dramatically reduce risk, and that preliminary policies like Anthropic’s RSP meaningfully reduce risk by creating urgency around key protective measures and increasing the probability of a pause if those measures can’t be implemented quickly enough.
I don’t think voluntary implementation of responsible scaling policies is a substitute for regulation. Voluntary commitments are unlikely to be universally adopted or to have adequate oversight, and I think the public should demand a higher degree of safety than AI developers are likely to voluntarily implement.
I think that developers implementing responsible scaling policies now increases the probability of effective regulation. If I instead thought it would make regulation harder, I would have significant reservations.
Transparency about RSPs makes it easier for outside stakeholders to understand whether an AI developer’s policies are adequate to manage risk, and creates a focal point for debate and for pressure to improve.
I think the risk from rapid AI development is very large, and that even very good RSPs would not completely eliminate that risk. A durable, global, effectively enforced, and hardware-inclusive pause on frontier AI development would reduce risk further. I think this would be politically and practically challenging and would have major costs, so I don’t want it to be the only option on the table. I think implementing RSPs can get most of the benefit, is desirable according to a broader set of perspectives and beliefs, and helps facilitate other effective regulation.

Why I’m excited about RSPs

I think AI developers are not prepared to work with very powerful AI systems. They don’t have the scientific understanding to deploy superhuman AI systems without considerable risk, and they do not have the security or internal controls to even safely train such models.

If protective measures didn’t improve then I think the question would be when rather than if development should be paused. I think the safest action in an ideal world would be pausing immediately until we were better prepared (though see the caveats in the next section). But the current level of risk is low enough that I think it is defensible for companies or countries to continue AI development if they have a sufficiently good plan for detecting and reacting to increasing risk.

If AI developers make these policies concrete and state them publicly, then I believe it puts the public and policymakers in a better place to understand what those policies are and to debate whether they are adequate. And I think the case for companies taking this action is quite strong—AI systems may continue to improve quickly, and a vague promise to improve safety at some unspecified future time isn’t enough.

I think that a good RSP will lay out specific conditions under which further development would need to be paused. Even though the goal is to avoid ever ending up in that situation, I think it’s important for developers to take the possibility seriously, to plan for it, and to be transparent about it with stakeholders.

Thoughts on an AI pause

If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal; this policy would come with other significant costs and currently seems unlikely to be implemented without much clearer evidence of serious risk.

A unilateral pause on large AI training runs in the West, without a pause on new computing hardware, would have more ambiguous impacts on global catastrophic risk. The primary negative effects on risk are leading to faster catch-up growth in a later period with more hardware and driving AI development into laxer jurisdictions.

However, if governments shared my perspective on risk then I think they should already be implementing domestic policies that will often lead to temporary pauses or slowdowns in practice. For example, they might require frontier AI developers to implement additional protective measures before training larger models than those that exist today, and some of those protective measures may take a fairly long time (such as major improvements in risk evaluations or information security). Or governments might aim to limit the rate at which effective training compute of frontier models grows, in order to provide a smoother ramp for society to adapt to AI and to limit the risk of surprises.

I expect RSPs to help facilitate effective regulation

Regardless of whether risk mitigation takes the form of responsible scaling policies or something else, I think voluntary action by companies isn’t enough. If the risk is large then the most realistic approach is regulation and eventually international coordination. In reality I think the expected risk is large enough (including some risk of a catastrophe surprisingly soon) that a sufficiently competent state would implement regulation immediately.

I believe that AI developers implementing RSPs will make it easier rather than harder to implement effective regulation. RSPs provide a clear path to iteratively improving policy; they provide information about existing practices that can inform or justify regulation; and they build momentum around and legitimize the idea that serious precautions can be necessary for safe development. They are also a step towards building out the procedures and experience that would be needed to make many forms of regulation effective.

I’m not an expert in this area, and my own decisions are mostly guided by a desire to offer my honest assessments of the effects of different policies. That said, my impression from interacting with people who have more policy expertise is that they broadly agree that RSPs are likely to help rather than hurt efforts to implement effective regulation. I have mostly seen voluntary RSPs discussed, and have advocated for them, in contexts where it appears the most likely alternative is less rather than more action.

Anthropic’s RSP

I believe that Anthropic’s RSP is a significant step in the right direction. I would like to see pressure on other developers to implement policies that are at least this good, though I think there is a long way to go from there to an ideal RSP.

Some components I found particularly valuable:

Specifying a concrete set of evaluation results that would cause them to move to ASL-3. I think having concrete thresholds by which concrete actions must be taken is important, and I think the proposed threshold is early enough to trigger before an irreversible catastrophe with high probability (well over 90%).
Making a concrete statement about security goals at ASL-3—“non-state actors are unlikely to be able to steal model weights, and advanced threat actors (e.g. states) cannot steal them without significant expense”—and describing security measures they expect to take to meet this goal.
Requiring a definition and evaluation protocol for ASL-4 to be published and approved by the board before scaling past ASL-3.
Providing preliminary guidance about conditions that would trigger ASL-4 and the necessary protective measures to operate at ASL-4 (including security against motivated states, which I expect to be extremely difficult to achieve, and an affirmative case for safety that will require novel science).

Some components I hope will improve over time:

The flip side of specifying concrete evaluations right now is that they are extremely rough and preliminary. I think it is worth working towards better evaluations with a clearer relationship to risk.
In order for external stakeholders to have confidence in Anthropic’s security I think it will take more work to lay out appropriate audits and red teaming. To my knowledge this work has not been done by anyone and will take time.
The process for approving changes to the RSP is publication and approval by the board. I think this ensures a decision will be made deliberately and is much better than nothing, but it would be better to have effective independent oversight.
To the extent that it’s possible to provide more clarity about ASL-4, doing so would be a major improvement by giving people a chance to examine and debate conditions for that level. To the extent that it’s not, it would be desirable to provide more concreteness about a review or decision-making process for deciding whether a given set of safety, security, and evaluation measures is adequate.

I’m excited to see criticism of RSPs that focuses on concrete ways in which they fail to manage risk. Such criticism can help (i) push AI developers to do better, and (ii) argue to policy makers that we need regulatory requirements stronger than existing RSPs. That said, I think it is significantly better to have an RSP than to not have one, and don’t think that point should be lost in the discussion.

On the name “responsible scaling”

I believe that a very good RSP (of the kind I've been advocating for) could cut risk dramatically if implemented effectively, perhaps a 10x reduction. In particular, I think we will probably have stronger signs of dangerous capabilities before something catastrophic happens, and that realistic requirements for protective measures can probably lead to us either managing that risk or pausing when our protective measures are more clearly inadequate. This is a big enough risk reduction that my primary concern is about whether developers will actually adopt good RSPs and implement them effectively.

That said, I believe that even cutting risk by 10x still leaves us with a lot of risk; I think it’s reasonable to complain that private companies causing a 1% risk of extinction is not “responsible.” I also think the basic idea of RSPs should be appealing to people with a variety of views about risk, and a more pessimistic person might think that even if all developers implement very good RSPs there is still a 10%+ risk of a global catastrophe.

On the one hand, I think it’s good for AI developers to make and defend the explicit claim that they are developing the technology in a responsible way, and to be vulnerable to pushback when they can’t defend that claim. On the other hand, I think it’s bad if calling scaling “responsible” gives (or looks like an attempt to give) a false sense of security, whether about the remaining catastrophic risk or about social impacts beyond catastrophic risk.

So “responsible scaling policy” may not be the right name. I think the important thing is the substance: developers should clearly lay out a roadmap for the relationship between dangerous capabilities and necessary protective measures, should describe concrete procedures for measuring dangerous capabilities, and should lay out responses if capabilities pass dangerous limits without protective measures meeting the roadmap.

177 Reactions

More posts like this

Comments5

Sorted by

New & upvoted

Click to highlight new comments since: Today at 10:35 AM

Remmelt3mo11

I kept responding in private conversations on Paul’s arguments, to a point that I decided to share my comments here.

The hardware overhang argument has poor grounding.

Labs scaling models results in more investment in producing more GPU chips with more flops (see Sam Altman’s play for the UAE chip factory) and less latency between (see the EA start-up Fathom Radiant, which started up offering fibre-optic-connected supercomputers for OpenAI and now probably shifted to Anthropic).

The increasing levels of model combinatorial complexity and outside signal connectivity become exponentially harder to keep safe. So the only viable pathway is not scaling that further, rather than “helplessly” take all the hardware that currently gets produced.

Further, AI Impacts found no historical analogues for a hardware overhang. And plenty of common sense reasons why the argument’s premises are unsound.

The hardware overhang claim lacks grounding, but that hasn’t prevented alignment researchers from repeating it in a way that ends up weakening coordination efforts to restrict AI corporations.

Responsible scaling policies have ‘safety-washing’ spelled all over them.

Consider the original formulation by Anthropic: “Our RSP focuses on catastrophic risks – those where an AI model directly causes large scale devastation.”

In other words: our company can scale on as long as our staff/trustees do not deem the risk of a new AI model directly causing a catastrophe as sufficiently high.

Is that responsible?

It’s assuming that further scaling can be risk managed. It’s assuming that just risk management protocols are enough.

Then, the company invents a new wonky risk management framework, ignoring established and more comprehensive practices.

Paul argues that this could be the basis for effective regulation. But Anthropic et al. lobbying national governments to enforce the use of that wonky risk management framework makes things worse.

It distracts from policy efforts to prevent the increasing harms. It creates a perception of safety (instead of actually ensuring safety).

Ideal for AI corporations to keep scaling and circumvent being held accountable.

RSPs support regulatory capture. I want us to become clear about what we are dealing with.

Geoffrey Miller6mo10

Paul - you wrote that 'If the world were unified around the priority of minimizing global catastrophic risk, I think that we could reduce risk significantly further by implementing a global, long-lasting, and effectively enforced pause on frontier AI development—including a moratorium on the development and production of some types of computing hardware. The world is not unified around this goal....'

I think that underestimates the current public consensus and concerns about AI risk. The polls I've seen suggest widespread public hostility to AGI development, and skepticism about the AI industry's capacity to manage AI development safely. Indeed, the public sentiment seems much closer to that of AI Safety experts (eg within EA), than it does to the views of AI industry insiders (such as Yann LeCun), or to e/acc people who yearn for 'the Singularity'.

I'm still digesting the implications of these opinion polls, but I think they should nudge EAs towards a fairly significant updating on our expectations about the role that the public could play in supporting an AI Pause. It's worth remembering that the public has seen depictions of dangerous AI in novels, movies, and TV series ever since the 1927 movie 'Metropolis'' (or, arguably, maybe even since the 1818 novel 'Frankenstein'). Ordinary folks are primed to understand that AI is very risky. They might not understand the details of technical AI alignment, or RSPs, or LLMs, or deep learning. But the political will seems to be there to support an AI Pause.

My worry is that we EAs have spent so many years assuming that the public can't understand AI risks, that we're still pushing ahead on technical and policy solutions, because that's what we're used to doing. And we assume the political will isn't there to do anything more significant and binding in reducing X risk. But perhaps the public will really is there.

SummaryBot6mo7

Executive summary: The post discusses the importance of responsible scaling policies (RSPs) in AI development, their relationship with regulation, and their potential to reduce risks associated with powerful AI systems.

Key points:

Responsible scaling policies (RSPs) play a crucial role in mitigating the risks associated with rapid AI development, as developers may not have the expertise or controls to handle superhuman AI systems safely.
RSPs create transparency and clear conditions under which AI development should be paused, improving public understanding and debate about AI safety measures.
While RSPs are important, they are not a substitute for regulation, as voluntary commitments may lack universality and oversight.
Implementing RSPs can increase the likelihood of effective regulation and provide a path for iterative policy improvements.
The post acknowledges that even with RSPs, significant risks remain in rapid AI development, leading to the potential need for a global, hardware-inclusive pause in AI development.
The author suggests that RSPs, like Anthropic's, can reduce risk and create a framework for measuring and improving AI safety but still need refinement and audits.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

JP Addison6mo4

I'm curating this post. There have been several recent posts on the theme of RSPs. I'm featuring this one, but I recommend the other two posts to readers.

I particularly like that these posts mention that they view these policies as good for eventual regulation, and are willing to be clear about this.

KArax5mo-23