We should expect that the incentives and culture for AI-focused companies to make them uniquely terrible for producing safe AGI.
From a “safety from catastrophic risk” perspective, I suspect an “AI-focused company” (e.g. Anthropic, OpenAI, Mistral) is abstractly pretty close to the worst possible organizational structure for getting us towards AGI. I have two distinct but related reasons:
From an incentives perspective, consider realistic alternative organizational structures to “AI-focused company” that nonetheless has enough firepower to host successful multibillion-dollar scientific/engineering projects:
In each of those cases, I claim that there are stronger (though still not ideal) organizational incentives to slow down, pause/stop, or roll back deployment if there is sufficient evidence or reason to believe that further development can result in major catastrophe. In contrast, an AI-focused company has every incentive to go ahead on AI when the case for pausing is uncertain, and minimal incentive to stop or even take things slowly.
From a culture perspective, I claim that without knowing any details of the specific companies, you should expect AI-focused companies to be more likely than plausible contenders to have the following cultural elements:
The first one should be self-explanatory. The second one is a bit more complicated, but basically I think it’s hard to have a safety-focused culture just by “wanting it” hard enough in the abstract, or by talking a big game. Instead, institutions (relatively) have more of a safe & robust culture if they have previously suffered the (large) costs of not focusing enough on safety.
For example, engineers who aren’t software engineers understand fairly deep down that their mistakes can kill people, and that their predecessors’ fuck-up have indeed killed people (think bridges collapsing, airplanes falling, medicines not working, etc). Software engineers rarely have such experience.
Similarly, governmental institutions have institutional memories with the problems of major historical fuckups, in a way that new startups very much don’t.
Introducing Ulysses*, a new app for grantseekers.
We (Austin Chen, Caleb Parikh, and I) built an app! You can test the app out if you’re writing a grant application! You can put in sections of your grant application** and the app will try to give constructive feedback about your applicants. Right now we're focused on the "Track Record" and "Project Goals" section of the application. (The main hope is to save back-and-forth-time between applicants and grantmakers by asking you questions that grantmakers might want to ask.
Austin, Caleb, and I hacked together a quick app as a fun experiment in coworking and LLM apps. We wanted a short project that we could complete in ~a day. Working on it was really fun! We mostly did it for our own edification, but we’d love it if the product is actually useful for at least a few people in the community!
As grantmakers in AI Safety, we’re often thinking about how LLMs will shape the future; the idea for this app came out of brainstorming, “How might we apply LLMs to our own work?”. We reflected on common pitfalls we see in grant applications, and I wrote a very rough checklist/rubric and graded some Manifund/synthetic applications against the rubric. Caleb then generated a small number of few shot prompts by hand and then used LLMs to generate further prompts for different criteria (e.g., concreteness, honesty, and information on past projects) using a “meta-prompting” scheme. Austin set up a simple interface in Streamlit to let grantees paste in parts of their grant proposals. All of our code is open source on Github (but not open weight 😛).***
This is very much a prototype, and everything is very rough, but please let us know what you think! If there’s sufficient interest, we’d be excited about improving it (e.g., by adding other sections or putting more effort into prompt engineering). To be clear, the actual LLM feedback isn’t necessarily good or endorsed by us, especially at this very early stage. As usual, use your own best judgment before incorporating the feedback.
*Credit to Saul for the name, who originally got the Ulysses S. Grant pun from Scott Alexander.
** Note: Our app will not be locally saving your data. We are using the OpenAI API for our LLM feedback. OpenAI says that it won’t use your data to train models, but you may still wish to be cautious with highly sensitive data anyway.
*** Linch led a discussion on the potential capabilities insights of our work, but we ultimately decided that it was asymmetrically good for safety; if you work on a capabilities team at a lab, we ask that you pay $20 to LTFF before you look at the repo.
The broader question I'm confused about is how much to update on the local/object-level of whether the labs are doing "kind of reasonable" stuff, vs what their overall incentives and positions in the ecosystem points them to doing.
eg your site puts OpenAI and Anthropic as the least-bad options based on their activities, but from an incentives/organizational perspective, their place in the ecosystem is just really bad for safety. Contrast with, e.g., being situated within a large tech company[1] where having an AI scaling lab is just one revenue source among many, or Meta's alleged "scorched Earth" strategy where they are trying very hard to commoditize the component of LLMs.
eg GDM employees have Google/Alphabet stock, most of the variance in their earnings isn't going to come from AI, at least in the short term.
Yudkowsky's comments at his sister's wedding seems surprisingly relevant here:
David Bashevkin:
And I would not think it was not think that Eliezer Yudkowsky would be the best sheva brachos speaker, but it was the most lovely thing that he said. What did Eliezer Yudkowsky say at your sheva brachos?
Channah Cohen:
Yeah, it’s a great story because it was mind-blowingly surprising at the time. And it is, I think the only thing that anyone said at a sheva brachos that I actually remember, he got up at the first sheva brachos and he said, when you die after 120 years, you’re going to go up to shamayim [this means heaven] and Hakadosh Baruch Hu [this means God]. And again, he used these phrases—
PART 3 OF 4 ENDS [01:18:04]
Channah Cohen:
Yeah. Hakadosh Baruch Hu will stand the man and the woman in front of him and he will go through a whole list of all the arguments you ever had together, and he will tell you who was actually right in each one of those arguments. And at the end he’ll take a tally, and whoever was right more often wins the marriage. And then everyone kind of chuckled and Ellie said, “And if you don’t believe that, then don’t act like it’s true.”
David Bashevkin:What a profound… If you don’t believe that, then don’t act like it’s true. Don’t spend your entire marriage and relationship hoping that you’re going to win the test to win the marriage.
I'm at work and don't have the book with me, but you can look at the "Acknowledgements" section of Superintelligence.
I agree that it's not clear whether the Department of Philosophy acted reasonably in the unique prestige ecosystem which universities inhabit, whether in the abstract or after adjusting for FHI quite possibly being unusually difficult/annoying to work with. I do think history will vindicate my position in the abstract and "normal people" with a smattering of facts about the situation (though perhaps not the degree of granularity where you understand the details of specific academic squabbles) will agree with me.
I don't think it's just an in-group perspective! Bostrom literally gives and receives feedback from kings; other members of FHI have gone on to influential positions in multi-billion dollar companies.
Are you really saying that if you ask the general public (or members of the intellectual elite), typical philosophy faculty at prestigious universities will be recognized to be as or more impressive or influential in comparison?
(I work for EA Funds, including EAIF, helping out with public communications among other work. I'm not a grantmaker on EAIF and I'm not responsible for any decision on any specific EAIF grant).
Hi. Thanks for writing this. I appreciate you putting the work in this, even though I strongly disagree with the framing of most of the doc that I feel informed enough to opine on, as well as most of the object-level.
Ultimately, I think the parts of your report about EA Funds are mostly incorrect or substantively misleading, given the best information I have available. But I think it’s possible I’m misunderstanding your position or I don’t have enough context. So please read the following as my own best understanding of the situation, which can definitely be wrong. But first, onto the positives:
There are also some things the report mentioned that we have also been tracking, and I believe we have substantial room for improvement:
Now, onto the disagreements:
Procedurally:
Substantively:
Semantically:
I originally want to correct misunderstandings and misrepresentations of EA Funds’ positions more broadly in the report. However I think there were just a lot of misunderstandings overall, so I think it's simpler for people to just assume I contest almost every categorization of the form “EA funds believes X”. A few select examples:
Note to readers: I reached out to Joel to clarify some of these points before posting. I really appreciate his prompt responses! Due to time constraints, I decided to not send him a copy of this exact comment before posting publicly.
I personally have benefited greatly from talking to specialist advisors in biosecurity.
From GPT4
“The median time to receive a response for an academic grant can vary significantly depending on the funding organization, the field of study, and the specific grant program. Generally, the process can take anywhere from a few months to over a year. ”
“The timeline for receiving a response on grant applications can vary across different fields and types of grants, but generally, the processes are similar in length to those in the academic and scientific research sectors.”
“Smaller grants in this field might be decided upon quicker, potentially within 3 to 6 months [emphasis mine], especially if they require less funding or involve fewer regulatory hurdles.”
Being funded by grants kind of sucks as an experience compared to e.g. employment; I dislike adding to such frustrations. There are also several cases I’m aware of where counterfactually impactful projects were not taken due to funders being insufficiently able to fund things in time, in some of those incidences I'm more responsible than anybody else.
I'm interested in what people think of are the strongest arguments against this view. Here are a few counterarguments that I'm aware of:
1. Empirically the AI-focused scaling labs seem to care quite a lot about safety, and make credible commitments for safety. If anything, they seem to be "ahead of the curve" compared to larger tech companies or governments.
2. Government/intergovernmental agencies, and to a lesser degree larger companies, are bureaucratic and sclerotic and generally less competent.
3. The AGI safety issues that EAs worry about the most are abstract and speculative, so having a "normal" safety culture isn't as helpful as buying in into the more abstract arguments, which you might expect to be easier to do for newer companies.
4. Scaling labs share "my" values. So AI doom aside, all else equal, you might still want scaling labs to "win" over democratically elected governments/populist control.