New & upvoted

Customize feedCustomize feed
NEW
CommunityCommunity
Personal+

Posts tagged community

Quick takes

Show community
View more
I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source "transformer debugger" tool. I resigned from OpenAI on February 15, 2024.
In my latest post I talked about whether unaligned AIs would produce more or less utilitarian value than aligned AIs. To be honest, I'm still quite confused about why many people seem to disagree with the view I expressed, and I'm interested in engaging more to get a better understanding of their perspective. At the least, I thought I'd write a bit more about my thoughts here, and clarify my own views on the matter, in case anyone is interested in trying to understand my perspective. The core thesis that was trying to defend is the following view: My view: It is likely that by default, unaligned AIs—AIs that humans are likely to actually build if we do not completely solve key technical alignment problems—will produce comparable utilitarian value compared to humans, both directly (by being conscious themselves) and indirectly (via their impacts on the world). This is because unaligned AIs will likely both be conscious in a morally relevant sense, and they will likely share human moral concepts, since they will be trained on human data. Some people seem to merely disagree with my view that unaligned AIs are likely to be conscious in a morally relevant sense. And a few others have a semantic disagreement with me in which they define AI alignment in moral terms, rather than the ability to make an AI share the preferences of the AI's operator.  But beyond these two objections, which I feel I understand fairly well, there's also significant disagreement about other questions. Based on my discussions, I've attempted to distill the following counterargument to my thesis, which I fully acknowledge does not capture everyone's views on this subject: Perceived counter-argument: The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives. At present, only a small proportion of humanity holds partly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies. As a result, it is plausible that almost all value would be lost, from a utilitarian perspective, if AIs were unaligned with human preferences. Again, I'm not sure if this summary accurately represents what people believe. However, it's what some seem to be saying. I personally think this argument is weak. But I feel I've had trouble making my views very clear on this subject, so I thought I'd try one more time to explain where I'm coming from here. Let me respond to the two main parts of the argument in some amount of detail: (i) "The vast majority of utilitarian value in the future will come from agents with explicitly utilitarian preferences, rather than those who incidentally achieve utilitarian objectives." My response: I am skeptical of the notion that the bulk of future utilitarian value will originate from agents with explicitly utilitarian preferences. This clearly does not reflect our current world, where the primary sources of happiness and suffering are not the result of deliberate utilitarian planning. Moreover, I do not see compelling theoretical grounds to anticipate a major shift in this regard. I think the intuition behind the argument here is something like this: In the future, it will become possible to create "hedonium"—matter that is optimized to generate the maximum amount of utility or well-being. If hedonium can be created, it would likely be vastly more important than anything else in the universe in terms of its capacity to generate positive utilitarian value. The key assumption is that hedonium would primarily be created by agents who have at least some explicit utilitarian goals, even if those goals are fairly weak. Given the astronomical value that hedonium could potentially generate, even a tiny fraction of the universe's resources being dedicated to hedonium production could outweigh all other sources of happiness and suffering. Therefore, if unaligned AIs would be less likely to produce hedonium than aligned AIs (due to not having explicitly utilitarian goals), this would be a major reason to prefer aligned AI, even if unaligned AIs would otherwise generate comparable levels of value to aligned AIs in all other respects. If this is indeed the intuition driving the argument, I think it falls short for a straightforward reason. The creation of matter-optimized-for-happiness is more likely to be driven by the far more common motives of self-interest and concern for one's inner circle (friends, family, tribe, etc.) than by explicit utilitarian goals. If unaligned AIs are conscious, they would presumably have ample motives to optimize for positive states of consciousness, even if not for explicitly utilitarian reasons. In other words, agents optimizing for their own happiness, or the happiness of those they care about, seem likely to be the primary force behind the creation of hedonium-like structures. They may not frame it in utilitarian terms, but they will still be striving to maximize happiness and well-being for themselves and others they care about regardless. And it seems natural to assume that, with advanced technology, they would optimize pretty hard for their own happiness and well-being, just as a utilitarian might optimize hard for happiness when creating hedonium. In contrast to the number of agents optimizing for their own happiness, the number of agents explicitly motivated by utilitarian concerns is likely to be much smaller. Yet both forms of happiness will presumably be heavily optimized. So even if explicit utilitarians are more likely to pursue hedonium per se, their impact would likely be dwarfed by the efforts of the much larger group of agents driven by more personal motives for happiness-optimization. Since both groups would be optimizing for happiness, the fact that hedonium is similarly optimized for happiness doesn't seem to provide much reason to think that it would outweigh the utilitarian value of more mundane, and far more common, forms of utility-optimization. To be clear, I think it's totally possible that there's something about this argument that I'm missing here. And there are a lot of potential objections I'm skipping over here. But on a basic level, I mostly just lack the intuition that the thing we should care about, from a utilitarian perspective, is the existence of explicit utilitarians in the future, for the aforementioned reasons. The fact that our current world isn't well described by the idea that what matters most is the number of explicit utilitarians, strengthens my point here. (ii) "At present, only a small proportion of humanity holds partly utilitarian views. However, as unaligned AIs will differ from humans across numerous dimensions, it is plausible that they will possess negligible utilitarian impulses, in stark contrast to humanity's modest (but non-negligible) utilitarian tendencies." My response: Since only a small portion of humanity is explicitly utilitarian, the argument's own logic suggests that there is significant potential for AIs to be even more utilitarian than humans, given the relatively low bar set by humanity's limited utilitarian impulses. While I agree we shouldn't assume AIs will be more utilitarian than humans without specific reasons to believe so, it seems entirely plausible that factors like selection pressures for altruism could lead to this outcome. Indeed, commercial AIs seem to be selected to be nice and helpful to users, which (at least superficially) seems "more utilitarian" than the default (primarily selfish-oriented) impulses of most humans. The fact that humans are only slightly utilitarian should mean that even small forces could cause AIs to exceed human levels of utilitarianism. Moreover, as I've said previously, it's probable that unaligned AIs will possess morally relevant consciousness, at least in part due to the sophistication of their cognitive processes. They are also likely to absorb and reflect human moral concepts as a result of being trained on human-generated data. Crucially, I expect these traits to emerge even if the AIs do not share human preferences.  To see where I'm coming from, consider how humans routinely are "misaligned" with each other, in the sense of not sharing each other's preferences, and yet still share moral concepts and a common culture. For example, an employee can share moral concepts with their employer while having very different consumption preferences from them. This picture is pretty much how I think we should primarily think about unaligned AIs that are trained on human data, and shaped heavily by techniques like RLHF or DPO. Given these considerations, I find it unlikely that unaligned AIs would completely lack any utilitarian impulses whatsoever. However, I do agree that even a small risk of this outcome is worth taking seriously. I'm simply skeptical that such low-probability scenarios should be the primary factor in assessing the value of AI alignment research. Intuitively, I would expect the arguments for prioritizing alignment to be more clear-cut and compelling than "if we fail to align AIs, then there's a small chance that these unaligned AIs might have zero utilitarian value, so we should make sure AIs are aligned instead". If low probability scenarios are the strongest considerations in favor of alignment, that seems to undermine the robustness of the case for prioritizing this work. While it's appropriate to consider even low-probability risks when the stakes are high, I'm doubtful that small probabilities should be the dominant consideration in this context. I think the core reasons for focusing on alignment should probably be more straightforward and less reliant on complicated chains of logic than this type of argument suggests. In particular, as I've said before, I think it's quite reasonable to think that we should align AIs to humans for the sake of humans. In other words, I think it's perfectly reasonable to admit that solving AI alignment might be a great thing to ensure human flourishing in particular. But if you're a utilitarian, and not particularly attached to human preferences per se (i.e., you're non-speciesist), I don't think you should be highly confident that an unaligned AI-driven future would be much worse than an aligned one, from that perspective.
I've recently made an update to our Announcement on the future of Wytham Abbey, saying that since this announcement, we have decided that we will use some of the proceeds on Effective Venture's general costs.
Edgy and data-driven TED talk on how the older generations in America are undermining the youth. Worth a watch.  
Mobius (the Bay Area-based family foundation where I work) is exploring new ways to remove animals from the food system. We're looking for a part-time Program Manager to help get more talented people who are knowledgable about farmed animal welfare and/or alternative proteins into US government roles. This entrepreneurial generalist would pilot a 3-6 month program to support promising students and early graduates with applying to and securing entry-level Congressional roles. We think success here could significantly improve thoughtful policymaking on farmed animal welfare and/or alternative proteins.  You can see more about the role here. Key details on the role: * Application deadline:  Tuesday 28th of May, at 23:59pm PT. Apply here. * Contract: 15-20 hours per week for 3-6 months, with the possibility of extending. * Location: Remote in the US. * Salary: $29-38 per hour (equivalent to approx. $60,000-$80,000/year) depending on experience. For exceptional candidates, we’re happy to discuss higher compensation. This would be a contractor role, with no additional benefits. Please share with potentially interested people!

Popular comments

Recent discussion

SummaryBot commented on Deep Honesty 4m ago
44
4

Most people avoid saying literally false things, especially if those could be audited, like making up facts or credentials. The reasons for this are both moral and pragmatic — being caught out looks really bad, and sustaining lies is quite hard, especially over time. Let...

Continue reading

Executive summary: Deep honesty, which involves explaining what you actually believe rather than trying to persuade others, can lead to better outcomes and deeper trust compared to shallow honesty, despite potential risks.

Key points:

  1. Shallow honesty means not saying false things, while deep honesty means explaining your true beliefs without trying to manage the other party's reactions.
  2. Deep honesty equips others to make best use of their private information along with yours, strengthening relationships, though it carries risks if not well-received.
  3. Deep hones
... (read more)
8
titotal
6h
In most of the cases you cited, I think being more honest is a good goal. However, echoing Ulrik's concern here, the potential downsides of "deep honesty" are not just limited to the "deeply honest" person. For example, a boss being "deeply honest" about being sexually attracted to a subordinate is not generally virtuous, it could just make them uncomfortable, and could easily be sexual harassment. This isn't a hypothetical, a high up EA cited the similar concept of "radical openness" as a contributing factor to his sexual harassment.  White lies exist for a reason, there are plenty of cases where people are not looking for "radical honesty" . Like, say you turn someone down from a date because they have a large disfiguring facial scar that makes them unattractive to you. Some people might want to know that this is the reason, other people might find it depressing to be told that a thing they have no control over makes them ugly. I think this is a clear case where the recipient should be the one asking. Don't be "deeply honest" to someone about potentially sensitive subjects unprompted.   As another example, you mention being honest when people ask "how are you". Generally, it's a good idea to open up to your friends, and have them open up to you. But if your cashier asks "how are you", they are just being polite, don't trauma dump to them about your struggles. 
7
Ulrik Horn
10h
Maybe you hint at it in your text but I want to emphasize that sometimes, honesty can put the listener in a difficult situation. Such difficult situations can range from anything like scaring the listener to more serious stuff like involving them in a crime (with the possibility of them ending up in jail or worse). A couple of examples (I think there are many more!): * You are angry with someone and you tell them how you feel like hitting them over the head with the beer bottle you are holding. * You are filing your taxes incorrectly in order to evade taxes and you tell your boss about it. Just mentioning this as in my experience, "lying" has a very practical, consequentialist "positive" aspect to it. Otherwise, I think you make good points about largely trying to be more honest - I try to do this myself in anything from expressing uncertainty when my kids ask me a question "dad, do ghosts exist" to expressing my opinions and feelings here on the forum, risking backlash from prospective employers/grantmakers.

Summary: This post documents research by SatisfIA, an ongoing project on non-maximizing, "aspiration-based" designs for AI agents that fulfill goals specified by constraints ("aspirations") rather than maximizing an objective function​​. We aim to contribute to AI safety...

Continue reading

Executive summary: The SatisfIA project explores aspiration-based AI agent designs that avoid maximizing objective functions, aiming to increase safety by allowing more flexibility in decision-making while still providing performance guarantees.

Key points:

  1. Concerns about the inevitability and risks of AGI development motivate exploring alternative agent designs that don't maximize objective functions.
  2. The project assumes a modular architecture separating the world model from the decision algorithm, and focuses first on model-based planning before considering
... (read more)

This announcement was written by Toby Tremlett, but don’t worry, I won’t answer the questions for Lewis.

Lewis Bollard, Program Director of Farm Animal Welfare at Open Philanthropy, will be holding an AMA on Wednesday 8th of May. Put all your questions for him on this thread...

Continue reading

What is your take of the impact of fishing on animal welfare? When it comes to aquatic lives, why should we only focus on farmed aquatic life, not fishing? Why not prioritise work towards the ban of all fishing, esp. industrial fishing?

Some info: fishing alone is responsible for 90% of all lives killed per year for consumption, fishing is also destroying wild aquatic life habitats more than we destroy lands, and it also is affecting all wild lives that depend on them to survive: it is pretty much the largest ecocide in the world, and it is legalized. All t... (read more)

1
Aidan Kankyoku
14h
Looking at major changes societies have adopted in the past, the path to these changes has often been nonlinear. A frequently-discussed example is the U.S. civil rights movement, where the extent of violent opposition reached a near zenith just before the movement's largest victories in the 1950s and 60s. Gay marriage in the U.S. was another example: in a 15-year period ending three years before marriage equality was decided by SCOTUS, advocates watched a wave of anti-gay marriage state constitutional amendments succeed at the ballot 30-1. Women's suffrage, the New Deal, and (most extremely) the abolition of slavery were all immediately preceded by enormous levels of opposition and social strife. How, if at all, does OP account for the frequent nonlinearity of major societal changes when deciding what interventions to support on behalf of farmed animals? 
3
David van Beveren
15h
How has your strategy for assessing potential grants evolved over the years, and what key factors do you now consider that you didn’t before?
Sign up for the Forum's email digest
You'll get a weekly email with the best posts from the past week. The Forum team selects the posts to feature based on personal preference and Forum popularity, and also adds some announcements and a classic post.

I've been thinking about community improvement and I realise I don't know of any examples where a community had a flaw and fixed it without some deeply painful process.

Often there are discussions of flaws within EA with some implied notion that communities in general are...

Continue reading

Changing the language they used

My dance group switched from gendered terms for the roles to non-gendered (blog post) and from calling one of the dance moves "Gypsy" to "right shoulder round". This didn't involve strife in our specific community, though other dance communities had serious rifts over these two issues.

In the gendered terms case, our transition was the outcome of a long process with the community, including talking about various term options, trial dances, and then polling. We needed to do it this way because role terms are very visible.

In ... (read more)

2Answer by John Salter5h
EA forum posts used to be written much worse than they are now in terms of conciseness, clarity, and ease of understanding. I believe a series of posts came out mocking it / arguing against it, and shortly afterwards writing habits changed.  

Anders Sandberg has written a “final report” released simultaneously with the announcement of FHI’s closure. The abstract and an excerpt follow.


Normally manifestos are written first, and then hopefully stimulate actors to implement their vision. This document is the reverse

...
Continue reading
4
Arepo
13h
@Habryka I'm just gonna call you out here. Someone -9ed my above comment in a single vote, and there are only about two people on the forum who that could be, one of who is the person I was criticising.  Given that I (I think clearly) meant this as a constructive remark, and that you're one of the most influential people in the EA movement, and that EA is supposed to encourage transparency and criticism, this sends a fairly unambiguous signal that the latter isn't really true.  In fact, I genuinely now imagine I've lost some small likelihood of being received positively by you if I ever approach Lightcone for support (and that I'm losing more by writing this). This seems like a bad sign for EA epistemic health. Please say if this wasn't you, and I'll retract and apologise.

I think many people have a voting power of 9. I do, and I know many people with more karma than me.

6
Habryka
12h
(I care quite a bit about votes being anonymous, so will generally glomarize in basically all situations where someone asks me about my voting behavior or the voting behavior of others, sorry about that)

Welcome! Use this thread to introduce yourself or ask questions about anything that confuses you. 

PS- this thread is usually entitled "Open thread", but I'm experimenting with a more descriptive title this time. 

Get started on the EA Forum

The "Guide to norms on...

Continue reading

Hi, I'm Sebastian. I'm a mathematics graduate that spent the last two years in entrepreneurship. About nine months ago, I started to actively engage with EA through the Biosecurity Fundamentals course by BlueDot Impact (although I've known about EA for a long time).

My current focus is biosecurity. I am particularly interested in biotech aspects such as DNA synthesis screening, wastewater surveillance and vaccine development. Since my expertise is mostly in data analysis, computation and software development (as opposed to bioinformatics or virology) I'm st... (read more)

TL;DR: Insider activism covers examples of concerned citizens participating in activism within or against the institutions they work in. This can occur due to natural conflicts of values or when a group of aligned people enter a particular organisation with the intention...

Continue reading
2
huw
10h
Thank you—I am a big believer in the power of collective action & have organised successful union drives & pay disputes in the past. I don't have a lot to add to your breakdown; I think this is a very promising area for EA to consider for almost every cause area (ex. would love to see a similar breakdown for current/future efforts in frontier AI labs). Just strategically, I think the most promising insider activism campaign would be to partner with an existing union in a country with strong union protections; this way, you can leverage those protections to prevent retaliation against employee activists, as they can credibly claim they were organising for the union. I think, frankly, this rules out the U.S. as a starting point—you would want to build groundswell in places where the host companies can't cut out off at the knees (the recent dismissals at Google are a strong reminder that if employees protest something the company has a stake in, they'll be fired at-will with no consequences). Furthermore, unions have a lot of existing connections & skills in developing these campaigns, and, as you've noted, regularly participate in employee activism directly or otherwise have presences in other social movements. This comes with the trade-off of potentially alienating some employees (unions are almost exclusively left-wing and have established reputations), but I don't think there are many people (outside of the U.S.) who would be put off by a union and would've otherwise joined an employee activist drive.

Awesome, sure that would be great experience for using similar collective action in other causes! Had similar thoughts when writing this, although we focus on animal advocacy in principle all our approach reports could be used for other asks in other cause areas. I'd be interested in takes from those working in these domains 

Great idea, for the more risky actions that could be a good approach if there are aligned unions. Do you think this is something you could have got the unions you helped on board with? Either animal advocacy or other potentially impactful causes?

Once it finishes selling all of its assets, the company will have as much as $16.3 billion in cash to distribute, according to a company statement. It owes customers and other non-governmental creditors about $11 billion.

 

Depending on the type of claim they hold in the case, some creditors could recover as much as 142% of what they are owed. The vast majority of customers, however, will likely get paid 118% of what they had on the FTX platform the day the company entered Chapter 11 bankruptcy.

 

Earlier this year, the company had about $6.4 billion in cash. The increase is due mostly to a general spike in prices for various cryptocurrencies, including Solana, a token heavily backed by convicted fraudster and FTX founder Sam Bankman-Fried. The company has also sold dozens other assets, including various venture-capital projects like a stake in the artificial-intelligence company Anthropic

...
Continue reading