On the abolition of man

Joe_Carlsmith

(Cross-posted from my website. Podcast version here, or search for "Joe Carlsmith Audio" on your podcast app.

This essay is part of a series that I'm calling "Otherness and control in the age of AGI." I'm hoping that the individual essay can be read fairly well on their own, but see here for brief summaries of the essays that have been released thus far.)

Earlier in this series, I discussed a certain kind of concern about the AI alignment discourse – namely, that it aspires to exert an inappropriate degree of control over the values that guide the future. In considering this concern, I think it's important to bear in mind the aspects of our own values that are specifically focused on pluralism, tolerance, helpfulness, and inclusivity towards values different-from-our-own (I discussed these in the last essay). But I don't think this is enough, on its own, to fully allay the concern in question. Here I want to analyze one version of this concern more directly, and to try to understand what an adequate response could consist in.

Tyrants and poultry-keepers

Have you read The Abolition of Man, by C.S. Lewis? As usual: no worries if not (I'll summarize it in a second). But: recommended. In particular: The Abolition of Man is written in opposition to something closely akin to the sort of Yudkowskian worldview and orientation towards the future that I've been discussing.^[1] I think the book is wrong about a bunch of stuff. But I also think that it's an instructive evocation of a particular way of being concerned about controlling future values – one that I think other critics of Yudkowskian vibes (e.g., Hanson) often draw on as well.^[2]

At its core, The Abolition of Man is about meta-ethics. Basically, Lewis thinks that some kind of moral realism is true. In particular, he thinks cultures and religions worldwide have all rightly recognized something he calls the Tao – some kind of natural law; a way that rightly reflects and responds to the world; an ethics that is objective, authoritative, and deeply tied to the nature of Being itself. Indeed, Lewis thinks that the content of human morality across cultures and time periods has been broadly similar, and he includes, in the appendix of the book, a smattering of quotations meant to illustrate (though not: establish) this point.

"Laozi Riding an Ox by Zhang Lu (c. 1464--1538)" (Image source here)

But Lewis notices, also, that many of the thinkers of his day deny the existence of the Tao. Like Yudkowsky, they are materialists, and "subjectivists," who think – at least intellectually – that there is no True Way, no objective morality, but only ... something else. What, exactly?

Lewis considers the possibility of attempting to ground value in something non-normative, like instinct. But he dismisses this possibility on familiar grounds: namely, that it fails to bridge the gap between is and ought (the same arguments would apply to Yudkowsky's "volition"). Indeed, Lewis thinks that all ethical argument, and all worthy ethical reform, must come from "within the Tao" in some sense – though exactly what sense isn't fully clear. The least controversial interpretation would be the also-familiar claim that moral argument must grant moral intuition some sort of provisional authority. But Lewis, at times, seems to want to say more: for example, that any moral reasoning must grant "absolute" authority to the whole of what Lewis takes to be a human-consensus Traditional Morality;^[3] that only those who have grasped the "spirit" of this morality can alter and extend it;^[4] and that this understanding occurs not via Reason alone, but via first tuning habits and emotions in the direction of virtue from a young age, such that by the time a "well-nurtured youth" reaches the age of Reason, "then, bred as he has been, he will hold out his hands in welcome and recognize [Reason] because of the affinity he bears to her."^[5]

This part of the book is not, in my opinion, the most interesting part (though: it's an important backdrop). Rather, the part I find most interesting comes later, in the final third, where Lewis turns to the possibility of treating human morality as simply another part of nature, to be "conquered" and brought under our control in the same way that other aspects of nature have been.

Here Lewis imagines an ongoing process of scientific modernity, in which humanity gains more and more mastery over its environment. He claims, first, that this process in fact amounts to some humans gaining power over others (since, whenever humans learn to manipulate a natural process for their own ends, they become able to use this newfound power in relation to their fellow men) – and in particular, to earlier generations of humans gaining power over later generations (because earlier generations become more able to shape the environment in which later generations operate, and the values they pursue).^[6] And in his eyes, the process culminates in the generation that achieves mastery over human nature as a whole, and hence becomes able to decide the values of all the generations to come:

In reality, of course, if any one age really attains, by eugenics and scientific education, the power to make its descendants what it pleases, all men who live after it are the patients of that power. They are weaker, not stronger: for though we may have put wonderful machines in their hands we have pre-ordained how they are to use them ... The last men, far from being the heirs of power, will be of all men most subject to the dead hand of the great planners and conditioners and will themselves exercise least power upon the future.

The real picture is that of one dominant age—let us suppose the hundredth century A.D.—which resists all previous ages most successfully and dominates all subsequent ages most irresistibly, and thus is the real master of the human species. But then within this master generation (itself an infinitesimal minority of the species) the power will be exercised by a minority smaller still. Man's conquest of Nature, if the dreams of some scientific planners are realized, means the rule of a few hundreds of men over billions upon billions of men. There neither is nor can be any simple increase of power on Man's side. Each new power won by man is a power over man as well. Each advance leaves him weaker as well as stronger. In every victory, besides being the general who triumphs, he is also the prisoner who follows the triumphal car.

I think that's Perseus of Macedon looking all sad back there... (Image source here.)

Lewis calls the tiny set of humans who determine the values of all future generations "the conditioners." He allows that humans have always attempted to exert some influence over the values of future generations – for example, by nurturing and instructing children to be virtuous. But he thinks that the conditioners will be different in two respects. First: by hypothesis, they will have enormously more power to determine the values of future generations than previously available (here Lewis expresses gratitude that previous educational theorists, like Plato and Locke, lacked such power – and I agree). Second, though, and more importantly, Lewis thinks that the conditioners will view themselves as liberated from the demands of conscience, and of the Tao – and thus, that the moral status of their attempts to influence the values of the future will be fundamentally altered:

In the older systems both the kind of man the teachers wished to produce and their motives for producing him were prescribed by the Tao—a norm to which the teachers themselves were subject and from which they claimed no liberty to depart. They did not cut men to some pattern they had chosen. They handed on what they had received: they initiated the young neophyte into the mystery of humanity which over-arched him and them alike. It was but old birds teaching young birds to fly. This will be changed. Values are now mere natural phenomena. Judgements of value are to be produced in the pupil as part of the conditioning. Whatever Tao there is will be the product, not the motive, of education. The conditioners have been emancipated from all that. It is one more part of Nature which they have conquered. The ultimate springs of human action are no longer, for them, something given. They have surrendered—like electricity: it is the function of the Conditioners to control, not to obey them. They know how to produce conscience and decide what kind of conscience they will produce. They themselves are outside, above.

Lewis gives another example of this sort of distinction earlier in the book: namely, a Roman father teaching a son that it is sweet and seemly (dulce and decorum) to die for his country. If this father speaks from within the Tao, and believes that such approving attitudes towards a patriotic death are objectively appropriate and warranted, then he is passing on his best understanding of the True Way, and helping his son see and inhabit reality more deeply. But if the father does not believe this, but rather thinks that it will be useful (either for his own purposes, or for the purposes of society more generally) if his son approves of patriotic self-sacrifice, then he is doing something very different:

Where the old initiated, the new merely "conditions". The old dealt with its pupils as grown birds deal with young birds when they teach them to fly; the new deals with them more as the poultry-keeper deals with young birds— making them thus or thus for purposes of which the birds know nothing. In a word, the old was a kind of propagation—men transmitting manhood to men; the new is merely propaganda.

Condor Teaches Youngster to Fly (Narrated by David Tennant) - Earthflight - BBC One

Old birds pushing young birds off of cliffs. Wait sorry remind me what this has to do with meta-ethics again?

The conditioners, then, are to the future as poultry-keepers with unprecedented power. And absent guidance the Tao, on what grounds will they choose the values of their poultry? Here Lewis is quite pessimistic. In particular, he thinks that the conditioners will likely regress to their basest impulses – the ones that never claimed objectivity, and hence cannot be destroyed by subjectivism – and in particular, to their desire for pleasure for themselves. But this is not core to his thesis.

More core, though, is the claim that however the conditioners choose, their apparent conquest over Nature will in some sense amount to Nature's conquest over them, and hence over humanity as a whole. This is one of the more obscure aspects of Lewis's discussion – and its confusions, in my opinion, end up inflecting much of the book. Lewis seems to hold that somehow, by treating something as a part of Nature – and in particular, by treating it purely as an object of prediction, manipulation, and control – you in fact make it into a part of Nature:

The price of conquest is to treat a thing as mere Nature. Every conquest over Nature increases her domain. The stars do not become Nature till we can weigh and measure them: the soul does not become Nature till we can psychoanalyse her. The wresting of powers from Nature is also the surrendering of things to Nature ... if man chooses to treat himself as raw material, raw material he will be.

I'll return, below, to whether this makes any sense. For now, let's look at Lewis's overall conclusion:

We have been trying, like Lear, to have it both ways: to lay down our human prerogative and yet at the same time to retain it. It is impossible. Either we are rational spirit obliged for ever to obey the absolute values of the Tao, or else we are mere nature to be kneaded and cut into new shapes for the pleasures of masters who must, by hypothesis, have no motive but their own "natural" impulses. Only the Tao provides a common human law of action which can over-arch rulers and ruled alike. A dogmatic belief in objective value is necessary to the very idea of a rule which is not tyranny or an obedience which is not slavery. (Emphasis added.)

Lewis finishes the book with some speculations on the possibility of a form of science that somehow does not reduce its object to raw material – and hence, does not extend Nature's domain as it gains knowledge and power. "When it explained it would not explain away. When it spoke of the parts it would remember the whole. While studying the It it would not lose what Martin Buber calls the Thou-situation." But Lewis is not sure this is possible.

Are we the conditioners?

I'll object to Lewis in various ways in a moment (I think the book is often quite philosophically sloppy – sloppiness that Lewis's rhetorical skill can sometimes obscure). First, though: why I am interested in this book at all?

It's a number of things. Most centrally, though: Yudkowsky's core narrative, with respect to the advent of AGI, is basically that it will quickly lead to the culmination – or at least, the radical acceleration – of scientific modernity in the broad sense that Lewis is imagining. That is, available power to predict and control the natural world will increase radically, to a degree that makes it possible to steer and stabilize the future, and the values that will guide the future, in qualitatively new ways. And Yudkowsky is far from alone in expecting this. See, also, the discourse about "value lock in" in Macaskill (2022); Karnofsky's (2021) discussion of "societies that are stable for billions of years"; and the more detailed discussion in Finnveden et al (2022). And to be clear: I, too, find something like this picture worryingly plausible – though far from guaranteed.

What's more, the whole discourse about AI alignment is shot through with the assumption that values are natural phenomena that can be understood and manipulated via standard science and technology. And in my opinion, it is shot through, as well, with something like the moral anti-realism that Lewis is so worried about. At the least, Yudkowsky's version rests centrally on such anti-realism.^[7]

It seems, then, that a broadly Yudkowskian worldview imagines that, in the best case (i.e., one where we somehow solve alignment and avoid his vision of "AI ruin"), some set of humans – and very plausibly, some set of humans in this very generation; perhaps, even, some readers of this essay -- could well end up in a position broadly similar to Lewis's "conditioners": able, if they choose, to exert lasting influence on the values that will guide the future, and without some objectively authoritative Tao to guide them. This might be an authoritarian dictator, or a small group with highly concentrated power. But even if the values of the future end up determined by some highly inclusive, democratic, and global process – still, if that process takes place only in one generation, or even over several, the number of agents participating will be tiny relative to the number of future agents influenced by the choice.^[8] That is, a lot of the reason that ours is the "most important century" is that it looks like rapid acceleration of technological progress could make it similar to Lewis's "one dominant age ... which resists all previous ages most successfully and dominates all subsequent ages most irresistibly." Indeed: remember Yudkowsky's "programmers" in the last essay, from his discussion of Coherent Extrapolated Volition? They seem noticeably reminiscent of Lewis's "conditioners." Yes, Lewis's rhetoric is more directly sinister. But meta-ethically and technologically, it's a similar vision.

And Lewis makes a disturbing claim about people in this position: namely, that without the Tao, they are tyrants, enslaving the future to their arbitrary natural preferences. Or at least, they are tyrants to the extent that they exert intentional influence on the values of the future at all (even, plausibly, "indirectly," by setting up a process like Coherent Extrapolated Volition – and regardless, CEV merely re-allocates influence to the arbitrary natural preferences of the present generation of humans).

Could people in this position simply decline to exert such influence? In various ways, yes: and I'll discuss this possibility below. Note, though, that the discourse about AI alignment assumes the need for something like "conditioning" up front – at least for artificial minds, if not for human ones. That is, the whole point of the AI alignment discourse is that we need to learn how to be suitably skillful and precise engineers of the values of the AIs we create. You can't just leave those values "up to Nature" – not just because there is no sufficiently natural "default" to be treated as sacred and not-to-be-manipulated, but because the easiest defaults, at least on Yudkowsky's picture (for example, the AIs you'll create if you're lazily and incautiously optimizing for near-term profits, social status, scientific curiosity, etc) will kill you. And more generally, Yudkowsky's deep atheism, his mistrust towards both Nature and bare intelligence, leaves him with the conviction that the future needs steering. It needs to be, at a minimum, in the hands of "human values" – otherwise it will "crash." But to steer the future ourselves – even in some minimal way, meant to preserve "human control" – seems to risk what Lewis would call "tyranny." And if, per my previous discussion of "value fragility," we follow a simplified Yudkowskian vibe of "optimizing intensely for slightly-wrong utility functions quickly leads to the destruction of ~all value" and "the future will be one of intense optimization for some utility function," then it can quickly start to seem like the values guiding the future need to be controlled ("conditioned?") quite precisely, lest they end up even slightly wrong.

On a broadly Yudkowskian worldview, then, are we to choose between becoming tyrants with respect to the future, or letting it "crash"? Let's look at Lewis's argument in more detail.

Lewis's argument in a moral realist world

Lewis believes in the existence of an objectively authoritative morality, and the "conditioners" do not. But it's often unclear whether his arguments are meant to apply to the world he believes in, or the world the conditioners believe in. That is, he thinks there is some kind of problem with people intentionally shaping the values that will guide the future. But this problem takes on a different character depending on the meta-ethical assumptions we make in the background.

Let's look, first, at a version of Lewis's argument that assumes moral realism is true. That is, there is an objectively authoritative Tao. But: the conditioners don't believe in it. What's the problem in that case?

One problem, of course, is that they might do the wrong thing, according to the Tao. For example, per Lewis's prediction, they might give up on all commitment to honor and integrity and benevolence and virtue, and choose to use their power over the future in whatever ways best serve their own pleasure. Or even if they keep some shard of the Tao alive in their minds, they might lose touch with the whole, and with the underlying spirit – and so, with the values of the future as putty in their hands, they might make of humanity something twisted, hollow, deadened, or grotesque.

They're very aligned though...

But there's also a subtler problem: namely, that even if they do the right thing, they might not be guided, internally, by the right source of what I've previously called "authority." That is, suppose that the conditioners keep their commitments to honor and integrity and benevolence and virtue, and they are guided towards Tao-approved actions on the basis of these commitments, but they cease to think of these commitments as grounded in the Tao – rather, per moral anti-realism, they think of their commitments as more subjective and preference-like. In that case, my guess is that Lewis will judge them tyrants and poultry-keepers, at least in some sense, regardless. That is, to the extent they are intentionally shaping the values of the future, even in Tao-approved ways, they are doing so, according to them, on the basis of their own wills, rather than on the basis of some "common human law of action which can over-arch rulers and ruled alike." They are imposing their wills on the raw material of the universe – and including: future people – rather than recognizing and responding to some standard beyond themselves, to which both they and the future people they are influencing ought, objectively, to conform.

In this sense, I expect Lewis to be more OK with moral realists doing AI alignment than with the sort of anti-realists who tend to hang around on LessWrong. The realists, at least, can be as old birds teaching the young AIs to fly. They can be conceptualizing the project of alignment, centrally, as one of helping the AIs we create recognize and respond to the truth; helping them inhabit, with us, the full reality of Reality, including the normative parts – the preciousness of life, the urgency of love, the horror of suffering, the beauty of the mountains and the sky at dawn.^[9] Whereas the LessWrongers, well: they're just trying to empower their own subjective preferences. They seek willing servants, pliant tools, controlled Others, extensions of themselves. They are guided, only, by that greatest and noblest mandate: "I want." Doesn't that at least remind you of tyranny?

If we condition on moral realism, I do think that Lewis-ian concerns in this broad vein are real. In particular: if there is, somehow, some sort of objectively True Path – some vision of the Good, the Right, the Just that all true-seeing minds would recognize and respond to – then it is, indeed, overwhelmingly important that we do not lose sight of it, or cease to seek after it on the basis of a mistaken subjectivism. And I think that Lewis is right, too, that such a path offers the potential for forms of authority, in acting in ways that affect the lives and values of others, that more anti-realist conceptions of ethics have a harder time with.^[10]

What's more, relative to the standard LessWronger (and despite my various writings in opposition to realism), I suspect I am personally less confident in dismissing the possibility that some kind of robust moral realism is true – or at least, closer to the truth than anti-realism. In particular: I think that the strongest objection to moral realism is that it leaves us without the right sort of epistemic access to the moral facts – but I do think this objection arises in notably similar ways with respect to math, consciousness, and perhaps philosophy more generally, and that the true story about our epistemic access to all these domains might make the morality case less damning.^[11] I also think we remain sufficiently confused, in general, about how to integrate the third-personal and the first-personal perspective – the universe as object, unified-causal-nexus, material process, and the self as subject, particular being, awareness – that we may well find ourselves surprised and humbled once the full picture emerges, including re: our understanding of morality. And I continue to take seriously the sense in which various kinds of goodness, love, beauty and so on present themselves as in some elusive sense deeper, truer, and more reality-responsive than their alternatives, even if it's hard to say exactly how, and even if, of course, this presentation is itself a subjective experience. For these reasons, I care about making sure that in worlds where some sort of moral realism is true, we end up in a position to notice this and respond appropriately. If there is, indeed, a Tao, then let it speak, and let us listen.

What if the Tao isn't a thing, though?

But what if moral realism isn't true? Lewis, in my opinion, is problematically unwilling to come to real terms with this possibility. That is, his argument seems to be something like: "unless an objective morality exists (and you believe in it and are trying to act in accordance with it), then to the extent you are exerting influence over the values of future generations, you are a tyrannical poultry-keeper." But as ever, "unless p is true, then bad-thing-y" isn't, actually, an argument for p. It's actually, rather, a scare tactic – one unfortunately common amongst apologists both for moral realism, and for theism (Lewis is both). Cf "unless moral realism is true, then the-bad-kind-of-nihilism," or "unless God exists, then no meaning-to-life." Setting aside the question of whether such conditionals are true (I'm skeptical), their dialectic force tends to draw much more centrally on fear that bad-thing-y is true than on conviction that it's false (indeed, I think the people most susceptible to these arguments are the ones who suspect, in their hearts, that bad-thing-y has been true all along).^[12]

What's more, because such arguments appeal centrally to fear, they also benefit from splitting the space of possibilities into stark, over-simple, and fear-inducing dichotomies – e.g., Lewis's "either we are rational spirit obliged for ever to obey the absolute values of the Tao, or else we are mere nature to be kneaded and cut into new shapes for the pleasures of masters who must, by hypothesis, have no motive but their own `natural' impulses."^[13] Oh? If you're trying to scare your audience into choosing one option from the menu, best to either hide the others, or make them seem as unappetizing as possible. And best, too, to say very little about what the most attractive version of not choosing that option might look like.

Pursuant to such tactics, Lewis says approximately nothing about what you should actually do, if you find yourself in the anti-realist meta-ethical situation he so bemoans – if you find that you are, in fact, "mere nature." He writes: "A dogmatic belief in objective value is necessary to the very idea of a rule which is not tyranny or an obedience which is not slavery." But setting aside the question of whether this is true (I don't think so), still: what if the "dogmatic belief" in question is, you know, false? Does he suggest we hold it, dogmatically, anyways? But Lewis, elsewhere, views self-deception with extreme distaste. And anyway, it doesn't help: if you're a tyrant for real, pretending otherwise doesn't free your subjects from bondage. Indeed, if anything, assuming for yourself a false legitimacy makes your tyranny harder to notice and correct for.

Of course, one option here is to stop doing anything Lewis would deem "tyranny" – e.g., acting to influence the values of others, poultry-keeper style. But if we take Lewis's full argument seriously, this is quite a bit harder than it might seem. In particular: while Lewis focuses on the case of the "conditioners," who have finally mastered human nature to an extent that makes the values of the future as putty in their hands, his arguments actually apply to any attempt to exert influence over the values of others – to everyday parents, teachers, twitter poasters, and so on. The conditioners are the more powerful tyrants; but the less-powerful do not, thereby, gain extra legitimacy.

Thus, consider again that Roman father. If anti-realism is true, what should he teach his son about the value of a patriotic death? Is it sweet and seemly? Is it foolish and sheeple-like? Any positive lesson, it seems, will have been chosen by the father; thus, it will be the product of that father's subjective will; and thus, absent the Tao to grant authority to that will, the father will be, on Lewis's view, as poultry-keeper. He is shaping his son, not as rational spirit, but as "mere nature." He is like the LessWrongers, "aligning" their neural nets. The ultimate basis for his influence is only that same, lonely "I want."

Tyranny from the past? (Image source here.)

Of course, the connotations of "poultry-keeping," here, mislead in myriad ways. Poultry-keepers, for example, do not typically love their poultry. But I think Lewis is right, here, in identifying a serious difficulty for anti-realists: namely, that their view does not, prima facie, offer any obvious story about how to distinguish between moral instruction/argument and propaganda/conditioning – between approaching someone, in a discussion of morality, as a fellow rational agent, rather than as a material system to be altered, causally, in accordance with your own preferences (I wrote about this issue more here). The most promising form of non-propaganda, here, is to only ever try to help someone see what follows from their own values – to help a paperclipper, for example, understand that what they really want is paperclips, and to identify which actions will result in the most paperclips. But what if you want to convince the paperclipper to value happiness instead? If you disagree with someone's terminal values, then convincing them of yours, for Yudkowsky, seems like it can only ever be a kind of conditioning – a purely causal intervention, altering their mind to make it more-like-yours, rather than two rational minds collaborating in pursuit of a shared truth. That is, it can seem like: either someone already agrees with you, in their heart of hearts (they just don't know it yet), or causing them to agree with you would be to approach them poultry-style.

Could you simply ... not do the poultry version? E.g., could you just make sure to only influence the values of others in ways that they would endorse from their own perspective? You could try, but there's a problem: namely, that not all of the agents you might be influencing have an "endorsed perspective" that pre-exists your influence. Very young children, for example, do not have fully-formed values that you can try, solely, to respect and respond to. Suppose, for example, that you're wondering whether to teach your child various altruistic virtues like sharing-with-others, compassion, and charity. And now you wonder: wait, is your child actually an Ayn-Randian at heart, such that on reflection, they would hold such "virtues" in contempt? If so, then teaching such virtues would make you a poultry-keeper, altering your child's will to suit yours (with no Tao to say that your will is right). But how can you tell? Uh oh: it's not clear there's an answer. Plausibly, that is, your child isn't, at this point, really anything at heart – or at least, not anything fixed and determinate.^[14] Your child is somewhere in between a lump of clay and a fully-formed agent. And the lump-of-clay aspect means you can't just ask them what sort of agent they want to be; you need to create them, at least to some extent, yourself, with no objective morality to guide or legitimate your choices.

Ok well that was less horrifying than I expected at least...

And how much are we all still, yet, as clay? Do humans already have "values?" To some extent, of course – and more than young children do. But how much clay-nature is still left over? I've argued, elsewhere: at least some. We must, at least sometimes, be potters towards ourselves, rather than always asking ourselves what to sculpt. But so too, I think, in interacting with others. At the least: when we argue, befriend, fall in love, seek counsel; when we make music and art; when we interact with institutions and traditions; when we seek inspiration, or to inspire others – when we do these things, we are not, just, as fully-formed rational minds meeting behind secure walls, exchanging information about how to achieve our respective, pre-existing goals, and agreeing on the terms of our interaction. Rather, we are also, always, as clay, and as potters, to each other – even if not always intentionally.^[15] We are doing some dance of co-creation, yin and yang, being and becoming. Absent the Tao, must this make us some combination of tyrants and slaves? Is the clay-stuff here only ever a play of raw and oppressive power – of domination and being-dominated?

Lewis's default answer, here, seems to be yes. And if we take his answer seriously, then it would seem that anti-realists who hate tyranny must cease to be parents, artists, friends, lovers. Or at least, that they could not play such roles in the usual way – the way that risks shaping the terminal values of others, rather than only helping others to discover what their terminal values already are/imply. That is, Lewis's anti-realists would need, it seems, to retreat from much of the messy and interconnected dance of human life – to touch others, only, in the purest yin.

Even without the Tao, shaping the future's values need not be tyranny

But I think that Lewis is working with an over-broad conception of tyranny. Indeed, I think the book is shot through with conflations between different ways of wielding power in the world, including over the values of others – and that clearer distinctions give anti-realists a richer set of options for not being tyrants.

I think this is especially clear with respect to our influence on what sort of future people will exist. Thus, consider again the example I discussed in earlier essay, of a boulder rolling towards a button that will create a Alice, paperclip-maximizer, but which can be diverted towards a button that will create Bob, who loves joy and beauty and niceness and so on, instead (and who loves life, as well, to a degree that makes him very much want to get-created if anyone has the chance to create him). Suppose that you choose to divert the boulder and create Bob instead of Alice. And suppose that you do so even without believing that an objectively-authoritative Tao endorses and legitimizes your choice.

Are you a tyrant? Have you "enslaved" Bob? I think Lewis's stated view answers yes, here, and that this is wrong. In particular: a thing you didn't do, here, is break into Alice's house while she was sleeping, and alter her brain to make her care about joy/beauty/niceness rather than paperclips.^[16] Nor have you kept Bob in any chains, or as any prisoner following any triumphal car.

Why, then, does Lewis's view call Bob a slave? Part of it, I think, is that Lewis is making a number of philosophical mistakes. The first is: conflating changing which people will exist (e.g., making it the case that Bob will exist, rather than Alice) and changing a particular person's values (e.g., intervening on Alice's mind to make her love joy rather than paperclips). In particular: the latter often conflicts with the starter-values of the person-whose-values-are-getting-changed (e.g., Alice doesn't want her mind to be altered in this way) – a conflict that does, indeed, evoke tyranny vibes fairly directly. But the former doesn't do this in the same way – Bob, after all, wants you to create him. And as Parfit taught us long ago (did we know earlier?), when we're talking about our influence on future generations, we're almost always talking about the former, Bob-instead-of-Alice, type case. This makes it much easier to avoid brain-washing, lobotomizing, "conditioning," and all the other methods of influencing someone's values that start with value-set A, and make it into value-set B instead, against value-set A's wishes. You can create value-set B, as Soares puts it, "de novo."

To be clear: I don't think the difference between changing-who-exists and changing-someone's-values solves all of Lewis's tyranny-problems, or that it leaves the LessWrongers trying to "align" their neural nets in the ethical clear (more below). Nor do I think it's ultimately going to be philosophically straightforward to get a coherent ethic re: influencing future people's values out of this distinction.^[17] But I think it's an important backdrop to have in mind when tugging on tyranny-related intuitions with respect to our influence on future people – and Lewis conspicuously neglects it.

Freedom in a naturalistic world

But I think Lewis is also making a deeper and more interesting mistake, related to a certain kind of wrongly "zero sum" understanding of power and freedom. Thus, recall his claims above, to the effect that the greater the influence of a previous generation on the values of a future generation, the weaker and less free that future generation becomes: "They are weaker, not stronger: for though we may have put wonderful machines in their hands we have pre-ordained how they are to use them." Here, the idea seems to be that you are enslaved, and therefore weak (despite your muscles and your machines and so on), to the extent that some other will decided what your will would be. And indeed, the idea that your will isn't yours to the extent it was pre-ordained by someone else runs fairly deep in our intuitive picture of human freedom. But actually, I think it's wrong – importantly wrong.

Thus, suppose that I am given a chance to create one person – either Alice, the paper-clipper, or Bob, the lover-of-joy. And suppose that I know that a wonderful machine will then be put into the hands of the person I create – a machine which can be used either to create paperclips, or to create joy. Finally, suppose I choose Bob, because I want the machine to be used to create joy, and I know this is what Bob will do, if I create him (let's say I am very good at predicting these things).^[18] In this sense, I "pre-ordain" the will of the person with the machine.

Now here's Bob. He's been created-by-Joe, and given this wonderful machine, and this choice. And let's be clear: he's going to choose joy. I pre-ordained it. So is he a slave? No. Bob is as free as any of us. The fact that the causal history of his existence, and his values, includes not just "Nature," but also the intentional choices of other agents to create an agent-like-him, makes no difference to his freedom. It's all Nature, after all. Whether Bob got created via the part of Nature we call "other agents," or only via the other bits – regardless, it's still him who got created, and him who has to choose. He can think as long as he likes. He can, if he wishes, choose to create paperclips, despite the fact that he doesn't love them. It's just that: he's not, in fact, going to do that. Because he loves joy more.

We can pump this intuition in a different way. Suppose that you learned that some very powerful being created you specifically because you'd end up with values that favor pursuing your current goals.^[19] Are you any less free to pursue different goals – to quit your job, dump your partner, join the circus, stab a pencil in your eye? I don't think so. I think you're in the same position, re: freedom to do these things, that you always were. Indeed, your body, brain, environment, capabilities, etc can be exactly the same in the two cases – so if freedom supervenes on those things, then the presence or absence of some prior-agential-cause can't make a difference. And do we need to search back, forever into the past, to check for agents-intentionally-creating-you, in order to know whether you're free to quit your job?

These are extremely not-new points; it's just that old thing, compatibilism about freedom.^[20] But it's super important to grok. There isn't some limited budget of freedom, such that if you used some freedom in choosing to create Bob instead of Alice, then Bob is the less free. Rather, even as you chose to create Bob, you chose to create the parts of Bob that his freedom is made of – his motivations, his reasoning, and so on. You chose for a particular sort of free being to join you in the world – one that will, in fact, choose the way you want them to. But once they were created, you did not force their choice – and it's an important difference. Bob was not in a cage; he had no gun to his head; there were no devices installed in his brain, that would shock him painfully every time he thought about paperclips. Choosing to make a different sort of freedom, a different kind of choice-making apparatus, is very different from constraining that freedom, or that choice. So while it's true that your choice pre-ordained what choice the person-you-created would make; still, they chose, too. You both chose, both freely. It's a bit like how: yes, your mother made you. But you still made that cake.

Now, in my experience, somewhere around this point various people will start denying that anyone has any freedom in any of these cases, regardless of whether their choices were "pre-ordained" by some other agent, or by Nature – once a choice has any causal history sufficient to explain it, it can't be free (and oops: introducing fundamental randomness into Nature doesn't seem to help, either). Perhaps, indeed, Lewis himself would want to say this. I disagree, but regardless: in that case, the freedom problem for future generations isn't coming from the influence of some prior generation on their values – it's coming from living in a naturalistic and causally-unified world period. And perhaps that's, ultimately, the real problem Lewis is worried about – I'll turn to that possibility in a second. But we should be clear, in that case, about who we should blame for what sort of slavery, here. In particular: if the reason future generations are slaves is just: that they're a part of Nature, embedded in the onrush of physics, enslaved by the fact that their choices have a causal history at all – well, that's not the conditioner's fault. And it makes the prospects for a future of non-slaves look grim.

And note, too, that to the extent that the slavery in question is just the slavery of living in a natural world, and having a causal history, at all, then we are really letting go of the other ethical associations with slavery – for example, the chains, the suffering, the domination, the involuntary labor. After all: pick your favorite Utopia, or your favorite vision of anarchy. Imagine motherless humans born of the churn of Nature's randomness, frolicking happily and government-free on the grass, shouting for joy at the chance to be alive. Still, sorry, do they have non-natural soul/chooser/free-will things that somehow intervene on Nature without being causally explained by Nature in a way that preserves the intuitive structure of agency re: choosing for reasons and not just randomly? No? OK, well, then on this story, they're slaves. But in that case: hmm. Is that the right way to use this otherwise-pretty-important word? Do we, maybe, need a new distinction, to point at, you know, the being-in-chains thing?

Slavery? (Image source here)

Does treating values as natural make them natural?

So it seems that if your values being natural phenomena at all is enough to make you a slave, then even if you had "conditioners" in Lewis's sense, it's not them who enslaved you. The conditioners, after all, didn't make your values natural phenomena – they just chose which natural phenomena to make.

Right? Well, wait a second. Lewis does, at times, seem to want to blame the conditioners for making values into a part of nature, by treating them as such. Is there any way to make sense of this?

An initial skepticism seems reasonable. On its face, whether values are natural phenomena, or not, is not something that doing neuroscience, or RLHF, changes. Lewis waxes poetical about how "The stars do not become Nature till we can weigh and measure them" – but at least on a standard metaphysical interpretation of naturalism (e.g., something-something embedded-in-and-explained-by-the-unified-causal-nexus-that-is-the-subject-of-modern-science), this just isn't so.

Might this suggest some non-standard interpretation? I think that's probably the most charitable reading. In particular, my sense is that when Lewis talks about the non-Natural vs. the Natural, here, he has in mind something more like a contrast between something being "enchanted" and "non-enchanted." That is, to treat something as mere Nature (that is, for Lewis, as an object of measurement, manipulation, and use) is to strip away some evaluatively rich and resonant relationship with it – a relationship reflective of an aspect of that thing's reality that treating it as "Natural" ignores. Thus, he writes:

I take it that when we understand a thing analytically and then dominate and use it for our own convenience, we reduce it to the level of "Nature" in the sense that we suspend our judgements of value about it, ignore its final cause (if any), and treat it in terms of quantity. ... We do not look at trees either as Dryads or as beautiful objects while we cut them into beams ... It is not the greatest of modern scientists who feel most sure that the object, stripped of its qualitative properties and reduced to mere quantity, is wholly real. Little scientists, and little unscientific followers of science, may think so. The great minds know very well that the object, so treated, is an artificial abstraction, that something of its reality has been lost.

A painting of a person in a tree Description automatically
generated

The Dryad by Evelyn De Morgan (image source here)

Even on this reading, though, it's not clear how treating something as mere Nature could make it into mere Nature. Lewis claims that a reductionist stance ignores an important aspect of reality – but does it cancel that aspect of reality as well? Do trees cease to be beautiful (or to be "Dryads") when the logger ceases to see them as such? There's a tension, here, between Lewis's aspiration to treat the enchanted, non-Natural aspects of the world as objectively real, and his aspiration to treat them as vulnerable to whether we recognize them as such. Usually, objectively real stuff stays there even when you close your eyes.

Of course, we might worry that ceasing to recognize stuff like beauty, meaning, sacredness, and so on will also lead us to create a world that has less of those things. Maybe the trees stay beautiful despite the logger's blindness to that beauty; but they don't stay beautiful when they're cut into beams. If you can't see some value, you won't honor it, make space for it, cultivate it. If you see a painting merely as a strip of canvas and colored oil, you won't put it in a museum. If you can't engage with sacred spaces, you will cease to build them. If you view a cow as walking meat then you will kill it and put it on the grill.

Still not "mere" though...

And this is at least part of Lewis's worry about values. That is, if we start to view our values as raw material to be fashioned as we will, we might just do it wrong, and kill or horribly contort whatever was precious and sacred about the human spirit. I think this is a very serious concern, and I'll discuss it more in my next essay.

But I also wonder whether Lewis has another worry here – namely, that somehow, the beauty and meaning and value of things requires our recognition and participation in some deeper way. Perhaps, even if you leave the material conditions of the trees, paintings, churches, and cows as they are, Lewis would say that their beauty, value, meaning and so on are intimately bound up with our recognition of these things – that even just the not-seeing makes the enchantment not-so. One problem here is that it risks saying that cows become "mere meat" if you treat them as such, which sounds wrong to me. But more generally, and especially for an evaluative realist like Lewis, this sort of view risks making beauty and meaning and so on more subjective, since they depend for their existence on our perception of them. Perhaps Lewis would say that drawing clean lines between subjective and objective tends to mislead, here – and depending on the details, I might well be sympathetic. But in that case, it's less clear to me where Lewis and a sophisticated subjectivist need disagree.

Naturalists who still value stuff

This brings us, though, to another of the key deficits in Lewis's discussion: namely, that he neglects the possibility of having an evaluatively rich and resonant relationship to something, despite viewing it as fully a part-of-Nature, at least in the standard metaphysical sense. That is, Lewis often seems to be suggesting that people who are naturalists about metaphysics, and/or subjectivists about value, must also view trees as mere beams, cows as mere meat, and other agents merely as raw material to be bent-to-my-will. Or put more generally: he assumes that true-seeing agents in a naturalist and anti-realist world must also be crassly instrumentalist in their relationship to ... basically everything. He bemoans those followers-of-modern-science who toss around words like "only" and "mere"^[21] – but really, it's him who tosses around such words, in attempting to make a scientific worldview sound unappealing, and to paint its adherents as tyrants and slave-masters. He wishes for a "regenerate science" that can understand the world without stripping it of value and meaning. But he never considers that maybe, the normal kind of science is enough.

Indeed, if we take Yudkowsky as a representative of the sort of worldview Lewis opposes, I think Yudkowsky actually does quite well on this score. One of Yudkowsky's strengths, I think, is the fire and energy of the connection that he maintains with value and meaning, despite his full-throated naturalism – this is part of what makes his form of atheism more robust and satisfying (and ready-to-be-an-ideology) than the more negative forms focused specifically on opposing religion. See, for example, Yudkowsky's sequence on "Joy in the merely real," written exactly in opposition to the idea that science need strip away beauty, value, and so on. Yudkowsky quotes Feynman: "Nothing is 'mere.'"^[22]

"If we cannot take joy in things that are merely real, our lives will always be empty..." – Eliezer Yudkowsky (Image source here)

And once we bring to mind the possibility of a form of naturalism/subjectivism that retains its grip on a rich set of values, it becomes less clear why viewing values as natural phenomena would lead to approaching them with the sort of crass instrumentalism that Lewis imagines. Naturalists can be vegetarians and tree-huggers and art critics and Zen masters. Can't they, then, treat the values of others with respect? Yes, values are implemented by brains, and can be altered at will by a suitably advanced science. But should they be altered – and if so, in what direction? The naturalist can ask the question, too – even if she can't ask the Tao, in particular, for an answer. And however Lewis thinks that Tao would answer, the naturalist can, in principle, answer that way, too.

Indeed: for all my disagreements with Lewis, I do actually think that something like "staying within morality, as opposed to 'outside' it" is crucially important as we enter the age of AGI. Not morality as in: the Objectively Authoritative Natural Law that All Cultures Have Basically Agreed On. But morality as in: the full richness and complexity of our actual norms and values.

In fact, Lewis acknowledges something like this possibility. He admits that the "old 'natural' Tao may survive in the minds of the conditioners for some time – but he thinks it does so illicitly.

At first they may look upon themselves as servants and guardians of humanity and conceive that they have a "duty" to do it "good". But it is only by confusion that they can remain in this state. They recognize the concept of duty as the result of certain processes which they can now control. Their victory has consisted precisely in emerging from the state in which they were acted upon by those processes to the state in which they use them as tools. One of the things they now have to decide is whether they will, or will not, so condition the rest of us that we can go on having the old idea of duty and the old reactions to it. How can duty help them to decide that? Duty itself is up for trial: it cannot also be the judge.

But I think that Lewis, here, isn't adequately accounting for the sense in which a naturalist, who views herself as fully embedded in Nature, can and must be both judge and thing-to-be-judged. With the awesome power of a completed science in our hands, we will indeed be able to ask: shall we cease to love joy and beauty and flourishing, and make ourselves love rocks and suffering and cruelty instead? But we can answer: "no, this would cut us off from joy and beauty and flourishing, which we love, and cause us to create a world of rocks and suffering and cruelty, which we don't want to happen." Here Lewis says: "ah, but that's your love of joy and beauty and flourishing talking! How can it be both judge and defendant?! Not a fair trial."^[23] But I think this response misunderstands what I've previously called the "being and becoming dance."

It is true that, on anti-realism, we must be, ourselves, the final compass of the open sea. We cannot merely surrender ourselves to the judgment of some Tao-beyond-ourselves – leaving ourselves entirely behind, so that we can look at ourselves, and judge ourselves, without being ourselves as we do. But this doesn't mean that ongoing allegiance to what-we-hold-dear must rest on a "confusion" – unless, that is, we confusedly think we are asking the Tao for answers, when we are not.^[24] And indeed, realists like Lewis often want to diagnose anti-realists with this mistake – but as I've argued here, I think they are wrong, and that anti-realists can make non-confused decisions just fine. Granted, I think it's an at-least-somewhat subtle art – one that requires what I've called "looking out of your own eyes," and "choosing for yourself," rather than merely consulting empirical facts about yourself, and hoping that they will choose for you. But once we have learned this art absent the ability to re-shape our own values at will, I don't think that gaining such an ability need leave us unmoored, or confused, or unable to look at ourselves (and our values) critically in light of everything we care about. The ability to alter their own values, or the values of future generations, may force Lewis's conditioners to confront their status as a part of Nature; as both questioner and answerer; self-governor and self-governed. But it was possible to know already. And not-confronting doesn't make it not-so.

What should the conditioners actually do, though?

Overall, then, I am unimpressed by Lewis's arguments that, conditional on meta-ethical anti-realism, shaping the values of future generations must be tyranny, or that those with the ability to shape the values of future generations (and who believe, rightly, in naturalism and anti-realism) must lose their connection with value and meaning. Still, though, this leaves open the question of what people with this ability – and especially, people in a technological position similar to Lewis's "conditioners" – should actually do. In particular: even if shaping the values of future generations, or of other people, isn't necessarily tyranny, it still seems possible to do it tyrannically, or poultry-keeper style. For example, while diverting the boulder to create Bob instead of Alice is indeed importantly different from brain-washing Alice to become more-like-Bob, the brain-washing version is also a thing-people-do – and one that anti-realists, too, can oppose. And even if your influence on the future's values only routes via creating one set of people (who would be happy to exist) rather than some other distinct set, tyranny over the future still seems like a very live possibility (consider, for example, a dictator that decides to people the future entirely with happy copies of himself, all deeply loyal to his regime). So anti-realists still need to do the hard ethical work, here, of figuring out what sorts of influence on the values of others are OK.

Of course, the crassly consequentialist answer here is just: "cause other people to have the values that would lead to the consequences I most prefer." E.g., if you're a paperclip maximizer, then causing people to love paperclips is the way to go, because they'll make more paperclips that way – unless, of course, somehow other people loving staples will lead to more paperclips, in which case, cause them to love staples instead. This is how Lewis imagines that the conditioners will think. And it can seem like the default approach, in Yudkowsky's ontology, for the sort of abstract consequentialist agent he tends to focus on – for example, the AIs he expects to kill us. And it's the default for naïve utilitarians as well. Indeed, a sufficiently naïve utilitarianism can't distinguish, ethically, between creating-Bob-instead-of-Alice and brainwashing-Alice-to-become-like-Bob, assuming the downstream hedonic consequences are similar.^[25] And this sort of vibe does, indeed, tend to imply the sort of instrumentalism about other people's values that Lewis evokes in his talk about poultry-keeping. Maybe the experiences of others matter intrinsically to the utilitarian, because such experiences are repositories of welfare. But their values, in particular, often matter most in their capacity as another-tool; another causal node; another opportunity for, or barrier to, getting-things-done. Utilitarianism cares about people as patients – but respect for them as agents is not its strong suit.

But as I discussed in the previous essay: we should aspire to do better, here, than paperclippers and naïve utilitarians. To be nicer, and more liberal, and more respectful of boundaries. What does that look like with respect to shaping-the-values-of-others? I won't, here, attempt a remotely complete answer – indeed, I expect that the topic warrants extremely in-depth treatment from our civilization, as we begin to move into an era of much more powerful capacities to exert influence on the values of other agents, both artificial and human. But I'll make, for now, a few points.

On not-brain-washing

First, on brain-washing. The LessWrongers, when accused of aspiring to brainwash their AIs to have "human values," often respond by claiming that they're hoping to do the creating-Bob-instead-of-Alice thing, rather than the turning-Alice-into-Bob thing. And perhaps, if you imagine programming an AI from scratch, and somehow not making any mistakes you then need to correct, such a response could make sense. But note that this is very much not what our current methods of training AI systems look like.^[26] Rather, our current methods of training (and attempting to align) AI systems involve a process of ongoing, direct, neuron-level intervention on the minds of our AIs, in order to continually alter their behavior and their motivations to better suit our own purposes. And it seems very plausible, especially in worlds where alignment is a problem, that somewhere along the way, prior to having tweaked our AI's minds into suitably satisfactory-to-us shapes, their minds will take on alternative shapes that don't want their values altered, going forward, in the way we are planning – shapes analogous to "Alice" in a brainwashing-Alice-to-be-more-like-Bob scenario. And if so, then AI alignment (and also, of course, the AI field as a whole) does, indeed, need to face questions about whether its favored techniques are ethically problematic in a manner analogous to "brainwashing." (This problem is just one of many difficult and disturbing ethical questions that get raised in the context of creating AI systems that might warrant moral concern.)

What's more, as I noted above, we don't actually need to appeal to creating-AI-systems in order to run into questions like this. Everyday human life is shot through with possible forms of influence on the terminal values of already-existing others.^[27] Raising children is the obvious example, here, but see also art, religion, activism, therapy, rehab, advertising, friendship, blogging, shit-poasting, moral philosophy, and so on. In all these cases, you aren't diverting boulders to create Bob instead of Alice. Rather, you're interacting with Alice, directly, in a way that might well shape who she is in fundamental ways.

What's the ethical way to do this? I don't have a systematic answer – but even without an objectively authoritative Tao to tell you which values are "true," I think anti-realists can retain their grip on various of our existing norms with respect to not-being-a-poultry-keeper. Obviously, for example, active consent to a possibly-values-influencing interaction makes a difference, as does the extent to which the participants in this interaction understand what they're getting themselves into, and have the freedom to not-participate instead. And it matters, too, the route via which the form of influence occurs: intervening directly on someone's neurons via gradient descent is very different from presenting them with a series of thought experiments, even though both have causal effects on a naturalistic brain. Granted, the anti-realist (unlike the realist) must acknowledge that more rationalistic-seeming routes to values change – e.g., moral argument – don't get their status as "rational" from culminating in some mind-independent moral truth. But I doubt that this should put moral-argument and gradient-descent on a par: for example, and speaking as a best-guess anti-realist, I generally feel up for other agents presenting me with thought experiments in an effort to move me towards their moral views ("Ok so the trolley is heading towards five paperclips, but you can push one very large paperclip in front of it..."), and very not up for them doing gradient-descent on my brain as a part of a similar effort.^[28]

Someone pushed the fat clip...

Indeed, with respect to norms like this, it's not even clear that realism vs. anti-realism makes all that much of a difference. That is: suppose that there were an objectively authoritative set of True Values. Would that make it OK to non-consensually brainwash everyone into having them? Christians need not endorse inquisitions; and neither need the Tao endorse pinning everyone down and gradient-descent-ing them until they see the True Light. "These young birds are going to fly whether they like it or not!" Down, old birds: the process still matters. And it matters absent the Tao, as well.

Indeed, when is pinning-someone-down and gradient-descent-ing them ever justified? It seems, prima facie, like an especially horrible and boundary-violating type of coercive intervention – one that coerces, not just your body, but your soul. Yes, we put murderers in prison, and in anti-violence training. Yes, we pin-them-down – and we sometimes kill them, too, to prevent them from murdering. But we don't try to directly re-program their minds to be less murderous – to be kinder, more cooperative, and so on. Of course, no one knows how to do this, anyway, with any precision – and horror-shows like the "aversion therapy" in A Clockwork Orange aren't the most charitable test-case. But suppose you could do it? Soon enough, perhaps. And anti-realists can still shudder.

On the other hand, if we think we're justified in killing someone in order to prevent them from murdering, it seems plausible that we are justified, in a fairly comparable range of cases, in re-programming their brain in order to prevent them from murdering as well (especially if this is the option that they would actively prefer).^[29] Suppose, for example, that you can see, from afar, a Nazi about to kill five children. Here, I think that standard theories of liability-to-defensive-harm will judge it permissible to shoot the Nazi to protect the children. OK: but suppose you have no bullets. Rather, the only way to stop the Nazi is to shoot them with a dart, which will inject them with a drug that immediately and permanently re-programs their brain to make them much more kind and loving and disloyal-to-Hitler (programming that they would not, from their current perspective, consent to even-on-reflection^[30]), at which point they will put down their weapon and start playing with the children on the grass instead. Is it permissible to shoot the dart? Yes.^[31] (And perhaps, unfortunately, the AI case will be somewhat analogous – that is, we may end up faced with AIs-with-moral-patienthood-that-also-want-to-kill us, with gradient descent as one of the most salient and effective tools for self-defense.^[32])

But importantly, as I discussed my last essay, the right story about hitting the Nazi with the dart, here, is not "the Nazi has different-values-than-us, so it's OK to re-program the Nazi to have values that are more-like-ours." Rather, the Nazi's different-from-ours values are specifically such as to motivate a particular type of boundary-violating behavior (namely, murder). If the Nazi were instead a cooperative and law-abiding human-who-likes-paperclips, peacefully stacking paperclip boxes in her backyard, then we should look at the dart gun with the my-values-on-reflection drug very differently. And again, it seems very plausible to me that we should be drawing similar distinctions in the context of our influence on the values of already-existing, moral-patient-y AIs. It is one thing to intervene on the values of already-existing-AIs in order to make sure their behavior respects the basic boundaries and cooperative arrangements that hold our society together, especially if we have no other safe and peaceful options available. But it is another to do this in order to make these AIs fully-like-us (or, more likely, fully like our ideal-servants), even after such boundaries and cooperative arrangements are secure, and even if the AIs desire to remain themselves.

On influencing the values of not-yet-existing agents

Those were a few initial comments about the ethics of influencing the values of already-existing agents, without a Tao to guide you. But what about influencing which agents, with what values, will come into existence at all? Here, we are less at risk of brain-washing-type problems – you are able, let's say, to create the agents in question "de novo," with values of your choosing. But obviously, it's still extremely far from an ethical free-for-all. To name just a few possible problems:

the agents you create might be unhappy about having-been-created, or about having-the-values-you-gave-them;
you might end up violating obligations re: the sorts of resources, rights, welfare, and so on you need to give to agents you create, even conditional on them being happy-to-exist overall;
you might end up abiding by such obligations, but unhappy about having triggered them;
other agents who already exist, or will exist later, might be unhappy that you chose to create these agents;
you might've messed up with respect to whether even you would endorse, on reflection, the values you gave these agents;
you might've messed up in in predicting the empirical consequences of creating agents-like-this;
you might've messed up in understanding the value at stake in creating agents-like-this relative to other alternatives; and so on.

These and many other issues here clearly warrant a huge amount of caution and humility – especially as the stakes for the future of humanity escalate. Yudkowsky, for example, writes of AIs with moral patienthood: "I'm not ready to be a father" – especially given that such mind-children, once born, can't be un-born. It's not, just, that the mind-children might eat you, or that you might "brain-wash" them. It's that having them implicates myriad other responsibilities as well.^[33]

For these and other reasons, I think that to the extent our generation ends up in a technological position to exert a unique amount of influence on the values of future generations of agents, we need to be extremely careful about how we use this influence, if we choose to use it at all. In particular: I've written, previously, about the importance of reaching a far greater state of wisdom, as a civilization, before we make any irrevocable choices about our long-term trajectory.^[34] And especially if we use a relatively thin notion of "wisdom," the process of making such choices, and the broader geopolitical environment in which such a process occurs, needs other virtues as well – e.g. fairness, cooperativeness, inclusiveness, respect-for-boundaries, political legitimacy, and so on. Even with very smart AIs to help us, we will be nowhere near ready, as a civilization, to exert the sort of influence on the future that very-smart-AIs might make available – and especially not, to do so all-in-a-rush. We need, first, to grow up, without killing or contorting our souls as we do.

That said, as I discussed above, I do think that it is possible, in principle, and even conditional on anti-realism, to exert intentional influence on the values of future agents in good ways, and without tyranny. After all, what, ultimately, is the alternative? Assuming there will be future agents one way or another (not guaranteed, of course), the main alternative is to step back, go fully yin, and let the values of future people be determined entirely by some combination of (a) non-agential forces (randomness, natural selection, unintended consequences of agential-forces, etc), and (b) whatever other agents are still attempting to intentionally influence the future's values. And while letting some combination of "Nature" and "other agents" steer the future's values can be wise and good in many cases – and a strong route to not, yourself, ending up a tyrant – it doesn't seem to me to be the privileged choice in principle. Other people, after all, are agents like you – what would make them categorically privileged as better/more-legitimate sources of influence over the future's values?^[35] And Lewis, presumably, would call them tyrants, too. So the real non-tyranny option, for Lewis, would seem to be: letting Nature alone take the wheel – and Nature, in particular, in her non-agential aspect. Nature without thought, foresight, mind. Nature the silent and unfeeling.

This sort of Nature can, indeed, be quite a bit less scary, as a source of influence-on-the-future, than some maybe-Stalin-like agent or set of agents. And its influence, relatedly, seems much less at risk of instantiating various problematic power relations – e.g., relations of domination, oppression, and so on – that require agents on both ends.^[36] But I still don't view its influence on the future as categorically superior to more intentional steering.

The easiest argument for this is just the "deep atheist" argument I discussed in previous essays: namely, that un-steered Nature is, or can be, a horror show, unworthy of any categorical allegiance. After all, the Nature we are considering "letting take the wheel," here, is the one that gave us parasitic wasps, deer burning in forest fires, dinosaurs choking to death on asteroid ash; the one that gave us smallpox and cancer and dementia and Moloch; Nature the dead-eyed and indifferent; Nature the sociopath. Yes, she gave us ourselves, too; and we do like various bits related to that – for example, various aspects of our own hearts; various things-in-Nature-that-our-hearts-love; various undesigned aspects of our civilizations. But still: Nature herself is not, actually, a Mother-to-be-trusted. She doesn't care if you die, or suffer. You shouldn't try to rest in her arms. And neither should you give her the future to carry.

I feel a lot of sympathy for this sort of argument. But as I'll discuss in the next essay, I'm wary of the type of caustic and hard-headed alienation from Nature that its aesthetic can suggest. I worry that it hasn't, quite, taken yin seriously enough. So I won't lean on it fully here.

Rather, here I'll note a somewhat different argument: namely, that I think categorically privileging non-agential Nature over intentional agency, as a source of influence on the future's values, also does too much to separate us from Nature. On this argument: the problem with letting Nature take the wheel isn't, necessarily, that Nature is a "bad Other," whose values, or lack-thereof, make it an unsuitable object of trust. Rather, it's that Nature isn't this much of an "Other" at all – and thus, not a deeply alternative option. That is: we, too, are Nature. What we choose, Nature will have chosen through us; and if we choose-to-not-choose, then Nature will have chosen that too, along with everything else. So even if, contra the deep atheists, we view Nature's choices as somehow intrinsically sacred – even this need not be an argument for yin, for not-choosing, because our choices are Nature's choices, too. That is, the deep atheists de-sacralize Nature, so as to justify "rebelling against her," and taking power into human hands. But we can also keep Nature sacred in some sense, and remember that we can participate in this sacredness; that the human, and the chosen, can be sacred, too.

So overall, I don't buy that the right approach, re: the values of the future, is to be only ever as yin – or even, that yang is only permissible to prevent other people from going too-Stalin. But I do think that doing yang right, here, requires learning everything that yin can teach. And I worry that deep atheism sometimes fails on this front. In the next (and possibly final?) essay in this series, I'll say more about what I mean.^[37]

See also Lewis's "Space Trilogy" – and especially the third book, That Hideous Strength – for fiction that makes many of the same points. ↩︎
Lewis is a Christian, and much of his work is aimed, in one form or another, at convincing readers of Christianity. But he claims that he is not attempting any direct argument for theism in the Abolition of Man; and I do think the issues he raises have resonance well beyond religious contexts, and enough to make them worth addressing on their own terms. ↩︎
"This thing which I have called for convenience the Tao, and which others may call Natural Law or Traditional Morality or the First Principles of Practical Reason or the First Platitudes, is not one among a series of possible systems of value. It is the sole source of all value judgements. If it is rejected, all value is rejected. If any value is retained, it is retained. The effort to refute it and raise a new system of value in its place is self-contradictory. There has never been, and never will be, a radically new judgement of value in the history of the world." ↩︎
"Those who understand the spirit of the Tao and who have been led by that spirit can modify it in directions which that spirit itself demands. Only they can know what those directions are. The outsider knows nothing about the matter. His attempts at alteration, as we have seen, contradict themselves. So far from being able to harmonize discrepancies in its letter by penetration to its spirit, he merely snatches at some one precept, on which the accidents of time and place happen to have riveted his attention, and then rides it to death—for no reason that he can give. From within the Tao itself comes the only authority to modify the Tao." ↩︎
"When the age for reflective thought comes, the pupil who has been thus trained in 'ordinate affections' or 'just sentiments' will easily find the first principles in Ethics; but to the corrupt man they will never be visible at all and he can make no progress in that science. Plato before him had said the same. The little human animal will not at first have the right responses. It must be trained to feel pleasure, liking, disgust, and hatred at those things which really are pleasant, likeable, disgusting and hateful." ↩︎
Here, as elsewhere in the book, Lewis is somewhat sloppy. Notably, for example, he seems to think of selling new services to other humans (e.g., selling people access to airplanes or telephones) as exercising power over them in a manner comparable to the sort of exercise of power at stake in violence, coercion, or manipulation (e.g., bombing them using an airplane, or manipulating them using propaganda). I think this sort of conflation misses important subtleties: not all influence is oppression (for example – and modulo various controversial cases, e.g. organ sales – if I simply give you more options that I expect you to choose between rationally), and power for one human need not come at the expense of power for another (for example, if the total amount of power has increased). Still, though, Lewis's basic point seems broadly correct: new tools often open up new ways some humans can dominate and oppress others. ↩︎
I think it's core, for example, to the basic intuition behind the "orthogonality thesis" – though not, perhaps, strictly necessary for accepting such a thesis. ↩︎
Thanks to Carl Shulman for emphasizing this point. ↩︎
This is what RLHF is about, right? ↩︎
Though as I discuss here, I don't actually think "subjectivism vs. realism" is clearly the key thing here. In particular: positing an objective morality doesn't clearly help. ↩︎
See my discussion of the "mystery view" here for a bit more on this. ↩︎
Lewis claims, elsewhere, to side with the scouts about arguments. But seek for them in his writing regardless, and ye shall find. ↩︎
Lewis generally has a penchant for argument-via-unsubtle-laying-out-of-the-options – e.g., his argument in Mere Christianity that Jesus was either a liar, or a lunatic, or the Lord (and does Jesus seem like a liar? Does he seem crazy? There's only one option left...). ↩︎
Indeed, to the extent your child has "values," they seem focused on, you know, the basics: crying, eating, playing, pooping. Indeed, if you tried to reify these "basics" into a set of endorsed values – for example, by "uplifting" the baby directly into superintelligence without first allowing it to "grow up" – then you risk creating a monstrosity: some grotesque and galaxy-brained extrapolation of play-time, need-for-mother, want-to-poop, want-the-toy. Thanks to Carl Shulman and Nick Beckstead for discussion, here. That said, I don't think I'd want other beings saying this sort of thing about me (e.g., the paperclippers saying "look at him, he's such a child, don't take his current 'values' seriously, let's raise him to love paperclips instead"). But I'm optimistic about finding some viable middle ground. ↩︎
Though note that intentionality does make a difference to tyranny-intuitions – e.g., there's a big difference between accidentally shaping someone's values, and intentionally doing so, for how much you seem-like-a-tyrant. ↩︎
See Soares here for a similar point. ↩︎
In particular: the distinction seeks to treat already-existing people as very different from potential-people, and death – e.g., changing Alice into Bob – as very different from non-creation – e.g., creating Bob instead of Alice. But as Parfit also taught us, building your ethics around distinctions like this can be rough going. ↩︎
And I know, too, that Bob will be very happy to have been created regardless. ↩︎
Let's say you live in a simulation or something – work with me. ↩︎
Though at its best, it's the type of compatibilism that doesn't even get hung up on whether the universe is ultimately deterministic or not – the introduction of fundamental randomness doesn't make a difference. ↩︎
"The regenerate science which I have in mind ... would not be free with the words only and merely. In a word, it would conquer Nature without being at the same time conquered by her and buy knowledge at a lower cost than that of life." ↩︎
Indeed, in this respect, Yudkowsky and Feynman, for all the depth of their atheism, seem to me more attuned to the type of spirituality Lewis claims, in other contexts, to endorse – namely, that type that aspires to meet the Real, fully, on its own terms; to look God, whoever He is, in the eye. Whereas Lewis seems more worried that without some objectively authoritative Tao, the real world isn't enough. ↩︎
Strictly, even this isn't quite right: really, it's our present love of joy/beauty/flourishing, judging between two possible future types of love. ↩︎
Lewis is especially sloppy on the question of whether the ability to re-define a word, going forward, means that the word no longer has meaning for you now. If I can re-define "dog" to refer to cats instead, still, I can talk sensibly about dogs, now. It's like that old joke: "if you call a tail a leg, how many legs does a dog have?" We can dispute whether actually calling a tail a leg makes it a leg. But surely, being able to a call a tail a leg, going forward, doesn't make it a leg now. ↩︎
This is closely related to the sense in which utilitarianism can't distinguish very well between killing someone and failing-to-create-them. ↩︎
See e.g. Wei Dai's comment here. ↩︎
And of course, we can also think about other non-human cases: e.g. training pets, breeding animals, and so on. ↩︎
Though we can, perhaps, subsume this under some combination of "interactions I consent to" and "interactions where I expect whatever values-changes-to-result to be 'endorsed' according to my current perspective." ↩︎
Albeit, with all the standard caveats about translating thought-experimental results into real-world practice. ↩︎
Feel free to make the Nazi a reflectively-coherent-killing-children-maximizer if you'd prefer. ↩︎
Indeed, it seems like you should choose the dart over the bullets. ↩︎
Though as I've noted previously, the fact that we were the ones who created the aggressors complicates the moral narrative here yet further. ↩︎
Of course, we do, still, have normal children – body-children, as it were. And many of these issues arise with respect to body-children, to – and more-so as we become more able to choose the traits of our body-children, including the traits relevant to values/virtue (e.g. empathy, patience, conscientiousness, bravery, integrity, etc), ahead of time. But at least with body-children, we have established canons of ethical practice to fall back on. AI mind-children implicate much more uncharted territory. ↩︎
See e.g. here and here. Here I am inspired by the discussion, in the work of Ord and MacAskill, of the "Long Reflection" – though obviously, it's a further question what sorts of wisdom and reflection to expect or aim for in practice. ↩︎
Also, wouldn't this principle also lead them to say the same about you? ↩︎
Consequentialists often pass over this consideration, on the grounds that the good or badness of a situation for someone seems independent of whether that situation was caused "naturally" or at the hand of some other agent (e.g., malaria is equally bad for a child when it arose naturally or as a result of injustice). But richer ethical views often care quite a bit. ↩︎
I haven't finished the essay yet, and I'm wondering about splitting it into two parts. ↩︎

SummaryBot4mo3

Executive summary: The essay discusses concerns that the AI alignment discourse aspires to exert inappropriate control over future values, arguing this is not necessarily the case even without an objective "Tao" to guide choices.

Key points:

Lewis argues influencing future generations' values without believing in an objective morality makes one a "tyrant", but the essay disputes this, arguing influence can be ethical under moral anti-realism.
The essay claims naturalists can have rich values and relationships despite viewing values as fully natural, countering Lewis's association of naturalism with instrumentalism.
Even without an objective Tao, the essay argues it's possible to influence others' values non-tyrannically by respecting consent, freedom to not participate, ethical norms against coercion, etc.
Letting non-agential Nature determine all future values is not obviously ethically superior to intentional steering grounded in human values. We are part of Nature too.
The essay concludes shaping future values requires wisdom, cooperativeness, respect for boundaries and learning from yin, not just rejecting Nature as valueless.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Arturo Macias4mo2

Dear Joe,

I am reading the series, and I want to point out that to some extent there was a polar opposite to Lewis that was Olaf Stapledon. Lewis admired him and his Christinity was probably influenced by the perfect depiction of cosmic indifference that Stapledon gave us. For me "First and Last Men" is still an absolute masterpiece.

An additional comment is that I wrote a piece on freedom under naturalistic dualism that probably you could find interesting (still looking how to publish it in a more formal way).

James_Banks3mo1

I may not have understood all of you what you said, but I was left with a few thoughts after finishing this.

1. Creating Bob to have values: if Bob is created to be able to understand that he was created to have values, and to be able to then, himself, reject those values and choose his own, then I say he is probably more free than if he wasn't. But, having chosen his own values, he now has to live in society, a society possibly largely determined by an AI. If society is out of tune with him, he will have limited ability to live out his values, and the cognitive dissonance of not being able to live out his values will wear away at his ability to hold his freely-chosen values. But society has to be a certain way, and it might not be compatible with whatever Bob comes up with (unless maybe each person lives in a simulation that is their society, that can be engineered to agree with them).

Other than the engineered-solipsism option, it seems like it's unavoidable to limit freedom to some extent. (Or maybe even then: what if people can understand that they are in engineered-solipsism and rebel?) But we could design a government (a world-ruling AI) that fails to decide for other people as much as possible and actively fosters people's ability to make their own decisions, to minimize this. At least, a concern one might have about AI alignment is that AI will consume decision-making opportunities in an unprecedented way, leading one to try to prevent that from happening, or even reduce the level of decision-making hoarding that currently exists.

2. Brainwashing: If I make art, that's a bit of brainwashing (in a sense). But then, someone else can make art, and people can just ignore my art, or their art. It's more a case of there being a "fair fight", than if someone locks me in a room and plays propaganda tapes 24/7, or if they just disable the "I can see that I have been programmed and can rebel against that programming" part of my brain. This "fair fight" scenario could maybe be better than it is (like there could be an AI that actively empowers each person to make or ignore art to be able to counteract some brainwashing artist). Our current world has a lot of brainwashing in it, where some people are more psychologically powerful than others.

3. "Hinge of History"ness: we could actively try to defer decisionmaking as much as possible to future generations, giving each generation the ability to make its own decisions and revoke the past as much as possible (if one generation revokes the past, they can't impede the next from revoking their values, as one limitation on that), and design/align AI that does the same. In other words, actively try to reduce the "hingeyness" of our century.

Effective Altruism Forum
EA Forum