26

kbog comments on Cognitive Science/Psychology As a Neglected Approach to AI Safety - Effective Altruism Forum

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (37)

You are viewing a single comment's thread. Show more comments above.

Comment author: Kaj_Sotala 11 June 2017 02:49:19PM 4 points [-]

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first. Having now read it, I'd say that I agree with its claims with regard to criticism of Greene's approach. I don't think it disproves the notion of psychology being useful for defining human values, though, for I think there's an argument for psychology's usefulness that's entirely distinct from the specific approach that Greene is taking.

I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good". E.g. Muehlhauser & Helm 2012:

Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.

We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]

Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).

A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.

Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]

What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]

Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]

We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).

Yet the problem remains: the AI has to be programmed with some definition of what is good.

Now this alone isn't yet sufficient to show that philosophy wouldn't be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible. Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

We've already seen this in trying to define concepts: as philosophy noted a long time ago, you can't come up with a set of explicit rules that would define even any concept even as simple as "man" in such a way that nobody could develop a counterexample. "The Normative Insignificance of Neuroscience" also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:

... what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.

Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we've managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a "man" or "philosopher" or whatever.

So given that

  • we can't build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept

and

  • defining morality looks similar to defining concepts, in that we can't build explicit verbal models of what morality is

it would seem reasonable to assume that

  • we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable

But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI's reasoning process should take into account those considerations. And we've already established that defining those considerations on a verbal level looks insufficient - they have to be established on a deeper level, of "what are the actual computational processes that are involved when the brain computes morality".

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Comment author: kbog  (EA Profile) 21 June 2017 10:44:38AM *  0 points [-]

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first.

Great!

But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

Restricting analysis to the Western tradition, 2500 years ago we barely had any conception of virtue ethics. Our contemporary conceptions of virtue ethics are much better than the ones the Greeks had. Meanwhile, deontological and consequentialist ethics did not even exist back then. Even over recent decades there has been progress in these positions. And plenty of philosophers know what a decisive theoretical argument could be: either they purport to have identified such arguments, or they think it would be an argument that showed the theory to be well supported by intuitions, reason, or some other evidence, not generally different from what an argument for a non-moral philosophical theory would look like.

it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good".

It would (arguably) give results that people wouldn't like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things. If you object to its actions then you are already begging the question by asserting that we ought to be focused on building a machine that will do things that we like regardless of whether they are moral. Moreover, you could tell a similar story for any values that people have. Whether you source them from real philosophy or from layman ethics wouldn't change the problems of optimization and systematization.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible.

But that's an even stronger claim than the one that moral philosophy hasn't progressed towards such a goal. What reasons are there?

Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

That's contentious, but some philosophers believe that, and there are philosophies which adhere to that. The problem of figuring out how to make a machine behave morally according to those premises is still a philosophical one, just one based on other ideas in moral philosophy besides explicit rule-based ones.

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted.

Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that's just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don't see why you think ethics would be special; basically everything can be modeled like this. But that's ridiculous. We don't look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.

for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Then why don't you believe in morality by social consensus? (Or do you? It seems like you're probably not, given that you're an effective altruist. What do you think about animal rights, or Sharia law?)

Comment author: Kaj_Sotala 09 July 2017 04:15:53PM *  0 points [-]

(We seem to be talking past each other in some weird way; I'm not even sure what exactly it is that we're disagreeing over.)

It would (arguably) give results that people wouldn't like, but assuming that the moral theory is correct and the machine understands it, almost necessarily it would do morally correct things.

Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.

But that's an even stronger claim than the one that moral philosophy hasn't progressed towards such a goal. What reasons are there?

I gave one in the comment? That philosophy has accepted that you can't give a set of human-comprehensible set of necessary and sufficient criteria for concepts, and if you want a system for classifying concepts you have to use psychology and machine learning; and it looks like morality is similar.

Except the field of ethics does it with actual arguments among experts in the field. You could make the same story for any field: truths about physics can be determined by social consensus, since that's just what the field of physics is, a physicist presents an experiment or hypothesis, another attacks it, if the hypothesis survives the attacks and is compelling then it is eventually accepted! And so on for all non-moral fields of inquiry as well. I don't see why you think ethics would be special; basically everything can be modeled like this. But that's ridiculous. We don't look at social consensus for all forms of inquiry, because there is a difference between what ordinary people believe and what people believe when they are trained professionals in the subject.

I'm not sure what exactly you're disagreeing with? It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there's an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don't think that ethics is special in that sense.

Sure, there is a difference between what ordinary people believe and what people believe when they're trained professionals: that's why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.

Then why don't you believe in morality by social consensus? (Or do you? It seems like you're probably not, given that you're an effective altruist.

I do believe in morality by social consensus, in the same manner as I believe in physics by social consensus: if I'm told that the physics community has accepted it as an established fact that e=mc^2 and that there's no dispute or uncertainty about this, then I'll accept it as something that's probably true. If I thought that it was particularly important for me to make sure that this was correct, then I might look up the exact reasoning and experiments used to determine this and try to replicate some of them, until I found myself to also be in consensus with the physics community.

Similarly, if someone came to me with a theory of what was moral and it turned out that the entire community of moral philosophers had considered this theory and accepted it after extended examination, and I could also not find any objections to that and found the justifications compelling, then I would probably also accept the moral theory.

But to my knowledge, nobody has presented a conclusive moral theory that would satisfy both me and nearly all moral philosophers and which would say that it was wrong to be an effective altruist - quite the opposite. So I don't see a problem in being an EA.

Comment author: kbog  (EA Profile) 14 July 2017 10:25:40PM *  0 points [-]

Well sure, if we proceed from the assumption that the moral theory really was correct, but the point was that none of those proposed theories has been generally accepted by moral philosophers.

Your point was that "none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good"." But this claim is simply begging the question by assuming that all the existing theories are false. And to claim that a theory would have bad moral results is different from claiming that it's not generally accepted by moral philosophers. It's plausible that a theory would have good moral results, in virtue of it being correct, while not being accepted by many moral philosophers. Since there is no dominant moral theory, this is necessarily the case as long as some moral theory is correct.

I gave one in the comment? That philosophy has accepted that you can't give a set of human-comprehensible set of necessary and sufficient criteria for concepts

If you're referring to ethics, no, philosophy has not accepted that you cannot give such an account. You believe this, on the basis of your observation that philosophers give different accounts of ethics. But that doesn't mean that moral philosophers believe it. They just don't think that the fact of disagreement implies that no such account can be given.

It seems obvious to me that physics does indeed proceed by social consensus in the manner you describe. Someone does an experiment, then others replicate the experiment until there is consensus that this experiment really does produce these results; somebody proposes a hypothesis to explain the experimental results, others point out holes in that hypothesis, there's an extended back-and-forth conversation and further experiments until there is a consensus that the modified hypothesis really does explain the results and that it can be accepted as an established scientific law. And the same for all other scientific and philosophical disciplines. I don't think that ethics is special in that sense.

So you haven't pointed out any particular features of ethics, you've merely described a feature of inquiry in general. This shows that your claim proves too much - it would be ridiculous to conduct physics by studying psychology.

Sure, there is a difference between what ordinary people believe and what people believe when they're trained professionals: that's why you look for a social consensus among the people who are trained professionals and have considered the topic in detail, not among the general public.

But that's not a matter of psychological inquiry, that's a matter of looking at what is being published in philosophy, becoming familiar with how philosophical arguments are formed, and staying in touch with current developments in the field. So you are basically describing studying philosophy. Studying or researching psychology will not tell you anything about this.