Comment author: Kaj_Sotala 08 July 2017 11:32:13PM 8 points [-]

I haven't found any instances of complete axiomatic descriptions of AI systems being used to mitigate problems in those systems (e.g. to predict, postdict, explain, or fix them) or to design those systems in a way that avoids problems they'd otherwise face. [...] It seems plausible that the kinds of axiomatic descriptions that HRAD work could produce would be too taxing to be usefully applied to any practical AI system.

I wonder if slightly analogous example could be found in the design of concurrent systems.

As you may know, it's surprisingly difficult to design software that has multiple concurrent processes manipulating the same data. You typically either screw up by letting the processes edit the same data at the same time or in the wrong order, or by having them wait for each other forever.

So to help reason more clearly about this kind of thing, people developed different forms of temporal logic that let them express in a maximally unambiguous form different desiderata that they have for the system. Temporal logic lets you express statements that say things like "if a process wants to have access to some resource, it will eventually enter a state where it has access to that resource". You can then use temporal logic to figure out how exactly you want your system to behave, in order for it to do the things you want it to do and not run into any problems.

Building a logical model of how you want your system to behave is not the same thing as building the system. The logic only addresses one set of desiderata: there are many others it doesn't address at all, like what you want the UI to be like and how to make the system efficient in terms of memory and processor use. It's a model that you can use for a specific subset of your constraints, both for checking whether the finished system meets those constraints, and for building a system so that it's maximally easy for it to meet those constraints. Although the model is not a whole solution, having the model at hand before you start writing all the concurrency code is going to make things a lot easier for you than if you didn't have any clear idea of how you wanted the concurrent parts to work and were just winging it as you went.

So similarly, if MIRI developed HRAD into a sufficiently sophisticated form, it might yield a set of formal desiderata of how we want the AI to function, as well as an axiomatic model that can be applied to a part of the AI's design, to make sure everything goes as intended. But I would guess that it wouldn't really be a "complete axiomatic descriptions of" the system, in the way that temporal logics aren't a complete axiomatic description of modern concurrent systems.

Comment author: PeterMcIntyre  (EA Profile) 22 June 2017 04:10:14PM *  1 point [-]

We agree these are technical problems, but for most people, all else being equal, it seems more useful to learn ML rather than cog sci/psych. Caveats: 1. Personal fit could dominate this equation though, so I'd be excited about people tackling AI safety from a variety of fields. 2. It's an equilibrium. The more people already attacking a problem using one toolkit, the more we should be sending people to learn other toolkits to attack it.

Comment author: Kaj_Sotala 22 June 2017 08:38:59PM 1 point [-]

it seems more useful to learn ML rather than cog sci/psych.

Got it. To clarify: if the question as framed as "should AI safety researchers learn ML, or should they learn cogsci/psych", then I agree that it seems better to learn ML.

Comment author: PeterMcIntyre  (EA Profile) 20 June 2017 06:58:24PM 1 point [-]

Hi Kaj,

Thanks for writing this. Since you mention some 80,000 Hours content, I thought I’d respond briefly with our perspective.

We had intended the career review and AI safety syllabus to be about what you’d need to do from a technical AI research perspective. I’ve added a note to clarify this.

We agree that there a lot of approaches you could take to tackle AI risk, but currently expect that technical AI research will be where a large amount of the effort is required. However, we’ve also advised many people on non-technical routes to impacting AI safety, so don’t think it’s the only valid path by any means.

We’re planning on releasing other guides and paths for non-technical approaches, such as the AI safety policy career guide, which also recommends studying political science and public policy, law, and ethics, among others.

Comment author: Kaj_Sotala 20 June 2017 09:09:49PM *  2 points [-]

Hi Peter, thanks for the response!

Your comment seems to suggest that you don't think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn't make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?

I agree that the "understanding psychology may help persuade more people to work on/care about AI safety" and "analyzing human intelligences may suggest things about takeoff scenarios" points aren't related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.

Comment author: LanceSBush 12 June 2017 02:54:30PM 1 point [-]

Whoops. I can see how my responses didn't make my own position clear.

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

I consider it a likely futile effort to integrate important and substantive discussions into contemporary moral philosophy. If engaging with moral philosophy introduces unproductive digressions/confusions/misplaced priorities into the discussion it may do more harm than good.

I'm puzzled by this remark:

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

I view utilitronium as an end, not a means. It is a logical consequence of wanting to maximize aggregate utility and is more or less a logical entailment of my moral views. I favor the production of whatever physical state of affairs yields the highest aggregate utility. This is, by definition, "utilitronium." If I'm using the term in an unusual way I'm happy to propose a new label that conveys what I have in mind.

Comment author: Kaj_Sotala 16 June 2017 02:41:54PM *  0 points [-]

I am an anti-realist, and I think the prospects for identifying anything like moral truth are very low. I favor abandoning attempts to frame discussions of AI or pretty much anything else in terms of converging on or identifying moral truth.

Ah, okay. Well, in that case you can just read my original comment as an argument for why one would want to use psychology to design an AI that was capable of correctly figuring out just a single person's values and implementing them, as that's obviously a prerequisite for figuring out everybody's values. The stuff that I had about social consensus was just an argument aimed at moral realists, if you're not one then it's probably not relevant for you.

(my values would still say that we should try to take everyone's values into account, but that disagreement is distinct from the whole "is psychology useful for value learning" question)

I'm puzzled by this remark:

Sorry, my mistake - I confused utilitronium with hedonium.

Comment author: LanceSBush 12 June 2017 12:52:12AM 4 points [-]

It's certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?

Sure. That isn't my primary objection though. My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.

Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you'd be convinced that this solution really does satisfy all the things you care about - and all the things that most other people care about, too.

I want to convert all matter in the universe to utilitronium. Do you think it is likely that an AI that factored in the values of all humans would yield this as its solution? I do not. Since I think the expected utility of most other likely solutions, given what I suspect about other people's values, is far less than this, I would view almost any scenario other than imposing my values on everyone else to be a cosmic disaster.

Comment author: Kaj_Sotala 12 June 2017 06:57:32AM *  0 points [-]

My main objection is that that even if we pursue this project, it does not achieve the heavy metaethical lifting you were alluding to earlier. It doesn’t demonstrate nor provide any particularly good reason to regard the outputs of this process as moral truth.

Well, what alternative would you propose? I don't see how it would even be possible to get any stronger evidence for the moral truth of a theory, than the failure of everyone to come up with convincing objections to it even after extended investigation. Nor a strategy for testing the truth which wouldn't at some point reduce to "test X gives us reason to disagree with the theory".

I would understand your disagreement if you were a moral antirealist, but your comments seem to imply that you do believe that a moral truth exists and that it's possible to get information about it, and that it's possible to do "heavy metaethical lifting". But how?

I want to convert all matter in the universe to utilitronium.

I think anything as specific as this sounds worryingly close to wanting an AI to implement favoritepoliticalsystem.

What the first communist revolutionaries thought would happen, as the empirical consequence of their revolution, was that people’s lives would improve: laborers would no longer work long hours at backbreaking labor and make little money from it. This turned out not to be the case, to put it mildly. But what the first communists thought would happen, was not so very different from what advocates of other political systems thought would be the empirical consequence of their favorite political systems. They thought people would be happy. They were wrong.

Now imagine that someone should attempt to program a “Friendly” AI to implement communism, or libertarianism, or anarcho-feudalism, or favoritepoliticalsystem, believing that this shall bring about utopia. People’s favorite political systems inspire blazing suns of positive affect, so the proposal will sound like a really good idea to the proposer.

We could view the programmer’s failure on a moral or ethical level—say that it is the result of someone trusting themselves too highly, failing to take into account their own fallibility, refusing to consider the possibility that communism might be mistaken after all. But in the language of Bayesian decision theory, there’s a complementary technical view of the problem. From the perspective of decision theory, the choice for communism stems from combining an empirical belief with a value judgment. The empirical belief is that communism, when implemented, results in a specific outcome or class of outcomes: people will be happier, work fewer hours, and possess greater material wealth. This is ultimately an empirical prediction; even the part about happiness is a real property of brain states, though hard to measure. If you implement communism, either this outcome eventuates or it does not. The value judgment is that this outcome satisfices or is preferable to current conditions. Given a different empirical belief about the actual realworld consequences of a communist system, the decision may undergo a corresponding change.

We would expect a true AI, an Artificial General Intelligence, to be capable of changing its empirical beliefs (or its probabilistic world-model, et cetera). If somehow Charles Babbage had lived before Nicolaus Copernicus, and somehow computers had been invented before telescopes, and somehow the programmers of that day and age successfully created an Artificial General Intelligence, it would not follow that the AI would believe forever after that the Sun orbited the Earth. The AI might transcend the factual error of its programmers, provided that the programmers understood inference rather better than they understood astronomy. To build an AI that discovers the orbits of the planets, the programmers need not know the math of Newtonian mechanics, only the math of Bayesian probability theory.

The folly of programming an AI to implement communism, or any other political system, is that you’re programming means instead of ends. You’re programming in a fixed decision, without that decision being re-evaluable after acquiring improved empirical knowledge about the results of communism. You are giving the AI a fixed decision without telling the AI how to re-evaluate, at a higher level of intelligence, the fallible process which produced that decision.

Comment author: LanceSBush 11 June 2017 06:12:57PM 2 points [-]

Hi Kaj,

Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people. People may just have irreconcilable values. You state that:

“For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.”

Suppose this is the best we can do. It doesn’t follow that the outputs of this exercise are “true.” I am not sure in what sense this would constitute a true set of moral principles.

More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable. On the contrary, I want everyone to share my moral views, because this is what, fundamentally, I care about. The notion that we should care about what others care about, and implement whatever the consensus is, seems to presume a very strong and highly contestable metaethical position that I do not accept and do not think others should accept.

Comment author: Kaj_Sotala 11 June 2017 07:14:35PM 0 points [-]

Even if we found the most agreeable available set of moral principles, that amount may turn out not to constitute the vast majority of people. It may not even reach a majority at all. It is possible that there simply is no moral theory that is acceptable to most people.

It's certainly possible that this is the case, but looking for the kind of solution that would satisfy as many people as possible certainly seems like the thing we should try first and only give it up if it seems impossible, no?

More importantly, it is unclear whether or not I have any rational or moral obligation to care about the outputs of this exercise. I do not want to implement the moral system that most people find agreeable.

Well, the ideal case would be that the AI would show you a solution which it had found, and upon inspecting it and considering it through you'd be convinced that this solution really does satisfy all the things you care about - and all the things that most other people care about, too.

From a more pragmatic perspective, you could try to insist on an AI which implemented your values specifically - but then everyone else would also have a reason to fight to get an AI which fulfilled their values specifically, and if it was you versus everyone else in the world, it seems like a pretty high probability that somebody else would win. Which means that your values would have a much higher chance of getting shafted than if everyone had agreed to go for a solution which tried to take into everyone's preferences into account.

And of course, in the context of AI, everyone insisting on their own values and their values only means that we'll get arms races, meaning a higher probability of a worse outcome for everyone.

See also Gains from Trade Through Compromise.

Comment author: Kaj_Sotala 11 June 2017 02:49:19PM 3 points [-]

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first. Having now read it, I'd say that I agree with its claims with regard to criticism of Greene's approach. I don't think it disproves the notion of psychology being useful for defining human values, though, for I think there's an argument for psychology's usefulness that's entirely distinct from the specific approach that Greene is taking.

I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good". E.g. Muehlhauser & Helm 2012:

Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.

We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]

Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).

A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.

Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]

What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]

Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]

We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).

Yet the problem remains: the AI has to be programmed with some definition of what is good.

Now this alone isn't yet sufficient to show that philosophy wouldn't be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible. Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

We've already seen this in trying to define concepts: as philosophy noted a long time ago, you can't come up with a set of explicit rules that would define even any concept even as simple as "man" in such a way that nobody could develop a counterexample. "The Normative Insignificance of Neuroscience" also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:

... what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.

Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we've managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a "man" or "philosopher" or whatever.

So given that

  • we can't build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept

and

  • defining morality looks similar to defining concepts, in that we can't build explicit verbal models of what morality is

it would seem reasonable to assume that

  • we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable

But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI's reasoning process should take into account those considerations. And we've already established that defining those considerations on a verbal level looks insufficient - they have to be established on a deeper level, of "what are the actual computational processes that are involved when the brain computes morality".

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Comment author: Kaj_Sotala 11 June 2017 02:50:25PM *  0 points [-]

Also, I find pretty compelling the argument that the classical definition of moral philosophy in trying to define "the good" is both impossible and not even a particularly good target to aim at, and that trying to find generally-agreeable moral solutions is something much more useful; and if we accept this argument, then moral psychology is relevant, because it can help us figure out generally-agreeable solutions.

As Martela (2017) writes:

...there is a deeper point in Williams's book that is even harder to rebut. Williams asks: What can an ethical theory do, if we are able to build a convincing case for one? He is skeptical about the force of ethical considerations and reminds us that even if we were to have a justified ethical theory, the person in question might not be concerned about it. Even if we could prove to some amoralists that what they are about to do is (a) against some universal ethical standard, (b) is detrimental to their own well-being, and/or (c) is against the demands of rationality or internal coherence, they still have the choice of whether to care about this or not. They can choose to act even if they know that what they are about to do is against some standard that they believe in. Robert Nozick—whom Williams quotes—describes this as follows: “Suppose that we show that some X he [the immoral man] holds or accepts or does commits him to behaving morally. He now must give up at least one of the following: (a) behaving immorally, (b) maintaining X, (c) being consistent about this matter in this respect. The immoral man tells us, ‘To tell you the truth, if I had to make the choice, I would give up being consistent’” (Nozick 1981, 408).

What Williams in effect says is that the noble task of finding ultimate justification for some ethical standards could not—even if it was successful—deliver any final argument in practical debates about how to behave. “Objective truth” would have only the motivational weight that the parties involved choose to give to it. It no longer is obvious what a philosophical justification of an ethical standard is supposed to do or even “why we should need such a thing” (Williams 1985, 23).

Yet when we look at many contemporary ethical debates, we can see that that they proceed as if the solutions to the questions they pose would matter. In most scientific disciplines the journal articles have a standard section called “practical bearings,” where the practical relevance of the accumulated results are discussed. Not so for metaethical articles, even though they otherwise simulate the academic and peer-reviewed writing style of scientific articles. When we read someone presenting a number of technical counterarguments against quasi-realist solutions to the Frege-Geach problem, there usually is no debate about what practical bearings the discussion would have, whether these arguments would be successful or not. Suppose that in some idealized future the questions posed by the Frege-Geach problem would be conclusively solved. A new argument would emerge that all parties would see as so valid and sound that they would agree that the problem has now been finally settled. What then? How would ordinary people behave differently, after the solution has been delivered to them? I would guess it is fair to say—at least until it is proven otherwise—that the outcome of these debates is only marginally relevant for any ordinary person's ethical life. [...]

This understanding of morality means that we have to think anew what moral inquiry should aim at. [...] Whatever justification can be given for one moral doctrine over the other, it has to be found in practice—simply because there are no other options available. Accordingly, for pragmatists, moral inquiry is in the end directed toward practice, its successfulness is ultimately judged by the practical bearings it has on people's experiences: “Unless a philosophy is to remain symbolic—or verbal—or a sentimental indulgence for a few, or else mere arbitrary dogma, its auditing of past experience and its program of values must take effect in conduct” (Dewey 1916, 315). Moral inquiry should thus aim at practice; its successfulness is ultimately measured by how it is able to influence people's moral outlook and behavior. [...]

Moral principles, ideals, rules, theories, or conclusions should thus be seen “neither as a cookbook, nor a remote calculus” (Pappas 1997, 546) but as instruments that we can use to understand our behavior and change it for the better. Instead of trying to discover the correct ethical theories, the task becomes one of designing the most functional ethical theories. Ethics serves certain functions in human lives and in societies, and the task is to improve its ability to serve these functions (Kitcher 2011b). In other words, the aim of ethical theorizing is to provide people with tools (see Hickman 1990, 113–14) that help them in living their lives in a good and ethically sound way. [...]

It is true that the lack of foundational principles in ethics denies the pragmatist moral philosopher the luxury of being objectively right in some moral question. In moral disagreements, a pragmatist cannot “solve” the disagreement by relying on some objective standards that deliver the “right” and final answer. But going back to Williams's argument raised at the beginning of this article, we can ask what would it help if we were to “solve” the problem. The other party still has the option to ignore our solution. Furthermore, despite the long history of ethics we still haven't found many objective standards or “final solutions” that everyone would agree on, and thus it seems that waiting for such standards to emerge is futile.

In practice, there seem to be two ways in which moral disagreements are resolved. First is brute force. In some moral disputes I am in a position in which I can force the other party to comply with my standards whether that other party agrees with me or not. The state with its monopoly on the legitimate use of violence can force its citizens to comply with certain laws even when the personal moral code of these citizens would disagree with the law. The second way to resolve a moral disagreement is to find some common ground, some standards that the other believes in, and start building from there a case for one's own position.

In the end, it might be beneficial that pragmatism annihilates the possibility of believing that I am absolutely right and the other party is absolutely wrong. As Margolis notes: “The most monstrous crimes the race has ever (been judged to have) perpetrated are the work of the partisans of ‘right principles’ and privileged revelation” (1996, 213). Instead of dismissing the other's perspective as wrong, one must try to understand it in order to find common ground and shared principles that might help in progressing the dialogue around the problem. If one really wants to change the opinion of the other party, instead of invoking some objective standards one should invoke some standards that the other already believes in. This means that one has to listen to the other person, try to see the world from his or her point of view. Only through understanding the other's perspective one can have a chance to find a way to change it—or to change one's own opinion, if this learning process should lead to that. One can aim to clarify the other's points of view, unveil their hidden assumptions and values, or challenge their arguments, but one must do this by drawing on principles and values that the other is already committed to if one wants to have a chance to have a real impact on the other's way of seeing the world, or actually to resolve the disagreement. I believe that this kind of approach, rather than a claim for a more objective position, has a much better chance of actually building common understanding around the moral issue at hand.

Comment author: kbog  (EA Profile) 05 June 2017 06:59:25PM -1 points [-]

Defining just what it is that human values are. The project of AI safety can roughly be defined as "the challenge of ensuring that AIs remain aligned with human values", but it's also widely acknowledged that nobody really knows what exactly human values are - or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program.

Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It's a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it's a stretch to say that even someone like Joshua Greene's work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious - see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

What people believe doesn't tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

Comment author: Kaj_Sotala 11 June 2017 02:49:19PM 3 points [-]

It took me a while to respond to this because I wanted to take the time to read "The Normative Insignificance of Neuroscience" first. Having now read it, I'd say that I agree with its claims with regard to criticism of Greene's approach. I don't think it disproves the notion of psychology being useful for defining human values, though, for I think there's an argument for psychology's usefulness that's entirely distinct from the specific approach that Greene is taking.

I start from the premise that the goal of moral philosophy is to develop a set of explicit principles that would tell us what is good. Now this is particularly relevant for designing AI, because we also want our AIs to follow those principles. But it's noteworthy that at their current state, none of the existing ethical theories are up to the task of giving us such a set of principles that, when programmed into an AI, would actually give results that could be considered "good". E.g. Muehlhauser & Helm 2012:

Let us consider the implications of programming a machine superoptimizer to implement particular moral theories.

We begin with hedonistic utilitarianism, a theory still defended today (Tännsjö 1998). If a machine superoptimizer’s goal system is programmed to maximize pleasure, then it might, for example, tile the local universe with tiny digital minds running continuous loops of a single, maximally pleasurable experience. We can’t predict exactly what a hedonistic utilitarian machine superoptimizer would do, but we think it seems likely to produce unintended consequences, for reasons we hope will become clear. [...]

Suppose “pleasure” was specified (in the machine superoptimizer’s goal system) in terms of our current understanding of the human neurobiology of pleasure. Aldridge and Berridge (2009) report that according to “an emerging consensus,” pleasure is “not a sensation” but instead a “pleasure gloss” added to sensations by “hedonic hotspots” in the ventral pallidum and other regions of the brain. A sensation is encoded by a particular pattern of neural activity, but it is not pleasurable in itself. To be pleasurable, the sensation must be “painted” with a pleasure gloss represented by additional neural activity activated by a hedonic hotspot (Smith et al. 2009).

A machine superoptimizer with a goal system programmed to maximize human pleasure (in this sense) could use nanotechnology or advanced pharmaceuticals or neurosurgery to apply maximum pleasure gloss to all human sensations—a scenario not unlike that of plugging us all into Nozick’s experience machines (Nozick 1974, 45). Or, it could use these tools to restructure our brains to apply maximum pleasure gloss to one consistent experience it could easily create for us, such as lying immobile on the ground.

Or suppose “pleasure” was specified more broadly, in terms of anything that functioned as a reward signal—whether in the human brain’s dopaminergic reward system (Dreher and Tremblay 2009), or in a digital mind’s reward signal circuitry (Sutton and Barto 1998). A machine superoptimizer with the goal of maximizing reward signal scores could tile its environs with trillions of tiny minds, each one running its reward signal up to the highest number it could. [...]

What if a machine superoptimizer was programmed to maximize desire satisfaction in humans? Human desire is implemented by the dopaminergic reward system (Schroeder 2004; Berridge, Robinson, and Aldridge 2009), and a machine superoptimizer mizer could likely get more utility by (1) rewiring human neurology so that we attain maximal desire satisfaction while lying quietly on the ground than by (2) building and maintaining a planet-wide utopia that caters perfectly to current human preferences. [...]

Consequentialist designs for machine goal systems face a host of other concerns (Shulman, Jonsson, and Tarleton 2009b), for example the difficulty of interpersonal comparisons of utility (Binmore 2009), and the counterintuitive implications of some methods of value aggregation (Parfit 1986; Arrhenius 2011). [...]

We cannot show that every moral theory yet conceived would produce substantially unwanted consequences if used in the goal system of a machine superoptimizer. Philosophers have been prolific in producing new moral theories, and we do not have the space here to consider the prospects (for use in the goal system of a machine superoptimizer) for a great many modern moral theories. These include rule utilitarianism (Harsanyi 1977), motive utilitarianism (Adams 1976), two-level utilitarianism (Hare 1982), prioritarianism (Arneson 1999), perfectionism (Hurka 1993), welfarist utilitarianism (Sen 1979), virtue consequentialism (Bradley 2005), Kantian consequentialism (Cummiskey 1996), global consequentialism (Pettit and Smith 2000), virtue theories (Hursthouse 2012), contractarian theories (Cudd 2008), Kantian deontology (R. Johnson 2010), and Ross’ prima facie duties (Anderson, Anderson, and Armen 2006).

Yet the problem remains: the AI has to be programmed with some definition of what is good.

Now this alone isn't yet sufficient to show that philosophy wouldn't be up to the task. But philosophy has been trying to solve ethics for at least the last 2500 years, and it doesn't look like there would have been any major progress towards solving it. The PhilPapers survey didn't show any of the three major ethical schools (consequentialism, deontology, virtue ethics) being significantly more favored by professional philosophers than the others, nor does anyone - to my knowledge - even know what a decisive theoretical argument in favor of one of them could be.

And at this point, we have pretty good theoretical reasons for believing that the traditional goal of moral philosophy - "developing a set of explicit principles for telling us what is good" - is in fact impossible. Or at least, it's impossible to develop a set of principles that would be simple and clear enough to write down in human-understandable form and which would give us clear answers to every situation, because morality is too complicated for that.

We've already seen this in trying to define concepts: as philosophy noted a long time ago, you can't come up with a set of explicit rules that would define even any concept even as simple as "man" in such a way that nobody could develop a counterexample. "The Normative Insignificance of Neuroscience" also notes that the situation in ethics looks similar to the situation with trying to define many other concepts:

... what makes the trolley problem so hard—indeed, what has led some to despair of our ever finding a solution to it—is that for nearly every principle that has been proposed to explain our intuitions about trolley cases, some ingenious person has devised a variant of the classic trolley scenario for which that principle yields counterintuitive results. Thus as with the Gettier literature in epistemology and the causation and personal identity literatures in metaphysics, increasingly baroque proposals have given way to increasingly complex counterexamples, and though some have continued to struggle with the trolley problem, many others have simply given up and moved on to other topics.

Yet human brains do manage to successfully reason with concepts, despite it being impossible to develop a set of explicit necessary and sufficient criteria. The evidence from both psychology and artificial intelligence (where we've managed to train neural nets capable of reasonably good object recognition) is that a big part of how they do it is by building up complicated statistical models of what counts as a "man" or "philosopher" or whatever.

So given that

  • we can't build explicit verbal models of what a concept is * but we can build machine-learning algorithms that use complicated statistical analysis to identify instances of a concept

and

  • defining morality looks similar to defining concepts, in that we can't build explicit verbal models of what morality is

it would seem reasonable to assume that

  • we can build machine-learning algorithms that can learn to define morality, in that it can give such answers to moral dilemmas that a vast majority of people would consider them acceptable

But here it looks likely that we need information from psychology to narrow down what those models should be. What humans consider to be good has likely been influenced by a number of evolutionary idiosyncrasies, so if we want to come up with a model of morality that most humans would agree with, then our AI's reasoning process should take into account those considerations. And we've already established that defining those considerations on a verbal level looks insufficient - they have to be established on a deeper level, of "what are the actual computational processes that are involved when the brain computes morality".

Yes, I am here assuming "what is good" to equate to "what do human brains consider good", in a way that may be seen as reducing to "what would human brains accept as a persuasive argument for what is good". You could argue that this is flawed, because it's getting dangerously close to defining "good" by social consensus. But then again, the way the field of ethics itself proceeds is basically the same: a philosopher presents an argument for what is good, another attacks it, if the argument survives attacks and is compelling then it is eventually accepted. For empirical facts we can come up with objective tests, but for moral truths it looks to me unavoidable - due to the is-ought gap - that some degree of "truth by social consensus" is the only way of figuring out what the truth is, even in principle.

Comment author: Gram_Stone 07 June 2017 12:04:32AM 0 points [-]

Also, have you seen this AI Impacts post and the interview it links to? I would expect so, but it seems worth asking. Tom Griffiths makes similar points to the ones you've made here.

Comment author: Kaj_Sotala 09 June 2017 12:31:02PM 0 points [-]

I'd seen that, but re-reading it was useful. :)

Comment author: SoerenMind  (EA Profile) 08 June 2017 09:00:41PM *  3 points [-]

I got linked here while browsing a pretty random blog on deep learning, you're getting attention! (https://medium.com/intuitionmachine/seven-deadly-sins-and-ai-safety-5601ae6932c3)

Comment author: Kaj_Sotala 09 June 2017 11:31:09AM 1 point [-]

Neat, thanks for the find. :)

View more: Prev | Next