Comment author: kbog  (EA Profile) 05 June 2017 06:59:25PM 0 points [-]

Defining just what it is that human values are. The project of AI safety can roughly be defined as "the challenge of ensuring that AIs remain aligned with human values", but it's also widely acknowledged that nobody really knows what exactly human values are - or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program.

Defining human values, at least in the prescriptive sense, is not a psychological issue at all. It's a philosophical issue. Certain philosophers have believed that psychology can inform moral philosophy, but it's a stretch to say that even someone like Joshua Greene's work in experimental philosophy is a psychology-focused research program, and the whole approach is dubious - see, e.g., The Normative Insignificance of Neuroscience (http://www.pgrim.org/philosophersannual/29articles/berkerthenormative.pdf). Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

What people believe doesn't tell us much about what actually is good. The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

Comment author: Gram_Stone 07 June 2017 02:32:36AM 1 point [-]

Your comment reads strangely to me because your thoughts seem to fall into a completely different groove from mine. The problem statement is perhaps: write a program that does what-I-want, indefinitely. Of course, this could involve a great deal of extrapolation.

The fact that I am even aspiring to write such a program means that I am assuming that what-I-want can be computed. Presumably, at least some portion of the relevant computation, the one that I am currently denoting 'what-I-want', takes place in my brain. If I want to perform this computation in an AI, then it would probably help to at least be able to reproduce whatever portion of it takes place in my brain. People who study the mind and brain happen to call themselves psychologists and cognitive scientists. It's weird to me that you're arguing about how to classify Joshua Greene's research; I don't see why it matters whether we call it philosophy or psychology. I generally find it suspicious when anyone makes a claim of the form: "Only the academic discipline that I hold in high esteem has tools that will work in this domain." But I won't squabble over words if you think you're drawing important boundaries; what do you mean when you write 'philosophical'? Maybe you're saying that Greene, despite his efforts to inquire with psychological tools, elides into 'philosophy' anyway, so like, what's the point of pretending it's 'moral philosophy' via psychology? If that's your objection, that he 'just ends up doing philosophy anyway', then what exactly is he eliding into, without using the words 'philosophy' or 'philosophical'?

More generally, why is it that we should discard the approach because it hasn't made itself obsolete yet? Should the philosophers give up because they haven't made their approach obsolete yet either? If there's any reason that we should have more confidence in the ability of philosophers than cognitive scientists to contribute towards a formal specification of what-I-want, that reason is certainly not track record.

What people believe doesn't tell us much about what actually is good.

I don't think anyone who has read or who likely will read your comment equivocates testimony or social consensus with what-is-good.

The challenge of AI safety is the challenge of making AI that actually does what is right, not AI that does whatever it's told to do by a corrupt government, a racist constituency, and so on.

It's my impression that AI safety researchers are far more concerned about unaligned AGIs killing everyone than they are about AGIs that are successfully designed by bad actors to do a specific, unimaginative thing without killing themselves and everyone else in the process.

Of course a new wave of pop-philosophers and internet bloggers have made silly claims that moral philosophy can be completely solved by psychology and neuroscience but this extreme view is ridiculous on its face.

Bleck, please don't ever give me a justification to link a Wikipedia article literally named pooh-pooh.

Comment author: Kaj_Sotala 05 June 2017 04:40:34PM 0 points [-]

This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf

Yay, correctly guessed which article that was before clicking on the link. :-)

Comment author: Gram_Stone 07 June 2017 12:04:32AM 0 points [-]

Also, have you seen this AI Impacts post and the interview it links to? I would expect so, but it seems worth asking. Tom Griffiths makes similar points to the ones you've made here.

Comment author: Gram_Stone 05 June 2017 02:24:25PM *  9 points [-]

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

Comment author: Telofy  (EA Profile) 26 February 2016 07:22:43PM 4 points [-]

I agree with the sentiment that is epitomized in the section that Micheal quoted. That said:

There are a million other things that the founders of the Against Malaria Foundation could have done, but they took the risk of riding on distributing bed nets, even though they had yet to see it actually work.

In 2004 they already had a large body of evidence to draw on to make the educated guess that if it has worked before, it will probably work again. And I’m also using AMF as an analogy here. It’s common practice to test an intervention through RCTs and other trials and if it works, then to roll it out at large scale without any more trails (apart from some cheap proxy measures without control group). It’s this experience that allows the incarcerated EAs to make educated guesses without further feedback loops.

AI risk, however, is novel and unusual in many ways, so there is little experience like that to inform any guesses, little experience that extrapolates to the field. We’re at the stage where J-PAL would come up with interventions and run RCTs on them to see if any of them have any positive effect, but we can’t do that.

But “little experience” was not meant as facetious overstatement. There are some interventions were many people have somewhat more solidly positive priors, like awareness-raising among AI researchers.

So while I agree with Jeff that the extreme dearth of feedback loops in the field is a great handicap for any proposed intervention, I also agree with you that we should tend to that dying person first and then fix the tire.

Comment author: Gram_Stone 26 February 2016 11:46:46PM 1 point [-]

I agree with this. It's the right way to take this further by getting rid of leaky generalizations like 'Evidence is good, no evidence is bad," and also to point out what you pointed out: is the evidence still virtuous if it's from the past and you're reasoning from it? Confused questions like that are a sign that things have been oversimplified. I've thought about the more general issues behind this since I wrote this, since I actually posted this on LW over two weeks ago. (I've been waiting for karma.) In the interim, I found an essay on Facebook by Eliezer Yudkowsky that gets to the core of why these are bad heuristics, among other things.

Comment author: MichaelDickens  (EA Profile) 26 February 2016 05:06:32PM 5 points [-]

I found a lot of this post disconcerting because of how often you linked to LessWrong posts, even when doing so didn't add anything. I think it would be better if you didn't rely on LW concepts so much and just say what you want to say without making outside references.

[I]magine that you are at some point on a long road, truly in the middle of nowhere, and you see a man whose car has a flat tire. You know that someone else may not drive by for hours, and you don't know how well-prepared the man is for that eventuality. You consider stopping your car to help; you have a spare, you know how to change tires, and you've seen it work before. And if you don't do it right the first time for some weird reason, you can always try again.

But suddenly, you notice that there is a person lying motionless on the ground, some ways down the road; far, but visible. There's no cellphone service, it would take an ambulance hours to get here unless they happened to be driving by, and you have no medical training or experience.

I don't know about you, but even if I'm having an extremely hard time thinking of things to do about a guy dying on my watch in the middle of nowhere, the last thing I do is say, "I have no idea what to do if I try to save that guy, but I know exactly how to change a tire, so why don't I just change the tire instead." Because even if I don't know what to do, saving a life is so much more important than changing a tire that I don't care about the uncertainty.

I really like this bit.

Comment author: Gram_Stone 26 February 2016 11:21:33PM 2 points [-]

I really like this bit.

Thank you.

I found a lot of this post disconcerting because of how often you linked to LessWrong posts, even when doing so didn't add anything. I think it would be better if you didn't rely on LW concepts so much and just say what you want to say without making outside references.

I mulled over this article for quite awhile before posting it, and this included the pruning of many hyperlinks deemed unnecessary. Of course, the links that remain are meant to produce a more concise article, not a more opaque one, so what you say is unfortunate to read. I would be interested in some specific examples of links or idiosyncratic language that either don't add value to or subtract value from the article.

It sure isn't good if I'm coming off as a crank though. I consider the points within this article very important.

8

On 'Why Global Poverty?' and Arguments from Unobservable Impacts

( Cross-posted from LessWrong. ) For context, Jeff Kaufman delivered a speech on effective altruism  and cause prioritization  at EA Global 2015  entitled 'Why Global Poverty?', which he has transcribed and made available here . It's certainly worth reading. I was dissatisfied with this speech in some ways. For the... Read More
Comment author: Marcus_A_Davis 20 February 2016 04:37:50AM 0 points [-]

I don't disagree on the problems of getting someone who thinks there is "negligible probability" of AI causing extinction being not suited for the task. That's why I said to aim for neutrality.

But I think we may be disagreeing over whether "thinks AI risk is an important cause" is too close to "is broadly positive towards AI risk as a cause area." I think so. You think not?

Comment author: Gram_Stone 20 February 2016 04:57:36AM 0 points [-]

But I think we may be disagreeing over whether "thinks AI risk is an important cause" is too close to "is broadly positive towards AI risk as a cause area." I think so. You think not?

Are there alternatives to a person like this? It doesn't seem to me like there are.

"Is broadly positive towards AI risk as a cause area" could mean "believes that there should exist effective organizations working on mitigating AI risk", or could mean "automatically gives more credence to the effectiveness of organizations that are attempting to mitigate AI risk."

It might be helpful if you elaborated more on what you mean by 'aim for neutrality'. What actions would that entail, if you did that, in the real world, yourself? What does hiring the ideal survey supervisor look like in your mind if you can't use the words "neutral" or "neutrality" or any clever rephrasings thereof?

Comment author: Marcus_A_Davis 20 February 2016 04:13:56AM 0 points [-]

This survey makes sense. However, I have a few caveats:

Think that AI risk is an important cause, but have no particular convictions about the best >approach or organisation for dealing with it. They shouldn't have worked for MIRI in the past, but >will presumably have some association with the general rationality or AI community.

Why should the person overseeing the survey think AI risk is an important cause? Doesn't that self-select for people who or more likely to be positive toward MIRI than whatever the baseline is for all people familiar with AI risk (and, obviously, competent to judge who to include in the survey)? The ideal person to me would be neutral and while of course finding someone who is truly neutral would likely prove impractical, selecting someone overtly positive would be a bad idea for the same reasons it would be to select someone overtly negative. The point is the aim should be towards neutrality.

They should also have a chance to comment on the survey itself >before it goes out. Ideally it >would be checked by someone who understand good survey >design, as subtle aspects of >wording can be important.

This should be a set time frame to draft a response to the survey before it goes public. A "chance" is too vague.

It should be impressed on participants the value of being open and thoughtful in their answers >for maximising the chances of solving the problem of AI risk in the long run.

Telling people to be open and thoughtful is great, but explicitly tying it to solving long run AI risk primes them to give certain kinds of answers.

Comment author: Gram_Stone 20 February 2016 04:25:39AM 0 points [-]

Why should the person overseeing the survey think AI risk is an important cause?

Because the purpose of the survey is to determine MIRI's effectiveness as a charitable organization. If one believes that there is a negligible probability that an artificial intelligence will cause the extinction of the human species within the next several centuries, then it immediately follows that MIRI is an extremely ineffective organization, as it would be designed to mitigate a risk that ostensibly does not need mitigating. The survey is moot if one believes this.

Comment author: Gram_Stone 19 February 2016 05:49:51PM 8 points [-]

I think that it's probably quite important to define in advance what sorts of results would convince us that the quality of MIRI's performance is either sufficient or insufficient. Otherwise I expect those already committed to some belief about MIRI's performance to consider the survey evidence for their existing belief, even if another person with the opposite belief also considers it evidence for their belief.

Relatedly, I also worry about the uniqueness of the problem and how it might change what we consider a cause worth donating to. Although you don't seem to be thinking that you could understand MIRI's arguments and see no flaws and still be inclined to say "I still can't be sure that this is the right way to go," I expect that many people are averse to donating to causes like MIRI because the effectiveness of the proposed interventions does not admit to simple testing. With existential risks, empirical testing is often impossible in the traditional sense, although sometimes possible in a limited sense. Results about sub-existential pandemic risk are probably at least somewhat relevant to the study of existential pandemic risk, for example. But it's not the same as distributing bed nets, looking at the malaria incidence, adjusting, reobserving, and so on and so on. It's not like we can perform an action, look through a time warp, and see whether or not the world ends in the future. And what I'm getting at is that, even if this is not really the nature of these problems, even if it is not the case that interventions upon these problems are not testable, we might imagine the implications if it were the case that they were genuinely untestable. I think that there are some people who would refuse to donate to existential risk charities merely because other charities have interventions testable for effectiveness. And this concerns me. If it is not by human failing that we don't test the effectiveness of our interventions, but it is the nature of the problem that you cannot test the effectiveness of your interventions, do you choose to do nothing? That is not a rhetorical question. I genuinely believe that we are confused about this and that MIRI is an example of a cause that may be difficult to evaluate without resolving this confusion. This is related to ambiguity aversion in cognitive science and decision theory. Even though ambiguity aversion appears in choices between betting on known and unknown risks, and not in choices to bet or not to bet on unknown risks in non-comparative contexts, effective altruists consider almost all charitable decisions within the context of cause prioritization, which means that we might expect EAs to encounter more comparative contexts than a random philanthropist, and thus for them to exhibit more bias towards causes with ambiguity, even if the survey itself would technically be focusing on one cause. It's noteworthy that the expected utility formalism and human behavior differ in the sense that the expected utility formalism prescribes indifference between bets with known and unknown probabilities in the case that each bet has the same payoffs. (In reality the situation is not even this clear, for the payoffs of successfully intervening upon malaria incidence as opposed to human extinction are hardly equal.) I think we must genuinely ask if we should be averse to ambiguity in general, and to attempt to explain why this heuristic was evolutionarily adaptive, and to see if the problem of existential risk is an example of a case either where we should, or where we should not, use ambiguity aversion as a heuristic. After all, a humanity that attempts no interventions on the problem of existential risk merely because it cannot test the effectiveness of its interventions is a humanity that ignores existential risk and goes extinct for it, even if we believed that we were being virtuous philanthropists the entire time.

Comment author: Linch 14 February 2016 01:58:16AM 1 point [-]

"And I would argue that any altruist is doing the same thing when they have to choose between causes before they can make observations. There are a million other things that the founders of the Against Malaria Foundation could have done, but they took the risk of riding on distributing bed nets, even though they had yet to see it actually work."

This point should be rewritten, I think. I'm not sure what the "it" here you're talking about actually is.

Comment author: Gram_Stone 16 February 2016 12:32:53AM *  2 points [-]

Sorry about the confusion, I mean to say that even though the Against Malaria Foundation observes evidence of the effectiveness of its interventions all of the time, and this is good, the founders of the Against Malaria Foundation had to choose an initial action before they had made any observations about the effectiveness of their interventions. Presumably, there was some first village or region of trial subjects that first empirically demonstrated the effectiveness of durable, insecticidal bednets. But before this first experiment, the AMF also presumably had to rely merely on correct reasoning without corroborative observations to support their arguments. Nonetheless, their reasoning was correct. Experiment is a way to increase our confidence in our reasoning, and it is good to use it when it's available, but we can have confidence at times without it. I use these points to argue that people successfully reason without being able to test the effectiveness of their actions all of the time, and that they often have to.

The more general point is that people often use a very simple heuristic to decide whether or not something academic is worthy of interest: Is it based on evidence and empirical testing? 'Evidence-based medicine' is synonymous with 'safe, useful medicine,' depending on who you ask. Things are bad if they are not based on evidence. But in the case of existential risk interventions, it is a property of the situation that we cannot empirically test the effectiveness of our interventions. It is thus necessary to reason without conducting empirical tests. This is a reason to take the problem more seriously, for its difficulty, as opposed to the reaction of some others, which is that the 'lack of evidence-based methods' is some sort of point against trying to solve the problem anyway.

And in the case of some risks, like AI, it is actually dangerous to conduct empirical testing. It's plausible that sufficiently intelligent unsafe AIs would mimic safe AIs until they gain a decisive strategic advantage. See Bostrom's 'treacherous turn' for more on this.

View more: Next