Comment author: Zeke_Sherman 28 March 2017 05:53:25PM *  1 point [-]

Thanks for the comments.

Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.

Evolution favors replication. But patience and resource acquisition aren't obviously correlated with any sort of value; if anything, better resource-acquirers are destructive and competitive. The claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic. Furthermore, competitive dynamics lead to systematic failures. See the citation.

Shulman's post assumes that once somewhere is settled, it's permanently inhabited by the same tribe. But I don't buy that. Agents can still spread through violence or through mimicry (remember the quote on fifth-generation warfare).

It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it.

All I am saying is that the argument applies to this issue as well.

Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict. Better technology yields better monitoring, but also better hiding - which is easier, monitoring ICBMs in the 1970's or monitoring cyberweapons today?

One of the most critical pieces of information in these cases is intentions, which are easy to keep secret and will probably remain so for a long time.

By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level?

Yes, or even implementable in current systems.

I think the mandate of AI alignment easily covers the failure modes you have in mind here.

The failure modes here are a different context where the existing research is often less relevant or not relevant at all. Whatever you put under the umbrella of alignment, there is a difference between looking at a particular system with the assumption that it will rebuild the universe in accordance with its value function, and looking at how systems interact in varying numbers. If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics. Figuring out how to optimize large systems of agents is absolutely not a simple matter of figuring out how to build one good agent and then replicating it as much as possible.

Comment author: Paul_Christiano 28 March 2017 10:53:56PM 3 points [-]

If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics

I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")

Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?

the claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic

If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)

Yes, or even implementable in current systems.

We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will "do what we want" we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict.

You're right.

Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don't think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I'm all for evaluating it on its merits. So maybe we are in agreement here.

In response to Utopia In The Fog
Comment author: Paul_Christiano 28 March 2017 04:34:18PM 8 points [-]

It's great to see people thinking about these topics and I agree with many of the sentiments in this post. Now I'm going to write a long comment focusing on those aspects I disagree with. (I think I probably agree with more of this sentiment than most of the people working on alignment, and so I may be unusually happy to shrug off these criticisms.)

Contrasting "multi-agent outcomes" and "superintelligence" seems extremely strange. I think the default expectation is a world full of many superintelligent systems. I'm going to read your use of "superintelligence" as "the emergence of a singleton concurrently with the development of superintelligence."

I don't consider the "single superintelligence" scenario likely, but I don't think that has much effect on the importance of AI alignment research or on the validity of the standard arguments. I do think that the world will gradually move towards being increasingly well-coordinated (and so talking about the world as a single entity will become increasingly reasonable), but I think that we will probably build superintelligent systems long before that process runs its course.

The future looks broadly good in this scenario given approximately utilitarian values and the assumption that ems are conscious, with a large growing population of minds which are optimized for satisfaction and productivity, free of disease and sickness.

On total utilitarian values, the actual experiences of brain emulations (including whether they have any experiences) don't seem very important. What matters are the preferences according to which emulations shape future generations (which will be many orders of magnitude larger).

"freewheeling evolutionary developments, while continuing to produce complex and intelligent forms of organization, lead to the gradual elimination of all forms of being that we care about"

Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.

(Evolution might select for particular values, e.g. if it's impossible to reliably delegate or if it's very expensive to build systems with stable values. But (a) I'd bet against this, and (b) understanding this phenomenon is precisely the alignment problem!)

(I discuss several of these issues here, Carl discusses evolution here.)

Whatever the type of agent, arms races in future technologies would lead to opportunity costs in military expenditures and would interfere with the project of improving welfare. It seems likely that agents designed for security purposes would have preferences and characteristics which fail to optimize for the welfare of themselves and their neighbors. It’s also possible that an arms race would destabilize international systems and act as a catalyst for warfare.

It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it. If there weren't competitive pressure / selection pressure to adopt future AI systems, then alignment would be much less urgent since we could just take our time.

There may be other interventions that improve coordination/peace more broadly, or which improve coordination/peace in particular possible worlds etc., and those should be considered on their merits. It seems totally plausible that some of those projects will be more effective than work on alignment. I'm especially sympathetic to your first suggestion of addressing key questions about what will/could/should happen.

Not only is this a problem on its own, but I see no reason to think that the conditions described above wouldn’t apply for scenarios where AI agents turned out to be the primary actors and decisionmakers rather than transhumans or posthumans.

Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.

More generally, we would like to avoid destructive conflict and are continuously developing new tools for getting what we want / becoming smarter and better-informed / etc.

And on top of all that, the historical trend seems to basically point to lower and lower levels of violent conflict, though this is in a race with greater and greater technological capacity to destroy stuff.

I would be more than happy to bet that the intensity of conflict declines over the long run. I think the question is just how much we should prioritize pushing it down in the short run.

“the only way to avoid having all human values gradually ground down by optimization-competition is to install a Gardener over the entire universe who optimizes for human values.”

I disagree with this. See my earlier claim that evolution only favors patience.

I do agree that some kinds of coordination problems need to be solved, for example we must avoid blowing up the world. These are similar in kind to the coordination problems we confront today though they will continue to get harder and we will have to be able to solve them better over time---we can't have a cold war each century with increasingly powerful technology.

There is still value in AI safety work... but there are other parts of the picture which need to be explored

This conclusion seems safe, but it would be safe even if you thought that early AI systems will precipitate a singleton (since one still cares a great deal about the dynamics of that transition).

Better systems of machine ethics which don’t require superintelligence to be implemented (as coherent extrapolated volition does)

By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level? That will work even if we need to solve alignment prior long before the emergence of a singleton? I'd endorse both of those desiderata.

I think the main difference in alignment work for unipolar vs. multipolar scenarios is how high we draw the bar for "aligned AI," and in particular how closely competitive it must be with unaligned AI. I probably agree with your implicit claim, that they either must be closely competitive or we need new institutional arrangements to avoid trouble.

Rather than having a singleminded focus on averting a particular failure mode

I think the mandate of AI alignment easily covers the failure modes you have in mind here. I think most of the disagreement is about what kinds of considerations will shape the values of future civilizations.

both working on arguments that agents will be linked via a teleological thread where they accurately represent the value functions of their ancestors

At this level of abstraction I don't see how this differs from alignment. I suspect the details differ a lot, in that the alignment community is very focused on the engineering problem of actually building systems that faithfully pursue particular values (and in general I've found that terms like "teleological thread" tend to be linked with persistently low levels of precision).

Comment author: Paul_Christiano 25 March 2017 11:03:36PM 3 points [-]

I owe Michael Nielsen $60k to donate as he pleases if [beacon.nist.gov](beacon.nist.gov/home] is between 0000000000... and 028F5C28F5... at noon PST on 2017/4/2.

Comment author: sdspikes 01 March 2017 01:50:13AM 1 point [-]

As a Stanford CS (BS/MS '10) grad who took AI/Machine Learning courses in college from Andrew Ng, worked at Udacity with Sebastian Thrun, etc. I have mostly been unimpressed by non-technical folks trying to convince me that AI safety (not caused by explicit human malfeasance) is a credible issue.

Maybe I have "easily corrected, false beliefs" but the people I've talked to at MIRI and CFAR have been pretty unconvincing to me, as was the book Superintelligence.

My perception is that MIRI has focused in on an extremely specific kind of AI that to me seems unlikely to do much harm unless someone is recklessly playing with fire (or intentionally trying to set one). I'll grant that that's possible, but that's a human problem, not an AI problem, and requires a human solution.

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

But maybe you do make friendly nuclear power plants? Not sure if this analogy worked out for me or not.

Comment author: Paul_Christiano 01 March 2017 02:27:27AM *  7 points [-]

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

The difficulty of the policy problem depends on the quality of our technical solutions: how large an advantage can you get by behaving unsafely? If the answer is "you get big advantages for sacrificing safety, and a small group behaving unsafely could cause a big problem" then we have put ourselves in a sticky situation and will need to conjure up some unusually effective international coordination.

A perfect technical solution would make the policy problem relatively easy---if we had a scalable+competitive+secure solution to AI control, then there would be minimal risk from reckless actors. On the flip side, a perfect policy solution would make the technical problem relatively easy since we could just collectively decide not to build any kind of AI that could cause trouble. In reality we are probably going to need both.

(I wrote about this here.)

You could hold the position that the advantages from building uncontrolled AI will predictably be very low even without any further work. I disagree strongly with that and think that it contradicts the balance of public argument, though I don't know if I'd call it "easily corrected."

Comment author: capybaralet 04 January 2017 07:32:22PM *  2 points [-]

I think I was too terse; let me explain my model a bit more.

I think there's a decent chance (OTTMH, let's say 10%) that without any deliberate effort we make an AI which wipes our humanity, but is anyhow more ethically valuable than us (although not more than something which we deliberately design to be ethically valuable). This would happen, e.g. if this was the default outcome (e.g. if it turns out to be the case that intelligence ~ ethical value). This may actually be the most likely path to victory.**

There's also some chance that all we need to do to ensure that AI has (some) ethical value (e.g. due to having qualia) is X. In that case, we might increase our chance of doing X by understanding qualia a bit better.

Finally, my point was that I can easily imagine a scenario in which our alternatives are: 1. Build an AI with 50% chance of being aligned, 50% chance of just being an AI (with P(AI has property X) = 90% if we understand qualia better, 10% else) 2. Allow our competitors to build an AI with ~0% chance of being ethically valuable.

So then we obviously prefer option1, and if we understand qualia better, option 1 becomes better.

* I notice as I type this that this may have some strange consequences RE high-level strategy; e.g. maybe it's better to just make something intelligent ASAP and hope that it has ethical value, because this reduces *its X-risk, and we might not be able to do much to change the distribution of the ethical value the AI we create produces that much anyhow. I tend to think that we should aim to be very confident that the AI we build is going to have lots of ethical value, but this may only make sense if we have a pretty good chance of succeeding.

Comment author: Paul_Christiano 20 January 2017 06:58:42PM 1 point [-]

Ah, that makes a lot more sense, sorry for misinterpreting you. (I think Toby has a view closer to the one I was responding to, though I suspect I am also oversimplifying his view.)

I agree that there are important philosophical questions that bear on the goodness of building various kinds of (unaligned) AI, and I think that those questions do have impact on what we ought to do. The biggest prize is if it turns out that some kinds of unaligned AI are much better than others, which I think is plausible. I guess we probably have similar views on these issues, modulo me being more optimistic about the prospects for aligned AI.

I don't think that an understanding of qualia is an important input into this issue though.

For example, from a long-run ethical perspective, whether or not humans have qualia is not especially important, and what mostly matters is human preferences (since those are what shape the future). If you created a race of p-zombies that nevertheless shared our preferences about qualia, I think it would be fine. And "the character of human preferences" is a very different kind of object than qualia. These questions are related in various ways (e.g. our beliefs about qualia are related to our qualia and to philosophical arguments about consciousness), but after thinking about that a little bit I think it is unlikely that the interaction is very important.

To summarize, I do agree that there are time-sensitive ethical questions about the moral value of creating unaligned AI. This was item 1.2 in this list from 4 years ago. I could imagine concluding that the nature of qualia is an important input into this question, but don't currently believe that.

Comment author: vollmer 08 December 2016 01:01:38PM *  0 points [-]

(If EAs end up committing a significant proportion of the $100k (or even all of it), will Paul reduce his 'haircut' or not?)

Comment author: Paul_Christiano 31 December 2016 03:32:26AM 2 points [-]

It looks like the total will be around $50k, so I'm going to reduce the cut to 0.5%.

Comment author: capybaralet 20 December 2016 01:00:19AM 1 point [-]

Hey I (David Krueger) remember we spoke about this a bit with Toby when I was at FHI this summer.

I think we should be aiming for something like CEV, but we might not get it, and we should definitely consider scenarios where we have to settle for less.

For instance, some value-aligned group might find that its best option (due to competitive pressures) is to create an AI which has a 50% probability of being CEV-like or "aligned via corrigibility", but has a 50% probability of (effectively) prematurely settling on a utility function whose goodness depends heavily on the nature of qualia.

If (as I believe) such a scenario is likely, then the problem is time-sensitive.

Comment author: Paul_Christiano 20 December 2016 03:20:18AM *  2 points [-]

(effectively) prematurely settling on a utility function whose goodness depends heavily on the nature of qualia

This feels extremely unlikely; I don't think we have plausible paths to obtaining a non-negligibly good outcome without retaining the ability to effectively deliberate about e.g. the nature of qualia. I also suspect that we will be able to solve the control problem, and if we can't then it will be because of failure modes that can't be avoided by settling on a utility function. Of course "can't see any way it can happen" is not the same as "am justifiably confident it won't happen," but I think in this case it's enough to get us to pretty extreme odds.

More precisely, I'd give 100:1 against: (a) we will fail to solve the control problem in a satisfying war, (b) we will fall back to a solution which depends on our current understanding of qualia, (c) the resulting outcome will be non-negligibly good according to our view about qualia at the time that we build AI, and (d) it will be good because we hold that view about qualia.

(My real beliefs might be higher than 1% just based on "I haven't thought about it very long" and peer disagreement. But I think it's more likely than not that I would accept a bet at 100:1 odds after deliberation, even given that reasonable people disagree.)

(By non-negligibly good I mean that we would be willing to make some material sacrifice to improve its probability compared to a barren universe, perhaps of $1000/1% increase. By because I mean that the outcome would have been non-negligibly worse according to that view if we had not held it.)

I'm not sure if there is any way to turn the disagreement into a bet. Perhaps picking an arbiter and looking at their views in a decade? (e.g. Toby, Carl Schulman, Wei Dai?) This would obviously involve less extreme odds.

Probably more interesting than betting is resolving the disagreement. This seems to be a slightly persistent disagreement between me and Toby, I have never managed to really understand his position but we haven't talked about it much. I'm curious about what kind of solutions you see as plausible---it sounds like your view is based on a more detailed picture rather than an "anything might happen" view.

Comment author: JoshYou 05 December 2016 02:25:48AM 1 point [-]

I'm still pretty confused about why you think donating 10% has to be time-confusing. People who outsource their donation decisions to, say, Givewell might only spend a few hours a year (or a few minutes, depending on how literally we interpret "outsourcing) deciding where to donate.

Comment author: Paul_Christiano 20 December 2016 03:11:03AM *  5 points [-]

I think that donor lotteries are a considerably stronger argument than GiveWell for the claim "donating 10% doesn't have to be time-consuming."

Your argument (with GiveWell in place of a lottery) requires that either (a) you think that GiveWell charities are clearly the best use of funds, or (b) by "doesn't have to be time-consuming" you mean "if you don't necessarily want to do the most good." I don't think you should be confused about why someone would disagree with (a), nor about why someone would think that (b) is a silly usage.

If there were low-friction donor lotteries, I suspect that most small GiveWell donors would be better-served by gambling up to perhaps $1M and then thinking about it at considerably greater length. I expect a significant fraction of them would end up funding something other than GiveWell top charities.

(I was originally supportive but kind of lukewarm about donor lotteries, but I think I've now come around to Carl's level of enthusiasm.)

Comment author: Ben_Todd 06 December 2016 12:35:08AM 5 points [-]

Thanks for moving this post to here rather than FB. I think it's a good discussion, however, I wanted to flag:

None of these criticisms are new to me. I think all of them have been discussed in some depth within CEA.

This makes me wonder if the problem is actually a failure of communication. Unfortunately, issues like this are costly to communicate outside of the organisation, and it often doesn't seem like the best use of time, but maybe that's wrong.

Given this, I think it also makes sense to run critical posts past the organisation concerned before posting. They might have already dealt with the issue, or have plans to do so, in which posting the criticism is significantly less valuable (because it incurs similar costs to the org but with fewer benefits). It also helps the community avoid re-treading the same ground.

Comment author: Paul_Christiano 20 December 2016 03:00:02AM *  2 points [-]

I assume this discussion is mostly aimed at people outside of CEA who are considering whether to take and help promote the pledge. I think there are many basic points which those people should probably understand but which CEA (understandably) isn't keen to talk about, and it is reasonable for people outside of CEA to talk about them instead.

I expect this discussion wasn't worth the time at any rate, but it seems like sharing it with CEA isn't really going to save time on net.

Comment author: Robert_Wiblin 09 December 2016 10:47:22PM 2 points [-]

Firstly: I think we should use the interpretation of the pledge that produces the best outcome. The use GWWC and I apply is completely mainstream use of the term pledge (e.g. you 'pledge' to stay with the person you marry, but people nonetheless get divorced if they think the marriage is too harmful to continue).

A looser interpretation is better because more people will be willing to participate, and each person gain from a smaller and more reasonable push towards moral behaviour. We certainly don't want people to be compelled to do things they think are morally wrong - that doesn't achieve an EA goal. That would be bad. Indeed it's the original complaint here.

Secondly: An "evil future you" who didn't care about the good you can do through donations probably wouldn't care much about keeping promises made by a different kind of person in the past either, I wouldn't think.

Thirdly: The coordination thing doesn't really matter here because you are only 'cooperating' with your future self, who can't really reject you because they don't exist yet (unlike another person who is deciding whether to help you).

One thing I suspect is going on here is that people on the autism spectrum interpret all kinds of promises to be more binding than neurotypical people do (e.g. https://www.reddit.com/r/aspergers/comments/46zo2s/promises/). I don't know if that applies to any individual here specifically, but I think it explains how some of us have very different intuitions. But I expect we will be able to do more good if we apply the neurotypical intuitions that most people share.

Of course if you want to make it fully binding for yourself, then nobody can really stop you.

Comment author: Paul_Christiano 20 December 2016 02:47:51AM 6 points [-]

Secondly: An "evil future you" who didn't care about the good you can do through donations probably wouldn't care much about keeping promises made by a different kind of person in the past either, I wouldn't think.

[...] there's no point having a commitment device to prompt you to follow through on something you don't think you should do

Usually we promise to do something that we would not have done otherwise, i.e. which may not be in line with our future self's interests. The promise "I will do X if my future self wants to" is gratuitous.

When I promise to do something I will try to do it, even if my preferences change. Perhaps you are reading "evil" as meaning "lacks integrity" rather than "is not altruistic," but in context that doesn't make much sense.

It seems reasonable for GWWC to say that the GWWC pledge is intended more as a statement of intent than as a commitment; it would be interesting to understand whether this is how most people who come into contact with GWWC perceive the pledge. If there is systematic misperception, it seems like the appropriate response is "oops, sorry" and to fix the misperception.

Thirdly: The coordination thing doesn't really matter here because you are only 'cooperating' with your future self, who can't really reject you because they don't exist yet (unlike another person who is deciding whether to help you).

It does not seem to me that the main purpose of taking the GWWC pledge, nor its main effect, is to influence the pledger's behavior.

View more: Next