Comment author: Paul_Christiano 06 August 2017 06:38:41PM *  5 points [-]

This seems to confuse costs and benefits, I don't understand the analysis. (ETA: the guesstimate makes more sense.)

I'm going to assume that a unit of blood is the amount that a single donor gives in a single session. (ETA: apparently a donation is 0.5 units of red blood cells. The analysis below is correct only if red blood cells are 50% of the value of a donation. I have no idea what the real ratio is. If red blood cells are most of the value, adjust all the values downwards by a factor of 2.)

The cost of donating a unit is perhaps 30 minutes (YMMV), and has nothing to do with 120 pounds. (The cost from having less blood for a while might easily dwarf the time cost, I'm not sure. When I've donated the time cost was significantly below 30 minutes.)

Under the efficient-NHS hypothesis, the value of marginal blood to the healthcare system is 120 pounds. We can convert this to QALYs using the marginal rate of (20,000 pounds / QALY), to get 0.6% of a QALY.

If you value all QALYs equally and think that marginal AMF donations buy them at 130 pounds / QALY, then your value for QALYs should be at most 130 pounds / QALY (otherwise you should just donate more). It should be exactly 130 pounds / QALY if you are an AMF donor (otherwise you should just donate less).

So 0.6% of a QALY should be worth about 0.8 pounds. If it takes 30 minutes to produce a unit of blood which is worth 0.6% of a QALY, then it should be producing value at 1.6 pounds / hour.

If the healthcare system was undervaluing blood by one order of magnitude, this would be 16 pounds / hour. So I think "would have to be undervaluing the effectiveness of blood donations by 2 orders of magnitude" is off by about an order of magnitude.

The reason this seems so inefficient has little to do with EA's quantitative mindset, and everything to do with the utilitarian perspective that all QALYs are equal. The revealed preferences of most EA's imply that they value their QALYs much more highly than those of AMF beneficiaries. Conventional morality suggests that people extend some of their concern for themselves to their peers, which probably leads to much higher values for marginal UK QALYs than for AMF beneficiary QALYs.

I think that for most EAs donating blood is still not worthwhile even according to (suitably quantitatively refined) common-sense morality. But for those who value their time at less than 20 pounds / hour and take the numbers in the OP seriously, I think that "common-sense" morality does strongly endorse donating blood. (Obviously this cutoff is based on my other quantitative views, which I'm not going to get into here).

(Note: I would not be surprised if the numbers in the post are wrong in one way or another, so don't really endorse taking any quantitative conclusions literally rather than as a prompt to investigate the issue more closely. That said, if you are able to investigate this question usefully I suspect you should be earning more than 20 pounds / hour.)

I'm very hesitant about EA's giving up on common-sense morality based on naive utilitarian calculations. In the first place, I don't think that most EA's moral reasoning is sufficiently sophisticated to outweigh simple heuristics like "when there are really big gains from trade, take them" (if society is willing to pay 240 pounds / hour for your time, and you value it at 16 pounds per hour, those are pretty big gains from trade). In the second place, even a naive utilitarian should be concerned that the rest of the world will be uncooperative with and unhappy with utilitarians if we are less altruistic than normal people in the ways that matter to our communities.

Comment author: Wei_Dai 11 July 2017 08:42:59AM 4 points [-]

And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.

I'm not sure which approaches you're referring to. Can you link to some details on this?

Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).

I don't understand how this is true. I can see how solving FAI implies solving capability amplification (just emulate the FAI at a low level *), but if all you had was a solution that allows a specific kind of agent (e.g., with values well-defined apart from its implementation details) keep those values as it self-modifies, how does that help a group of short-lived humans who don't know their own values break down an arbitrary cognitive task and perform it safely and as well as an arbitrary competitor?

(* Actually, even this isn't really true. In MIRI's approach, an FAI does not need to be competitive in performance with every AI design in every domain. I think the idea is to either convert mainstream AI research into using the same FAI design, or gain a decisive strategic advantage via superiority in some set of particularly important domains.)

My understanding is, MIRI's approach is to figure out how to safely increase capability by designing a base agent that can make safe use of arbitrary amounts of computing power and can safely improve itself by modifying its own design/code. The capability amplification approach is to figure out how to safely increase capability by taking a short-lived human as the given base agent, making copies of it and and organize how the copies work together. These seem like very different problems with their own difficulties.

I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list).

I agree that in this area MIRI's approach and yours face similar difficulties. People (including me) have criticized CEV for being vague and likely very difficult to define/implement though, so MIRI is not exactly getting a free pass by being vague. (I.e., I assume Daniel already took this into account.)

But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

This seems like a fair point, and I'm not sure how to weight these factors either. Given that discussion isn't particularly costly relative to the potential benefits, an obvious solution is just to encourage more of it. Someone ought to hold a workshop to talk about your ideas, for example.

I think it would also be a good reason to focus on the difficulties that are common to both approaches

This makes sense.

Comment author: Paul_Christiano 11 July 2017 04:04:41PM *  3 points [-]

On capability amplification:

MIRI's traditional goal would allow you to break cognition down into steps that we can describe explicitly and implement on transistors, things like "perform a step of logical deduction," "adjust the probability of this hypothesis," "do a step of backwards chaining," etc. This division does not need to be competitive, but it needs to be reasonably close (close enough to obtain a decisive advantage).

Capability amplification requires breaking cognition down into steps that humans can implement. This decomposition does not need to be competitive, but it needs to be efficient enough that it can be implemented during training. Humans can obviously implement more than transistors, the main difference is that in the agent foundations case you need to figure out every response in advance (but then can have a correspondingly greater reason to think that the decomposition will work / will preserve alignment).

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

On competitiveness:

I would prefer be competitive with non-aligned AI, rather than count on forming a singleton, but this isn't really a requirement of my approach. When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

On reliability:

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction. Both of these seem likely applicable to ML models, though would depend on how exactly they play out.

On the ML side, I think the other promising approaches involve either adversarial training, ensembling / unanimous votes, which could be applied to the agent foundations problem.

Comment author: Wei_Dai 09 July 2017 08:53:55AM 18 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

From the perspective of an observer who can only judge from what's published online, I'm worried that Paul's approach only looks more promising than MIRI's because it's less "mature", having received less scrutiny and criticism from others. I'm not sure what's happening internally in various research groups, but the amount of online discussion about Paul's approach has to be at least an order of magnitude less than what MIRI's approach has received.

(Looking at the thread cited by Rob Bensinger, various people including MIRI people have apparently looked into Paul's approach but have not written down their criticisms. I've been trying to better understand Paul's ideas myself and point out some difficulties that others may have overlooked, but this is hampered by the fact that Paul seems to be the only person who is working on the approach and can participate on the other side of the discussion.)

I think Paul's approach is certainly one of the most promising approaches we currently have, and I wish people paid more attention to it (and/or wrote down their thoughts about it more), but it seems much too early to cite it as an example of an approach that is more promising than HRAD and therefore makes MIRI's work less valuable.

Comment author: Paul_Christiano 10 July 2017 05:37:42PM *  9 points [-]

I agree with this basic point, but I think on the other side there is a large gap in concreteness that makes makes it much easier to usefully criticize my approach (I'm at the stage of actually writing pseudocode and code which we can critique).

So far I think that the problems in my approach will also appear for MIRI's approach. For example:

  • Solomonoff induction or logical inductors have reliability problems that are analogous to reliability problems for machine learning. So to carry out MIRI's agenda either you need to formulate induction differently, or you need to somehow solve these problems. (And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.) I think Eliezer has long understood this problem and has alluded to it, but it hasn't been the topic of much discussion (I think largely because MIRI/Eliezer have so many other problems on their plates).
  • Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).
  • I've given some concrete definitions of deliberation/extrapolation, and there's been public argument about whether they really capture human values. I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list). If you want to actually give a satisfying definition of CEV, I feel you are probably going to have to go down the same path that started with this post. I suspect Eliezer has some ideas for how to avoid these problems, but at this point those ideas have been subject to even less public discussion than my approach.

I agree there are further problems in my agenda that will be turned up by my discussion. But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

If you agree that many of my problems also come up eventually for MIRI's agenda, that's good news about the general applicability of MIRI's research (e.g. the reliability problems for Solomonoff induction may provide a good bridge between MIRI's work and mainstream ML), but I think it would also be a good reason to focus on the difficulties that are common to both approaches rather than to problems like decision theory / self-reference / logical uncertainty / naturalistic agents / ontology identification / multi-level world models / etc.

Comment author: Kerry_Vaughan 07 July 2017 10:55:00PM 2 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

I haven't thought about this in detail, but you might think that whether the evidence in this section justifies the claim in 3c might depend, in part, on what you think the AI Safety project is trying to achieve.

On first pass, the "learning to reason from humans" project seems like it may be able to quickly and substantially reduce the chance of an AI catastrophe by introducing human guidance as a mechanism for making AI systems more conservative.

However, it doesn't seem like a project that aims to do either of the following:

(1) Reduce the risk of an AI catastrophe to zero (or near zero) (2) Produce an AI system that can help create an optimal world

If you think either (1) or (2) are the goals of AI Safety, then you might not be excited about the "learning to reason from humans" project.

You might think that "learning to reason from humans" doesn't accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can't rule out AI catastrophes with high certainty unless you can "peer inside the machine" so to speak. HRAD might allow you to peer inside the machine and make statements about what the machine will do with extremely high certainty.

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

Comment author: Paul_Christiano 08 July 2017 04:05:16PM *  8 points [-]

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

A human can spend an hour on a task, and train an AI to do that task in milliseconds.

Similarly, an aligned AI can spend an hour on a task, and train its successor to do that task in milliseconds.

So you could hope to have a sequence of nice AI's, each significantly smarter than the last, eventually reaching the limits of technology while still reasoning in a way that humans would endorse if they knew more and thought faster.

(This is the kind of approach I've outlined and am working on, and I think that most work along the lines of "learn from human reasoning" will make a similar move.)

Comment author: BenHoffman 01 May 2017 01:11:05AM *  4 points [-]

SlateStarScratchpad claims (with more engagement here) that the literature mainly shows that parents who like hitting their kids or beat them severely do poorly, and that if you control for things like heredity or harsh beatings it’s not obvious that mild corporal punishment is more harmful than other common punishments.

My best guess is that children are very commonly abused (and not just by parents - also by schools), but I don't think the line between physical and nonphysical punishments is all that helpful for understanding the true extent of this.

Comment author: Paul_Christiano 01 May 2017 02:08:32PM 7 points [-]

Scott links to this study, which is more convincing. They measure the difference between "physical mild (slap, spank)" and "physical harsh (use weapon, punch, kick)" punishment, with ~10% of children in the latter category. They consider children of twins to control for genetic confounders, and find something like a 0.2 SD effect on measures of behavioral problems at age 25. There is still confounding (e.g. households where parents beat their kids may be worse in other ways), and the effects are smaller and for rarer forms of punishment, but it is getting somewhere.

Comment author: Paul_Christiano 01 May 2017 12:22:04AM *  17 points [-]

The reported correlations between physical punishment and life outcomes, which underlie the headline $3.6 trillion / year figure, seem unlikely to be causal. I only clicked on the first study, but it made very little effort to control for any of the obvious confounders. (The two relevant controls are mother's education and presence of the father.) The confounding is sufficiently obvious and large that the whole exercise seems kind of crazy. On top of that, as far as I can tell, a causal effect of this size would be inconsistent with adoption studies.

It would be natural to either start with the effect on kids' welfare, which seems pretty easy to think about, or else make a much more serious effort to actually figure out the long-term effects.

Comment author: Zeke_Sherman 28 March 2017 05:53:25PM *  1 point [-]

Thanks for the comments.

Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.

Evolution favors replication. But patience and resource acquisition aren't obviously correlated with any sort of value; if anything, better resource-acquirers are destructive and competitive. The claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic. Furthermore, competitive dynamics lead to systematic failures. See the citation.

Shulman's post assumes that once somewhere is settled, it's permanently inhabited by the same tribe. But I don't buy that. Agents can still spread through violence or through mimicry (remember the quote on fifth-generation warfare).

It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it.

All I am saying is that the argument applies to this issue as well.

Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict. Better technology yields better monitoring, but also better hiding - which is easier, monitoring ICBMs in the 1970's or monitoring cyberweapons today?

One of the most critical pieces of information in these cases is intentions, which are easy to keep secret and will probably remain so for a long time.

By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level?

Yes, or even implementable in current systems.

I think the mandate of AI alignment easily covers the failure modes you have in mind here.

The failure modes here are a different context where the existing research is often less relevant or not relevant at all. Whatever you put under the umbrella of alignment, there is a difference between looking at a particular system with the assumption that it will rebuild the universe in accordance with its value function, and looking at how systems interact in varying numbers. If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics. Figuring out how to optimize large systems of agents is absolutely not a simple matter of figuring out how to build one good agent and then replicating it as much as possible.

Comment author: Paul_Christiano 28 March 2017 10:53:56PM 4 points [-]

If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics

I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")

Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?

the claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic

If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)

Yes, or even implementable in current systems.

We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will "do what we want" we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict.

You're right.

Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don't think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I'm all for evaluating it on its merits. So maybe we are in agreement here.

In response to Utopia In The Fog
Comment author: Paul_Christiano 28 March 2017 04:34:18PM 12 points [-]

It's great to see people thinking about these topics and I agree with many of the sentiments in this post. Now I'm going to write a long comment focusing on those aspects I disagree with. (I think I probably agree with more of this sentiment than most of the people working on alignment, and so I may be unusually happy to shrug off these criticisms.)

Contrasting "multi-agent outcomes" and "superintelligence" seems extremely strange. I think the default expectation is a world full of many superintelligent systems. I'm going to read your use of "superintelligence" as "the emergence of a singleton concurrently with the development of superintelligence."

I don't consider the "single superintelligence" scenario likely, but I don't think that has much effect on the importance of AI alignment research or on the validity of the standard arguments. I do think that the world will gradually move towards being increasingly well-coordinated (and so talking about the world as a single entity will become increasingly reasonable), but I think that we will probably build superintelligent systems long before that process runs its course.

The future looks broadly good in this scenario given approximately utilitarian values and the assumption that ems are conscious, with a large growing population of minds which are optimized for satisfaction and productivity, free of disease and sickness.

On total utilitarian values, the actual experiences of brain emulations (including whether they have any experiences) don't seem very important. What matters are the preferences according to which emulations shape future generations (which will be many orders of magnitude larger).

"freewheeling evolutionary developments, while continuing to produce complex and intelligent forms of organization, lead to the gradual elimination of all forms of being that we care about"

Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.

(Evolution might select for particular values, e.g. if it's impossible to reliably delegate or if it's very expensive to build systems with stable values. But (a) I'd bet against this, and (b) understanding this phenomenon is precisely the alignment problem!)

(I discuss several of these issues here, Carl discusses evolution here.)

Whatever the type of agent, arms races in future technologies would lead to opportunity costs in military expenditures and would interfere with the project of improving welfare. It seems likely that agents designed for security purposes would have preferences and characteristics which fail to optimize for the welfare of themselves and their neighbors. It’s also possible that an arms race would destabilize international systems and act as a catalyst for warfare.

It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it. If there weren't competitive pressure / selection pressure to adopt future AI systems, then alignment would be much less urgent since we could just take our time.

There may be other interventions that improve coordination/peace more broadly, or which improve coordination/peace in particular possible worlds etc., and those should be considered on their merits. It seems totally plausible that some of those projects will be more effective than work on alignment. I'm especially sympathetic to your first suggestion of addressing key questions about what will/could/should happen.

Not only is this a problem on its own, but I see no reason to think that the conditions described above wouldn’t apply for scenarios where AI agents turned out to be the primary actors and decisionmakers rather than transhumans or posthumans.

Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.

More generally, we would like to avoid destructive conflict and are continuously developing new tools for getting what we want / becoming smarter and better-informed / etc.

And on top of all that, the historical trend seems to basically point to lower and lower levels of violent conflict, though this is in a race with greater and greater technological capacity to destroy stuff.

I would be more than happy to bet that the intensity of conflict declines over the long run. I think the question is just how much we should prioritize pushing it down in the short run.

“the only way to avoid having all human values gradually ground down by optimization-competition is to install a Gardener over the entire universe who optimizes for human values.”

I disagree with this. See my earlier claim that evolution only favors patience.

I do agree that some kinds of coordination problems need to be solved, for example we must avoid blowing up the world. These are similar in kind to the coordination problems we confront today though they will continue to get harder and we will have to be able to solve them better over time---we can't have a cold war each century with increasingly powerful technology.

There is still value in AI safety work... but there are other parts of the picture which need to be explored

This conclusion seems safe, but it would be safe even if you thought that early AI systems will precipitate a singleton (since one still cares a great deal about the dynamics of that transition).

Better systems of machine ethics which don’t require superintelligence to be implemented (as coherent extrapolated volition does)

By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level? That will work even if we need to solve alignment prior long before the emergence of a singleton? I'd endorse both of those desiderata.

I think the main difference in alignment work for unipolar vs. multipolar scenarios is how high we draw the bar for "aligned AI," and in particular how closely competitive it must be with unaligned AI. I probably agree with your implicit claim, that they either must be closely competitive or we need new institutional arrangements to avoid trouble.

Rather than having a singleminded focus on averting a particular failure mode

I think the mandate of AI alignment easily covers the failure modes you have in mind here. I think most of the disagreement is about what kinds of considerations will shape the values of future civilizations.

both working on arguments that agents will be linked via a teleological thread where they accurately represent the value functions of their ancestors

At this level of abstraction I don't see how this differs from alignment. I suspect the details differ a lot, in that the alignment community is very focused on the engineering problem of actually building systems that faithfully pursue particular values (and in general I've found that terms like "teleological thread" tend to be linked with persistently low levels of precision).

Comment author: Paul_Christiano 25 March 2017 11:03:36PM 4 points [-]

I owe Michael Nielsen $60k to donate as he pleases if [beacon.nist.gov](beacon.nist.gov/home] is between 0000000000... and 028F5C28F5... at noon PST on 2017/4/2.

Comment author: sdspikes 01 March 2017 01:50:13AM 1 point [-]

As a Stanford CS (BS/MS '10) grad who took AI/Machine Learning courses in college from Andrew Ng, worked at Udacity with Sebastian Thrun, etc. I have mostly been unimpressed by non-technical folks trying to convince me that AI safety (not caused by explicit human malfeasance) is a credible issue.

Maybe I have "easily corrected, false beliefs" but the people I've talked to at MIRI and CFAR have been pretty unconvincing to me, as was the book Superintelligence.

My perception is that MIRI has focused in on an extremely specific kind of AI that to me seems unlikely to do much harm unless someone is recklessly playing with fire (or intentionally trying to set one). I'll grant that that's possible, but that's a human problem, not an AI problem, and requires a human solution.

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

But maybe you do make friendly nuclear power plants? Not sure if this analogy worked out for me or not.

Comment author: Paul_Christiano 01 March 2017 02:27:27AM *  7 points [-]

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

The difficulty of the policy problem depends on the quality of our technical solutions: how large an advantage can you get by behaving unsafely? If the answer is "you get big advantages for sacrificing safety, and a small group behaving unsafely could cause a big problem" then we have put ourselves in a sticky situation and will need to conjure up some unusually effective international coordination.

A perfect technical solution would make the policy problem relatively easy---if we had a scalable+competitive+secure solution to AI control, then there would be minimal risk from reckless actors. On the flip side, a perfect policy solution would make the technical problem relatively easy since we could just collectively decide not to build any kind of AI that could cause trouble. In reality we are probably going to need both.

(I wrote about this here.)

You could hold the position that the advantages from building uncontrolled AI will predictably be very low even without any further work. I disagree strongly with that and think that it contradicts the balance of public argument, though I don't know if I'd call it "easily corrected."

View more: Next