Comment author: ThomasSittler 30 September 2017 09:00:14PM *  3 points [-]

Internally we value the average CEA staff hour at ~$75

This is $75 roughly in "EA money" (i.e. OpenPhil's last dollar), yes? It's significantly lower than I thought. However, I suspect that this intuition was biased (upward), because I more often think in terms of "non-EA money". In non-EA money, CEA time would have a much higher nominal value. But if you think EA money can be used to buy good outcomes very cost-effectively (even at the margin) then $75 could make sense.

Comment author: Paul_Christiano 01 October 2017 04:33:18PM 2 points [-]

However, I suspect that this intuition was biased (upward), because I more often think in terms of "non-EA money". In non-EA money, CEA time would have a much higher nominal value. But if you think EA money can be used to buy good outcomes very cost-effectively (even at the margin) then $75 could make sense.

Normally people discuss the value of time by figuring out how many dollars they'd spend to save an hour. It's kind of unusual to ask how many dollars you'd have someone else spend so that you save an hour.

Comment author: Geuss 15 September 2017 10:04:41AM *  0 points [-]

I don't way to be too harsh, but this is the apotheosis of obtuse Oxford-style analytic philosophy. You can make whatever conceptual distinctions you like, but you should really be starting from the historical and sociological reality of capitalism. The case for why capitalism generates selfish motivations is not obscure.

Capitalism is a set of property relations that emerged in early modern England because its weak feudal aristocracy had no centralised apparatus by which to extract value from peasants, and so turned to renting out land to the unusually large number of tenants in the country - generating (a) competitive market pressures to maximise productivity; (b) landless peasants that were suddenly deprived of the means of subsistence farming. The peasants were forced to sell their labour - the labour they had heretofore been performing for themselves, on their own terms - to the emerging class of agrarian capitalists, who extracted a portion of their product to re-invest in their holdings.

The capitalists have to maximise productivity through technological innovation, wage repression, and so forth, or they are run into the ground and bankrupted by market competition. There is, as such, a set of self-interested motivations which one acquires if one wants to be a successful and lasting capitalist. It is a condition of the role within the structure of the market. The worker has to, on the other hand, sell themselves to those with a monopoly of the means of subsistence or face starvation. To do so they have to acquire the skills, comport and obedience to be attractive to the capitalist class. Again, one has to acquire certain self-interested motivations as a condition of the role within the market. Finally, capitalism requires a sufficiently self-interested culture such that it can sustain compounding capital accumulation through the sale of ever-greater commodities.

Comment author: Paul_Christiano 16 September 2017 03:17:54AM *  3 points [-]

Finally, capitalism requires a sufficiently self-interested culture such that it can sustain compounding capital accumulation through the sale of ever-greater commodities.

This is a common claim, but seems completely wrong. An economy of perfectly patient agents will accumulate capital much faster than a community that consumes 50% of its output. The patient agents will invest in infrastructure and technology and machines and so on to increase their future wealth.

The capitalists have to maximise productivity through technological innovation, wage repression, and so forth, or they are run into the ground and bankrupted by market competition

In an efficient market, the capitalists earn rents on their capital whatever they do.

Comment author: kbog  (EA Profile) 29 August 2017 09:11:25AM *  2 points [-]

Cheap autonomous weapons could greatly decrease the cost of ending life---within a decade they could easily be the cheapest form of terrorism by far, and may eventually be the cheapest mass destruction in general. Think insect-sized drones carrying toxins or explosive charges that are lethal if detonated inside the skull.

That sounds a lot more expensive than bullets. You can already kill someone for a quarter.

If weaponized insect drones become cheap enough for terrorists to build and use, then counterterrorist organizations will be able to acquire large numbers of potent surveillance tools to find and eliminate their centers of fabrication.

You should have a low prior for new military technologies to alter the balance of power between different types of groups in a specific way. It hasn't happened much in history, because competing groups can take advantage of these technologies too.

The greater the military significance of AI, the more difficult it becomes for states to share information and coordinate regarding its development. This might be bad news for safety.

Of course the military AIs will be kept secret. But the rest of AI work won't be like that. Cruise liners aren't less safe because militaries are secretive about warship design.

Plus, in Bostrom and Armstrong's Race to the Precipice paper, it is shown that uncertainty about other nations' AI capabilities actually makes things safer.

Comment author: Paul_Christiano 29 August 2017 05:16:46PM 4 points [-]

That sounds a lot more expensive than bullets. You can already kill someone for a quarter.

The main cost of killing someone with a bullet is labor. The point is that autonomous weapons reduce the labor required.

alter the balance of power between different types of groups in a specific way.

New technologies do often decrease the cost of killing people and increase the number of civilians who can be killed by a group of fixed size (see: guns, explosives, nuclear weapons).

Comment author: Paul_Christiano 29 August 2017 04:50:34AM *  5 points [-]

The two arguments I most often hear are:

  • Cheap autonomous weapons could greatly decrease the cost of ending life---within a decade they could easily be the cheapest form of terrorism by far, and may eventually be the cheapest mass destruction in general. Think insect-sized drones carrying toxins or explosive charges that are lethal if detonated inside the skull.

  • The greater the military significance of AI, the more difficult it becomes for states to share information and coordinate regarding its development. This might be bad news for safety.

Comment author: Paul_Christiano 06 August 2017 06:38:41PM *  5 points [-]

This seems to confuse costs and benefits, I don't understand the analysis. (ETA: the guesstimate makes more sense.)

I'm going to assume that a unit of blood is the amount that a single donor gives in a single session. (ETA: apparently a donation is 0.5 units of red blood cells. The analysis below is correct only if red blood cells are 50% of the value of a donation. I have no idea what the real ratio is. If red blood cells are most of the value, adjust all the values downwards by a factor of 2.)

The cost of donating a unit is perhaps 30 minutes (YMMV), and has nothing to do with 120 pounds. (The cost from having less blood for a while might easily dwarf the time cost, I'm not sure. When I've donated the time cost was significantly below 30 minutes.)

Under the efficient-NHS hypothesis, the value of marginal blood to the healthcare system is 120 pounds. We can convert this to QALYs using the marginal rate of (20,000 pounds / QALY), to get 0.6% of a QALY.

If you value all QALYs equally and think that marginal AMF donations buy them at 130 pounds / QALY, then your value for QALYs should be at most 130 pounds / QALY (otherwise you should just donate more). It should be exactly 130 pounds / QALY if you are an AMF donor (otherwise you should just donate less).

So 0.6% of a QALY should be worth about 0.8 pounds. If it takes 30 minutes to produce a unit of blood which is worth 0.6% of a QALY, then it should be producing value at 1.6 pounds / hour.

If the healthcare system was undervaluing blood by one order of magnitude, this would be 16 pounds / hour. So I think "would have to be undervaluing the effectiveness of blood donations by 2 orders of magnitude" is off by about an order of magnitude.

The reason this seems so inefficient has little to do with EA's quantitative mindset, and everything to do with the utilitarian perspective that all QALYs are equal. The revealed preferences of most EA's imply that they value their QALYs much more highly than those of AMF beneficiaries. Conventional morality suggests that people extend some of their concern for themselves to their peers, which probably leads to much higher values for marginal UK QALYs than for AMF beneficiary QALYs.

I think that for most EAs donating blood is still not worthwhile even according to (suitably quantitatively refined) common-sense morality. But for those who value their time at less than 20 pounds / hour and take the numbers in the OP seriously, I think that "common-sense" morality does strongly endorse donating blood. (Obviously this cutoff is based on my other quantitative views, which I'm not going to get into here).

(Note: I would not be surprised if the numbers in the post are wrong in one way or another, so don't really endorse taking any quantitative conclusions literally rather than as a prompt to investigate the issue more closely. That said, if you are able to investigate this question usefully I suspect you should be earning more than 20 pounds / hour.)

I'm very hesitant about EA's giving up on common-sense morality based on naive utilitarian calculations. In the first place, I don't think that most EA's moral reasoning is sufficiently sophisticated to outweigh simple heuristics like "when there are really big gains from trade, take them" (if society is willing to pay 240 pounds / hour for your time, and you value it at 16 pounds per hour, those are pretty big gains from trade). In the second place, even a naive utilitarian should be concerned that the rest of the world will be uncooperative with and unhappy with utilitarians if we are less altruistic than normal people in the ways that matter to our communities.

Comment author: Wei_Dai 11 July 2017 08:42:59AM 4 points [-]

And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.

I'm not sure which approaches you're referring to. Can you link to some details on this?

Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).

I don't understand how this is true. I can see how solving FAI implies solving capability amplification (just emulate the FAI at a low level *), but if all you had was a solution that allows a specific kind of agent (e.g., with values well-defined apart from its implementation details) keep those values as it self-modifies, how does that help a group of short-lived humans who don't know their own values break down an arbitrary cognitive task and perform it safely and as well as an arbitrary competitor?

(* Actually, even this isn't really true. In MIRI's approach, an FAI does not need to be competitive in performance with every AI design in every domain. I think the idea is to either convert mainstream AI research into using the same FAI design, or gain a decisive strategic advantage via superiority in some set of particularly important domains.)

My understanding is, MIRI's approach is to figure out how to safely increase capability by designing a base agent that can make safe use of arbitrary amounts of computing power and can safely improve itself by modifying its own design/code. The capability amplification approach is to figure out how to safely increase capability by taking a short-lived human as the given base agent, making copies of it and and organize how the copies work together. These seem like very different problems with their own difficulties.

I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list).

I agree that in this area MIRI's approach and yours face similar difficulties. People (including me) have criticized CEV for being vague and likely very difficult to define/implement though, so MIRI is not exactly getting a free pass by being vague. (I.e., I assume Daniel already took this into account.)

But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

This seems like a fair point, and I'm not sure how to weight these factors either. Given that discussion isn't particularly costly relative to the potential benefits, an obvious solution is just to encourage more of it. Someone ought to hold a workshop to talk about your ideas, for example.

I think it would also be a good reason to focus on the difficulties that are common to both approaches

This makes sense.

Comment author: Paul_Christiano 11 July 2017 04:04:41PM *  3 points [-]

On capability amplification:

MIRI's traditional goal would allow you to break cognition down into steps that we can describe explicitly and implement on transistors, things like "perform a step of logical deduction," "adjust the probability of this hypothesis," "do a step of backwards chaining," etc. This division does not need to be competitive, but it needs to be reasonably close (close enough to obtain a decisive advantage).

Capability amplification requires breaking cognition down into steps that humans can implement. This decomposition does not need to be competitive, but it needs to be efficient enough that it can be implemented during training. Humans can obviously implement more than transistors, the main difference is that in the agent foundations case you need to figure out every response in advance (but then can have a correspondingly greater reason to think that the decomposition will work / will preserve alignment).

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

On competitiveness:

I would prefer be competitive with non-aligned AI, rather than count on forming a singleton, but this isn't really a requirement of my approach. When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

On reliability:

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction. Both of these seem likely applicable to ML models, though would depend on how exactly they play out.

On the ML side, I think the other promising approaches involve either adversarial training, ensembling / unanimous votes, which could be applied to the agent foundations problem.

Comment author: Wei_Dai 09 July 2017 08:53:55AM 18 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

From the perspective of an observer who can only judge from what's published online, I'm worried that Paul's approach only looks more promising than MIRI's because it's less "mature", having received less scrutiny and criticism from others. I'm not sure what's happening internally in various research groups, but the amount of online discussion about Paul's approach has to be at least an order of magnitude less than what MIRI's approach has received.

(Looking at the thread cited by Rob Bensinger, various people including MIRI people have apparently looked into Paul's approach but have not written down their criticisms. I've been trying to better understand Paul's ideas myself and point out some difficulties that others may have overlooked, but this is hampered by the fact that Paul seems to be the only person who is working on the approach and can participate on the other side of the discussion.)

I think Paul's approach is certainly one of the most promising approaches we currently have, and I wish people paid more attention to it (and/or wrote down their thoughts about it more), but it seems much too early to cite it as an example of an approach that is more promising than HRAD and therefore makes MIRI's work less valuable.

Comment author: Paul_Christiano 10 July 2017 05:37:42PM *  9 points [-]

I agree with this basic point, but I think on the other side there is a large gap in concreteness that makes makes it much easier to usefully criticize my approach (I'm at the stage of actually writing pseudocode and code which we can critique).

So far I think that the problems in my approach will also appear for MIRI's approach. For example:

  • Solomonoff induction or logical inductors have reliability problems that are analogous to reliability problems for machine learning. So to carry out MIRI's agenda either you need to formulate induction differently, or you need to somehow solve these problems. (And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.) I think Eliezer has long understood this problem and has alluded to it, but it hasn't been the topic of much discussion (I think largely because MIRI/Eliezer have so many other problems on their plates).
  • Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).
  • I've given some concrete definitions of deliberation/extrapolation, and there's been public argument about whether they really capture human values. I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list). If you want to actually give a satisfying definition of CEV, I feel you are probably going to have to go down the same path that started with this post. I suspect Eliezer has some ideas for how to avoid these problems, but at this point those ideas have been subject to even less public discussion than my approach.

I agree there are further problems in my agenda that will be turned up by my discussion. But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

If you agree that many of my problems also come up eventually for MIRI's agenda, that's good news about the general applicability of MIRI's research (e.g. the reliability problems for Solomonoff induction may provide a good bridge between MIRI's work and mainstream ML), but I think it would also be a good reason to focus on the difficulties that are common to both approaches rather than to problems like decision theory / self-reference / logical uncertainty / naturalistic agents / ontology identification / multi-level world models / etc.

Comment author: Kerry_Vaughan 07 July 2017 10:55:00PM 2 points [-]

3c. Other research, especially "learning to reason from humans," looks more promising than HRAD (75%?)

I haven't thought about this in detail, but you might think that whether the evidence in this section justifies the claim in 3c might depend, in part, on what you think the AI Safety project is trying to achieve.

On first pass, the "learning to reason from humans" project seems like it may be able to quickly and substantially reduce the chance of an AI catastrophe by introducing human guidance as a mechanism for making AI systems more conservative.

However, it doesn't seem like a project that aims to do either of the following:

(1) Reduce the risk of an AI catastrophe to zero (or near zero) (2) Produce an AI system that can help create an optimal world

If you think either (1) or (2) are the goals of AI Safety, then you might not be excited about the "learning to reason from humans" project.

You might think that "learning to reason from humans" doesn't accomplish (1) because a) logic and mathematics seem to be the only methods we have for stating things with extremely high certainty, and b) you probably can't rule out AI catastrophes with high certainty unless you can "peer inside the machine" so to speak. HRAD might allow you to peer inside the machine and make statements about what the machine will do with extremely high certainty.

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

Comment author: Paul_Christiano 08 July 2017 04:05:16PM *  8 points [-]

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

A human can spend an hour on a task, and train an AI to do that task in milliseconds.

Similarly, an aligned AI can spend an hour on a task, and train its successor to do that task in milliseconds.

So you could hope to have a sequence of nice AI's, each significantly smarter than the last, eventually reaching the limits of technology while still reasoning in a way that humans would endorse if they knew more and thought faster.

(This is the kind of approach I've outlined and am working on, and I think that most work along the lines of "learn from human reasoning" will make a similar move.)

Comment author: BenHoffman 01 May 2017 01:11:05AM *  4 points [-]

SlateStarScratchpad claims (with more engagement here) that the literature mainly shows that parents who like hitting their kids or beat them severely do poorly, and that if you control for things like heredity or harsh beatings it’s not obvious that mild corporal punishment is more harmful than other common punishments.

My best guess is that children are very commonly abused (and not just by parents - also by schools), but I don't think the line between physical and nonphysical punishments is all that helpful for understanding the true extent of this.

Comment author: Paul_Christiano 01 May 2017 02:08:32PM 7 points [-]

Scott links to this study, which is more convincing. They measure the difference between "physical mild (slap, spank)" and "physical harsh (use weapon, punch, kick)" punishment, with ~10% of children in the latter category. They consider children of twins to control for genetic confounders, and find something like a 0.2 SD effect on measures of behavioral problems at age 25. There is still confounding (e.g. households where parents beat their kids may be worse in other ways), and the effects are smaller and for rarer forms of punishment, but it is getting somewhere.

Comment author: Paul_Christiano 01 May 2017 12:22:04AM *  17 points [-]

The reported correlations between physical punishment and life outcomes, which underlie the headline $3.6 trillion / year figure, seem unlikely to be causal. I only clicked on the first study, but it made very little effort to control for any of the obvious confounders. (The two relevant controls are mother's education and presence of the father.) The confounding is sufficiently obvious and large that the whole exercise seems kind of crazy. On top of that, as far as I can tell, a causal effect of this size would be inconsistent with adoption studies.

It would be natural to either start with the effect on kids' welfare, which seems pretty easy to think about, or else make a much more serious effort to actually figure out the long-term effects.

View more: Next