Comment author: SoerenMind  (EA Profile) 20 September 2017 08:18:50PM *  1 point [-]

After some clarification Dayan thinks that vigour is not the thing I was looking for.

We discussed this a bit further and he suggested that the temporal difference error does track pretty closely what we mean by happiness/suffering, at least as far as the zero point is concerned. Here's a paper making the case (but it has limited scope IMO).

If that's true, we wouldn't need e.g. the theory that there's a zero point to keep firing rates close to zero.

The only problem with TD errors seems to be that they don't account for the difference between wanting and liking. But it's currently just unresolved what the function of liking is. So I came away with the impression that liking vs wanting and not the zero point is the central question.

I've seen one paper suggesting that liking is basically the consumption of rewards, which would bring us back to the question of the zero point though. But we didn't find that theory satisfying. E.g. food is just a proxy for survival. And as the paper I linked shows, happiness can follow TD errors even when no rewards are consumed.

Dayan mentioned that liking may even be an epiphenomenon of some things that are going on in the brain when we eat food/have sex etc, similar to how the specific flavour of pleasure we get from listening to music is such an epiphenomenon. I don't know if that would mean that liking has no function.

Any thoughts?

Comment author: Brian_Tomasik 21 September 2017 03:37:07AM 0 points [-]

Interesting. :)

Daswani and Leike (2015) also define (p. 4) happiness as the temporal difference error (in an MDP), and for model-based agents, the definition is, in my interpretation, basically the common Internet slogan that "happiness = reality - expectations". However, the authors point out (p. 2) that pleasure = reward != happiness. This still leaves open the issue of what pleasure is.

Personally I think pleasure is more morally relevant. In Tomasik (2014), I wrote (p. 11):

After training, dopamine spikes when a cue appears signaling that a reward will arrive, not when the reward itself is consumed [Schultz et al., 1997], but we know subjectively that the main pleasure of a reward comes from consuming it, not predicting it. In other words, in equation (1), the pleasure comes from the actual reward r, not from the amount of dopamine δ.

In this post commenting on Daswani and Leike (2015), I said:

I personally don't think the definition of "happiness" that Daswani and Leike advance is the most morally relevant one, but the authors make an interesting case for their definition. I think their definition corresponds most closely with "being pleased of one's current state in a high-level sense". In contrast, I think raw pleasure/pain is most morally significant. As a simple test, ask whether you'd rather be in a state where you've been unexpectedly notified that you'll get a cookie in a few minutes or whether you'd rather be in the state where you actually eat the cookie after having been notified a few minutes earlier. Daswani and Leike's definition considers being notified about the cookie to be happiness, while I think eating the cookie has more moral relevance.


Dayan mentioned that liking may even be an epiphenomenon of some things that are going on in the brain when we eat food/have sex etc, similar to how the specific flavour of pleasure we get from listening to music is such an epiphenomenon.

I'm not sure I understand, but I wrote a quick thing here inspired by this comment. Do you think that's what he meant? If so, may I attribute him/you for the idea? It seems fairly plausible. :) Studying what separates red from blue might help shine light on this topic.

In response to S-risk FAQ
Comment author: gworley3  (EA Profile) 18 September 2017 07:30:39PM 4 points [-]

One thing I find meta-interesting about s-risk is that s-risk is included in the sort of thing we were pointing at in the late 90s before we started talking about x-risk, and so to my mind s-risk has always been part of the x-risk mitigation program but, as you make clear, that's not how it's been communicated.

I wonder if there are types of risks for the long-term future we implicitly would like to avoid but have accidentally explicitly excluded from both x-risk and s-risk definitions.

In response to comment by gworley3  (EA Profile) on S-risk FAQ
Comment author: Brian_Tomasik 19 September 2017 12:11:11AM 3 points [-]

the sort of thing we were pointing at in the late 90s before we started talking about x-risk

I'd be interested to hear more about that if you want to take the time.

Comment author: SoerenMind  (EA Profile) 27 August 2017 11:54:13AM 1 point [-]

I feel like there's a difference between (a) an agent inside the room who hasn't yet pressed the lever to get out and (b) the agent not existing at all.

Yes that's probably the right way to think about it. I'm also considering an alternative though: Since we're describing the situation with a simple computational model we shouldn't assume that there's anything going on that isn't captured by the model. E.g. if the agent in the room is depressed, it will be performing 'mental actions' - imagining depressing scenarios etc. But we may have to assume that away, similar to how high school physics would assume no friction etc.

So we're left with an agent that decides initially that it won't do anything at all (not even updating its beliefs) because it doesn't want to be outside of the room and then remains inactive. The question arises if that's an agent at all and if it's meaningfully different unconsciousness.

Comment author: Brian_Tomasik 27 August 2017 11:42:29PM *  0 points [-]

So we're left with an agent that decides initially that it won't do anything at all (not even updating its beliefs) because it doesn't want to be outside of the room and then remains inactive. The question arises if that's an agent at all and if it's meaningfully different unconsciousness.

Hm. :) Well, what if the agent did do stuff inside the room but still decided not to go out? We still wouldn't be able to tell if it was experiencing net positive, negative, or neutral welfare. Examples:

  1. It's winter. The agent is cold indoors and is trying to move to the warm parts of the room. We assume its welfare is net negative. But it doesn't go outside because it's even colder outside.

  2. The agent is indoors having a party. We assume it's experiencing net positive welfare. It doesn't want to go outside because the party is inside.

We can reproduce the behavior of these agents with reward/punishment values that are all positive numbers, all negative numbers, or a combination of the two. So if we omit the higher-level thoughts of the agents and just focus on the reward numbers at an abstract level, it doesn't seem like we can meaningfully distinguish positive or negative welfare. Hence, the sign of welfare must come from the richer context that our human-centered knowledge and evaluations bring?

Of course, qualia nonrealists already knew that the sign and magnitude of an organism's welfare are things we make up. But most people can agree upon, e.g., the sign of the welfare of the person at the party. In contrast, there doesn't seem to be a principled way that most people would agree upon for us to attribute a sign of welfare to a simple RL agent that reproduces the high-level behavior of the person at the party.

Comment author: SoerenMind  (EA Profile) 26 August 2017 10:11:51AM *  1 point [-]

Thanks for the reply. I think I can clarify the issue about discrete time intervals. I'd be curious on your thoughts on the last sentence of my comment above if you have any.

Discrete time

So it seems like a time step is defined as the interval between one action and the next?

Yes. But in a SEMI or a Markov Decision Process (SMDP) this is not the case. SMDPs allow temporally extended actions and are commonly used in RL research. Dayan's papers use a continuous SMDP. You can still have RL agents in this formalism and it tracks our situation more closely. But I don't think the formalism matters for our discussion because you can arbitrarily approximate any formalism with a standard MDP - I'll explain below.

The continuous-time experiment looks roughly like this: Imagine you're in a room and you have to press a lever to get out - and get back to what you would normally be doing and get an average reward rho per second. However, the lever is hard to press. You can press it hard and fast or light and slowly, taking a total time T to complete the press. The total energy cost of pressing is 1/T so ideally you'd press very slowly but that would mean you couldn't be outside the room during that time (opportunity costs).

In this setting, the 'action' is just the time T that you to press the lever. We can easily approximate this with a standard MDP. E.g. you could take action 1 which completely presses the lever in one time step, costing you 1/1=1 reward in energy. Or you could take action 2, which you would have to take twice to complete the press, costing you only 1/2 reward (so 1/4 for each time you take action 2). And so forth. Does that make sense?

Zero point

Of course, if you don't like it outside the room at all, you'll never press the lever - so there is a 'zero point' in terms of how much you like it outside. Below that point you'll never press the lever.

It seems like vigor just says that what you're doing is better than not doing it?

I'm not entirely sure what you mean, but I'll clarify that acting vigorously doesn't say anything about whether the agent is currently happy. It may well act vigorously just to escape punishment. Similarly, an agent that currently works to increase its life-time doesn't necessarily feel good, but its work still implies that it thinks the additional life-time it gets will be good.

But I think your criticism may be the same as what I said in the edit above - that there is an unwarranted assumption that the agent is at the zero-point before it presses the lever. In the experiments this is assumed because there are no food rewards or shocks during that time. But you could still imaging that a depressed rat would feel bad anyway.

The theory that assumes nonexistence is the zero-point kind of does the same thing though. Although nonexistence is arguably a definite zero-point, the agent's utility function might still go beyond its life-time...

Does this clarify the case?

Comment author: Brian_Tomasik 26 August 2017 11:24:02PM 1 point [-]

Your explanation was clear. :)

acting vigorously doesn't say anything about whether the agent is currently happy

Yeah, I guess I meant the trivial observation that you act vigorously if you judge that doing so has higher expected total discounted reward than not doing so. But this doesn't speak to whether, after making that vigorous effort, your experiences will be net positive; they might just be less negative.

Of course, if you don't like it outside the room at all, you'll never press the lever - so there is a 'zero point' in terms of how much you like it outside.

...assuming that sticking around inside the room is neutral. This gets back to the "unwarranted assumption that the agent is at the zero-point before it presses the lever."

The theory that assumes nonexistence is the zero-point kind of does the same thing though.

Hm. :) I feel like there's a difference between (a) an agent inside the room who hasn't yet pressed the lever to get out and (b) the agent not existing at all. For (a), it seems we ought to be able to give a (qualia and morally nonrealist) answer about whether its experiences are positive or negative or neutral, while for (b), such a question seems misplaced.

If it were a human in the room, we could ask that person whether her experiences before lever pressing were net positive or negative. I guess such answers could vary a lot between people based on various cultural, psychological, etc. factors unrelated to the activity level of reward networks. If so, perhaps one position could be that the distinction between positive vs. negative welfare is a pretty anthropomorphic concept that doesn't travel well outside of a cognitive system capable of making these kinds of judgments. Intuitively, I feel like there is more to the sign of one's welfare than these high-level, potentially idiosyncratic evaluations, but it's hard to say what.

I suppose another approach could be to say that the person in the room definitely is at welfare 0 (by fiat) based on lack of reward or punishment signals, regardless of how the person evaluates her welfare verbally.

Comment author: SoerenMind  (EA Profile) 21 August 2017 11:48:23AM *  1 point [-]

I've had a look into Dayan's suggested papers - they imply an interesting theory. I'll put my thoughts here so the discussion can be public. The theory contradicts the one you link above where the separation between pain and pleasure is a contingency of how our brain works.

You've written about another (very intuitive) theory, where the zero-point is where you'd be indifferent between prolonging and ending your life:

"This explanation may sound plausible due to its analogy to familiar concepts, but it seems to place undue weight on whether an agent’s lifetime is fixed or variable. Yet I would still feel pain and pleasure as being distinct even if I knew exactly when I would die, and a simple RL agent has no concept of death to begin with."

Dayan's research suggests that the zero-point will also come up in many circumstances relating to opportunity costs which would deal with that objection. To simplify, let's say the agent expects a fixed average rate of return rho for the foreseeable future. It is faced with a problem where it can either act fast (high energy expenditure) or act slowly (high opportunity costs as it won't get the average return for a while). If rho is negative or zero, there is no need to act quickly at all because there are not opportunity costs. But the higher the opportunity costs get, the fast the agent will want to be at getting its average reward back so it will act quickly despite the immediate cost.

The speed with which the agent acts is called vigour in Dayan's research. The agent's vigour mathematically implies an average rate of return if the agent is rational. There can be other reasons for low vigour such as a task that requires patience - they have some experiments here in figure 1. In their experiment the optimal vigour (one over tao*) is proportional to the square root of the average return. A recent paper has confirmed the predictions of this model in humans.

So when is an agent happy according to this model?

The model would imply that the agent has positive welfare positive welfare when the agent treats it as creating positive opportunity costs while it's doing other things (and vice versa for negative welfare). This would also apply to your example where the agent expends resources to increase or decrease its life-time.

What I like about this is that the welfare depends on the agent's behaviour and not the way the rewards are internally processed and represented as numbers which is arbitrary.

I'm still not sure how you would go about calculating the welfare of an agent if you don't have a nice experimental setup like Dayan's. That might be amenable to more thinking. Moreover, all welfare is still relative and it doesn't allow comparisons between agents.

Edit: I'm not sure though if there's a problem because we now have to assume that the 'inactive' time where the agent doesn't get its average reward is the zero-baseline which is also arbitrary.

Comment author: Brian_Tomasik 26 August 2017 04:23:02AM *  1 point [-]

Thanks!! Interesting. I haven't read the linked papers, so let me know if I don't understand properly (as I probably don't).

I've always thought of simple RL agents as getting a reward at fixed time intervals no matter what they do, in which case they can't act faster or slower. For example, if they skip pressing a lever, they just get a reward of 0 for that time step. Likewise, in an actual animal, the animal's reward neurons don't fire during the time when the lever isn't being pressed, which is equivalent to a reward of 0.

Of course, animals would prefer to press the lever more often to get a positive reward rather than a reward of 0, but this would be true whether the lever gave positive reward or merely relief from punishment. For example, maybe the time between lever presses is painful, and the pressed lever is merely less painful. This could be the experience of, e.g., a person after a breakup consuming ice cream scoops at a higher rate than normal to escape her pain: even with the increased rate of ice cream intake, she may still have negative welfare, just less negative. It seems like vigor just says that what you're doing is better than not doing it?

For really simple RL agents like those living in Grid World, there is no external clock. Time is sort of defined by when the agent takes its next step. So it's again not clear if a "rate of actions" explanation can help here (but if it helps for more realistic RL agents, that's cool!).

This answer says that for a Markov Decision Process, "each action taken is done in a time step." So it seems like a time step is defined as the interval between one action and the next?

Comment author: RandomEA 05 August 2017 08:34:38PM 1 point [-]

One possible benefit of blood, kidney, and bone marrow donations is that they could demonstrate that EAs actually do care about other people in their country (which could help with movement building), but such donations can only be associated with EA if they are in fact effective on the margin (which does not seem to be the case with blood donations).

Comment author: Brian_Tomasik 06 August 2017 09:56:53AM 0 points [-]

You could put blood donation into the "relaxation" or "fun social activity" category.

Comment author: MikeJohnson 01 August 2017 09:07:05PM *  0 points [-]

That's no reason to believe that analytic functionalism is wrong, only that it is not sufficient by itself to answer very many interesting questions.

I think that's being generous to analytic functionalism. As I suggested in Objection 2,

In short, FRI’s theory of consciousness isn’t actually a theory of consciousness at all, since it doesn’t do the thing we need a theory of consciousness to do: adjudicate disagreements in a principled way. Instead, it gives up any claim on the sorts of objective facts which could in principle adjudicate disagreements.

.

I only claim that most physical states/processes have only a very limited collection of computational states/processes that it can reasonably be interpreted as[.]

I'd like to hear more about this claim; I don't think it's ridiculous on its face (per Brian's and Michael_PJ's comments), but it seems a lot of people have banged their head against this without progress, and my prior is formalizing this is a lot harder than it looks (it may be unformalizable). If you could formalize it, that would have a lot of value for a lot of fields.

So although I used that critique of IIT as an example, I was mainly going off of intuitions I had prior to it. I can see why this kind of very general criticism from someone who hasn't read the details could be frustrating, but I don't expect I'll look into it enough to say anything much more specific.

I don't expect you to either. If you're open to a suggestion about how to approach this in the future, though, I'd offer that if you don't feel like reading something but still want to criticize it, instead of venting your intuitions (which could be valuable, but don't seem calibrated to the actual approach I'm taking), you should press for concrete predictions.

The following phrases seem highly anti-scientific to me:

sounds wildly implausible | These sorts of theories never end up getting empirical support, although their proponents often claim to have empirical support | I won't be at all surprised if you claim to have found substantial empirical support for your theory, and I still won't take your theory at all seriously if you do, because any evidence you cite will inevitably be highly dubious | The heuristic that claims that a qualia-related concept is some simple other thing are wrong, and that claims of empirical support for such claims never hold up | I am almost certain that there are trivial counterexamples to the Symmetry Theory of Valence

I.e., these statements seem to lack epistemological rigor, and seem to absolutely prevent you from updating in response to any evidence I might offer, even in principle (i.e., they're actively hostile to your improving your beliefs, regardless of whether I am or am not correct).

I don't think your intention is to be closed-minded on this topic, and I'm not saying I'm certain STV is correct. Instead, I'm saying you seem to be overreacting to some stereotype you initially pattern-matched me as, and I'd suggest talking about predictions is probably a much healthier way to move forward if you want to spend more time on this. (Thanks!)

Comment author: Brian_Tomasik 02 August 2017 09:11:56AM 1 point [-]

I only claim that most physical states/processes have only a very limited collection of computational states/processes that it can reasonably be interpreted as[.]

I haven't read most of this paper, but it seems to argue that.

Comment author: Brian_Tomasik 02 August 2017 08:34:46AM *  8 points [-]

I'd be interested in literature on this topic as well, because it seems to bedevil all far-future-aware EA work.

Some articles:

Comment author: AlexMennen 31 July 2017 01:29:48PM 1 point [-]

That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.

If you expanded on this, I would be interested.

Comment author: Brian_Tomasik 01 August 2017 09:31:46AM 0 points [-]

I didn't have in mind anything profound. :) The idea is just that "degree of information integration" is one interesting metric along which to compare minds, along with metrics like "number of neurons", "number of synapses", "number of ATP molecules consumed per second", "number of different brain structures", "number of different high-level behaviors exhibited", and a thousand other similar things.

Comment author: AlexMennen 30 July 2017 10:17:36PM *  6 points [-]

Speaking of the metaphysical correctness of claims about qualia sounds confused, and I think precise definitions of qualia-related terms should be judged by how useful they are for generalizing our preferences about central cases. I expect that any precise definition for qualia-related terms that anyone puts forward before making quite a lot of philosophical progress is going to be very wrong when judged by usefulness for describing preferences, and that the vagueness of the analytic functionalism used by FRI is necessary to avoid going far astray.

Regarding the objection that shaking a bag of popcorn can be interpreted as carrying out an arbitrary computation, I'm not convinced that this is actually true, and I suspect it isn't. It seems to me that the interpretation would have to be doing essentially all of the computation itself, and it should be possible to make precise the sense in which brains and computers simulating brains carry out a certain computation that waterfalls and bags of popcorn don't. The defense of this objection that you quote from McCabe is weak; the uncontroversial fact that many slightly different physical systems can carry out the same computation does not establish that an arbitrary physical system can be reasonably interpreted as carrying out an arbitrary computation.

I think the edge cases that you quote Scott Aaronson bringing up are good ones to think about, and I do have a large amount of moral uncertainty about them. But I don't see these as problems specific to analytic functionalism. These are hard problems, and the fact that some more precise theory about qualia may be able to easily answer them is not a point in favor of that theory, since wrong answers are not helpful.

The Symmetry Theory of Valence sounds wildly implausible. There are tons of claims that people put forward, often contradicting other such claims, that some qualia-related concept is actually some other simple thing. For instance, I've heard claims that goodness is complexity and that what humans value is increasing complexity. Complexity and symmetry aren't quite opposites, but they're certainly anti-correlated, and both theories can't be right. These sorts of theories never end up getting empirical support, although their proponents often claim to have empirical support. For example, proponents of Integrated Information Theory often cite that the cerebrum has a higher Phi value than the cerebellum does as support for the hypothesis that Phi is a good measure of the amount of consciousness a system has, as if comparing two data points was enough to support such a claim, and it turns out that large regular rectangular grids of transistors, and the operation of multiplication by a large Vandermonde matrix, both have arbitrarily high Phi values, and yet the claim that Phi measures consciousness still survives and claims empirical support, despite this damning disconfirmation. And I think the “goodness is complexity” people also provided examples of good things that they thought they had established are complex and bad things that they thought they had established are not. I know this sounds totally unfair, but I won't be at all surprised if you claim to have found substantial empirical support for your theory, and I still won't take your theory at all seriously if you do, because any evidence you cite will inevitably be highly dubious. The heuristic that claims that a qualia-related concept is some simple other thing are wrong, and that claims of empirical support for such claims never hold up, seems to be pretty well supported. I am almost certain that there are trivial counterexamples to the Symmetry Theory of Valence, even though perhaps you may have developed a theory sophisticated enough to avoid the really obvious failure modes like claiming that a square experiences more pleasure and less suffering than a rectangle because its symmetry group is twice as large.

Comment author: Brian_Tomasik 31 July 2017 05:13:53AM 0 points [-]

To steelman the popcorn objection, one could say that separating "normal" computations from popcorn shaking requires at least certain sorts of conditions on what counts as a valid interpretation, and such conditions increase the arbitrariness of the theory. Of course, if we adopt a complexity-of-value approach to moral value (as I and probably you think we should), then those conditions on what counts as a computation may be minimal compared with the other forms of arbitrariness we bring to bear.

I haven't read Principia Qualia and so can't comment competently, but I agree that symmetry seems like not the kind of thing I'm looking for when assessing the moral importance of a physical system, or at least it's not more than one small part of what I'm looking for. Most of what I care about is at the level of ordinary cognitive science, such as mental representations, behaviors, learning, preferences, introspective abilities, etc.

That said, I do think theories like IIT are at least slightly useful insofar as they expand our vocabulary and provide additional metrics that we might care a little bit about.

View more: Next