24

Integrity for consequentialists

(Cross-posted from the sideways view.)

For most people I don't think it's important to have a really precise definition of integrity. But if you really want to go all-in on consequentialism then I think it's useful. Otherwise you risk being stuck with a flavor of consequentialism that is either short-sighted or terminally timid.

I.

I aspire to make decisions in a pretty simple way. I think about the consequences of each possible action and decide how much I like them; then I select the action whose consequences I like best.

To make decisions with integrity, I make one change: when I imagine picking an action, I pretend that picking it causes everyone to know that I am the kind of person who picks that option.

If I'm considering breaking a promise to you, and I am tallying up the costs and benefits, I consider the additional cost of you having known that I would break the promise under these conditions. If I made a promise to you, it's usually because I wanted you to believe that I would keep it. So you knowing that I wouldn't keep the promise is usually a cost, often a very large one.

payshisdebts
Optimal summary of this post.

If I'm considering sharing a secret you told me, and I am tallying up the costs and benefits, I consider the additional cost of you having known that I would share this secret. In many cases, that would mean that you wouldn't have shared it with me---a cost which is usually larger than whatever benefit I might gain from sharing it now.

If I'm considering having a friend's back, or deciding whether to be mean, or thinking about what exactly counts as "betrayal," I'm doing the same calculus. (In practice there are many cases where I am pathologically unable to be really mean. One motivation for being really precise about integrity is recovering the ability to engage in normal levels of being a jerk when it's actually a good idea.)

This is a weird kind of effect, since it goes backwards in time and it may contradict what I've actually seen. If I know that you decided to share the secret with me, what does it mean to imagine my decision causing you not to have shared it?

It just means that I imagine the counterfactual where you didn't share the secret, and I think about just how bad that would have been---making the decision as if I did not yet know whether you would share it or not.

I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on.

II.

This can get pretty confusing, and at the end of the day this simple statement is just an approximation. I could run through a lot of confusing examples and maybe sometime I should, but this post isn't the place for that.

I'm not going to use some complicated reasoning to explain why screwing you over is consistent with integrity, I am just going to be straightforward. I think "being straightforward" is basically what you get if you do the complicated reasoning right. You can believe that or not, but one consequence of integrity is that I'm not going to try to mislead you about it. Another consequence is that when I'm dealing with you, I'm going to interpret integrity like I want you to think that I interpret it.

Integrity doesn't mean merely keeping my word. To the extent I want to interact with you I will be the kind of person you will be predictably glad to have interacted with. To that end, I am happy to do nice things that have no direct good consequences for me. I am hesitant to be vengeful; but if I think you've wronged me because you thought it would have no bad consequences for you, I am willing to do malicious things that have no direct good consequences for me.

On the flip side, integrity does not mean that I always keep my word. If you ask me a question that I don't want to answer, and me saying "I don't think I should answer that" would itself reveal information that I don't want to reveal, then I will probably lie. If I say I will do something then I will try to do it, but it just gets tallied up like any other cost or benefit, it's not a hard-and-fast rule. None of these cases are going to feel like gotchas; they are easy to predict given my definition of integrity, and I think they are in line with common-sense intuitions about being basically good.

Some examples where things get more complicated: if we were trying to think of the same number between 1 and 20, I wouldn't assume that we are going to win because by choosing 17 I cause you to know that I'm the kind of person who picks 17. And if you invade Latvia I'm not going to bomb Moscow, assuming that by being arbitrarily vindictive I guarantee your non-aggression. If you want to figure out what I'd do in these cases, think UDT + the arguments in the rest of this post + a reasonable account of logical uncertainty. Or just ask. Sometimes the answer in fact depends on open philosophical questions. But while I find that integrity comes up surprisingly often, really hard decision-theoretic cases come up about as rarely as you'd expect.

A convenient thing about this form of integrity is that it basically means behaving in the way that I'd want to claim to behave in this blog post. If you ask me "doesn't this imply that you would do X, which you only refrained from writing down because it would reflect poorly on you?" then you've answered your own question.

III.

Why would I do this? At face value it may look a bit weird. People's expectations about me aren't shaped by a magical retrocausal influence from my future decision. Instead they are shaped by a messy basket of factors:

  • Their past experiences with me.
  • Their past experiences with other similar people.
  • My reputation.
  • Abstract reasoning about what I might do.
  • Attempts to "read" my character and intentions from body language, things I say, and other intuitive cues.
  • (And so on.)

In some sense, the total "influence" of these factors must add up to 100%.

I think that basically all of these factors give reasons to behave with integrity:

  • My decision is going to have a causal influence on what you think of me.
  • My decision is going to have a causal influence on what you think of other similar people. I want to be nice to those people. But also my decision is correlated with their decisions (moreso the more they are like me) and I want them to be nice to me.
  • My decision is going to have a direct effect on my reputation.
  • My decision has logical consequences on your reasoning about my decision. After all, I am running a certain kind of algorithm and you have some ability to imperfectly simulate that algorithm.
  • To the extent that your attempts to infer my character or intention are unbiased, being the kind of person who will decide in a particular way will actually cause you to believe I am that kind of person.
  • (And so on.)

The strength of each of those considerations depends on how significant each factor was in determining their views about me, and that will vary wildly from person to person and case to case. But if the total influence of all of these factors is really 100%, then just replacing them all with a magical retrocausal influence is going to result in basically the same decision.

Some of these considerations are only relevant because I make decisions using UDT rather than causal decision theory. I think this is the right way to make decisions (or at least the way that you should decide to make decisions), but your views my vary. At any rate, it's the way that I make decisions, which is all that I'm describing here.

IV.

What about a really extreme case, where definitely no one will ever learn what I did, and where they don't know anything about me, and where they've never interacted with me or anyone similar to me before? In that case, should I go back to being a consequentialist jerk?

There is a temptation to reject this kind of crazy thought experiment---there are never literally zero causal effects. But like most thought experiments, it is intended to explore an extreme point in the space of possibilities:twopoints

Of course we don't usually encounter these extreme cases; most of our decisions sit somewhere in between. The extreme cases are mostly interesting to the extent that realistic situations are in between them and we can usefully interpolate.

For example, you might think that the picture looks something like this:line

On this perspective, if I would be a jerk when definitely for sure no one will know then presumably I am at least a little bit of a jerk when it sure seems like no one will know.

But actually I don't think the graph looks like this.

Suppose that Alice and Bob interact, and Alice has either a 50% or 5% chance of detecting Bob's jerk-like behavior. In either case, if she detects bad behavior she is going to make an update about Bob's characteristics. But there are several reasons to expect the 5% chance will have a 10x larger update if it actually happens:

  • If Alice is attempting to impose incentives to elicit pro-social behavior from Bob, then the size of the disincentive needs to be 10x larger. This effect is tempered somewhat if imposing twice as large a cost is more than twice as costly for Alice, but still we expect a significant compensating factor.
  • For whatever reference class Alice is averaging over (her experiences with Bob, her experiences with people like Bob, other people's experiences with Bob...) Alice has 1/10th as much data about cases with a 5% chance of discovery, and so (once the total number of data points in the class is reasonably large) each data point has nearly 10x as much influence.
  • In general, I think that people are especially suspicious of people cheating when they probably won't get caught (and consider it more serious evidence about "real" character), in a way that helps compensate for whatever gaps exist in the last two points.

In reality, I think the graph is closer to this:curve

Our original thought experiment is an extremely special case, and the behavior changes rapidly as soon as we move a little bit away from it.

At any rate, these considerations significantly limit the applicability of intuitions from pathological scenarios, and tend to push optimal behavior closer to behaving with integrity.

This effect is especially pronounced when there are many possible channels through which my behavior can effect others' judgments, since then a crazy extreme case must be extreme with respect to every one of these indicators: my behavior must be unobservable, the relevant people must have no ability to infer my behavior from tells in advance, they must know nothing about the algorithm I am running, and so on.

V.

Integrity has one more large advantage: it is often very efficient. Being able to make commitments is useful, as a precondition for most kinds of positive-sum trade. Being able to realize positive-sum trades, without needing to make explicit commitments, is even more useful. (On the revenge side things are a bit more complicated, and I'm only really keen to be vengeful when the behavior was socially inefficient in addition to being bad for my values.)

I'm generally keen to find efficient ways to do good for those around me. For one, I care about the people around me. For two, I feel pretty optimistic that if I create value, some of it will flow back to me. For three, I want to be the kind of person who is good to be around.

So if the optimal level of integrity from a social perspective is 100%, but from my personal perspective would be something close to 100%, I am more than happy to just go with 100%. I think this is probably one of the most cost-effective ways I can sacrifice a (tiny) bit of value in order to help those around me.

On top of that:

  • Integrity is most effective when it is straightforward rather than conditional.
  • "Behave with integrity" is a whole lot simpler (computationally and psychologically) than executing a complicated calculation to decide exactly when you can skimp.
  • Humans have a bunch of emotional responses that seem designed to implement integrity---e.g. vengefulness or a desire to behave honorably---and I find that behaving with integrity also ticks those boxes.

After putting all of this together, I feel like the calculus is pretty straightforward. So I usually don't think about it, and just (aspire to) make decisions with integrity.

VI.

Many consequentialists claim to adopt firm rules like "my word is inviolable" and then justify those rules on consequentialist grounds. But I think on the one hand that approach is too demanding---the people I know who take promises most seriously basically never make them---and on the other it does not go far enough---someone bound by the literal content of their word is only a marginally more useful ally than someone with no scruples at all.

Personally, I get a lot of benefit from having clear definitions; I feel like the operationalization of integrity in this post has worked pretty well, and much better than the deontological constraints it replaced.  That said, I'm always interested in adopting something better, and would love to hear pushback or arguments for alternative norms.

Comments (12)

Comment author: purplepeople 17 December 2016 06:59:32PM 2 points [-]

This is an excellent post. I've been struggling myself to understand to what extend deontological values and the inherent irrationality of humans need to be factored into consequentialist decision making. I've become more and more convinced that values and social norms matter much more than I had previously thought.

Comment author: Maxdalton 15 November 2016 12:57:25PM -2 points [-]

This seems to be an interesting approach to this question. However, for a top level post in this forum, I would like to see more of an attempt to link this directly to effective altruism, which, as many have noted, is not simply consequentialism. There is no mention of 'effective altruism', 'charity', 'career', 'poverty', 'animal' or 'existential risk' (of course effective altruism is broader than these things, but I think this is indicative).

(Writing in a personal capacity)

Comment author: casebash 15 November 2016 04:23:48PM 5 points [-]

Effective altruism is strongly linked with consequentialism, so much so, that I don't think a more explicit link is required.

Comment author: Ben_Todd 04 December 2016 01:07:43AM 1 point [-]

I found Paul's post useful, but I think it would have been good to point out that EA is not a type of consequentialism, since that's a misconception I think we should try to stamp out.

Comment author: TruePath 02 December 2016 04:19:21PM 0 points [-]

I think this post is confused on a number of levels.

First, as far as ideal behavior is concerned integrity isn't a relevant concept. The ideal utilitarian agent will simply always behave in the manner that optimizes expected future utility factoring in the effect that breaking one's word or other actions will have on the perceptions (and thus future actions) of other people.

Now the post rightly notes that as a limited human agent we aren't truly able to engage in this kind of analysis. Both because of our computational limitations and our inability to perfectly deceive it is beneficial to adopt heuristics about not lying, stabbing people in the back etc.. (which we may judge to be worth abandoning in exceptional situations).

However, the post gives us no reason to believe it's particular interpretation of integrity "being straightforward" is the best such heuristic. It merely asserts the author's belief that this somehow works out to be the best.

This brings us to the second major point, even though the post acknowledges the very reason for considering integrity is that, "I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on." the post proceeds to act as if it was considering what kind of integrity like notion would be appropriate to design into (or socially construct) in some alternative society of purely rational agents.

Obviously, the way we should act depends hugely on the way in which others will interpret our actions and respond to them. In the actual world WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST. It doesn't matter if some other alternate notion of integrity might have been better to have if we don't show integrity in the traditional manner we will be punished.

In particular, "being straightforward" will often needlessly imperil people's estimation of our integrity. For example, consider the usual kinds of assurances we give to friends and family that we "will be there for them no matter what" and that "we wouldn't ever abandon them." In truth pretty much everyone, if presented with sufficient data showing their friend or family member to be a horrific serial killer with every intention of continuing to torture and kill people, would turn them in even in the face of protestations of innocence. Does that mean that instead of saying "I'll be there for you whatever happens" we should say "I'll be there for you as long as the balance of probability doesn't suggest that supporting you will cost more than 5 QALYs" (quality adjusted life years)?

No, because being straightforward in that sense causes most people to judge us as weird and abnormal and thereby trust us less. Even though everyone understands at some level that these kind of assurances are only true ceterus parabus actually being straightforward about that fact is unusual enough that it causes other people to suspect that they don't understand our emotions/motivations and thus give us less trust.


In short: yes, the obvious point that we should adopt some kind of heuristic of keeping our word and otherwise modeling integrity is true. However, the suggestion that this nice simple heuristic is somehow the best one is completely unjustified.

Comment author: Paul_Christiano 03 December 2016 03:08:40AM *  4 points [-]

I apologize in advance if I'm a bit snarky.

The ideal utilitarian agent will simply always behave in the manner that optimizes expected future utility factoring in the effect that breaking one's word or other actions will have on the perceptions (and thus future actions) of other people

This view is not broadly accepted amongst the EA community. At the very least, this view is self-defeating in the following sense: such an "ideal utilitarian" should not try to convince other people to be an ideal utilitarian, and should attempt to become a non-ideal utilitarian ASAP (see e.g. Parfit's hitchhiker for the standard counterexample, though obviously there are more realistic cases).

However, the post gives us no reason to believe it's particular interpretation of integrity "being straightforward" is the best such heuristic. It merely asserts the author's belief that this somehow works out to be the best.

I argued for my conclusion. You may not buy the arguments, and indeed they aren't totally tight, but calling it "mere assertion" seems silly.

the very reason for considering integrity is that, "I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on."

This is neither true, nor what I said.

WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST

This is what it looks like when something is asserted without argument.

I do agree roughly with this sentiment, but only if it is interpreted sufficiently broadly that it is consistent with my post.

Does that mean that instead of saying "I'll be there for you whatever happens" we should say "I'll be there for you as long as the balance of probability doesn't suggest that supporting you will cost more than 5 QALYs" (quality adjusted life years)?

I tried to spell out pretty explicitly what I recommend in the post, right at the beginning ("when I imagine picking an action, I pretend that picking it causes everyone to know that I am the kind of person who picks that option"), and it clearly doesn't recommend anything like this.

You seem to use "being straightforward" in a different way than I do. Saying "I'll be there for you whatever happens" is straightforward if you actually mean the thing that people will understand you as meaning.

Comment author: Robert_Wiblin 20 January 2017 08:54:42PM 0 points [-]

"WE WILL BE TRUSTED TO THE EXTENT WE RESPECT THE STANDARD SOCIETAL NOTIONS OF INTEGRITY AND TRUST"

I think there is a lot to this, but I feel it can be subsumed into Paul's rule of thumb:

  • You should follow a standard societal notion of what is decent behaviour (unless you say ahead of time that you won't in this case) if you want people to have always thought that you are the kind of person who does that.

Because following standard social rules that everyone assumes to exist is an important part of being able to coordinate with others without very high communication and agreement overheads, you want to at least meet that standard (including following some norms you might have reservations about). Of course this doesn't preclude you meeting a higher standard if having a reputation for going above and beyond would be useful to you (as Paul argues it often is for most of us).

Comment author: Pilif  (EA Profile) 25 November 2016 05:56:28PM 0 points [-]

Can you show an example of how this set of rules helps you to "rexover the ability to engage in normal levels of being a jerk when it's actually a good idea"?

Comment author: Paul_Christiano 03 December 2016 04:43:17AM 1 point [-]

Suppose I am considering saying something mean about someone in a context where they won't hear me, and I would be unwilling to say the same thing to their face. I have a hard time with this in general. But there are cases where it is OK according to this heuristic (when they'd be fine knowing that I would say that kind of thing about them under those conditions), and I think those are the cases that I endorse-on-reflection.