15

Is it better to be a wild rat or a factory farmed cow? A systematic method for comparing animal welfare.

TLDR: We looked at a lot of different systems to compare welfare and ended up combining a few common ones into a weighted animal welfare index (or welfare points for short). We think this system captures a broad range of ethical considerations and should be applicable across a wide range of both farm and wild animals in a way that allows us to compare interventions.  

The goal of Charity Entrepreneurship is to compare different charitable interventions and actions so that new, strong charities can be founded. One of the necessary steps in such a process is having a way to compare different animals in different conditions. For example, how does moving a chicken from a battery cage to cage free compare welfare wise for the chicken, or how does giving up red meat, thus resulting in one less cow being brought into existence, compare to an insect dying more humanely because of a change in which insecticide is used. These are complex questions surrounded by both ethical and epistemic uncertainty. In the health community, DALYs have become fairly common and established as a metric. Sadly, there is not the same level of consensus within the animal rights community. We expected there would be multiple competing systems, so we first outlined what we would look for within a system to assess its helpfulness to us. This could be described as the “goal” or purpose of the metric. Of course, the fundamental goal is to help us evaluate different possible actions, but more specifically, we broke down what we were looking for in the criteria below.  

Underlying goals of metrics

  • Proxies’ ethical value accuracy
    • Strength of correlation between the metric and ethical value
    • Encapsulation - captures a broad range of what is important
    • Directness
    • Gamability
  • Cross-applicability
    • Cross-intervention applicability 
    • Cross-animal applicability 
    • Ethical robustness
    • Externally understandable
    • External precedent of use
  • Operationalizability
    • Amenable to numerical quantification 
    • Ease/speed of use
    • Objectiveness 
    • Generates few false positives or false negatives
    • Intuitive to work with
    • Easy to collect
    • Easy to explain

After establishing what we were looking for, the next step was to take a look at all current systems and see if any of them was conducive or could be used partly by an organization like ours. We ended up finding quite a wide range.

EA community

We first looked within the EA community, since there had been some solid attempts at quantification and the ones below are just a few of many examples.

Within the EA community

These metrics were generally very hard, quantified, and often even explicitly cost-effectiveness focused. Sadly, they were also extremely specific and not built for generalization across different interventions and charities. Thus, for our purposes, they were more helpful as inspiration for the factors to consider, or standards that we would want to be able to measure, rather than for practical cross-intervention use.

Biology-based markers

The next set of metrics we looked at was biology-based markers. We had some background knowledge about cortisol readings as a measure of stress and hoped that we would find other objective markers that could make up part of a more inclusive system and add some objectivity to other soft systems. Some of the ones we considered (although, there are many other possible biological indicators) are listed below.  

Biology-based markers

  • Cortisol
  • Dopamine
  • Endocrine changes
  • Circulating catecholamines and corticosteroids
  • Death rate
  • Behavior changes
  • Visible injury rate 
  • Reduced life expectancy
  • Impaired growth
  • Impaired reproduction
  • Body damage
  • Disease 
  • Immunosuppression
  • Adrenal activity
  • Behavior anomalies
  • Self-narcotization

Biological markers were useful in that they were much less subjective than other metrics but sadly, it was also very hard to find consistent data across animals on many of them (with the death rate being a notable exception). We ended up thinking these would make up a part of a larger system, but even an index of them would not be inclusive enough to cover all the possible sources of animal welfare situations that could occur. 

Academic measures of quality

The third type of system we considered was “academic measures of quality of life”. WAS research had a great summary of many of the different systems used, but we also looked outside of their research for other possible systems.

Academic measures 

  • Five freedoms
  • The Five Domains model
  • Five Provisions model
  • Botreau’s twelve criteria
  • McMillan’s five elements, which play a fundamental role in quality of life
  • Fraser’s animal welfare’s four core values
  • Webster’s animal welfare’s three questions
  • Taylor and Mills’s domains for assessing companion animal’s quality of life:
  • Swaisgood’s ten motivational theories which have currency among animal-welfare researchers

Many of these systems were beautifully comprehensive and described metrics and criteria in such a way that it would be cross-applicable to a wide range of animals across a wide range of conditions. Some even specified different grade levels (although, these were generally not numeric) to provide more consistency across reports. It seemed possible that some researchers would have already used these systems, though sadly, we did not find much research showcasing the modern practical use of these systems. The main drawback of these systems was their subjectivity. Even with the ones with specific grade levels, a lot would be left up to the evaluator about making calls between one system and another: for example, how does not being fed for several days, while being otherwise perfectly fed, compare to semi-chronic but low level hunger. Overall, we took a large number of elements of our system from the five domains model, which felt like the most extensively quantified and broad one of these models. 

Systems used in global poverty

Next, we considered the current systems used in global poverty alleviation and other cause assessment areas. We thought it might be possible to modify one of these metrics to be usefully applicable to animals. 

Modified poverty based metrics

  • Animal QALYs
  • Animal DALYs
  • Animal Income
  • Animal subjective well-being estimates
  • Equivalent lives saved
  • Preference from behind the veil of ignorance 

Generally, these metrics were too unapplicable (e.g. income) or would have required considerably more time to modify and put into the animal welfare context (e.g. DALYs do not have a way to have a net negative existence, which is a key consideration in the case of factory farmed animals).

Creating our own system

Finally, we considered creating a cross-applicable system from scratch

Our own ideas for possible systems

  • SAD - suffering-adjusted life-day
  • Sentience-adjusted suffering years
  • Net negative lives averted 
  • Total world net expected value 
  • Numerical criteria for animals’ quality of life, e.g. a -100 to 100 rating

We did end up using some of the ideas drawn from considering this option but, overall, found that taking elements from other systems would both increase quality and reduce the time that we would otherwise spend on creating a new system from scratch. 

Results: an inclusive index 

We ended up putting many of these systems onto a spreadsheet and comparing them on the original metric criteria we had derived. Some criteria ended up getting narrowed down. For example, we combined various biological markers into a single “biological markers” category. Some criteria were made more numerical and cross-comparable, for example, by translating the 5 domains model into number-based scores, instead of grades. Other elements were given their own category and weighting based on how well they met the top line criteria (for example, death rate). Most criteria were ruled out as redundant or not helpful for our purposes. 

We ended up with 8 criteria with an importance weighting for each. Combined, they added to a range of +100 (an ideal life) to -100 (a perfectly unideal life) with 0 representing uncertainty about the life being net positive or negative. Each area can have positive or negative welfare scores and is to be rated independently, giving a more robust cluster approach to the overall endline score. The weighting of each factor is different, depending on how well it scored on our original metric criteria. For example, death rate gets a relatively higher weighting (20 welfare points) than our index of other biological markers (4 welfare points) due to its ease to work with and its clearer relation to direct animal suffering (e.g. we are more confident that animals with very high and painful death rates will correlate more strongly with a life not worth living than the more abstract biological markers will).  

Factors we ended up using:

  • Death rate/reason - 20
  • Human preference from behind the veil of ignorance - 20
  • Disease/injury/functional impairment - 17
  • Thirst/hunger/malnutrition - 15
  • Anxiety/fear/pain/distress - 15
  • Environmental challenge - 5
  • Index of Biological markers - 4
  • Behavioral/interactive restriction - 4

Our full spreadsheet with factors, scores, and metric criteria scores gives a deeper sense of why different areas were given the weighting they were, as well as a narrative explanation of what a negative, middling, and positive score would look like in each category. 

Overall, we felt like this system gave us a good balance between both the more subjective metrics that could capture more data and the harder metrics that were more objective. We feel that this system could be used across a wide range of both animals and interventions, and lead to cross-comparable results.

Comments (13)

Comment author: saulius  (EA Profile) 18 September 2018 08:45:24AM *  11 points [-]

Thank you for tackling a very important problem. But currently I feel I’d be lost when trying to apply this model because there is more explanation needed for many factors. For example, how does the cortisol level weight against the dopamine level? And what levels are good? How to measure and weight various listed factors to assess anxiety? Etc.

Some examples of this model being applied would be very helpful for understanding the model. Is that the next step in your research?

Comment author: Joey 18 September 2018 05:22:23PM 6 points [-]

Yes indeed, that is the next step. We plan on applying this system to ~15 animal situations and doing a 1-5 hour report on each. This would be both for different animals (e.g. wild rat and factory farmed cows) and different welfare situations for the same animal (e.g. a report each for battery caged laying hens vs enriched cage laying hens)

On biological markers specifically, from the research we have done so far, it's very hard to find any consistent biological markers, not to mention situations where we have a bunch that we can cross compare on the same animal. Generally a good score might look like “some cortisol tests have been done on rats in an ideal living situation vs wild rats and the cortisol levels are about the same” where if the same study was done but the cortisol levels were much higher in the wild rats, that would be an indication of lower wild rat welfare.

Comment author: Jamie_Harris 28 September 2018 10:41:24PM 2 points [-]

I wanted to echo all of Saulius' points (including the thanks for doing this!).

To clarify your response here: all of the rankings are essentially subjective judgements, based on whatever evidence you have available in that category? So in the example above, if those cortisol tests were somehow your only evidence in the "index of biological markers" category, you would just decide a score that you felt represented the appropriate level of badness for the wild rat "index of biological markers" score?

I'm also wondering if you're going to use the method to compare humans to non-human animals? Some of the biological measures we could use fall down when we think about how humans fit in, e.g. neuron count. Including humans in comparative measures seems valuable for reflecting on/testing intuitions we might otherwise have about cross-species comparisons.

Comment author: Joey 01 October 2018 05:01:50PM 1 point [-]

Re:biological markers, the ideal situation would be multiple markers in both the animal in an ideal life vs their current life vs a perfectly unideal life, then scores would be given based on how their current life compares. In practice, sometimes we have found data on a happy life vs a standard life for an animal and can get some sense of how far away these are from each other, but often we have found no applicable data at all for this section. Our reports are very time capped (5 hours or less depending on the importance of the animal), so we do not dive deep into the mechanisms.

Humans from different situations will be ranked as well. I agree having them as a comparative measure for cross-species comparison allows for much easier intuition checks.

Comment author: saulius  (EA Profile) 18 September 2018 08:49:59AM 2 points [-]

Also, I think the link "WAS research had a great summary" does not link to where you intended.

Comment author: Joey 18 September 2018 05:22:09PM 1 point [-]

Thanks. Fixed.

Comment author: Naryan 18 September 2018 03:49:55PM 5 points [-]

Great to see this being looked at. Do you have any examples of this method in use? I'd be interested to see various animals and situations ranked using this method - as it could provide a baseline to quantify the benefits of various interventions.

I also attempted to create my own method of comparing animal suffering while I was calculating the value of going vegetarian. I'll provide a quick summary here, and would love to hear if anyone else has tried something similar.

The approach was to create an internally consistent model based upon my naive intuitions and what data I could find. I spent a while tuning the model so that various trade-offs would make sense and didn't lead to incoherent preferences. It is super rough, but was a first step in my self-examination of ethics.

  1. I created a scale of the value of [human/animal] experience from torture (-1000) to self-actualization (+5) with neutral at 0.
  2. I guessed where various animal experiences fell on the scale, averaged over a lifetime. This is a very weak part of the model - and where Joey's method could really come in handy.
  3. I then multiplied the experience by the lifespan of the animal (as a percentage of human life).
  4. Finally, I added a 'cognitive/subjectivity' multiplier based on the animal's intelligence. This is contentious, but helps so I don't value the long-lived cicada (insect) the same as a human. This follows from other ethical considerations in my model, but some people prefer to remove this step.

The output of this rough model was to value various animal lives as a percentage of human lives - a more salient/comparable measure for me.

This model was built over about 5 hours and is still updating as I have more conversations around animal suffering. Would love to hear if anyone else tried a different strategy!

Comment author: Joey 18 September 2018 05:22:38PM 4 points [-]

Examples coming soon. We are currently aiming to have ~15 done and published by 10/7/18. Our full goal of this project is to create a consistent systematic baseline to quantify the benefits of various interventions which would then allows us to compare specific charity ideas and rank what might be the best few to found within the animal movement.

http://everydayutilitarian.com/essays/how-much-suffering-is-in-the-standard-american-diet/ is the closest thing to calculating the value of going vegetarian that I know.

Comment author: sirshred 09 October 2018 05:26:32PM 0 points [-]

Please link to the examples here when they are finished, thanks!

Comment author: saulius  (EA Profile) 18 September 2018 04:46:39PM 3 points [-]

I tried to do something similar when deciding where to donate. The most significant difference was step 4. I used neuron count as a multiplier. For example, according to http://reflectivedisequilibrium.blogspot.com/2013/09/how-is-brain-mass-distributed-among.html, cows on average have 13.6 times more neurons than chickens. So in my model, one minute of cow's life was 13.6 times more important than one minute of chicken's life of comparable quality. I've seen some people comparing the square root of neuron count instead. http://ethical.diet/ makes it easy to make these kinds of comparisons for farm animals.

Comment author: Foster 05 October 2018 12:15:38PM 2 points [-]

This looks promising!

I often find myself second guessing estimations of animal charity effectiveness as it feels like they might have cherry-picked their 'moral metric'. Breaking it down in this way seems like a laudable and structured approach for assessing an issue with quite so many unknown variables.

Things that excited me:

  • I could imagine a report where, for a given intervention, each of these is estimated, confidence weightings given and explanations of evidence, priors and reasonings for each estimation. Reading that would have given me more confidence when I was earlier in my journey re animal suffering.

  • Complex, intuition-challenging problems broken down into smaller, more intuition-friendly problems seems valuable.

  • I'd guess it's likely that making many weighted judgement calls and making gut checks from many angles will result in answers closer aligned with our values.

Comment author: MikeJohnson 18 September 2018 08:00:22PM 2 points [-]

Glad to see work on this.

It seems to me there are two questions here: (1) what are the average effects of different environments (e.g. wilderness; factory farm) on animal well-being? (2) what is the average hedonic well-being of different species?

It feels like you're attempting to find a method that will give the combined score for any given animal. But maybe it'd be best to focus on each individually. Some of the methods you mentioned (e.g. cortisol levels, behavior anomalies, self-narcotization) seem fairly solid for addressing (1), if you had more data. What's the biggest hurdle to gathering more data? Can you think of any clever ways to gather lots of data cheaply? Basically it seems really useful to try to build an intra-species hedonic comparison first, and worry about inter-species comparisons later.

That said-- on inter-species comparisons, I don't think any of the methods you mention are likely to give a good answer to (2), especially as none deal directly with brain activity. It's possible (although I don't know for sure) that some of QRI's work is relevant here- essentially, we have a method ('CDNS') that could be adapted to estimate the degree to which a given connectome is naturally 'tuned' toward harmony or dissonance. This would face many of the same data & validation challenges you mention for other proxy measures, but essentially I'm skeptical that it's possible to address (2) without something like what QRI is doing, that actually looks at brain activity and doesn't rely on hard-coded assumptions about things that could be species-specific and are probably leaky anyway (e.g., brain region X is associated with pain).

If it checks out, this could give a rough inter-species comparison of natural hedonic set-points between literally any two connectomes-- cows, chickens, rats, grasshoppers, mosquitos, humans. Probably not an end-all-be-all, but a useful tool in the toolbox. More on our 'CDNS' method.

Comment author: RobertDaoust 05 October 2018 01:03:08PM 1 point [-]

Good work ! I am including a link to it in my Preparatory Notes for the Measurement of Suffering, where perhaps you will find other useful measuring methods.