Beware the Straw Man

Many animal shelters regularly use standardized tests to assess the behavior of dogs and to determine adoption suitability. However, while the use of these tests has become ubiquitous, there is a distinct lack of research demonstrating their reliability or validity. In other words, while testing a dog’s degree of friendliness, aggression and fear prior to adoption makes intuitive sense and feels like a good idea, we do not actually know whether or not it actually works. This is an important question to raise (and I am by no means the first to be raising it), because when these tests are administered as a method for predicting a dog’s future behavior, the dog’s performance on the test often determines whether or not he has a future.


My previous blog,  “This test you keep using…..” reviewed a study that examined the predictive value of a single subtest of a standard behavior test, the fake hand as a test for food aggression (1).  That paper’s results are important because it brought the fake hand test under needed scientific scrutiny and because it brought additional attention to the need for the scientific validation of all behavior tests that are used with shelter and rescue dogs.

Though still limited, these studies are being conducted and published. For example, an Australian behaviorist, Kate Mornement, has been studying behavior assessments used by animal shelters for the last several years as part of her PhD research at Monash University (2,3). Most recently, she and her research team examined the effectiveness of a behavior assessment program called the BARK protocol (4).


Kate Mornement, Pets Behaving Badly

The study: Kate and her research team first worked with a focus panel of nine canine experts to develop a standardized 12-subtest behavior assessment that was labeled the Behavioural Assessment for Rehoming K-9s (BARK) program. The BARK battery of tests was designed to assess five primary behavior traits: anxiety, compliance, fear, friendliness, and activity level. Following development, the BARK test’s reliability and validity were studied in a shelter setting over a 12-month period. Several measures of its effectiveness were examined: inter-rater reliability (the degree to which different evaluators agreed when assessing the same dog), test-retest (the degree to which a dog’s score was stable over time; in this case dogs were retested 24 hours following their initial assessment), and predictive validity (the accuracy with which in-shelter test results predicted a dog’s in-home behavior). Predictive validity was assessed by surveying adoptive owners several months following adoption regarding their dog’s degree of anxiety, fear, friendliness, compliance, and activity level.

testing test

 Results: Several results of this study are of value to shelter professionals and dog folks:

  1. Inter-rater reliability: Scoring for the five behavior categories showed statistically significant and moderate agreement between evaluators when testing the same dog. This means that two evaluators (who in this study were highly experienced researchers) generally rated dogs similarly. Some associations were stronger than others, with the assessment for fearful behavior showing the strongest correlation between scorers.
  2. Test-retest reliability: The test-retest reliability was significant for some traits and non-significant for others, resulting in overall weak reliability for the entire group of subtests. This means that a dog’s scores were not always consistent (stable) over time (in this case, just 24 hours) while in the shelter environment. Similar to inter-rater reliability scoring, the tests that reflected a dog’s degree of fear had the strongest correlations.
  3. Predictive validity:  A group of 67 dogs who had been adopted into homes were subsequently assessed via owner interviews. Owner assessments were compared with the dogs’ in-shelter BARK scores. Overall, the predictive value of the BARK test was found to be poor. Only two of the five behavior categories had statistically significant correlations between in-home behavior and BARK test scores; fear and friendliness. However, even these associations were not strong (r = 0.42 and r = 0.49, respectively).  There were no correlations between in-home reports of anxiety, compliance, and activity level with in-shelter BARK scores.

Take Away for Dog Folks: These results suggest that a standardized behavior test, administered to shelter dogs in a shelter environment, may not be a reliable indicator of a dog’s future behavior.   


Soapbox time: These results (and those of Marder et al) raise several questions. Perhaps single-session tests designed to measure major behavior categories can work and all that is needed is additional attention to designing the right types of subtests. Or, perhaps it is more important to examine differences among shelters in terms of staff experience, time availability, adoption standards, and the number of animals that are cared for and attempt to design behavior assessments that can be modified to fit individual shelter’s needs. Or, perhaps it is time to rethink the entire use of these tests and to consider not using them at all.Elephant

Because these tests have become so entrenched in shelter and rescue dog culture, it is this last suggestion that is not only often overlooked, but also that has the potential to raise much ire. Typically, responses to this suggestion center around three objections, all of which qualify in one way or another as straw man arguments and which can effectively function to derail thoughtful discourse.


 These are:

  1. If we stopped using the [insert branded test name here] behavior test, we would not have a way to assess dogs’ behavior prior to putting them up for adoption. [Setting up a false dichotomy: It is not an either/or issue. There are other, potentially better, approaches to monitoring and assessing shelter dog behavior than single-session standardized tests. Not using the [***] test, does not require you to use nothing at all to assess behavior].
  2. I have seen the tests work with my own eyes; if it prevents a single dog who is aggressive from going up for adoption from my shelter, it is worth using. [False proposition: For example, Marder’s data show that yes, some dogs are correctly identified as food aggressive. However, others are missed, and some  dogs who are not aggressive are misidentified as such. A poor diagnostic test that gets one right once in a while cannot be defended as a valid diagnostic test].
  3. But, what about the children? We cannot risk adopting out a dog who might bite a child!!?! (This last is typically uttered with raising voice and a hint of hysteria) [Classic Strawman: Redefining the argument to imply that those who question the use of the beloved test advocate the release of baby-killing canines into communities. This is misdirection at its best as invoking the emotional “save the children” chant works to derail discourse every time. First, no one denies that it is important to ensure that only dogs who are safe are placed into homes with children. Second, the fact that it is important to identify dogs who may be aggressive to children is not the same issue as whether or not one continues to use tests that appear to be unreliable. In other words, an inaccurate test would not help you to save those children that you are so concerned about… ]


Strawman arguments, in addition to being logically invalid, function to keep people from paying attention to the evidence, and from admitting that there may be a problem with the use of behavior tests to assess shelter dogs. If we can keep these at bay and instead encourage discussion of where the science seems to be leading us, we may find that there are potential alternatives to the current standardized, single-session behavior tests. Improved design of the tests is one, as is custom-designing tests to meet shelters’ needs, as is developing an approach that is more longitudinal in nature – for example, having shelter staff note simple behaviors once or twice a day during feeding/cleaning/exercising dogs to provide a longer-term and cumulative record of the dog’s behavior. Longitudinal data, like single-session behavior tests, would require validation through scientific testing. Conversely, to continue to champion a battery of tests that have not yet held up under scientific scrutiny and have been shown to be significantly deficient in at least some areas, seems to be helping no one, least of all the dogs who are tested……..and fail.

References Cited:

  1. Marder AR, Shabelansky A, Patronek GJ, Dowling-Guyer S, D’Arpino SS. Food-related aggression in shelter dogs: A comparison of behavior identified by a behavior evaluation in the shelter and owner reports after adoption. Applied Animal Behaviour Science 2013; 148:150-156.
  2. Mornement K, Toukhsati S, Coleman G, et al. Reliability, validity and feasibility of existing tests of canine behavior. AIAM Annual Conference on Urban Animal Management, Proceedings. 2009;11-18.
  3. Mornement KM, Coleman GJ, Toukhsati S, Bennett PC. A review of behavioural assessment protocols used by Australian animal shelters to determinae adoption suitability of dogs. Journal of Applied Animal Welfare Science 2010; 13:314-329.
  4. Mornement KM, Coleman GJ, Toukhsati S, Bennett PC. Development of the behavioural assessment for re-homing K9’s (B.A.R.K.) protocol. Applied Animal Behaviour Science 2013; Article in Press. Abstract


This test that you keep using……

The availability heuristic is a common cognitive error that influences our ability to make accurate decisions. It is operating full-force whenever we base a decision upon evidence that is easily available (i.e. dramatic, obvious, easily measured) but that may not actually reflect reality. In practice, this means that we pay more attention to evidence that is salient (obvious and dramatic) and tend to ignore evidence that may be more compelling but not quite so sensational.

Take for example, shark attacks. The feeling that shark attacks are far more common and that we are at greatly inflated risk than we actually are occurs because of the extensive and sensationalist media coverage that a single shark encounter attracts. As a result, when you consider going to the beach this summer, an image of a shark pops into your mind because such an image is highly available to you. However, while shark attacks can and do happen, an examination of the actual risk is much lower that our perceptions lead us to believe.

Shark      Coconuts                     PERCEPTION                                                             REALITY

I will return to the significance of the availability error shortly. Let’s now turn to an important dog topic – the expression of food-related aggression in dogs. (There will be a tie-in, I promise. 🙂 )

Background information: Food-related aggression (FA) is a specific subtype of resource guarding in dogs. It’s expression  can vary in intensity from a dog who simply shows tenseness near his food bowl, to freezing, growling, or biting a person who interferes with the dog while he or she is eating. Most of the standardized behavior evaluations that are used by shelters and rescue groups include an assessment for FA. For reasons of safety, many use a fake plastic or rubber hand that is attached to a long stick for this test. Although procedures vary somewhat, the test for FA involves interfering with the dog while he is eating from a bowl, first by placing the fake hand into the bowl and pulling it away and then by attempting to push the dog’s face away from his food by pressing the instrument alongside the dog’s face. The validity of this test, meaning its ability to correctly identify dogs who do (and do not) truly have FA, is an important issue because dogs who exhibit FA during a behavior evaluation are almost always identified as an adoption risk, which can lead to reduced opportunities for finding a home, and at some shelters, to automatic euthanasia.


2004 Study: Despite its ubiquitous inclusion in behavior tests, few studies have actually examined the reliability of the fake hand test for FA. A few years ago, a group of researchers at Cornell conducted a study with dogs who had  a history of various forms of aggression, including FA (1). They found a positive and statistically significant correlation between showing an aggressive response toward the fake hand and previously exhibited aggression in the dog. However, the relationship was weak and a substantial number of dogs who were NOT aggressive also tested positive (i.e. reacted to the hand) when tested. The authors recommended the use of caution when using a fake hand in behavior tests because of the high number of both false positive and false negative responses that they found. A limitation of this study was that because the researchers used dogs with a known history of different types of aggression who were already in their permanent homes, they could not make conclusions about the predictive value of the test. To do this, we needed a study that examined how well the fake hand test, when administered to dogs in a shelter environment, correlates with dogs’ future behavior when living in homes. Such a study was published in September, 2013 in the journal Applied Animal Behaviour Science (2).

2013 Study:  Dr. Amy Marder and her colleagues at the Center for Shelter Dogs in Boston, MA  tested a group of 97 dogs using a standardized canine behavior evaluation that included a test for FA. Dogs showing extreme aggression or multiple forms of aggression were excluded from the study for ethical and safety reasons. Following testing, all of the dogs were adopted into homes. Adopters of dogs who showed food aggression (FA+) were provided with additional instructions for handling the dog during feeding times, but the dogs themselves received no additional training or behavior modification prior to adoption. Adoptive owners were surveyed to assess the dog’s behavior in the home at 3 days, 3 weeks and 3 months following adoption.

Results: Of the group of 97 tested dogs, 20 dogs (21 %) reacted aggressively to the hand and were classified as FA+; 77 dogs did not react and were identified as FA-. Of the 20 dogs who were classified as FA+, approximately half (11/20, 55 %) were reported by their owners to show food-related aggression while in the home and nine of the FA+ dogs (45 %) showed no signs of food aggression when in the home. Of the 77 dogs who were classified as FA-, the majority (60/77, 78 %) were also FA- when in their adoptive homes. However, 17 dogs from this group, 22 %, did show signs of FA when in the home, even though they had tested negative for FA while in the shelter. A final result was that the majority of owners of dogs who were showing FA in the home reported that they did not consider their dog’s behavior to be problematic and that they would definitely adopt the same dog again.

Take away for dog folks:

  1. The authors found that the negative predictive value of the test was high since 78 percent of dogs who tested negative in a shelter environment showed no food aggressive behaviors when in their adoptive home. (This is good).
  2. The positive predictive value of the test was low since only 55 percent of dogs who tested positive in the shelter environment showed food aggression when in the home (This is bad).
  3. Owners may perceive food-related aggression as much less problematic than do shelter staff and may have little trouble managing dogs who are reactive around their food bowls.

If that information does not give you enough to chew upon, let me contribute an additional question to this controversial (and apparently quite polarized) topic. What do these data say about the test itself?

There is really no question that the data presented in this study, along with the Cornell study, suggest something additional. Realizing that this is a sacred cow to those who are highly committed to their fake hands, I offer up the suggestion that perhaps the fake-hand test is not measuring what its users think it is measuring. (In other words, it is not a valid test of FA).

Keep UsingThis test that you keep using……..

Here is why (stay with me here; this gets long but it is worth the ride…..): The researchers reported positive and negative predictive values for the fake hand test (numbers noted above), but they also had data available to calculate two additional measures of a diagnostic test’s validity. These are referred to as sensitivity (a test’s ability to correctly identify all positive responses) and specificity (its ability to correctly identify all negative responses). I went ahead and punched these numbers using the data that the paper provided and found this:

  • Fake Hand Test Sensitivity = 39 % This means that 39 percent of the time, the fake-hand correctly identified FA in the dogs who actually had it. The flip side of this statistic is probably more important. It also means that almost 2/3 of the time (61 %), the fake-hand either incorrectly identified a dog who was FA- as being FA+ or missed the identification and labeled a dog who was FA+ as being FA-. Although sensitivity values are considered to be a relative measure, I do not think anyone would try to argue that 39 percent success rate signifies a valid test. (Especially in light of the fact that a positive result for this particular test can mean the end of life for the dog).
  • Fake Hand Test Specificity = 87 %. This means that the majority of the time, if the fake hand says a dog is non-reactive around his food bowl, it is correct. Only 13 percent of dogs who tested FA- actually had FA. While this is a desirable value for the test, high specificity alone is not enough.
  • Supporting data? This was not the first study that has examined the use of the fake hand in behavior evaluations, but it is the first study that has measured the predictive value of the test. It is important to note that to date, there are no published studies that provide data showing that using a fake hand to diagnose food reactivity in dogs is a highly reliable test. None.

Which begs the question – Why do temperament tests that are used with shelter dogs continue to include the fake hand as a test for food aggression?

IllogicalSeems a bit illogical, doesn’t it?

There are a few possibilities:

  1. It is simple and measurable: Unlike much of what we do in behavior and training, the Fake Hand test is pretty easy to administer and to score. Therefore, it is a shoe-in for being included in a battery of tests that can be quickly administered to a lot of dogs and by personnel who have varying levels of expertise.
  2. The use of the fake hand is well-established: Many, but not all, of the behavior assessment tests that are used in shelters today include a test for FA that uses a fake hand (3,4). Many of these tests are highly standardized and include specific training programs for shelter staff who administer them. However, while proponents of the fake hand insist that a set of clear and very specific steps are used in the test’s administration (i.e. how far to stand away from the bowl, how many times the dog’s face is pushed, how to manipulate the bowl), such protestations are a moot point since none of the specific guidelines for administering the tests have been validated either.
  3. The results are dramatic and salient – i.e. AVAILABLE: A dog who reacts aggressively when a fake hand is shoved in his face while he is eating provides us with an example of the availability heuristic in action. Aggressive responses in dogs elicit dramatic and involuntary reactions in those who witness the response –  a rush of adrenaline, a bit of fear, perhaps even a little bit of the “stopping to watch a car wreck” feeling, if you will.  Just as we react strongly (and illogically) to reports of shark attacks, so too might an evaluator react emotionally to an aggressing dog. The fallout is that the aggression that is provoked by a fake hand during a behavior test may acquire more significance than it actually has in real life.  (This is supported by Dr. Marder’s results when interviewing owners of FA+ dogs, who did not see FA as such a big deal). And, because the provoked aggressive response in the dog is dramatic and obvious, the evaluator now feels compelled to do something about the reaction that was provoked – special adopts, no adopt, euthanize.

        shark3         Pos FA Test                   AVAILABILITY HEURISTIC – DRAMATIC IMAGES STICK WITH US

Here’s a bombshell….Perhaps poking a dog in the face with a fake hand while he is eating in a shelter environment is not a valid way to test for food aggression:  The sensitivity statistic of 39 % suggests that at least some (if not the majority) of dogs who react when tested with  a fake hand are not showing FA. At the very least, this paper and this particular statistic suggests that the presumed test for FA using a fake hand is not testing for the thing that proponents think it is testing for. Additionally, the availability error may lead those who regularly administer this test to assign excessive significance to FA because of the salience of provoked responses in the test and highly inflated perceptions of risk to owners. Given that the fake hand test leads to decisions that severely reduce a dog’s chances of being adopted into a home or may even result in the death of the dog, this is a possibility that must be raised and considered.

Cited References:

  1. Kroll TL, Houpt KA, Erb HN. The use of novel stimuli as indicators of aggressive behavior in dogs. Journal of the American Animal Hospital Association 2004; 40:13-19.
  2. Marder AR, Shabelansky A, Patronek GJ, Dowling-Guyer S, D’Arpino SS. Food-related aggression in shelter dogs: A comparison of behavior identified by a behavior evaluation in the shelter and owner reports after adoption. Applied Animal Behaviour Science 2013; 148:150-156.
  3. Barnard S, Siracusa C, Reisner I, et al. Validity of model devices used to assess canine temperament in behavioral tests. Applied Animal Behaviour Science 2012; 138:79-87.
  4. Taylor KD, Mills DS. The development and assessment of temperament tests for adult companion dogs. Journal of Veterinary Behavior 2006; 1:94-108.