Many animal shelters regularly use standardized tests to assess the behavior of dogs and to determine adoption suitability. However, while the use of these tests has become ubiquitous, there is a distinct lack of research demonstrating their reliability or validity. In other words, while testing a dog’s degree of friendliness, aggression and fear prior to adoption makes intuitive sense and feels like a good idea, we do not actually know whether or not it actually works. This is an important question to raise (and I am by no means the first to be raising it), because when these tests are administered as a method for predicting a dog’s future behavior, the dog’s performance on the test often determines whether or not he has a future.
My previous blog, “This test you keep using…..” reviewed a study that examined the predictive value of a single subtest of a standard behavior test, the fake hand as a test for food aggression (1). That paper’s results are important because it brought the fake hand test under needed scientific scrutiny and because it brought additional attention to the need for the scientific validation of all behavior tests that are used with shelter and rescue dogs.
Though still limited, these studies are being conducted and published. For example, an Australian behaviorist, Kate Mornement, has been studying behavior assessments used by animal shelters for the last several years as part of her PhD research at Monash University (2,3). Most recently, she and her research team examined the effectiveness of a behavior assessment program called the BARK protocol (4).
Kate Mornement, Pets Behaving Badly
The study: Kate and her research team first worked with a focus panel of nine canine experts to develop a standardized 12-subtest behavior assessment that was labeled the Behavioural Assessment for Rehoming K-9s (BARK) program. The BARK battery of tests was designed to assess five primary behavior traits: anxiety, compliance, fear, friendliness, and activity level. Following development, the BARK test’s reliability and validity were studied in a shelter setting over a 12-month period. Several measures of its effectiveness were examined: inter-rater reliability (the degree to which different evaluators agreed when assessing the same dog), test-retest (the degree to which a dog’s score was stable over time; in this case dogs were retested 24 hours following their initial assessment), and predictive validity (the accuracy with which in-shelter test results predicted a dog’s in-home behavior). Predictive validity was assessed by surveying adoptive owners several months following adoption regarding their dog’s degree of anxiety, fear, friendliness, compliance, and activity level.
Results: Several results of this study are of value to shelter professionals and dog folks:
- Inter-rater reliability: Scoring for the five behavior categories showed statistically significant and moderate agreement between evaluators when testing the same dog. This means that two evaluators (who in this study were highly experienced researchers) generally rated dogs similarly. Some associations were stronger than others, with the assessment for fearful behavior showing the strongest correlation between scorers.
- Test-retest reliability: The test-retest reliability was significant for some traits and non-significant for others, resulting in overall weak reliability for the entire group of subtests. This means that a dog’s scores were not always consistent (stable) over time (in this case, just 24 hours) while in the shelter environment. Similar to inter-rater reliability scoring, the tests that reflected a dog’s degree of fear had the strongest correlations.
- Predictive validity: A group of 67 dogs who had been adopted into homes were subsequently assessed via owner interviews. Owner assessments were compared with the dogs’ in-shelter BARK scores. Overall, the predictive value of the BARK test was found to be poor. Only two of the five behavior categories had statistically significant correlations between in-home behavior and BARK test scores; fear and friendliness. However, even these associations were not strong (r = 0.42 and r = 0.49, respectively). There were no correlations between in-home reports of anxiety, compliance, and activity level with in-shelter BARK scores.
Soapbox time: These results (and those of Marder et al) raise several questions. Perhaps single-session tests designed to measure major behavior categories can work and all that is needed is additional attention to designing the right types of subtests. Or, perhaps it is more important to examine differences among shelters in terms of staff experience, time availability, adoption standards, and the number of animals that are cared for and attempt to design behavior assessments that can be modified to fit individual shelter’s needs. Or, perhaps it is time to rethink the entire use of these tests and to consider not using them at all.
Because these tests have become so entrenched in shelter and rescue dog culture, it is this last suggestion that is not only often overlooked, but also that has the potential to raise much ire. Typically, responses to this suggestion center around three objections, all of which qualify in one way or another as straw man arguments and which can effectively function to derail thoughtful discourse.
If we stopped using the [insert branded test name here] behavior test, we would not have a way to assess dogs’ behavior prior to putting them up for adoption. [Setting up a false dichotomy: It is not an either/or issue. There are other, potentially better, approaches to monitoring and assessing shelter dog behavior than single-session standardized tests. Not using the [***] test, does not require you to use nothing at all to assess behavior].
I have seen the tests work with my own eyes; if it prevents a single dog who is aggressive from going up for adoption from my shelter, it is worth using. [False proposition: For example, Marder’s data show that yes, some dogs are correctly identified as food aggressive. However, others are missed, and some dogs who are not aggressive are misidentified as such. A poor diagnostic test that gets one right once in a while cannot be defended as a valid diagnostic test].
But, what about the children? We cannot risk adopting out a dog who might bite a child!!?! (This last is typically uttered with raising voice and a hint of hysteria) [Classic Strawman: Redefining the argument to imply that those who question the use of the beloved test advocate the release of baby-killing canines into communities. This is misdirection at its best as invoking the emotional “save the children” chant works to derail discourse every time. First, no one denies that it is important to ensure that only dogs who are safe are placed into homes with children. Second, the fact that it is important to identify dogs who may be aggressive to children is not the same issue as whether or not one continues to use tests that appear to be unreliable. In other words, an inaccurate test would not help you to save those children that you are so concerned about… ]
Strawman arguments, in addition to being logically invalid, function to keep people from paying attention to the evidence, and from admitting that there may be a problem with the use of behavior tests to assess shelter dogs. If we can keep these at bay and instead encourage discussion of where the science seems to be leading us, we may find that there are potential alternatives to the current standardized, single-session behavior tests. Improved design of the tests is one, as is custom-designing tests to meet shelters’ needs, as is developing an approach that is more longitudinal in nature – for example, having shelter staff note simple behaviors once or twice a day during feeding/cleaning/exercising dogs to provide a longer-term and cumulative record of the dog’s behavior. Longitudinal data, like single-session behavior tests, would require validation through scientific testing. Conversely, to continue to champion a battery of tests that have not yet held up under scientific scrutiny and have been shown to be significantly deficient in at least some areas, seems to be helping no one, least of all the dogs who are tested……..and fail.
- Marder AR, Shabelansky A, Patronek GJ, Dowling-Guyer S, D’Arpino SS. Food-related aggression in shelter dogs: A comparison of behavior identified by a behavior evaluation in the shelter and owner reports after adoption. Applied Animal Behaviour Science 2013; 148:150-156.
- Mornement K, Toukhsati S, Coleman G, et al. Reliability, validity and feasibility of existing tests of canine behavior. AIAM Annual Conference on Urban Animal Management, Proceedings. 2009;11-18.
- Mornement KM, Coleman GJ, Toukhsati S, Bennett PC. A review of behavioural assessment protocols used by Australian animal shelters to determinae adoption suitability of dogs. Journal of Applied Animal Welfare Science 2010; 13:314-329.
- Mornement KM, Coleman GJ, Toukhsati S, Bennett PC. Development of the behavioural assessment for re-homing K9’s (B.A.R.K.) protocol. Applied Animal Behaviour Science 2013; Article in Press. Abstract