Main Content Area
Discriminant Validity: Excessive Neglect & Misleading Claims
Providing evidence that a multi-item scale measures a particular construct and not other constructs with which it might be confused is an important part of social science research. This part of a scale’s psychometric quality is what I will refer to here as discriminant validity (DV here after).1 Support for DV should definitely be provided in the development of a scale if the researchers truly care about the quality of their analyses, results, and conclusions. It also makes sense for DV to be tested anytime researchers intend for their work to be published in a scholarly journal, even if they borrowed a scale that had been developed by others in the past. Yet, testing and reporting the results of DV tests in either case is not typical, at least not in the scholarly journals I examine for my work.2
Too many authors state that a scale is valid based on studies done years or decades earlier. While it is good practice to borrow scales for which DV has been supported in the past, that does not mean they have DV in the context of one’s current study. People and contexts can change over time and that affects DV. Further, when authors say that a scale has DV and yet do not provide evidence from their own study it is like to saying to the manuscript reviewers and readers “don’t expect me to provide evidence of DV because this is an established measure for which others have provided sufficient evidence.” This is what I call the perpetual validity defense. It is another pet-peeve of mine that I have written about previously. Even in a study in which one tests for DV, if it indicates that X and Y are distinct it should not be assumed that X has DV with measures of other constructs. In other words, it is misleading to make a general statement such as “X has discriminant validity;” it is more precise to say something like “there is evidence of discriminant validity between X and Y in this study.”
While I am primarily discussing here the empirical domain where the distinctiveness of multi-item scales is the issue, assessing DV with respect to constructs is an important point as well. Testing of a construct’s DV with respect to other constructs involves theory, expert insight, and a battery of tests that examine the various types of validity, not just DV. Further, as discussed in detail by Voorhees et al. (2016), when constructs are thought to be similar but not the same, testing DV is quite challenging.
The bottom line is that readers deserve to know the quality of a measure and proper information about DV is a big part of that. It should be tested and reported much more that it currently is. Indeed, it should be a requirement for publication in top level scholarly journals which take measurement seriously.
1. The purpose here is not to critique different procedures for testing for DV but rather to discuss claims made when describing a scale’s validity. For a more detailed examination and discussion of DV and how it should be tested, see Voorhees, Clay M., Michael K. Brady, Roger Calantone, and Edward Ramirez (2016), “Discriminant Validity Testing in Marketing: An Analysis, Causes for Concern, and Proposed Remedies,” Journal of the Academy of Marketing Science, 44 (1), 119–134.
2. An examination conducted by Voorhees et al. (2016, p. 133) of seven marketing journals covering 2008 to 2012 concluded that “only 14.7% of marketing research studies formally document discriminant validity prior to examining relationships among variables.” For comparison, I did a quick examination of the scales in Volume 10 of the Marketing Scales Handbook which covered five of the seven journals used by Voorhees (2016). For this purpose, the data were limited to articles published in 2016 and 2017 that reported consumer research using multi-item scales. The percent of articles in which the authors reported DV tests was around 27%.