Main Content Area
Poor Choice of Scales (Single- vs. Multi-Item)
Why do so many market researchers rely heavily, if not exclusively, on single-item measures of theoretical constructs in their consumer insight studies? My guess is that it is some variation of one main thing: convenience. It is easier for a researcher to develop, administer, and analyze questionnaires with one item measures of the constructs of interest. It is also simpler and faster for respondents to complete questionnaires, particularly when there are many constructs being measured and time is extremely limited.
There must also be the assumption that single-item (SI) scales are as accurate, or almost as accurate, as multi-item (MI) scales. But, is that true? This issue has been examined in several scholarly studies with varying answers and recommendations. In this posting, beyond drawing on my own experience, I will draw on the findings of the most recent and extensive set of the studies I am aware: Diamantopoulos et al. (2012). Among the studies they conducted was a set of Monte Carlo style simulations, manipulating several data and measurement characteristics in 181,758 runs. It does not get more thorough than that.
Based on those thousands of runs made by Diamantopoulos et al. (2012), the results clearly showed that the predictive validity of SI scales varies across constructs, product categories, and stimuli (e.g., brands). This means that even if a SI scale is judged to have performed well in one context it gives a researcher little confidence that the measure will do as well in other situations.
Having said that, I do not want to be so dogmatic as to claim one should never use a SI scale. Blind loyalty to one or the other is not being advocated here. There are many issues that can be taken into account when choosing between a SI and a MI scale. For brevity, I have limited myself to just five here.
- Is the variable to be measured a fact that respondents ought to be able to answer or is it one that is more subjective or abstract? Simple, factual information that is expected to be clear in the respondents' minds might be safely measured with SI scales. Examples would be demographics (e.g., are you legally married), product ownership (e.g., do you currently own a truck), or a simple behavior (e.g., did you attend a professional sporting event last month). Obviously, that leaves a whole lot of questions for which answers are not so simple or factual. Generally, attitudes, emotions, lifestyle, personality traits, intentions, and most psychological constructs cannot be adequately measured with one question. Instead, they are best measured with MI scales to fully capture the richness of the construct and to reduce measurement error or, at least, be able to quantify the error in the measure.
- How important is measurement quality to the study and those who use the results? The more you care about measurement quality, the more you should use a validated MI scale. Two key types of measurement quality are reliability and validity. Explaining them is beyond the scope of this short blog but suffice it to say here that researchers should understand and care about these two issues. If they do not, users of their research should not place any confidence in the accuracy of the results. Measuring what people think and feel is not some sort of art where one measure is just as good as another. While judgment and creativity can play roles in developing measures, ultimately, there is science that can help test how good scales are. It can be determined how well they measure what they are supposed to measure. At the other extreme, there are times when accuracy is not as critical and SI scales may be adequate. For example, flash polls are often taken now-a-days that are filled with SI scales. It does not matter too much if the scores are off by several points this way or that. The results do not affect the investment of capital, closing stores, laying off employees, etc.
- Will you be examining the correlation between multiple psychological constructs? Poor reliability attenuates correlations (e.g., Nunnally and Bernstein 1994, p. 212). In other words, two constructs may be related but, if the measures of each construct have low reliability then, the relationship may not be detected. While this is a potential problem with both MI and SI scales, it is more likely with SI measures because they tend to be more unreliable than MI scales. In a related issue, if you are going to use the data in structural equation modeling where relationships among several constructs are being estimated, then use of MI scales is even more critical. That is because measurement error should be taken into account in the model. That is problematic with SI scales. So, if you are examining multiple psychological constructs for their inter-relationships and understand it is unrealistic to have perfect (error free) SI indicators of those constructs then use MI scales. At the other extreme, if simple, main effects are the focus of a study then SI measures may be sufficient for your purpose.
- Will you be using the measure to compare several groups or how the same group changes over time? The more you plan to reuse a measure and compare the results, a validated MI scale should be used. For example, a very important construct to most organizations is customer satisfaction and you probably want to track it over time. A MI scale should definitely be used for it since critical managerial decisions are likely to be based on the results and you want those results to be as precise as possible. At the other extreme, there may be several variables examined in a survey that are less critical to management at the time. For those constructs that are not the primary focus of a particular study, are rather general in nature, or are being examined on an exploratory basis then it may be acceptable to measure them with SI scales.
- Do you already have data from past studies that empirically indicate a particular item performs well compared to a MI measure? If so, then maybe you can determine if one of the items performed so well compared to the complete scale in its predictive validity and other qualities that the SI version could be used in a future study by itself. However, even then, the SI measure is prone to contextual effects (Diamantopoulos et al. (2012). In other words, when the item is used by itself in another context you will not know how well it performed compared to what would have been possible with the MI scale, especially if a different criterion variable is involved, e.g., purchase behavior vs. satisfaction. As for what to do if you have no data from a past study in which a MI scale was used for the construct of interest, good luck. In such a situation, you have no data to indicate that any particular item is the one best measure of the construct and what its error is.
If you have been able to answer all of the questions in favor of SI scales, congratulations. Maybe you are one of the lucky ones who can gather the data you need in a particular study without the need to use MI scales. For the others, it is time to get serious about using MI scales in those circumstances in which they are justified. There are decades worth of articles, books, and courses on the topic. Some of them are listed here.
I have no doubt that there are plenty of practitioners who are determined to use SI measures but want to do so thoughtfully. For them, the challenge is that that there are no accepted guidelines at this time for creating and validating SI measures for use in the study of consumer psychology. Keep in mind that those who say there are procedures for selecting "good" SI scales are probably referring to what can be done after data have been collected with a MI scale. If data have been collected for a MI scale then for that data set, yes, it can be determined how well each item performed relative to the full scale. But, by that point, the very benefit of using a SI scale has been lost since a MI scale had to be used to make the performance test possible!
In a similar vein, I am concerned when researchers use my books to cherry-pick an item from a MI scale and use it by itself rather than the full scale. They may believe that if the MI scale was reliable and valid then any one item taken from it is just about as good. Yet, part of the reason a MI scale is reliable is that the random error in any one item is offset by the random errors in the other items. That is not possible with SI measures. Ultimately, researchers are advised to heed the conclusion drawn by Diamantopoulos et al. (2012, p. 446) based on the results of their 181,758 simulations: "opting for SI measures in most empirical settings is a risky decision as the set of circumstances that would favor their use is unlikely to be frequently encountered in practice."
Diamantopoulos, Adamantios, Marko Sarstedt, Christoph Fuchs, Petra Wilczynski, and Sebastian Kaiser (2012), "Guidelines for Choosing between Multi-item and Single-item for Construct Measurement: A Predictive Validity Perspective," Journal of the Academy of Marketing Science, 40 (3), 434-449.
Nunnally, Jum C. and Ira H. Bernstein (1994), Psychometric Theory, New York: McGraw- Hill.