Main Content Area
Failure to Consider Alternative Measures
When it comes time to decide what scales to use in a study, how much effort do researchers put into comparing alternative measures? My review of scales over the last two decades leads me to believe that far too many researchers use just which ever measure is most convenient. In some cases, those measures may have been appropriate for the purpose. But, it is clear to me that too many times scales of unknown or dubious validity have been used when better ones were available.
As with almost any decision we make as professionals, the ideal process is to figure out what construct is to be measured and then search for alternative measures. All scales are not created equal! While there have been many books and articles discussing methods for constructing scales, I am not aware of as many that talk about the steps of selecting from among scales that are already developed. I proposed some steps several years ago (Bruner 2003) and, while I won’t go into all of the details here, let me mention some of the main things to look for when comparing alternative measures.
Length: How many items are in the scale? As discussed in a previous blog, you don’t want too many nor too few items. You should also take the nature of the questionnaire into account. What else is to be measured in the instrument? The more constructs being measured in a period of time, the shorter each scale will need to be. Also, how important is the construct to the overall study? The more important the construct then the longer a scale may need to be to increase its internal consistency and cover all of its facets. Having said all of that, most constructs we measure in marketing can be measured with scales of 3-5 items quite validly but it has to be the "right" set of items.
Format: is a certain format preferred over another? Most scales used in scholarly marketing research use one of two formats: Likert-type or semantic differential. Comparing the pros and cons of these two formats is beyond the scope of this blog but suffice it to say that one style might fit your instrument or the construct being measured better than the other. The Likert-type usually provides sentences toward which respondents will indicate their level of agreement. In contrast, directions used with semantic differentials usually identify an object and ask respondents to use a list of opposing terms or short phrases to describe the object (an ad, a product, a company). Semantic-differentials can be a very simple way to gauge what people think or feel about objects. But, for deeper issues such as personality traits, values, and motivations, full sentences are needed that can be crafted to capture the nuances of a construct.
Reliability: one of the reasons that researchers use multi-item scales is to increase the reliability of the measures. (See my previous blog on this topic.) Select a scale with high reliability, usually interpreted as above .80 with the typical stats such as Cronbach’s alpha and construct reliability. The lower a scale’s internal consistency, especially when it is below .70, the more it is attenuating measured relationships, i.e., the scale is a less precise measure and may not detect relationships that actually exist.
Face Validity: One of the most important things in choosing a scale is to have someone very familiar with the construct look at the scales items and judge how well they seem to capture the intended construct. As I discussed in the last blog, scale names can be misleading. Look beyond the title.
Construct Validity: Most scales used in academia are seriously lacking in evidence of their construct validity. If that is true in scholarly research where standards for justifying methodology are “high” then I can only imagine that evidence of validity is lower in industry where critical review by an external group of experts is much less likely. I do not want to suggest that by not knowing a scale’s validity, it is invalid. NO. The point, however, is that the researcher does not know how well the measure actually measures what it is supposed to measure beyond what can be determined via looking at the scale items (face validity). There are several other forms of validity that are more complicated and can not be readily judged without being able to analyze the full data set if not multiple data sets. Instead, the researcher looking for a scale must scrutinize the evidence that has been provided by previous users.
History: All other things being equal, it may make the most sense to use a scale that has been used a lot over time rather than one that is new. For many well-known constructs there are multiple scales available to choose from but, each scale is a little different and those differences can affect results. If it is important to be able to directly compare your results to those of others who studied the construct or a relationship then use the measures that they did unless there are very good reasons not to. (Read more about this issue, "equivalency," in Bruner 2003.)
The bottom line is that just because several measures have the same name does not mean they are of the same quality or are equally appropriate for a purpose. Part of being a good researcher is choosing the right tools rather than just using whatever is most convenient. Luckily, the ability to compare alternative measures of constructs is easier now than it has ever been. The growing library of scales provided at the Marketing Scales website allows alternative measures to be located easily and the reviews provide information to make a choice between them.
Bruner II, Gordon C. (2003), “Combating Scale Proliferation,” Journal of Targeting, Measurement & Analysis, 11 (4), 362-372.