Main Content Area
Use the "Right" Number of Scale Items
This is an issue that most of us who develop measures struggle with: how many items “should” we use to measure a psychological construct? My pet peeve is that some scale developers use far too few while others use far more than they have too. By and large, it is probably the practitioners who use too few, like one item, while it is the scholars who use more than they need to.
First, let's back up and remember that psychological variables are difficult to reliability measure when using just one item. In other words, there is unlikely to be one item that perfectly measures the construct you are interested in. Thus, any one item you do select has an unknown amount of error. Given that use of more items of the appropriate sort increase reliability and the error can be estimated, multi-item scales have a great psychometric advantage. Further, knowing the reliability of a scale helps as you estimate the degree to which it validly measures a construct.
The many benefits of multiple items does not, however, justify the use of dozens of items to measure any one construct. Rarely do scales need to have lots of items. I don't understand why to this day you see scales developed with more than 10 items. I can safely bet that the more items a scale has then the more likely they will fragment into several dimensions. Since any set of items is supposed to measure just one construct (unidimensionality), then multi-dimensionality is more likely as the number of items in a scale increase. Scholars, please keep in mind that the purpose of having multiple items in the first place is to triangulate measurement of a construct, enable estimation of its reliability (via internal consistency), and tap into the main facets of a construct. The items used in a scale are merely a representative sample of statements regarding the construct and there is no need to include every possible statement that can be made about the object .
The bottom line is that having a scale with 3-8 well selected/tested items should be sufficient in order to have a reliable and valid unidimensional scale (e.g., Bagozzi and Baumgartner 1994; Green and Rao 1970; Stanton et al. 2002). Admittedly, there is more to it than that, especially when you are trying to identify the best small set of items that will achieve the various types of validation but, the point made here is that aiming for something greater than 1 and smaller than 10 is a worthy goal.
Bagozzi, Richard P. and Hans Baumgartner (1994), “The Evaluation of Structural Equation Models and Hypothesis Testing,” in R.P. Bagozzi, editor, Principles of Marketing Research, Blackwell Publishers, Cambridge, MA (1994), pp. 386-422.Green, Paul E. and Vithala R. Rao (1970), "Rating Scales and Information Recovery: How Many Scales and Response Categories to Use?" Journal of Marketing, 34 (July), 33-39.
Stanton, Jeffrey M., Evan F. Sinar, William K. Balzer, and Patricia C. Smith (2002), “Issues and Strategies for Reducing the Length of Self-Report Scales,” Personnel Psychology, 55 (1), 167–94..