Main Content Area
Our Limited Improvement in Scale Development
Recently, I went through hundreds of scale reviews that I have written over the last two decades. This effort was part of the process of preparing the scales database. Most of the reviews were written years ago for the Marketing Scales Handbooks but had to be organized, cleaned up, and coded for loading in the database. In the process of editing all of those reviews I was reminded of how many scales in the early days of my work were really bad. If I came across them now I would probably not waste time reviewing them regardless of what journal they were published in or who the authors were. I was even tempted to delete some of them from the database. I did not do so, however, because of the hope that there was still something of value that contemporary researchers could learn from them, like borrowing what is useful and improving upon it.
What's so bad about these old scales, you ask? A lot of them had bad reliability, not just low reliability (internal consistencies between .60 and .70) but really bad (below .60). They were not reliable. That means they were not valid and that means conclusions based upon their usage are highly suspect.
I was also reminded of the number of scales that were multi-dimensional. In some cases, it was just my guess that they had multi-dimensions due to their low reliabilities or lack of face validity. In other cases, I didn't have to guess; the data provided by the authors indicated they were multi-dimensional. Some of the worst cases had dozens of items and factor analyses clearly showed they were measuring more than one construct.
Our understanding of formative vs. reflective measures was also weak 20 years ago. Given that, there are cases of formative scales in the early volumes being treated as reflective measures. They got into the books because they were treated in our top journals as reflective measures and my own understanding of formative scales was poor enough at that time that I did not reject them or warn readers about them.
I am glad to report that my work with scales developed in more recent years doesn't tend to show the depth of the problems mentioned above. Reliabilities are rarely if ever below .60 any more and there is greater concern for dimensionality. Having said that, plenty of problems remain to which this series of pet-peeve posts attests. I still see far too many scales with one or more of the following problems: low reliability, no evidence of validity, suspicious dimensionality, questionable face validity, improper credit given for the work of others being borrow or modified, and (my top pet-peeve), new scales being created for a construct when several good ones are available.
Yes, we know more than we did 20 years ago and maybe in some general sense the average scale in our top journals is better than the average was twenty years ago. But, the variance in quality is still higher that one would expect for a field's top journals. Clearly, we have come far in the 20 years I have been reviewing scales but we still have long way to go. The fundamental question is, are we going to grow in our treatment of measurement as a science or will we languish with the view that it's really more of an art.