construct measurement in research
If women earn less than men for the same job, is that gender prejudice? Note that the satisfaction scale discussed earlier is not strictly an interval scale, because we cannot say whether the difference between ‘strongly satisfied’ and ‘somewhat satisfied” is the same as that between ‘neutral’ and ‘somewhat satisfied’ or between ‘somewhat dissatisfied’ and ‘strongly dissatisfied. For instance, diamonds can scratch all other naturally occurring minerals on earth, and hence diamond is the “hardest” mineral. A construct is an abstract idea inferred from specific instances that are thought to be related. If an employment status item is modified to allow for more than two possible values (e.g., unemployed, full-time, part-time, and retired), it is no longer binary, but still remains a nominal scaled item. For instance, there may be certain tribes in the world who lack prejudice and who cannot even imagine what this concept entails. ), and religious affiliation (Christian, Muslim, Jew, etc.). However, instead of relying entirely on statistical analysis for item selection, a better strategy may be to examine the candidate items at each level and selecting the statement that is the most clear and makes the most sense. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. However, in semantic differential scales, the statement remains constant, while the anchors (adjective pairs) change across items. For instance, we can create a customer satisfaction indicator with five attributes: strongly dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, and strongly satisfied, and assign numbers one through five respectively for these five attributes, so that we can use sophisticated statistical tools for quantitative data analysis. These items are generated by experts who know something about the construct being measured. For instance, the word ‘prejudice’ conjures a certain image in our mind, however, we may struggle if we were asked to define exactly what the term meant. Unidimensional scale measures constructs along a single scale, ranging from high to low. Quantitative analysis: Inferential statistics. Notice that the scale is now almost cumulative when read crosswise from left to right . For instance, you’ll need to decide how you will categorise occupations, particularly since some occupations may have changed with time (e.g., there were no web developers before the Internet). The process of creating the indicators is called scaling. Are there different kinds of prejudice, and if so, what are they? For instance, if we conceptualise a person’s academic aptitude as consisting of two dimensions—mathematical and verbal ability—then academic aptitude is a multidimensional construct. Examples include simple constructs such as a person’s weight, wind speed, and probably even complex constructs like self-esteem (if we conceptualize self-esteem as consisting of a single dimension, which of course, may be a unrealistic assumption). Judges with the same number of “yes”, the statements can be sorted from left to right based on most number of agreements to least. This is a composite (multi-item) scale where respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Louis Thurstone. If churchgoers believe that non-believers … In the context of survey research, a construct is the abstract idea, underlying theme, or subject matter that one wishes to measure using survey questions. Allowed central tendency measures include mean, median, or mode, as are measures of dispersion, such as range and standard deviation. Like previous scaling methods, the Guttman method also starts with a clear definition of the construct of interest, and then using experts to develop a large set of candidate items. All measures of central tendencies, including geometric and harmonic means, are allowed for ratio scales, as are ratio measures, such as studentised range or coefficient of variation. However, in semantic differential scales, the statement remains constant, while the anchors (adjective pairs) change across items. This can be done by grouping items with a common median, and then selecting the item with the smallest inter-quartile range within each median group. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Even if we assign unique numbers to each value, for instance 1 for male and 2 for female, the numbers don’t really mean anything (i.e., 1 is not less than or half of 2) and could have been easily been represented non-numerically, such as M for male and F for female. First, indexes often comprise of components that are very different from each other (e.g., income, education, and occupation in the SES index) and are measured in different ways. Likert items allow for more granularity (more finely tuned response) than binary items, including whether respondents are neutral to the statement. These constructs can be measured using a single measure or test. The central tendency measure of an ordinal scale can be its median or mode, and means are uninterpretable. This is a composite (multi-item) scale where respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. Perceived severity is one aspect of the health belief model. Typical marketing constructs are brand loyalty, satisfaction, preference, awareness, knowledge. How many scale attributes should you use (e.g., 1–10; 1–7; −3 to +3)? Following this rating, specific items can be selected for the final scale in one of several ways: by computing bivariate correlations between judges’ ratings of each item and the total item (created by summating all individual items for each respondent), and throwing out items with low (e.g., less than 0.60) item-to-total correlations, or by averaging the rating for each item for the top quartile and the bottom quartile of judges, doing a -test for the difference in means, and selecting items that have high -values (i.e., those that discriminate best between the top and bottom quartile responses). Values of attributes may be quantitative (numeric) or qualitative (non-numeric). A six-item binary scale for measuring political activism, Have you ever written a letter to a public official, Have you ever signed a political petition, Have you ever donated money to a political cause, Have you ever donated money to a candidate running for public office, Have you ever written a political letter to the editor of a newspaper or magazine, Have you ever persuaded someone to change his/her voting plans, Table 6.3. Again, this process may involve a lot of subjectivity. Sophisticated transformation such as positive similar (e.g., multiplicative or logarithmic) are also allowed. Indicators operate at the empirical level, in contrast to constructs, which are conceptualized at the theoretical level. A typical example of a six-item Likert scale for the “employment self-esteem” construct is shown in Table 6.3. However, researchers sometimes wish to summarise measures of two or more constructs to create a set of categories or types called a typology. The idea is that people who agree with one item on this list also agree with all previous items. Next, a panel of judges is recruited to select specific items from this candidate pool to represent the construct of interest. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Guttman, L. A. A multi-dimensional typology of newspapers. Since most scales employed in social science research are unidimensional, we will next three examine approaches for creating unidimensional scales. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degrees Fahrenheit is the same as that between 80 and 90 degrees Fahrenheit. In practice, we seldom find a set of items that matches this cumulative pattern perfectly. However, social science researchers often ‘pretend’ (incorrectly) that these differences are equal so that we can use statistical techniques for analysing ordinal scaled data. Each month, government employees call all over the country to get the current prices of more than 80,000 items. Stevens, S. (1946). When evaluating the severity of a disease, an individual should consider both medical consequences (death and disability) and social consequences (family life, career, and social relationships) of the disea… The next chapter will examine how to evaluate the reliability and validity of the scales developed using the above approaches. These items are generated by experts who know something about the construct being measured. Conceptualization is the mental process by which fuzzy and imprecise constructs (concepts) and their constituent components are defined in concrete and precise terms. Finally, what procedure would you use to generate the scale items (e.g., Thurstone, Likert, or Guttman method) or index components? Values of attributes may be quantitative (numeric) or qualitative (non-numeric). I think Construct validity is close to the concept of sensitivity. It involves the operation to construct variables, and the development and application of instruments or tests to quantify these variables [Kimberlin & Winterstein, 2008]. Permissible statistics are chi-square and frequency distribution, and only a one-to-one (equality) transformation is allowed (e.g., 1 = Male, 2 = Female). Designed by Louis Guttman, this composite scale uses a series of items arranged in increasing order of intensity of the construct of interest, from least intense to most intense. The conceptualization process is all the more important because of the imprecision, vagueness, and ambiguity of many social science constructs. Measurement. Finally, what procedure would you use to generate the scale items (e.g., Thurstone, Likert, or Guttman method) or index components? Do you mind immigrants being citizens of your country, Do you mind immigrants living in your own neighborhood, Would you mind living next door to an immigrant, Would you mind having an immigrant as your close friend, Would you mind if someone in your family married an immigrant. Indiana University - Kelley School of Business - Department of Marketing. Semantic differential is believed to be an excellent technique for measuring people’s attitude or feelings toward objects, events, or behaviours. This is why the research literature often includes different conceptual definitions of the same construct. Stevens (1946) said, “Scaling is the assignment of objects to numbers according to a rule.” This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research. Ratio scales are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a “true zero” point (where the value zero implies lack or non-availability of the underlying construct). Strategic management researchers have emphasized concept development but generally have ignored construct measurement issues. For example, male and female (or M and F, or 1 and 2) are two levels of the indicator ‘gender’. For instance, the method of paired comparison requires each judge to make a judgment between each pair of statements (rather than rate each statement independently on a 1 to 11 scale). The median value of each scale item represents the weight to be used for aggregating the items into a composite scale score representing the construct of interest. Scales can be unidimensional or multidimensional, based on whether the underlying construct is unidimensional (e.g., weight, wind speed, firm size) or multidimensional (e.g., academic aptitude, intelligence). But how do we create the indicators themselves? A reflective indicator is a measure that ‘reflects’ an underlying construct. But in real life, we tend to treat this concept as real. Guttman’s cumulative scaling method . Unidimensional scale measures constructs along a single scale, ranging from high to low. Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. One important decision in conceptualising constructs is specifying whether they are unidimensional or multidimensional. Answering all of these questions is the key to measuring the prejudice construct correctly. Monotonically increasing transformation (which retains the ranking) is allowed. Once a theoretical construct is defined, exactly how do we measure it? Indicators representing constructs at the empirical level are called v_____. With a lot of statements, this approach can be enormously time consuming and unwieldy compared to the method of equal-appearing intervals. The process of understanding what is included and what is excluded in the concept of prejudice is the conceptualization process. Ordinal scales can also use attribute labels (anchors) such as “bad”, “medium”, and “good”, or “strongly dissatisfied”, “somewhat dissatisfied”, “neutral”, or “somewhat satisfied”, and “strongly satisfied”. These scales are used for variables or indicators that have mutually exclusive attributes. These scales are used for variables or indicators that have mutually exclusive attributes. A group of judges then rate each candidate item as “yes” if they view the item as being favorable to the construct and “no” if they see the item as unfavorable. The median value of each scale item represents the weight to be used for aggregating the items into a composite scale score representing the construct of interest. Theoretical propositions consist of relationships between abstract constructs. While some constructs in social science research, such as a person’s age, weight, or a firm’s size, may be easy to measure, other constructs, such as creativity, prejudice, or alienation, may be considerably harder to measure. Perceived severity refers to an individual’s belief about the seriousness of contracting an illness or disease, or the severity of the consequences of leaving it untreated. MIS Quarterly, 35(2), 293-334, 2011. Attaching a rating scale to a statement or instrument is not scaling. Note that any item with reversed meaning from the original direction of the construct must be reverse coded (i.e., 1 becomes a 5, 2 becomes a 4, and so forth) before summating. The Likert method, a unidimensional scaling method developed by Murphy and Likert (1938), is quite possibly the most popular of the three scaling approaches described in this chapter. Binary scales. In this chapter, we offer an overview of the measurement process, commonly portrayed in terms of technical issues, such as the validity or the reliability. For any conceptual definition of a construct, there will be many different operational definitions or ways of measuring it. Likewise, a customer satisfaction scale may be constructed to represent five attributes: ‘strongly dissatisfied’, ‘somewhat dissatisfied’, ‘neutral’, ‘somewhat satisfied’ and ‘strongly satisfied’. Operationalization refers to the process of developing indicators or items for measuring these constructs. Each of these methods are discussed here. A classic example in the natural sciences is Moh’s scale of mineral hardness, which characterizes the hardness of various minerals by their ability to scratch other minerals. This typology can be used to categorise newspapers into one of four ‘ideal types’ (A through D), identify the distribution of newspapers across these ideal types, and perhaps even create a classificatory model for classifying newspapers into one of these four ideal types depending on other attributes. In S. A. Stouer, L. A. Guttman & E. A. Schuman (Eds. Social Science Research: Principles, Methods and Practices (Revised edition) by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted. In the latter case, we can say that respondents who are “somewhat satisfied” are less satisfied than those who are “strongly satisfied”, but we cannot quantify their satisfaction levels. Construct validity is most important which tells us whether we are able to correctly measure what we are supposed to measure. There are two major issues that will be considered here. However, researchers sometimes wish to summarize measures of two or more constructs to create a set of categories or types called a typology . Overview; Fingerprint; Abstract. A typical example of a six-item Likert scale for the ‘employment self-esteem’ construct is shown in Table 6.3. What is the goal? The three most popular unidimensional scaling methods are: Thurstone’s equal-appearing scaling, Likert’s summative scaling, and Guttman’s cumulative scaling. Levels of measurement, also called rating scales, refer to the values that an indicator can take (but says nothing about the indicator itself). If someone says bad things about other racial groups, is that racial prejudice? Should you use a scale, index, or typology? Social Science Research: Principles, Methods, and Practices. Levels of measurement , also called rating scales , refer to the values that an indicator can take (but says nothing about the indicator itself). Another example of index is socio-economic status (SES), also called the Duncan socio-economic index (SEI). Allowed scale transformation are positive linear. Designed by Guttman (1950), the cumulative scaling method is based on Emory Bogardus’ social distance technique, which assumes that people’s willingness to participate in social relations with other people vary in degrees of intensity, and measures that intensity using a list of items arranged from ‘least intense’ to ‘most intense’. Justice, Beauty, Happiness, and Health are all constructs. Construct measurement represents a key task for any scholar attempting to develop a theoretical contribution or an empirical study. Afterward, you will present your responses to the class. The three most popular unidimensional scaling methods are: (1) Thurstone’s equal-appearing scaling, (2) Likert’s summative scaling, and (3) Guttman’s cumulative scaling. Like previous scaling methods, the Guttman method also starts with a clear definition of the construct of interest, and then uses experts to develop a large set of candidate items. In others, researchers are still in the process of deciding which of various conceptual definitions is the best. In this module, it will be assumed that all measures have an acceptable level of reliability and validity. Interval scales allow us to examine ‘how much more’ is one attribute when compared to another, which is not possible with nominal or ordinal scales. Ratio scales are those that have all the qualities of nominal, ordinal, and interval scales, and in addition, also have a ‘true zero’ point (where the value zero implies lack or non-availability of the underlying construct). If you have a proposition stating that “compassion is positively related to empathy”, you cannot test that proposition unless you can conceptually separate empathy from compassion and then empirically measure these two very similar constructs correctly. First, conceptualize (define) the index and its constituent components. However, the scale does not indicate the actual hardness of these minerals, or even provide a relative assessment of their hardness. The three approaches are similar in many respects, with the key differences being the rating of the scale items by judges and the statistical methods used to select the final items. For instance, if religiosity is defined as composing of a belief dimension, a devotional dimension, and a ritual dimension, then indicators chosen to measure each of these different dimensions will be considered formative indicators. Nominal scales merely offer names or labels for different attribute values. AU - Scandura, Terri A. Measurement is the assigning of numbers to observations in order to quantify phenomena. Third, create a rule or formula for calculating the index score. We now have a scale which looks like a ruler, with one item or statement at each of the 11 points on the ruler (and weighted as such). Note that many variables in social science research are qualitative, even when represented in a quantitative manner. If women earn less than men for the same job, is that gender prejudice? The final scale items are selected as statements that are at equal intervals across a range of medians. Unidimensional constructs are measured using reflective indicators, even though multiple reflective indicators may be used for measuring abstruse constructs such as self-esteem. Some studies have used a “forced choice approach” to force respondents to agree or disagree with the LIkert statement by dropping the neutral mid-point and using even number of values and, but this is not a good strategy because some people may indeed be neutral to a given statement and the forced choice approach does not provide them the opportunity to record their neutral stance. Each item in this scale is a binary item, and the total number of “yes” indicated by a respondent (a value from 0 to 6) can be used as an overall measure of that person’s political activism. For example, a survey researcher who claims constructive validity for a measure of satisfaction will have to demonstrate in a scientific manner that satisfied respondents behave differently from dissatisfied respondents.