# Statistical Analysis - Marketing Research

Statistical analysis helps researchers and managers answer one of two questions: Does a specific result differ significantly from another result or from an expected result, or is a specific result associated

Normalization.

The fact that one can describe a particular case as being so many standard deviations away from the mean introduces one other important role that this dispersion measure can serve for researchers. A frequent problem when comparing responses to certain kinds of psychological questions across respondents is that people tend to differ in the proportion of a given rating scale they tend to use. For instance, when rating different stores on an 11-point interval scale, an extroverted respondent may use the full range from, say, 2 to 11, while more restrained respondents may venture ratings only between 4 and 7. If we were to compare only their raw scores, the computer would treat a score of 7 as being essentially the same for both. But as we have seen, a 7 for the extrovert is just barely above average; for the introvert, it represents outright enthusiasm, the highest score he or she gives. To accommodate these basic differences across individuals (or sometimes across questions), it is customary to transform the original respondent scores into scores measured in terms of numbers of standard deviations. Thus, a computer would be instructed to divide each respondent’s original score on a given scale by that respondent’s personal standard deviation on all similar scales. This is called normalization of the data. By this procedure, the introvert’s score of 7 will get transformed into a higher normalized score than the extrovert’s score of 7. Normalization is also a method for making many different kinds of variables comparable, that is, to express them all in standard deviation units. This approach is often used when employing multiple regression equations. with or predicted by some other result or results, or is this just due to chance?

Such analyses are typically performed on one of three kinds of data: frequencies, means, or proportions. Do more people buy from a jumbled or a neat display? Is the proportion of target audience members noticing a newspaper ad different for men and women? Can mean donation levels be predicted from occupation, income, and family size?

The sections that follow introduce the major statistical analysis techniques that a beginning researcher may wish to use. The techniques are organized on the basis of the kinds of data for which they are most appropriate: nominal, ordinal, or metric. However, we need first to understand the concept of significance.

Levels of Significance

An important question in statistical analysis is what we mean when we say there is a very low probability of a particular result being due to chance. If a probability is very low, we may decide that our actual results are really different from the expected results and take some action on it. But suppose the analysis yielded a .15 chance that the results are really not different. Should we act on this, or do we act only if the probability they are not different is .05 or lower? That is, what is the appropriate level of significance? In classical statistics, statisticians tend to use either the .05 or the .01 level of significance as the cutoff for concluding that a result is significant. In my opinion, this classical notion is of little relevance to marketing decision makers, especially in this age of computer analysis. Historically, statisticians have instructed us that good science involves the construction of hypotheses usually in null form (that there is no difference or association) before the results are in (so we are not tempted to test what we have in fact already found), and the setting, in advance, of a level of statistical significance level beyond which we would reject the null hypothesis. The cutoff is typically stated as the probability of rejecting this hypothesis. Classically, this probability was set at either .05 or .01 depending on how tough the researcher wanted to be before accepting a positive outcome. But these levels are arbitrary. Why not .045 or .02, for example? Furthermore, they ignore the important managerial context. The real issue is not whether the data are sufficiently strong to permit us to make statements about the truth but whether the results are strong enough to permit the manager to take action. Implicit in this action orientation is the view that

1. it is the manager’s perception of the significance of the result that is relevant, not the researcher’s use of some classical cutoff
2. significance is really in terms of whether the result will lead to action; and
3. significance is ultimately a matter not just of statistical probability but also of the managers

prior information, prior conviction about which way to act, and the stakes involved. In this conceptualization, it becomes obvious that the researcher’s responsibility is simply to report the absolute probability that a result is significant and then let the manager decide whether this is significant in terms of the decision at hand. Significance in some cases (for example, where the stakes are low and management is already leaning toward a particular course of action) may be acceptable with a .15 probability or better. In other cases where the stakes are larger and management is quite unsure what is best, only a .03 or better probability will decide the matter. In modern managerial decision making, the classical role of the .05 and the .01 levels of significance should be irrelevant.

Nominal Data: The Chi Square Test

Where we have nominal data, we are forced to analyze frequency counts since there are no means and variances. Two kinds of questions are typically asked of these frequency counts. When looking at only one variable, we usually ask whether the results in the study differ from some expected distribution (often referred to as the model). For example, we might wish to know whether the distribution of occupations in a target population is different from that found in the metropolitan area as a whole or in an earlier study. The second kind of analysis we may wish to conduct is to ask whether the distribution of one variable is associated with another, for example, whether occupation depends on the geographical area of the respondent. The appropriate statistical test to use for either type of analysis is called the chi square (χ2) test. Because it is especially appropriate for nominal data and because it can also be used for higher-order numbers, chi square may well be the most frequently used statistical test in marketing research.

The chi square test is exceedingly simple to understand and almost as easy to compute. I have calculated the chi square on backs of envelopes on airplanes, on my pocket calculator during a client meeting, and countless times in the classroom. All that is needed is the raw frequency count (Fi) for each value of the variable you are analyzing and a corresponding expected frequency (Ei). The chi square technique then calculates the difference between these two, squares the result, and divides by the expected frequency. It sums these calculations across all the values (cells) for the variable to get the total chi square value. (Division by the expected frequencies is a way of making sure that a small absolute difference for a case with a lot of respondents expected in it is not given as much weight in the final result as the same absolute difference for a smaller cell.)

Comparison to a Given Distribution. Suppose that prior to an election a political candidate has her campaign staff interview a sample of shoppers outside a particular department store on a random sample of days and nights. The candidate wants to know whether the shoppers’ party affiliations differ from what would be expected if her staff had obtained a random sample of all voters in her district. Of a sample of 130 shoppers, 80 said they were Democrats, 30 were Republicans, and 20 were listed as Independents. Suppose that voter registrations in the district show that 54 percent of all voters are Democrats, 27 percent are Republicans, and the rest are Independents. The question is, Do the affiliations of the shoppers differ from the expected pattern? The chi square for this example is calculated from the data.

If the analysis is done by hand, the analyst then refers to a chi square table that indicates the likelihood of obtaining the calculated total chi square value (or greater) if the actual frequencies and the expected frequencies were really the same. (If the calculation is done by computer, this probability will be printed on the output.) If the probability is very low, it means that results are clearly not what was expected. Conversely, subtracting the probability from 1 gives the probability that the results are really different. For example, a chi square probability of .06 means that there is a 6 percent chance the two distributions are really the same and a 94 percent chance they are not.It is important to use the appropriate degrees of freedom when determining the probability. (The computer does this automatically.) Degrees of freedom is a measure that reflects the number of cells in a table that can take any value, given marginal totals. In the example, we estimated the expected number of cases for three cells. Since we started with 130 cases, once we had calculated the expectedfrequencies in any two cells, the remaining cell has no freedom to assume any value at all; it is perfectly determined. Thus, two of the cells were free to take on any amount and one cell was not. Therefore, degrees of freedom in this case is two: the number of cells minus one. In a cross-tabulation, degrees of freedom is (r – 1)

Actual and Expected Party Affiliation.

Cross-Tabulations. A second research issue involving nominal values is whether two or more nominal categories are independent of each other or are associated. In the example, we might ask whether the distribution of party affiliations differs between men and women. The chi square analysis procedure used in this case is very similar to that in the previous case, and the formula is unchanged. That is, we are again simply asking the chi square analysis technique to tell us whether the actual results do or do not fit a model. Suppose we had surveyed eighty men and fifty women in the study, and their party affiliations were those reported. Are these distributions affected by the sex of the shopper, or are they independent? To answer this question, we must first construct a set of expectations for each of the cells and then go through the cells and, one by one, compute chi square values comparing expected to actual outcomes. As in all other crosstabulations, we are testing whether there is no relationship between the variables, that is, that they are independent.

The first step is to figure out what the expected frequencies would be if the two variables were really independent. This is easy; if they were independent, the distribution within the sexes would be identical. Thus, in the example, we would hypothesize that the proportion of Democrats, Republicans, and Independents is the same for the two sexes. we can see that only slightly over half the men (actually 54 percent) are Democrats, but three-quarters of the women are (74 percent). Therefore, we must ask whether the proportion of Democrats depends on one’s sex or whether any apparent association is due to chance. The expected frequencies based on a nodifference model are given on the right-hand side. (Note that the marginal totals have to be the same for the actual and expected frequencies.)

The calculated chi square is 5.27. Is this significant? As noted, degrees of freedom refers to the number of cells in the rows minus one, multiplied by the number of columns minus one. In this case it is (r – 1) (c – 1) = 2. (The correctness of this can be seen by arbitrarily filling two cells of the expected frequency section. Note that the other four cells can take on only one value given the marginal totals.) In this case, with two degrees of freedom, we would conclude that there is between a .9 and a .95 probability that the null hypothesis is not true—that there is a relationship between sex and party affiliation. It is now up to the manager to decide whether this is enough certainty on which to act (that is, to assume that female shoppers are much better targets for Democratic party candidates).

Some Caveats. There are two things to guard against in carrying out a chi square analysis since the computation of chi square is sensitive to very small expected cell frequencies and large absolute sample sizes. To guard against the danger of small expected cell sizes, a good rule of thumb is not to calculate (or trust) a chi square when the expected frequency for any cell is five or less. Cells may be added together (collapsed) to meet the minimum requirement.

With respect to total sample size, it turns out that chi square is directly proportional to the number of cases used in its calculation. Thus, if one multiplied the cell values in Table 10.3 by 10, the calculated chi square value would be ten times larger and very significant rather than barely significant, as it is now. There are statistical corrections for large sample sizes that more experienced researchers use in such cases. Neophyte researchers should simply be aware that large sample sizes can result in bloated chi squares and for this reason should be especially careful when comparing chi squares across studies where differences in significance levels may be due to nothing more than differences in sample sizes.

Party Affiliation by Sex.

Metric Data: t Tests
The next most frequently used statistical test in marketing is the t test. Because it is applicable only to interval or ratio data, it is called a parametric test. It is used to compare two population estimations such as means or proportions and assess the probability that they are drawn from the same population. It is computed in slightly different ways depending on whether one is analyzing independent or Non independent measures.

t Test for Independent Measures. The t test can be used to compare means or proportions from two independent samples. For example, the t test can be used to indicate whether a sample of donors in New York gave larger average donations than a sample in San Antonio. The procedure to conduct this test is first to use a procedure to estimate the (combined) standard errors of these means. (Remember that the standard error is a measure of the spread of a hypothetical series of means produced from the same sampling procedure carried out over and over again, in this case, in New York and San Antonio.) One then divides the difference between the means by the combined standard error, which is actually a combined standard error of difference in means. (To combine the standard errors and conduct this test, the original data in the samples must be normally distributed and have equal variances.

If these assumptions do not appear to be met, a more sophisticated analysis should be conducted.) This in effect indicates how many standard errors the two means are apart. The resulting figure is called a t statistic if the sample size is small (under thirty) and a z statistic if it is large. This statistic then allows us to say something about the probability that two means are really equal (drawn from a more general population of all customers). A low probability indicates they are different.The same analysis could be conducted comparing two proportions instead of two means.
Independent t tests can also be used for two other purposes that are often important in research. First, they can test whether a mean or proportion for a single sample is different from an expected value. For example, a researcher could determine whether the average household size in a sample differs from the Bureau of the Census figure for the area. Using this same logic, the t test can also assess whether the coefficients in a multiple regression equation are really zero as indicated in Exhibit 10.2. Sometimes we wish to see whether the means or proportions for the answers to one question in a study are different from similar means or proportions elsewhere in the same study or in a later study of the same sample. For example, we may wish to know whether respondents’ evaluations of one positioning statement for an organization are more or less favorable than another positioning. (Note that the means or proportions must be in the same units.) Since this procedure would compare respondents to themselves, the two measures are not independent. In this case, the computer takes each pair of respondent answers and computes a difference. It then produces a t statistic and an associated probability that indicates the likelihood that the mean of all of the differences between pairs of answers is really zero. If the probability is low, we would conclude that the respondents did perceive the concepts as different.

Metric Data: Analysis of Variance

Very often, managers are interested in learning about differences across many groups or cases. A useful tool for these purposes in analysis of variance. There are two main variations referred to as one-way and N-way analysis of variance.

One-Way Analysis of Variance. Suppose we want to compare three or more means. That is, suppose we want to ask whether mean donation levels are different across five cities. The parametric test to use here is the one-way analysis of variance (ANOVA), which is, in a sense, an extension of the t test described above. For the t test, we compared the difference between two means to an estimate of the random variance of those means (expressed as the standard error of the difference). The more general ANOVA technique proceeds in essentially the same way. It calculates a measure of variance across all the means (for example, the five cities) and then compares this to a measure of random variance—in this case, the combined variances within the five cities. Specifically, ANOVA divides the variance across the cities by the variance within cities to produce a test statistic and a probability of significance. The test statistic here is called an F ratio (of which the t ratio or t statistic is a special case). Again, a low probability and a high F statistic is interpreted to mean that the variance across the cities is greater than chance.
Note that we did not conclude that any one city is different from any other specific city, only that there was significant variance among them all. We may have a hypothesis that a specific pair of cities are different. In this case, we would simply have the computer run a t test on the difference between the two means.
N-Way ANOVA. Analysis of variance is probably most often used as the primary statistical tool for analyzing the results of experiments. It is highly flexible and can be used for quite complicated designs. Consider by way of illustration a simple study of the effects on museum gift shop sales of

1. offering or not offering free beverages and
2. using each of four different types of background music

Suppose the researcher conducted a fully factorial experiment (that is, every possible combination) in which each of the four kinds of music was tried out for a specified period of time at a sample of gift shops with and without a free beverage service. Hypothetical results for each of the eight combinations offered in five shops for each cell.

Sales Results of Hypothetical Experiment

N-way analysis of variance proceeds to analyze these results in the same way as we did in one-way ANOVA. We first compute an estimate of random variance, in this case, the combination of the variances within each of the eight cells. We (actually a computer does this) then calculate three tests for significant effects. The computer program asks:

• Is there a main effect due to having the beverages present or not, that is, is the variance across the two beverage conditions significantly greater than the random variance when the music treatments are controlled?
• Is there a main effect due to the different music types (ignoring beverage treatment)?
• Is there an interaction effect due to the combination of music and beverage service; that is, are the results in each of the eight treatment cells higher or lower than would be predicted from simply adding the two main effects together?
In each case, the computer reports an F ratio statistic and a probability of no effects. A glance at the means for the various cells that (1) beverages yield more sales than no beverages, (2) semiclassical music yields the most sales and contemporary pop the least, and (3) beverages added to classical or semiclassical music increase sales, but when added to contemporary or pop music decrease sales. Are these results statistically significant? The ANOVA results for the data in Table 10.4 are as follows:

Here, we see that the type of music does have a significant effect: the differences in means with the beverage treatment controlled seem to be real. However, despite appearances, the presence or absence of beverages has no effect. Finally, there is an interaction effect indicating that the combination of beverages and classical or semiclassical music is the manager’s best bet. It may be that the two create a much more desirable ambience. At least our statistics kept us from concluding that beverages by themselves would be a good addition.

Association: Nonmetric and Metric Data

In many cases, we wish to know whether variables are associated with each other, either singly or in sets. If variables are associated, we may be able to use one variable or set of variables to predict another. Furthermore, if we have some plausible prior theory, we may also say that one variable or set of variables explains or causes the other (although always remember that association is not causation). We have already discussed nominal measures of association using chi square. Other measures of association can be computed for both ranked data and metric data.

Ordinal Data: Spearman Rank-Order Correlation. Spearman’s rank-order correlation procedure can compare the rankings of two variables, for example, a hospital’s rankings on staff knowledge and friendliness. Spearman’s rho coefficient indicates whether a higher ranking on one variable is associated with a higher (or lower) ranking on some other variable. If the two rankings move in the same direction, the sign of rho will be positive. If the rankings move in opposite directions, rho will have a negative sign. In either case, rho can range between zero and one; the closer to one it is, the more we can conclude that the rankings really are associated.

Metric Data: Pearson Product Moment Correlation. This approach seeks the same result as the Spearman analysis but is used for interval or ratio-scaled variables. The Pearson correlation coefficient, called r, can be positive or negative and range from 0 to 1. Most computer programs produce both the Pearson r and a probability that the actual value of r is zero. Its square, the Pearsonian r2, is an indication of the proportion of the original variance explained by the relationship.

Metric Data: Simple Correlations. Another use of the Pearsonian correlation coefficient is as a measure of the extent to which a straight line plotted through the points representing pairs of measurements fits the data poorly (and has a low r) or rather well (and therefore has a high r). Line with a good fit (5a) and a line with a poor fit (5b).

Metric Data: Multiple Regression. If one variable is good at predicting another variable, the researcher may wish to look further to see whether a second or third variable will help improve this explanatory power. Multiple linear regression is the technique most often used for this. In a manner similar to simple two-variable correlation, multiple regression seeks to construct a linear combination of two or more independent variables (that may be metric or dichotomous) that predict the value of a dependent metric variable. (A dichotomous variable in a regression equation is a special case of a nominal variable where the values zero and one are used to indicate the presence or absence of some characteristic, for example, being a woman or being married.) An example would be using age, income, education, size of household, and sex to predict the number of hours a month a person would spend exercising.

Two Hypothetical Regression Lines.

If a researcher has a great many variables that might be used in such a multiple regression but is not sure which to use, there are two basic approaches to finding the better set (using a computer). The first approach is theory driven. The researcher can specify the set of variables in advance, usually on the basis of some theory about which variables ought to predict well. Since the computer will print out a t statistic measure indicating the probability that the coefficient for any given variable is really zero, the researcher can then look at the initial output and eliminate predictors with high probabilities of being nonsignificant and rerun the analysis. This may have to be done two or three times before the final best set of predictor variables is determined.

Alternatively, the researcher can ask the computer to look for the best set of predictors among what is usually a very large set of candidate variables using a procedure called stepwise regression analysis. Under this procedure, the computer takes the original variance in the dependent variable and proceeds to enter into the equation the predictor with the highest explanatory power (for example, the highest simple r). It then subtracts out the variance explained by this variable, computes new correlations of each remaining potential predictor variable and the adjusted dependent variable, and then picks the variable with the highest correlation at this step. In this manner, variables from the candidate set are added to the prediction equation until they are all exhausted or some predetermined stopping point is reached (such as when the variance about to be explained at the next step is less than 1 percent). Stepwise regression is a good technique at the exploratory stage of a study. However, care should be taken with it since it has properties that make it possible that a truly significant predictor will be missed because it happens to be highly correlated with a variable entered at an early step in the analysis. Furthermore, if the sample is large enough, researchers should test the model eventually discovered on a different subsample from the one on which it was developed. Once the multiple regression analysis is done, the researcher will wish to look at three measures produced by the computer program:

• Multiple R measures how well the equation fits the data. The probability that this statistic is really zero is important as an indicator that the equation really does predict.
• Multiple R2 is analogous to the Pearsonian r2. It indicates the proportion of variance in the dependent variable accounted for by the linear combination of the predictor variables. It is as important as the probability of the multiple R being zero. If the data set is large, it is often possible to have a highly significant multiple R for an equation that explains very little of the variance in the dependent variable.
• Standardized variable coefficients. The researcher will also wish to know the relative contribution of each of the independent variables to the overall prediction equation. He or she could look at the relative size of the coefficients for each predictor variable to try to learn this, but this would be misleading because the variables are usually in different units; for example, the coefficient for income may be very small because income is expressed in thousands or tens of thousands of dollars, while sex may have a large coefficient because it can be only 0 or 1. The solution to this dilemma is to convert all of the variables into standard deviation units. The resulting beta coefficients (as they are called) are simply the original coefficients (often called B coefficients) divided by the respective standard deviations. Variables with larger beta coefficients can be considered to make more of a contribution to the overall prediction than those with smaller coefficients.

Marketing Research Topics