978-1111826925 Chapter 24 Lecture Note

subject Type Homework Help
subject Pages 9
subject Words 3618
subject Authors Barry J. Babin, Jon C. Carr, Mitch Griffin, William G. Zikmund

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
Chapter 24
Multivariate Statistical Analysis
AT-A-GLANCE
I. Introduction
A. What is multivariate data analysis?
B. The “variate” in multivariate
II. Classifying Multivariate Techniques
A. Dependence techniques
B. Interdependence techniques
C. Influence of measurement scales
III. Analysis of Dependence
A. Multiple regression analysis
A simple example
Regression coefficients in multiple regression
R2 in multiple regression
Statistical significance in multiple regression
Steps in interpreting a multiple regression model
B. ANOVA (n-way) and MANOVA
N-way (univariate) ANOVA
Interpreting MANOVA
C. Discriminant analysis
IV. Analysis of Interdependence
A. Factor analysis
How many factors
Factor loadings
Factor rotation
Data reduction technique
Creating composite scales with factor results
Communality
Total variance explained
B. Cluster analysis
C. Multidimensional scaling
LEARNING OUTCOMES
1. Understand what multivariate statistical analysis involves and know the two types of multivariate
analysis
2. Interpret results from multiple regression analysis
3. Interpret results from multivariate analysis of variance (MANOVA)
4. Interpret basic exploratory factor analysis results
5. Know what multiple discriminant analysis can be used to do
6. Understand how cluster analysis can identify market segments
CHAPTER VIGNETTE: Cow-A-Bunga Never Goes Out of Style
The yearning to hold on to the past is a common psychological experience. Nostalgia sells, and
researchers are very interested in understanding exactly what nostalgia is, who is most prone to react to it,
and how it contributes to business success. Toy shelves are filled with throw-back versions of toys adults
grew up with it, appliance designs are retro-designs from the 1950s, and advertisers use nostalgic
advertising to help consumers relive the past. Researchers are working on numerous issues related to
nostalgia, such as how it can be measured, what emotions are associated with it, can it be used to segment
markets, and so on. Nostalgia is a complex experience involving multiple thoughts and feelings, and
multivariate research procedures can consider the effects of multiple variables simultaneously.
SURVEY THIS!
Students are asked to examine the survey questions that deal with satisfaction with the business school
experience. When finished with the chapter:
1. Run a factor analysis on the 8 questions from “Teachers Knowledge of Topics” through
“Your Overall Academic Performance”
a. How many factors are retained?
b. What would you “name” these factors?
c. Create a summated scale for each factor.
2. Run a multiple regression analysis with the “Overall Experience” question as the dependent
measure and the summed scale(s) as the independent measure(s). Also include gender as an
independent variable (dummy variable). Interpret the results.
a. Is the overall model significant?
b. Which of the independent variables are significant?
c. How much variance in “Overall Experience” explained by the predictor variables?
d. Which of the independent variables is most important in determining satisfaction
with the “Overall Experience”?
RESEARCH SNAPSHOTS
Too Much of a Good Thing!
Researchers often test hypotheses by examining regression coefficients. Financial data can be
problematic to analyze, and an example is given. While the model F is highly significant, and the
model R2 is high, the results for the independent variable tests show a different story. Only one
independent variable is significant at the type I error rate of .05, and the coefficients do not
make sense because the coefficients for two of the independent variables are beyond the range
that should theoretically take (i.e., -1.0 to 1.0). The two VIF factors are in the 50s, and
generally, when multiple VIF factors approach 5 or greater, problems with multicollinearity can
be expected. In this case, the researcher may wish to rerun the model after dropping one of the
offending variables.
How to Get MANOVA Results
A department store developer gathered data looking at the effect of nostalgia on customer
impressions. A field experiment was set up in which a key department was either given a modern
design or a retro design. Since two related dependent variables were involved (interest and
excitement), MANOVA is the appropriate technique and can be conducted using SPSS. The
dialog box for SPSS is given, and fixed factors include the experimental variable, respondent sex
with age included as a covariate or control variable. The SPSS output is summarized:
1. Multivariate results Wilkes Lambda, overall multivariate F, and p-value associated
with it.
2. The univariate model F statistics for each dependent variable with significance level.
3. The individual effects associated with one dependent variable are interpreted.
4. The individual effects associated with the other dependent variable are interpreted.
5. Review the means for each experimental cell as well as the covariate results.
Getting Factor Results with SAS or SPSS
While researchers may choose to use a spreadsheet to produce simple or even multiple regression
results, they will almost always turn to a specialized program for procedures like factor analysis.
The instructions for using both SAS and SPSS for getting factor results are given.
OUTLINE
I. INTRODUCTION
What is Multivariate Data Analysis?
Research that involves three or more variables, or that is concerned with underlying
dimensions among multiple variables, will involve multivariate statistical analysis.
The “Variate” in Multivariate
Another distinguishing characteristic of multivariate analysis is the variate, which is a
mathematical way in which a set of variables can be represented with one equation.
Variates are formed as a linear combination of variables each contributing to the overall
meaning of the variate based on an empirically derived weight.
Mathematically, the variate is a function of the measured variables involved in an
analysis:
Vk = f(X1, X2, …, Xm)
Vk is the kth variate. Every analysis could involve multiple sets of variables, each
represented by a variate.
X1 to Xm represent the measured variables.
II. CLASSIFYING MULTIVARIATE TECHNIQUES
Exhibit 24.1 presents a very basic classification of multivariate data analysis procedures.
Two basic groups are dependence methods and interdependence methods.
Dependence Techniques
When hypotheses involve a distinction between independent and dependent variables, a
dependence technique is needed.
Dependence methods include:
multiple regression analysis
multiple discriminant analysis
multivariate analysis of variance
structural equations modeling
Interdependence Techniques
When researchers examine questions that do not distinguish between independent and
dependent variables, interdependence techniques are used.
No one variable or variable subset is to be predicted from or explained by the others.
The most common interdependent methods are:
factor analysis
cluster analysis
multidimensional scaling
Influence of Measurement Scales
The nature of the measurement scales will determine which multivariate technique is
appropriate for the data.
Exhibits 24.2 and 24.3 show that selection of a multivariate technique requires
consideration of the types of measures used for both independent and dependent sets of
variables, and the exhibits refer to nominal and ordinal scales as nonmetric and interval
and ratio scales as metric.
III. ANALYSIS OF DEPENDENCE
Multivariate dependence techniques are variants of the general linear model (GLM), which
is a way of modeling some process based on how different variables cause fluctuations from
the average dependent variable.
Fluctuations can come in the form of group means that differ from the overall mean (i.e.,
ANOVA) or in the form of a significant slope coefficient (i.e., regression).
The basic idea can be thought of as follows:
XFFXY
i

ˆ
Realize that Y could represent multiple dependent variables, just as X and F could represent
multiple independent variables.
Multiple regression, n-way ANOVA, and MANOVA are common forms that the GLM can
take.
Multiple Regression Analysis
Multiple regression analysis is an extension of simple regression analysis allowing a
metric dependent variable to be predicted by multiple impendent variables.
Chapter 23 illustrated simple linear regression analysis where one dependent variable is
explained by one independent variable.
Reality suggests that several factors are likely to affect a dependent variable. If this is the
case, the problem requires identification of a linear relationship with multiple regression
analysis. The multiple regression equation is:
Yi = b0 + b1X1 + b2X2 + b3X3 + … + bnXn + ei
Thus, as a form of the GLM, dependent variable predictions
)
ˆ
(Y
are made by
adjusting the constant (b0 – which would be equal to the mean if all slope coefficients
are zero) based on the slope coefficients associated with each independent variable.
Less than interval (nonmetric) independent variables can be used in multiple
regression by implementing dummy variable coding, which is a variable that uses a
1 and a 0 to code the different levels of a dichotomous variable.
A Simple Example
Assume that a toy manufacturer wishes to explain store sales (dependent
variable) using a sample of stores from Canada and Europe, and several
hypotheses are offered:
H1: Competitor’s sales are related negatively to sales.
H2: Sales are higher in communities with a sales office than when no
sales office is present.
H3: Grammar school enrollment in a community is related positively to
sales.
The independent variables are italicized, and the presence of a sales office is a
categorical variable that can be represented with dummy coding (0 = no office in
a particular region, 1 = office in the region).
Results:
Y
ˆ
= 102.18 + .387X1 + 115.2X2 + 6.73X3
Coefficient of multiple determination (R2) = 0.845
F-value = 14.6; p < .05
The regression equation indicates that sales are positively related to X1, X2, and
X3, and the coefficients show the effect on the dependent variable of a 1-unit
increase in any of the independent variables (e.g., the value b2 = 115.2 indicates
that an increase of $115,200 (000 included) in toy sales is expected with each
additional unit of X2).
Because the effect associated with X1 is positive, H1 is not supported because the
sign of the regression coefficient is opposite the prediction.
If the coefficients of the other two independent variables are statistically
significant, the hypotheses will be supported because the effects are in the
hypothesized direction.
Regression Coefficients in Multiple Regression
Multiple regression involves multiple slope estimates, or regression weights.
One challenge in regression models is to understand how one independent
variable affects the dependent variable considering the effect of the other
independent variables.
Regression coefficients are unaffected by each other only when
independent variables are independent.
Conventional regression methods provide standardized parameter estimates, 1,
2, and so on, that can be thought of as partial regression coefficients.
The correlation between Y and X1 controlling for the correlation that X2
has with the Y, is called partial correlation.
As long as the correlation between independent variables is modest,
partial regression coefficients adequately represent the relationships.
When researchers want to know which independent variable is most predictive of
the dependent variable, the standardized regression coefficient () is used.
provides a constant scale the greater the absolute value of the standardized
coefficient, the more that particular independent variable is responsible for
explaining the dependent variable.
R2 in Multiple Regression
The coefficient of multiple determination in multiple regression indicates the
percentage of variation in Y explained by all independent variables.
It the two independent variables are truly independent (uncorrelated with each
other), the R2 for a multiple regression model is equal to the separate R2 values
that would result from two separate simple regression models.
More typically, the independent variables are related to one another meaning that
the model R2 from a multiple regression model will be less than the separate R2
values resulting from individual simple regression models.
This reduction in R2 is proportionate to the extent to which the
independent variables are inter-related or collinear.
Statistical Significance in Multiple Regression
An F-test is used to test statistical significance by comparing the variation
explained by the regression equation to the residual error variation.
The F-test allows for testing of the relative magnitudes of the sum of squares due
to the regression (SSR) and the error sum of squares (SSE).
 
 
MSE
MSR
knSSE
kSSR
F
1/
/
where
k = number of independent variables
n = number of observations
MSR = Mean Squares Regression
MSE = Mean Squares Error
Degrees of freedom for the F-test (d.f.) are:
d.f. for the numerator = k
d.f. for the denominator = nk − 1
In practice, statistical programs will report the p-value associated with the F-test
directly.
Similarly, the programs report the statistical test for each independent variable.
Independent variables with p-values below the acceptable Type I error rate are
considered significant predictors of the dependent variable.
Steps in Interpreting a Multiple Regression Model
1. Examine the model F-test if not significant, the model should be dismissed and
there is no need to proceed to further steps.
2. Examine the individual statistical tests for each parameter estimate – independent
variables with significant results can be considered a significant explanatory
variable.
3. Examine the model R2 no cut-off values exist, but the absolute value is more
important when the researcher is more interested in prediction than explanation.
4. Examine collinearity diagnostics – multicollinearity in regression analysis refers
to how strongly interrelated the independent variables in a model are.
When multicollinearity is too high, the individual parameter estimates
become difficult to interpret.
Most regression programs can compute variance inflation factors (VIF)
for each variable, and a rule of thumb is a VIF above 5.0 suggests
problems with multicollinearity.
ANOVA (n-way) and MANOVA
Also represents a form of the GLM.
ANOVA can be extended beyond one-way ANOVA to predict a dependent variable with
multiple categorical independent variables.
Multivariate analysis of variance (MANOVA) is a multivariate technique that predicts
multiple continuous dependent variables with multiple independent variables.
Independent variables are categorical, although a continuous control variable can be
included in the form of a covariate.
N-way (univariate) ANOVA
The interpretation of an n-way ANOVA model follows closely from the regression
results described previously.
The steps involved are essentially the same with the addition of interpreting
differences between means:
1. Examine the overall model F-test result. If significant, proceed.
2. Examine individual F-tests for individual variables.
3. For each significant categorical independent variable, interpret the effect by
examining the group means (see Chapter 12).
4. For each significant, continuous covariate, interpret the parameter estimate (b).
5. For each significant interaction, interpret the means for each combination (see the
graphical representation as illustrated in Chapter 12).
Interpreting MANOVA
MANOVA models produce an additional layer of testing.
The first layer involves the multivariate F-test, which is based on a statistic called
Wilke’s Lambda ().
Examines whether or not an independent variable explains significant variation
among the dependent variables within the model.
If is significant, then the F-test results from individual univariate regression models
nested within the MANOVA model are interpreted.
The rest of the interpretation results follow from the one-way ANOVA or multiple
regression model results above.
Discriminant Analysis
Researchers often need to produce a classification of sampling units, and this process
may involve using a set of independent variables to decide if a sampling unit belongs in
one group or another.
The challenge is to find the discriminating variables to use in a predictive equation that
will produce better than chance assignment of the individuals to the groups.
Discriminant analysis is a multivariate technique that predicts a categorical dependent
variable based on a linear combination of independent variables.
A linear combination of independent variables that explains group membership is known
as a discriminant function.
The researcher’s task is to derive the coefficients of the discriminant function (a straight
line).
The following linear function is used:
Zi = b1X1i + b2X2i + . . . + bnXni
where
Zi = ith applicant’s discriminant score
bn = discriminant coefficient for the nth variable
Xni = ith applicant’s value on the nth independent variable
Using scores for all the individuals in the sample, a discriminant function is determined
based on the criterion that the groups be maximally differentiated on the set of
independent variables.
Suppose the personnel manager wanting to predict whether an applicant will succeed on
the basis of age, sales aptitude test scores, and mechanical ability scores finds that the
standardized weights in the equation to be
Z = b1X1 + b2X2 + b3X3
= 0.069X1 + 0.013X2 + 0.0007X3
This means that age (X1) is much more important than sales aptitude test scores (X2)
on whether or not an applicant will succeed. Mechanical ability (X3) has relatively
minor discriminating power.
In the computation of the linear discriminant function, weights are assigned to the
variables to maximize the ratio of the difference between the means of the two groups to
the standard deviation within groups.
The standardized discriminant coefficients, or weights, provide information about the
relative importance of each of these variables in discriminating between the two groups.
A major goal of discriminant analysis is to perform a classification function.
To determine whether the discriminant analysis can be used as a good predictor,
information provided in the “confusion matrix” is used.
Tests can be performed to determine if the rate of correct classification is statistically
significant.
IV. ANALYSIS OF INTERDEPENDENCE
The purpose of the analysis of interdependence is to further understand the structure of a set
of variables or objects.
Factor Analysis
Factor Analysis is a prototypical multivariate, interdependence technique and is a
technique of statistically identifying a reduced number of factors from a larger number of
measured variables.
The factors themselves are not measures, but instead, they are identified by forming a
variate using the measured variables.
Factors are usually latent constructs (i.e., attitudes or satisfaction) or an index (i.e., social
class).
A researcher need not distinguish between independent and dependent variables.
Can be divided into two types:
1. Exploratory factor analysis (EFA) – performed when the researcher is uncertain about
how many factors may exist among a set of variables.
2. Confirmatory factor analysis (CFA) performed when the researcher has strong
theoretical expectations about the factor structure before performing the analysis. A
good tool for assessing construct validity.
More than one technique exists for estimating variants that form the factors, but the
general idea is to mathematically produce variates that explain the most total variance
among the set of variables being analyzed.
EFA provides two important pieces of information:
1. How many factors exist among a set of variables?
2. What variables match up or “load on” which factors?
How Many Factors?
Often times, the researcher asks the question, “How many factors will exist among a
large number of variables?”
The question is usually addressed based on the eigenvalues for a factor solution.
The most common rule is to base the number of factors on the number of eigenvalues
greater than 1.0, which is the default rule for most statistical programs.
Factor Loadings
A factor loading indicates how strongly correlated a factor is with a measured
variable.
EFA depends on the loadings for proper interpretation.
A latent construct can be interpreted based on the pattern of loadings and the content
of the variables; thus, the latent construct is measured indirectly by the variables.
Loading estimates are provided by factor analysis programs (see Exhibit 24.6).
Factors are interpreted by examining any patterns that emerge from the factor results.
Factor Rotation
Factor rotation is a mathematical way of simplifying factor results.
The most common type is called varimax.
Involves creating new reference axes for a given set of variables because often times,
an initial factor solution is difficult to interpret.
Rotation clears things up by producing more obvious patterns of loadings.
Data Reduction Technique
Factor analysis is considered a data reduction technique.
These techniques allow a researcher to summarize information from many variables
into a reduced set of variates or composite variables.
Advantageous for many reasons:
In general, the rule of parsimony suggests an explanation involving fewer
components is better than one involving many more.
A way of identifying which variables among a large set might be important in
some analysis.
Simplifies decision making.
Creating Composite Scales with Factor Results
When a clear pattern of loadings exists, the researcher can create a scale by summing
the variables with high loadings and creating a summated scale which could be tested
for reliability using coefficient alpha.
Composite scales can then be used in another multivariate technique (i.e., regression).
Communality
Communality is a measure of the percentage of a variable’s variation that is explained
by the factors.
A relatively high communality indicates that a variable has much in common with the
other variables taken as a group.
For any variable, it is equal to the sum of the squared loading for that variable, and
these values are shown on factor analysis printouts.
Total Variance Explained
If each loading is squared and totaled, that total divided by the number of factors
provides an estimate of the variance in a set of variables explained by a factor.
This explanation of variance is much the same as R2 in multiple regression.
These values are computed by the statistics program.
Cluster Analysis
Cluster analysis is a multivariate approach for identifying objects or individuals that are
similar to one another in some respect.
An important tool for identifying market segments.
Classifies individuals or objects into a small number of mutually exclusive and
exhaustive groups.
The cluster should have high internal (within-cluster) homogeneity and external
(between-cluster) heterogeneity.
The logic of cluster analysis is to group individuals or objects by their similarity or
distance from each other.
The actual mathematical procedures for deriving clusters will not be dealt with here as
the purpose is only to introduce the technique.
Differs from factors analysis because in factor analysis, the researcher might search for
constructs that underlie the variables (i.e., population, retail sales, number of retail
outlets); in cluster analysis the researcher would seek constructs that underlie the objects
(i.e., cities).
Differs from multiple discriminant analysis in that the groups are not predefined.
The purpose of cluster analysis is to determine how many groups really exist and to
define their composition.
Multidimensional Scaling
Multidimensional scaling provides a means for measuring objects in multidimensional
space on the basis of respondents’ judgments of the similarity of objects.
The perceptual difference among objects is reflected in the relative distance among
objects in multidimensional space.
In the most common form, subjects are asked to evaluate an object’s similarity to other
objects.
Exhibit 24.9 shows a perceptual map in two-dimensional space.
The labeling of the dimension axes is a task of interpretation for the researcher and is not
statistically determined.
There are multiple ways of using multivariate procedures to generate a perceptual map
(see Exhibit 24.10 for a summary).

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.