978-1111826925 Chapter 24 Lecture Note

Type Homework Help

Pages 9

Words 3618

Textbook Business Research Methods 9th Edition

Authors Barry J. Babin, Jon C. Carr, Mitch Griffin, William G. Zikmund

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

Chapter 24

Multivariate Statistical Analysis

AT-A-GLANCE

I. Introduction

A. What is multivariate data analysis?

B. The “variate” in multivariate

II. Classifying Multivariate Techniques

A. Dependence techniques

B. Interdependence techniques

C. Influence of measurement scales

III. Analysis of Dependence

A. Multiple regression analysis

A simple example

Regression coefficients in multiple regression

R2 in multiple regression

Statistical significance in multiple regression

Steps in interpreting a multiple regression model

B. ANOVA (n-way) and MANOVA

N-way (univariate) ANOVA

Interpreting MANOVA

C. Discriminant analysis

IV. Analysis of Interdependence

A. Factor analysis

How many factors

Factor loadings

Factor rotation

Data reduction technique

Creating composite scales with factor results

Communality

Total variance explained

B. Cluster analysis

C. Multidimensional scaling

LEARNING OUTCOMES

1. Understand what multivariate statistical analysis involves and know the two types of multivariate

analysis

2. Interpret results from multiple regression analysis

3. Interpret results from multivariate analysis of variance (MANOVA)

4. Interpret basic exploratory factor analysis results

5. Know what multiple discriminant analysis can be used to do

6. Understand how cluster analysis can identify market segments

CHAPTER VIGNETTE: Cow-A-Bunga Never Goes Out of Style

The yearning to hold on to the past is a common psychological experience. Nostalgia sells, and

researchers are very interested in understanding exactly what nostalgia is, who is most prone to react to it,

and how it contributes to business success. Toy shelves are filled with throw-back versions of toys adults

grew up with it, appliance designs are retro-designs from the 1950s, and advertisers use nostalgic

advertising to help consumers relive the past. Researchers are working on numerous issues related to

nostalgia, such as how it can be measured, what emotions are associated with it, can it be used to segment

markets, and so on. Nostalgia is a complex experience involving multiple thoughts and feelings, and

multivariate research procedures can consider the effects of multiple variables simultaneously.

SURVEY THIS!

Students are asked to examine the survey questions that deal with satisfaction with the business school

experience. When finished with the chapter:

1. Run a factor analysis on the 8 questions from “Teacher’s Knowledge of Topics” through

“Your Overall Academic Performance”

a. How many factors are retained?

b. What would you “name” these factors?

c. Create a summated scale for each factor.

2. Run a multiple regression analysis with the “Overall Experience” question as the dependent

measure and the summed scale(s) as the independent measure(s). Also include gender as an

independent variable (dummy variable). Interpret the results.

a. Is the overall model significant?

b. Which of the independent variables are significant?

c. How much variance in “Overall Experience” explained by the predictor variables?

d. Which of the independent variables is most important in determining satisfaction

with the “Overall Experience”?

RESEARCH SNAPSHOTS

Too Much of a Good Thing!

Researchers often test hypotheses by examining regression coefficients. Financial data can be

problematic to analyze, and an example is given. While the model F is highly significant, and the

model R2 is high, the results for the independent variable tests show a different story. Only one

independent variable is significant at the type I error rate of .05, and the  coefficients do not

make sense because the coefficients for two of the independent variables are beyond the range

that  should theoretically take (i.e., -1.0 to 1.0). The two VIF factors are in the 50s, and

generally, when multiple VIF factors approach 5 or greater, problems with multicollinearity can

be expected. In this case, the researcher may wish to rerun the model after dropping one of the

offending variables.

How to Get MANOVA Results

A department store developer gathered data looking at the effect of nostalgia on customer

impressions. A field experiment was set up in which a key department was either given a modern

design or a retro design. Since two related dependent variables were involved (interest and

excitement), MANOVA is the appropriate technique and can be conducted using SPSS. The

dialog box for SPSS is given, and fixed factors include the experimental variable, respondent sex

with age included as a covariate or control variable. The SPSS output is summarized:

1. Multivariate results – Wilkes Lambda, overall multivariate F, and p-value associated

with it.

2. The univariate model F statistics for each dependent variable with significance level.

3. The individual effects associated with one dependent variable are interpreted.

4. The individual effects associated with the other dependent variable are interpreted.

5. Review the means for each experimental cell as well as the covariate results.

Getting Factor Results with SAS or SPSS

While researchers may choose to use a spreadsheet to produce simple or even multiple regression

results, they will almost always turn to a specialized program for procedures like factor analysis.

The instructions for using both SAS and SPSS for getting factor results are given.

OUTLINE

I. INTRODUCTION

What is Multivariate Data Analysis?

Research that involves three or more variables, or that is concerned with underlying

dimensions among multiple variables, will involve multivariate statistical analysis.

The “Variate” in Multivariate

Another distinguishing characteristic of multivariate analysis is the variate, which is a

mathematical way in which a set of variables can be represented with one equation.

Variates are formed as a linear combination of variables each contributing to the overall

meaning of the variate based on an empirically derived weight.

Mathematically, the variate is a function of the measured variables involved in an

analysis:

Vk = f(X1, X2, …, Xm)

Vk is the kth variate. Every analysis could involve multiple sets of variables, each

represented by a variate.

X1 to Xm represent the measured variables.

II. CLASSIFYING MULTIVARIATE TECHNIQUES

Exhibit 24.1 presents a very basic classification of multivariate data analysis procedures.

Two basic groups are dependence methods and interdependence methods.

Dependence Techniques

When hypotheses involve a distinction between independent and dependent variables, a

dependence technique is needed.

Dependence methods include:

multiple regression analysis

multiple discriminant analysis

multivariate analysis of variance

structural equations modeling

Interdependence Techniques

When researchers examine questions that do not distinguish between independent and

dependent variables, interdependence techniques are used.

No one variable or variable subset is to be predicted from or explained by the others.

The most common interdependent methods are:

factor analysis

cluster analysis

multidimensional scaling

Influence of Measurement Scales

The nature of the measurement scales will determine which multivariate technique is

appropriate for the data.

Exhibits 24.2 and 24.3 show that selection of a multivariate technique requires

consideration of the types of measures used for both independent and dependent sets of

variables, and the exhibits refer to nominal and ordinal scales as nonmetric and interval

and ratio scales as metric.

III. ANALYSIS OF DEPENDENCE

Multivariate dependence techniques are variants of the general linear model (GLM), which

is a way of modeling some process based on how different variables cause fluctuations from

the average dependent variable.

Fluctuations can come in the form of group means that differ from the overall mean (i.e.,

ANOVA) or in the form of a significant slope coefficient (i.e., regression).

The basic idea can be thought of as follows:

XFFXY





Realize that Y could represent multiple dependent variables, just as X and F could represent

multiple independent variables.

Multiple regression, n-way ANOVA, and MANOVA are common forms that the GLM can

take.

Multiple Regression Analysis

Multiple regression analysis is an extension of simple regression analysis allowing a

metric dependent variable to be predicted by multiple impendent variables.

Chapter 23 illustrated simple linear regression analysis where one dependent variable is

explained by one independent variable.

Reality suggests that several factors are likely to affect a dependent variable. If this is the

case, the problem requires identification of a linear relationship with multiple regression

analysis. The multiple regression equation is:

Yi = b0 + b1X1 + b2X2 + b3X3 + … + bnXn + ei

Thus, as a form of the GLM, dependent variable predictions

)

are made by

adjusting the constant (b0 – which would be equal to the mean if all slope coefficients

are zero) based on the slope coefficients associated with each independent variable.

Less than interval (nonmetric) independent variables can be used in multiple

regression by implementing dummy variable coding, which is a variable that uses a

1 and a 0 to code the different levels of a dichotomous variable.

A Simple Example

Assume that a toy manufacturer wishes to explain store sales (dependent

variable) using a sample of stores from Canada and Europe, and several

hypotheses are offered:

H1: Competitor’s sales are related negatively to sales.

H2: Sales are higher in communities with a sales office than when no

sales office is present.

H3: Grammar school enrollment in a community is related positively to

sales.

The independent variables are italicized, and the presence of a sales office is a

categorical variable that can be represented with dummy coding (0 = no office in

a particular region, 1 = office in the region).

Results:



= 102.18 + .387X1 + 115.2X2 + 6.73X3

Coefficient of multiple determination (R2) = 0.845

F-value = 14.6; p < .05

The regression equation indicates that sales are positively related to X1, X2, and

X3, and the coefficients show the effect on the dependent variable of a 1-unit

increase in any of the independent variables (e.g., the value b2 = 115.2 indicates

that an increase of $115,200 (000 included) in toy sales is expected with each

additional unit of X2).

Because the effect associated with X1 is positive, H1 is not supported because the

sign of the regression coefficient is opposite the prediction.

If the coefficients of the other two independent variables are statistically

significant, the hypotheses will be supported because the effects are in the

hypothesized direction.

Regression Coefficients in Multiple Regression

Multiple regression involves multiple slope estimates, or regression weights.

One challenge in regression models is to understand how one independent

variable affects the dependent variable considering the effect of the other

independent variables.

Regression coefficients are unaffected by each other only when

independent variables are independent.

Conventional regression methods provide standardized parameter estimates, 1,

2, and so on, that can be thought of as partial regression coefficients.

The correlation between Y and X1 controlling for the correlation that X2

has with the Y, is called partial correlation.

As long as the correlation between independent variables is modest,

partial regression coefficients adequately represent the relationships.

When researchers want to know which independent variable is most predictive of

the dependent variable, the standardized regression coefficient () is used.

 provides a constant scale – the greater the absolute value of the standardized

coefficient, the more that particular independent variable is responsible for

explaining the dependent variable.

R2 in Multiple Regression

The coefficient of multiple determination in multiple regression indicates the

percentage of variation in Y explained by all independent variables.

It the two independent variables are truly independent (uncorrelated with each

other), the R2 for a multiple regression model is equal to the separate R2 values

that would result from two separate simple regression models.

More typically, the independent variables are related to one another meaning that

the model R2 from a multiple regression model will be less than the separate R2

values resulting from individual simple regression models.

This reduction in R2 is proportionate to the extent to which the

independent variables are inter-related or collinear.

Statistical Significance in Multiple Regression

An F-test is used to test statistical significance by comparing the variation

explained by the regression equation to the residual error variation.

The F-test allows for testing of the relative magnitudes of the sum of squares due

to the regression (SSR) and the error sum of squares (SSE).

 

   

MSE

MSR

knSSE

kSSR

F



1/

where

k = number of independent variables

n = number of observations

MSR = Mean Squares Regression

MSE = Mean Squares Error

Degrees of freedom for the F-test (d.f.) are:

d.f. for the numerator = k

d.f. for the denominator = n − k − 1

In practice, statistical programs will report the p-value associated with the F-test

directly.

Similarly, the programs report the statistical test for each independent variable.

Independent variables with p-values below the acceptable Type I error rate are

considered significant predictors of the dependent variable.

Steps in Interpreting a Multiple Regression Model

1. Examine the model F-test – if not significant, the model should be dismissed and

there is no need to proceed to further steps.

2. Examine the individual statistical tests for each parameter estimate – independent

variables with significant results can be considered a significant explanatory

variable.

3. Examine the model R2 – no cut-off values exist, but the absolute value is more

important when the researcher is more interested in prediction than explanation.

4. Examine collinearity diagnostics – multicollinearity in regression analysis refers

to how strongly interrelated the independent variables in a model are.

When multicollinearity is too high, the individual parameter estimates

become difficult to interpret.

Most regression programs can compute variance inflation factors (VIF)

for each variable, and a rule of thumb is a VIF above 5.0 suggests

problems with multicollinearity.

ANOVA (n-way) and MANOVA

Also represents a form of the GLM.

ANOVA can be extended beyond one-way ANOVA to predict a dependent variable with

multiple categorical independent variables.

Multivariate analysis of variance (MANOVA) is a multivariate technique that predicts

multiple continuous dependent variables with multiple independent variables.

Independent variables are categorical, although a continuous control variable can be

included in the form of a covariate.

N-way (univariate) ANOVA

The interpretation of an n-way ANOVA model follows closely from the regression

results described previously.

The steps involved are essentially the same with the addition of interpreting

differences between means:

1. Examine the overall model F-test result. If significant, proceed.

2. Examine individual F-tests for individual variables.

3. For each significant categorical independent variable, interpret the effect by

examining the group means (see Chapter 12).

4. For each significant, continuous covariate, interpret the parameter estimate (b).

5. For each significant interaction, interpret the means for each combination (see the

graphical representation as illustrated in Chapter 12).

Interpreting MANOVA

MANOVA models produce an additional layer of testing.

The first layer involves the multivariate F-test, which is based on a statistic called

Wilke’s Lambda ().

Examines whether or not an independent variable explains significant variation

among the dependent variables within the model.

If  is significant, then the F-test results from individual univariate regression models

nested within the MANOVA model are interpreted.

The rest of the interpretation results follow from the one-way ANOVA or multiple

regression model results above.

Discriminant Analysis

Researchers often need to produce a classification of sampling units, and this process

may involve using a set of independent variables to decide if a sampling unit belongs in

one group or another.

The challenge is to find the discriminating variables to use in a predictive equation that

will produce better than chance assignment of the individuals to the groups.

Discriminant analysis is a multivariate technique that predicts a categorical dependent

variable based on a linear combination of independent variables.

A linear combination of independent variables that explains group membership is known

as a discriminant function.

The researcher’s task is to derive the coefficients of the discriminant function (a straight

line).

The following linear function is used:

Zi = b1X1i + b2X2i + . . . + bnXni

where

Zi = ith applicant’s discriminant score

bn = discriminant coefficient for the nth variable

Xni = ith applicant’s value on the nth independent variable

Using scores for all the individuals in the sample, a discriminant function is determined

based on the criterion that the groups be maximally differentiated on the set of

independent variables.

Suppose the personnel manager wanting to predict whether an applicant will succeed on

the basis of age, sales aptitude test scores, and mechanical ability scores finds that the

standardized weights in the equation to be

Z = b1X1 + b2X2 + b3X3

= 0.069X1 + 0.013X2 + 0.0007X3

This means that age (X1) is much more important than sales aptitude test scores (X2)

on whether or not an applicant will succeed. Mechanical ability (X3) has relatively

minor discriminating power.

In the computation of the linear discriminant function, weights are assigned to the

variables to maximize the ratio of the difference between the means of the two groups to

the standard deviation within groups.

The standardized discriminant coefficients, or weights, provide information about the

relative importance of each of these variables in discriminating between the two groups.

A major goal of discriminant analysis is to perform a classification function.

To determine whether the discriminant analysis can be used as a good predictor,

information provided in the “confusion matrix” is used.

Tests can be performed to determine if the rate of correct classification is statistically

significant.

IV. ANALYSIS OF INTERDEPENDENCE

The purpose of the analysis of interdependence is to further understand the structure of a set

of variables or objects.

Factor Analysis

Factor Analysis is a prototypical multivariate, interdependence technique and is a

technique of statistically identifying a reduced number of factors from a larger number of

measured variables.

The factors themselves are not measures, but instead, they are identified by forming a

variate using the measured variables.

Factors are usually latent constructs (i.e., attitudes or satisfaction) or an index (i.e., social

class).

A researcher need not distinguish between independent and dependent variables.

Can be divided into two types:

1. Exploratory factor analysis (EFA) – performed when the researcher is uncertain about

how many factors may exist among a set of variables.

2. Confirmatory factor analysis (CFA) – performed when the researcher has strong

theoretical expectations about the factor structure before performing the analysis. A

good tool for assessing construct validity.

More than one technique exists for estimating variants that form the factors, but the

general idea is to mathematically produce variates that explain the most total variance

among the set of variables being analyzed.

EFA provides two important pieces of information:

1. How many factors exist among a set of variables?

2. What variables match up or “load on” which factors?

How Many Factors?

Often times, the researcher asks the question, “How many factors will exist among a

large number of variables?”

The question is usually addressed based on the eigenvalues for a factor solution.

The most common rule is to base the number of factors on the number of eigenvalues

greater than 1.0, which is the default rule for most statistical programs.

Factor Loadings

A factor loading indicates how strongly correlated a factor is with a measured

variable.

EFA depends on the loadings for proper interpretation.

A latent construct can be interpreted based on the pattern of loadings and the content

of the variables; thus, the latent construct is measured indirectly by the variables.

Loading estimates are provided by factor analysis programs (see Exhibit 24.6).

Factors are interpreted by examining any patterns that emerge from the factor results.

Factor Rotation

Factor rotation is a mathematical way of simplifying factor results.

The most common type is called varimax.

Involves creating new reference axes for a given set of variables because often times,

an initial factor solution is difficult to interpret.

Rotation clears things up by producing more obvious patterns of loadings.

Data Reduction Technique

Factor analysis is considered a data reduction technique.

These techniques allow a researcher to summarize information from many variables

into a reduced set of variates or composite variables.

Advantageous for many reasons:

In general, the rule of parsimony suggests an explanation involving fewer

components is better than one involving many more.

A way of identifying which variables among a large set might be important in

some analysis.

Simplifies decision making.

Creating Composite Scales with Factor Results

When a clear pattern of loadings exists, the researcher can create a scale by summing

the variables with high loadings and creating a summated scale which could be tested

for reliability using coefficient alpha.

Composite scales can then be used in another multivariate technique (i.e., regression).

Communality

Communality is a measure of the percentage of a variable’s variation that is explained

by the factors.

A relatively high communality indicates that a variable has much in common with the

other variables taken as a group.

For any variable, it is equal to the sum of the squared loading for that variable, and

these values are shown on factor analysis printouts.

Total Variance Explained

If each loading is squared and totaled, that total divided by the number of factors

provides an estimate of the variance in a set of variables explained by a factor.

This explanation of variance is much the same as R2 in multiple regression.

These values are computed by the statistics program.

Cluster Analysis

Cluster analysis is a multivariate approach for identifying objects or individuals that are

similar to one another in some respect.

An important tool for identifying market segments.

Classifies individuals or objects into a small number of mutually exclusive and

exhaustive groups.

The cluster should have high internal (within-cluster) homogeneity and external

(between-cluster) heterogeneity.

The logic of cluster analysis is to group individuals or objects by their similarity or

distance from each other.

The actual mathematical procedures for deriving clusters will not be dealt with here as

the purpose is only to introduce the technique.

Differs from factors analysis because in factor analysis, the researcher might search for

constructs that underlie the variables (i.e., population, retail sales, number of retail

outlets); in cluster analysis the researcher would seek constructs that underlie the objects

(i.e., cities).

Differs from multiple discriminant analysis in that the groups are not predefined.

The purpose of cluster analysis is to determine how many groups really exist and to

define their composition.

Multidimensional Scaling

Multidimensional scaling provides a means for measuring objects in multidimensional

space on the basis of respondents’ judgments of the similarity of objects.

The perceptual difference among objects is reflected in the relative distance among

objects in multidimensional space.

In the most common form, subjects are asked to evaluate an object’s similarity to other

objects.

Exhibit 24.9 shows a perceptual map in two-dimensional space.

The labeling of the dimension axes is a task of interpretation for the researcher and is not

statistically determined.

There are multiple ways of using multivariate procedures to generate a perceptual map

(see Exhibit 24.10 for a summary).

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

978-1111826925 Chapter 24 Lecture Note

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

978-1111826925 Chapter 24 Lecture Note

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students