978-1111826925 Chapter 23 Lecture Note

subject Type Homework Help
subject Pages 9
subject Words 2867
subject Authors Barry J. Babin, Jon C. Carr, Mitch Griffin, William G. Zikmund

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
Chapter 23
Bivariate Statistical Analysis:
Measures of Association
AT-A-GLANCE
I. The Basics
II. Simple Correlation Coefficient
A. An example
B. Correlation, covariance and causation
C. Coefficient of determination
D. Correlation matrix
III. Regression Analysis
A. The regression equation
B. Parameter estimate choices
Raw regression estimates (b1)
Standardize regression estimates ()
C. Visual estimation of a simple regression model
Errors in prediction
D. Ordinary least-squares (OLS) method of regression analysis
Statistical significance of regression model
R2
Interpreting regression output
Plotting the OLS regression line
Simple regression and hypothesis testing
IV. Appendix 23A: Arithmetic Behind OLS
LEARNING OUTCOMES
1. Apply and interpret simple bivariate correlations
2. Interpret a correlation matrix
3. Understand simple (bivariate) regression
4. Understand the least-squares estimation technique
5. Interpret regression output including the tests of hypotheses tied to specific parameter coefficients
CHAPTER VIGNETTE: Bringing Your Work to Your Home (and
Bringing Your Home to Work)
Our understanding of the work and family interface has changed substantially in recent years. The idea
that work roles and family roles could be at odds with one another is nowadays referred to as work-family
conflict (WFC)—conflict that results when the demands and responsibilities of one role “spill over” into
the other role. Researchers have begun to examine and explore the many different work and family
characteristics (i.e., independent variables) that can predict WFC (a dependent variable), with the goal of
providing insights into the causes and consequences of this phenomenon.
SURVEY THIS!
Based on the variables list, do the following:
1. Choose 3 variables (independent variables) that you think would predict satisfaction
(dependent variable).
2. Conduct a bivariate correlation analysis for all of your selected variables—do they show the
correct sign? Are they significantly related?
3. Using those same independent and dependent variables, conduct a simple regression analysis.
What do you find?
RESEARCH SNAPSHOTS
What Makes Attractiveness?
What are the things that make someone attractive? Companies that hire people to sell fashion are
interested in this question, and a correlation matrix is given that shows how different
characteristics related to each other. Variables include a measure of fit (i.e., how well the person
matches a fashion retail concept), attractiveness, weight, age, manner of dress, and hair-style. A
sample of consumers rated a model shown in a photograph on those characteristics. The results
suggest that if the model seems to “fit” the store concept, she seems attractive. If she is too big,
she is less attractive. Age is unrelated to attractiveness or fit, and moderness and coldness are
associated with lower attractiveness. The steps for using SPSS to find correlations are given.
Size and Weight
America seems obsessed with weight control. The previous research snapshot gave correlations
between factors related to attractiveness. What if the following hypothesis was tested: H1:
Perceptions that a female model is overweight are related negatively to perceptions of
attractiveness. This can be tested with a simple regression, and results support the hypothesis.
The = -.275 is both in the expected direction (negative) and significant (p < .05). Therefore, a
person perceived as “too fat,” is seen as less attractive.
OUTLINE
I. THE BASICS
The mathematical symbol X is commonly used for an independent variable, and Y typically
denotes a dependent variable
The chi-square (2) test provides information about whether two or more less than interval
variables are interrelated.
Measurement characteristics influence which measure of association is most appropriate
(see Exhibit 23.1).
II. SIMPLE CORRELATION COEFFICIENT
The most popular technique for indicating the relationship of one variable to another is
correlation.
A correlation coefficient is a statistical measure of the covariation or association between
two variables.
Covariance is the extent to which a change in one variable corresponds systematically to a
change in another – it can be thought of as a standardized covariance.
When correlations estimate relationships between continuous variables, the Pearson
product-moment correlation is appropriate.
The correlation coefficient, r, ranges from +1.0 to -1.0.
If the value of r equals +1.0, a perfect positive relationship exists.
If the value of r equals -1.0, a perfect negative relationship exists.
No correlation is indicated if r equals 0.
A correlation coefficient indicates both the magnitude of the linear relationship and the
direction of that relationship.
The formula for calculating the correlation coefficient for two variables X and Y is as
follows:
r
xy
= r
yx
= X
i
- X
( )
Y
i
- Y
( )
å
X
i
- X
( )
2
Y
i
- Y
( )
2
å å
Where the symbols and represent the sample averages of X and Y,
respectively.
An alternative way to express the correlation formula is:
where
X
Y
22
yx
xy
yxxy
rr
= variance of X
= variance of Y
= covariance of X and Y
with
If associated values Xi and Yi differ from their means in the same direction, their covariance
will be positive; covariance will be negative if the values of Xi and Yi have a tendency to
deviate in opposite directions.
The Pearson correlation coefficient is a standardized measure of covariance, and researchers
find it useful because they can compare two correlations without regard for the amount of
variance exhibited by each variable separately.
An Example
While researchers do not need to calculate correlation manually, the calculation process
helps illustrate exactly what is meant by correlation and covariance.
Consider an investigation made to determine whether the average number of hours
worked in manufacturing industries is related to unemployment.
Exhibit 23.3 shows the correlation between the two variables is -.635, indicating an
inverse (negative) relationship (i.e., when number of hours goes up, unemployment
comes down).
Correlation, Covariance and Causation
Recall that concomitant variation is one condition needed to establish a causal
relationship between two variables.
When two variables covary, they display concomitant variation.
This systematic covariation does not in and of itself establish causality – the relationship
would also need to be nonspurious and that any hypothesized “cause” would have to
occur before any subsequent effect.
Coefficient of Determination
If we wish to know the proportion of variance in Y explained by X (or vice versa), we
can calculate the coefficient of determination (R2) by squaring the correlation
coefficient:
2
x
2
y
xy
 
N
YYXX
i
i
xy

R
2
Explained variance
Total variance
The coefficient of determination, R2, measures that part of the total variance of Y that
is accounted for by knowing the value of X.
R-squared really is just r squared.
Correlation Matrix
A correlation matrix is the standard form of reporting observed correlations among
multiple variables.
Each entry represents the bivariate relationship between a pair of variables.
Table 23.4 shows a correlation matrix.
Note that the main diagonal consists of correlations of 1.00, which will always be the
case when a variable is correlated with itself.
Had this been a covariance matrix, the diagonal would display the variance for any
given variable.
The procedure for determining statistical significance is the t-test of the significance
of a correlation coefficient.
Typically it is hypothesized that r = 0, and then a t-test is performed.
Statistical programs usually indicate the p-value associated with each correlation
and/or star significant correlations using asterisks.
III. REGRESSION ANALYSIS
Regression is another technique for measuring the linear association between a dependent and
an independent variable.
Although simple regression and correlation are mathematically equivalent in most respects,
regression is a dependence technique where correlation is an interdependence technique.
A dependence technique makes a distinction between dependent and independent
variables.
An interdependence technique does not make this distinction and simply is concerned
with how variables relate to one another.
Simple regression links a dependent (or criterion) variable, Y, to an independent (or predictor)
variable, X.
Regression analysis attempts to predict the values of a continuous, interval-scaled dependent
variable from the specific values of the independent variable.
The Regression Equation
Simple (bivariate) linear regression investigates a straight-line relationship of the type:
Y = +
b
X
where Y is a continuous dependent variable, X is an independent variable that is usually
continuous, although dichotomous nominal or ordinal variables can be included in the
form of a dummy variable.
Alpha () and beta (
b
) are two parameters that must be estimated so that the equation
best represents a given set of data.
Determine the height of the regression line and the angle of the line relative to
horizontal.
Regression techniques have the job of estimating values for these parameters that
make the line fit the observations the best.
represents the Y intercept (where the line crosses the y-axis).
is the slope coefficient.
Parameter Estimate Choices
The estimates for and are the key to regression analysis.
In most business research, the estimate of is most important because the explanatory
power of regression rests with this coefficient because this is where the direction and
strength of the relationship between the independent and dependent variable is explained.
The Y- intercept term is sometimes referred to as a constant because represents a fixed
point.
An estimated slope coefficient () is sometimes referred to as a regression weight,
regression coefficient, parameter estimate or sometimes even as a path estimate because
of the way hypothesized causal relationships are often represented in diagrams.
These terms are used interchangeably.
Parameter estimates can be presented in either raw or standardized form.
A potential problem with raw parameter estimates is due to the fact that they reflect
the measurement scale range.
A standardized regression coefficient (β) provides a common metric allowing
regression results to be compared to one another no matter what the original scale
range may have been.
The standardized y-intercept term is always 0.
The most common short-hand is as follows:
B0 or b0 = raw (unstandardized) y-intercept term. What is referred to as above.
B1 or b1 = raw regression coefficient or estimate.
1 = standardized regression coefficients.
Raw Regression Estimates (b1)
Have the advantage of retaining the scale metric which is also their key
disadvantage.
Should the standardized or unstandardized coefficients be interpreted?
If the purpose of the regression analysis is forecasting, then raw parameter
estimates must be used that is, the researcher is interested only in
prediction.
Standardized Regression Estimates ()
Have the advantages of a constant scale.
When should standardized regression estimates be used?
When the researcher is testing explanatory hypotheses that is, when the
purpose of the research is more explanation than prediction.
Visual Estimation of a Simple Regression Model
Simple regression involves finding a best-fit line given a set of observations plotted in
two-dimensional space.
Many ways exist to estimate where this line should go: instrumental variables, maximum
likelihood, visual estimation, and ordinary least squares (OLS).
This book focuses on the latter two.
Exhibit 23.7 plots data in a scatter diagram.
The vertical axis indicates the value of the dependent variable, Y.
The horizontal axis indicates the value of the independent variable, X.
Each single point in the diagram represents an observation of X and Y at a given point
in time.
The values are simply points in a Cartesian plane.
One way to determine the relationship between X and Y is to simply visually draw the
best fit straight line through the points in the figure.
That is, try to draw a line the goes through the center of the plot of points.
The better one can estimate where the best fit line should be, the less will be the error in
prediction.
Errors in Prediction
The goal of regression analysis is an estimation technique which would place the line
so that the total sum of all errors over all observations is minimized.
Ordinary Least-Squares (OLS) Method of Regression Analysis
OLS is a relatively straight forward mathematical technique that guarantees that the
resulting straight line will produce the least possible total error in using X to predict Y.
The logic is based on how much better a regression line can predict values of Y compared
to simply using the mean as a prediction for all observations.
Unless the dependent and independent variables are perfectly related, no straight line can
connect all observations.
The procedure used in the least-squares method generates a straight line that minimizes
the sum of squared deviations of the actual values from this predicted regression line.
No other line can produce less error.
Using the symbol e to represent the deviations of the observations from the regression
line, the least-squares criterion is as follows:
e
i
2
i = 1
n
å
is minimum
where e = Yi -
i
Y
ˆ
(the residual)
Yi= actual value of the dependent variable
i
Y
ˆ
= estimated value of the dependent variable (“Y hat”)
n = number of observations
i = number of the particular observation
The general equation for a straight line is Y = b0 +
b
1X where a more appropriate
estimating equation includes an allowance for error:
The equation means that the predicted value for any value of X (Xi) is determined as a
function of the estimated slope coefficient, plus the estimated intercept coefficient + some
error.
The raw parameter estimates can be found using the following formulas:
 
 
2
2
1)()(
))(()(
ii
iiii
XXn
YXYXn
b
and
XbYb
o1
where
Yi = ith observed value of the dependent variable
Xi= ith observed value of the independent variable
= mean of the dependent variable
X = independent variable
= mean of the independent variable
n = number of observations
bo = intercept estimate
b1 = slope estimate (regression weight)
The standardized regression coefficient from a simple regression equals the Pearson
correlation coefficient for the two variables.
See Appendix 23A for the arithmetic necessary to calculate the parameter estimates.
Statistical Significance of Regression Model
Like ANOVA, an F-test provides a way of testing the statistical significance of the
regression model.
Y
X
The overall F-test for regression is illustrated in Exhibit 23.7.
1. The total line including the blue and red line represents the total deviation of the
observation from the mean:
YY
i
2. The blue portion represents how much of the total deviation is explained by the
regression line:
YY
i
ˆ
3. The red portion represents how much of the total deviation is not explained by the
regression line (also equal to ei):
ii
YY ˆ
These three components are mathematically related because the total deviation is a sum
of what is explained by the regression line and what is not explained by the regression
line:
)
ˆ
()
ˆ
()(
iiii
YYYYYY 
Total
Deviatio
n
(SST)
=
Deviation
explained by
the regression
(SSR)
+
Deviation
unexplained by
the regression
(SSE)
Just as in ANOVA, the total deviation represents the total variation to be explained.
Partitioning of the variation into components allows us to form a ratio of the explained
variation versus the unexplained variation:
SST = SSR + SSE
An F-test, or an analysis of variance, can be applied to a regression to test the relative
magnitude of the SSR (Sums of Squares Regression) and SSE (Sums of Squared
Errors) with their appropriate degrees of freedom.
The equation for the F-test is:
MSE
MST
kn
SSE
k
SSR
F
knk

)(
)1(
))(1(
Where,
MST is an abbreviation for Mean Squared Regression
MSE is an abbreviation for Mean Squared Error
k is the number of independent variables (always 1 for simple regression)
n is the sample size
Again, researchers need not calculate this by hand as regression programs will produce an
“ANOVA” table which will provide the F-value, a p-value (significance level) and
generally show the partitioned variation in some form.
R2
The coefficient of determination, R2, reflects the proportion of variance explained by
the regression line. It can be found with this formula:
SST
SSR
R
2
For example, a coefficient of determination of .875 may be interpreted to mean that
87.5 percent of the variation in the dependent variable was explained by associating
the variable with the independent variable (however, in practice, do not expect to
often see a simple regression result with an R2 as high as this example).
What is an “acceptable” R2 value?
Depends on so many factors that a single precise guideline is inappropriate.
The focus should be on the F-test.
Interpreting Regression Output
Exhibit 23.8 provides a typical output for regression analysis.
Interpreting simple regression output is a simple two-step process:
1. Interpret the overall significance of the model.
a. The output will include a “model F” and a significance value.
b. The coefficient of determination or R2 can be interpreted.
2. The individual parameter coefficient is interpreted.
a. The t-value associated with the slope coefficient can be interpreted. For
simple regression, the p-value for the model F and for the t-test of the
individual regression weight will be the same.
b. A t-test for the intercept term (constant) is also provided; however, it is
seldom of interest because the explanatory power rests in the slope
coefficient.
c. If a need to forecast sales exists, the estimated regression equation is
needed.
Plotting the OLS Regression Line
A regression line on the scatter diagram, only two predicted values of Y need to be
plotted.
To determine the error (residual) of any observation, the predicted value of Y is first
calculated. The predicted value is then subtracted from the actual value.
Simple Regression and Hypothesis Testing
The explanatory power of regression lies in hypothesis testing.
Regression is often used to test relational hypotheses.
The outcome of the hypothesis test involves two conditions that must be satisfied:
1. The regression weight must be in the hypothesized direction.
2. The t-test associated with the regression weight must be significant.
IV. APPENDIX 23A: ARITHMETIC BEHIND OLS
Data from Exhibit 23.6 are used to solve for the parameter estimates using the OLS
equations.

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.