June 7, 2019

CHAPTER 14

Determining Relationships

LEARNING OBJECTIVES

To learn what is meant by a “relationship” between two variables

To understand when and how cross-tabulations with chi-square analysis are applied

To become knowledgeable about the use and interpretation of correlations

To learn about the application and interpretation of regression analysis

To become proficient in the use of the XL Data Analyst to execute various types of

relationship analyses

CHAPTER OUTLINE

What Is a Relationship Between Two Variables?

Categorical Variables Relationships

Cross-Tabulation Analysis

Types of Frequencies and Percentages in a Cross-Tabulation Table

Chi-Square Analysis of a Cross-Tabulation Table

How to Present a Significant Cross-Tabulation Finding

How to Perform Cross-Tabulation Analysis with the XL Data Analyst

Correlation: Assessing Metric Variables Relationships

Correlation Coefficients and Covariation

Statistical Significance of a Correlation

Rules of Thumb for Correlation Strength

The Pearson Product Moment Correlation Coefficient

How to Perform Correlation Analysis with the XL Data Analyst

How to Present Correlation Findings

Regression Analysis

Computing the Intercept and Slope for Bivariate Regression

Testing for Statistical Significance of the Intercept and the Slope

Making a Prediction with Bivariate Regression Analysis

Multiple Regression Analysis

Working with Multiple Regression

Using “Dummy” Independent Variables

Three Uses of Multiple Regression

How to Use the XL Data Analyst to Perform Regression Analysis

How to Present Regression Analysis Findings

Final Comments on Multiple Regression Analysis

Flow Chart on Relationship Analyses

KEY TERMS

Additivity

Bivariate regression analysis

Chi-square analysis

Coefficient of determination

Column percentages table

Correlation coefficient

Covariation

Linear relationship

Multiple R

Multiple regression analysis

Null hypothesis for a correlation

“Observed frequencies”

Pearson product moment correlation

Regression analysis

Cross-tabulation analysis

Cross-tabulation cell

Cross-tabulation table

Dependent variable

Dummy independent variable

“Expected frequencies”

Frequencies table

Independent variable

Intercept

Least squares criterion

Relationship

Residuals

Row percentages table

R-Square value

Scatter diagram

Screening device

Slope

Standard error of the estimate

Standardized beta coefficient

Straight line formula

TEACHING SUGGESTIONS

1. In this edition, we dropped the notion of the use of Boolean operators to describe how

cross-tabulations are determined. Instructors may retain this approach in their class

presentations as it utilizes a concept that students know and use (AND, OR, etc. in search

engines) with a concept that is new to them (cross-tabulations of categorical variables). The

description relies heavily on students’ understanding that the category or group labels are

used. It is recommended that instructors strongly emphasize that group labels are being used

throughout their coverage of cross-tabulations.

2. Cross-tabulation with Chi-square analysis is a nonparametric statistical concept.

Nonparametic procedures, of course, are very different from parametric statistical techniques.

Instructors who are comfortable with them may wish to delve deeper into nonparametric

statistical techniques (assumptions, distributions, scales, etc.) as a way of helping students

keep cross-tabulation analysis separate from correlation or other analyses.

3. Experience has taught that students will become confused with initial encounters with the

various types of frequencies and percentages possible in cross-tabulation tables. The chapter

takes students step-by-step through these slowly; however, Instructors should consider using

class time to review the various steps so students will gain a conceptual understanding of

what each type is and how it is used. At the very least, students should understand how row

and column percentages are useful in identifying the underlying association once statistical

significance is found. This fact is underlined by the XL Data Analyst, which provides row

and column percentages only if the cross-tabulation relationship is significant at the 95

percent level of confidence.

4. The null hypothesis is omnipresent in associative analysis tests, and it is the foundation for

practically all statistical tests. We recommend that Instructors continually remind students of

the null hypothesis of no association as they review the various associative analysis tests. It

may be worthwhile to remind students that the null hypothesis is present in statistical

inference tests such as t-tests (no difference between the means of the two groups), or

analysis of variance (no difference between any two group means). For Instructors’

information, the null hypothesis concept is emphasized in the next chapter, particularly with

descriptions of bivariate and multiple regression analyses.

5. For Instructors who want their student to actually compute statistics, consider end-of-chapter

questions numbers 14 and 15. Number 14 provides observed frequencies in a cross-tabulation

along with the computed Chi-square value. It has only four cells. The answer to question 14

below has the XL Data Analyst cross-tabulation output with the row and cell percentages, so

Instructors will be spared the requirement to compute them. Question 15 will require students

to set up the cross-tabulation table and to determine the observed frequencies, plus do all

other computations.

6. For Instructors who want to emphasize the scatter diagram interpretation of a correlation,

consider using Application Question 16. The correlation matrix for all five variables is

provided in the answer to question 16 below as are all the scatter diagrams. Students should

be able to build the data set in Excel for Windows quickly, and they can have Excel create all

possible scatter diagrams.

7. There are many nuances to regression analysis not treated in this chapter’s introduction to the

topic. The intent is to describe the basic concepts and to have students identify their related

values on a printout. Instructors should be aware that in no way will students become more

than fundamentally knowledgeable about regression, running it and interpreting the findings.

8. A constraint in Excel is that it will allow no more than 16 independent variables when

performing regression analysis. When students (or instructors) are confronted with a larger

number of independent variables, we recommend the following approach.

a. Identify groups of independent variables such as demographics, lifestyle

characteristics, usage and/or behavioral variables, performance evaluations and/or

satisfaction variables, etc.

b. Use regression analysis to identify the statistically significant independent variables

in the first group (such as usage and/or behavioral variables).

c. Add the next group to these variables (such as lifestyle characteristics) and determine

the statistically significant independent variables. (It is necessary to rerun the

regression each time as the addition of new independent variables affects the outcome

every time due to changes in sample size, non-inclusion of missing observations, etc.)

d. Continue with this approach until the final set of statistically significant independent

variables are determined.

9. When running multiple regression, the XL Data Analyst checks for multicolinearity among

the independent variables. That is, a requirement of multiple regression is that there is not

high correlation between any two independent variables. Using a “tolerance” test, the

program will detect the level of multicolinearity and issue a warning if the level is too high.

Processing stops. Users should systematically find the two independent variables with the

highest correlation and eliminate one from the multiple regression analysis. This approach

may take several iterations if several independent variables are highly correlated.