978-0134167404 Chapter 15 Lecture Note Part 1

CHAPTER 15

UNDERSTANDING REGRESSION ANALYSIS BASICS

LEARNING OBJECTIVES

In this chapter you will learn:

15-1 What is bivariate linear regression analysis including basic concepts such as terms,

assumptions, and equation

15-2 What is multiple regression analysis including the basic underlying conceptual

model, terms, assumptions, and computations

15-3 What is stepwise multiple regression including how to do it with SPSS

15-4 Some warnings regarding the use of multiple regression analysis

15-5 How to report multiple regression findings to clients

CHAPTER OUTLINE

Bivariate Linear Regression Analysis

• Basic Concepts in Regression Analysis

o Independent and dependent variables

o Computing the slope and the intercept

• How to Improve a Regression Analysis Finding

Multiple Regression Analysis

• An Underlying Conceptual Model

• Multiple Regression Analysis Described

o Basic assumptions in multiple regression

• “Trimming” the Regression for Significant Findings

• Special Uses of Multiple Regression Analysis

o Using a “dummy” independent variable

o Using standardized betas to compare the importance of independent

variables

o Using multiple regression as a screening device

o Interpreting the findings of multiple regression analysis

Stepwise Multiple Regression

• How to do Stepwise Multiple Regression with SPSS

• Step-By-Step Summary of How to Perform Multiple Regression Analysis

Warnings Regarding Multiple Regression Analysis

Reporting Regression Findings to Clients

KEY TERMS

Regression Analysis Bivariate regression

Intercept Slope

Dependent variable Independent variable

Least-squares criterion Outlier

General conceptual model Multiple regression analysis

Regression plane Additivity

Independence assumption Multicollinearity

Variance inflation factor (VIF) Dummy independent variable

Standardized beta coefficient Screening device

Stepwise multiple regression

TEACHING SUGGESTIONS

1. Students may need some additional help understanding the difference between

extrapolation and building a predictive model. A way to help them comprehend the

difference is to note that extrapolation always relies on some pattern that is seen over

time, while prediction requires the use of a factor other than time. Extrapolation uses

the average change in the focal variable per relevant time period, while prediction

uses the average change in the focal variable per relevant unit of the other variable.

You can use sales and marketing variables as an example. If sales have increased at

10% per year for the past 5 years, you can extrapolate that they will increase 10% in

the coming year. However, if sales have increased by 20% for every 10% decrease in

price, you can predict that they will increase by 20% if the price is decreased by 10%.

In both cases, however, all other variables are assumed to have the same influence as

in the past.

2. The analysis of residuals underpins assessment of the goodness of a predictive model,

and it is an important foundational concept. To help students understand analysis of

residuals, consider the following in-class exercise.

Show students the following number series, and ask them what straight line formula

will correctly predict the next number.

15, 20, 25, 30, ?

To find the intercept, use y = a + bx, and set x = 0, or

a + b (0) = 15

a = 15

Next, experiment with different values of b, and look at how close the results are to

the given series.

Series (y) 15, 20, 25, 30, ?

Let x = 0 1 2 3 4 Residual (sum: 0-3)

15 + 1x 15, 16, 17, 18, 19 24

15 + 2x 15, 17 19, 21, 23 18

15 + 3x 15, 18, 21, 24, 27 12

15 + 4x 15, 19, 23, 27, 31 6

15 + 5x 15, 20, 25, 30, 35 0

The residual (sum: 0-3) is the sum of the differences in the predicted value for each

equation as compared to the series. The 15 +5x equation has the lowest residual, so it

is the best predictive model, and although its residual is 0, the prediction of 35 for

x=4 is correct.

3. Use of the Novartis data to illustrate bivariate regression is intentional as it explicitly

ties regression to correlation. The text notes that the same data is used, but it is

worthwhile to point out the connection to students who may have skipped over this

point or otherwise overlooked it.

4. There are many nuances to regression analysis not treated in this chapter’s

introduction to the topic. The intent is to describe the basic concepts and to have

students identify their related values on a printout. SPSS on the other hand, does

provide for a number of statistical options that are beyond the scope of the chapter,

particularly in the case of multiple regression. Some instructors who desire more in-

depth coverage of this technique may do so with their own materials and rely on

SPSS to accommodate this deeper coverage.

5. Regression analysis is complicated and difficult for undergraduate students to

understand. To help with the comprehension of regression analysis, we have

provided a number of regression application examples. If one’s students relate well

to concrete examples, it may be beneficial to use these examples in class or to go over

them in detail more than with the examples in earlier chapters.

6. The section on the underlying conceptual model for multiple regression analysis has

two pedagogical benefits.

First, it can be used to help students understand the distinction between independent

and dependent variables. The independent variables come from the constructs that

are on the outside of the diagram that have arrows pointing toward the center.

Dependent variables emanate from the center of the diagram, and the diagram implies

that the central variables (dependent) are affected or influenced by the surrounding

variables (independent).

Second, the abbreviated lists of examples of variables for each circle in the diagram

should help students to identify the specific variables (such as demographic variables)

that would or could be used in the multiple regression model.

7. In earlier editions of the textbook, the predictive analysis chapter included a section

on time series analysis. This topic was deleted, and the section on multiple regression

was expanded in response to what the authors perceived to be a low level of interest

in time series analysis by adopting instructors. The Student Version of SPSS does

have time series (experiential smoothing) analysis capabilities as well as graphing

procedures for time series data. Instructors who wish to teach time series analysis

concepts can still do so using SPSS; however, they will need to draw from sources

other than the textbook for reading or study materials for their students.

8. Because of the many assumptions of regression analysis that can be easily violated

with a tool such as SPSS, we emphasize caution when unleashing students on

multiple regression analysis. We have provides some readable references in the

endnotes (13 and14) that we list below in case the Instructor want his or her students

to be exposed to practitioner-oriented literature on this topic. (The Quirk’s Marketing

Research Review articles are available at www.quirks.com).

See for example, Kennedy, Peter (2005, Winter), Oh No! I got the wrong sign! What

should I do? Journal of Economic Education, Vol. 36, No. 1, 77-92.

9. The Auto Concepts Segmentation Analysis dataset does not yield a good predictive

model results. We note that survey data typically does not, but Instructors may want

to emphasize that surveys have restrictive scales that greatly dampen the variance and

give regression little to work with. If Instructors have more illustrative datasets that

generate tighter predicted confidence intervals, they should consider using them. On

the other hand, the Auto Concepts Segmentation Analysis data set is a good vehicle to

demonstrate the “screening” technique use of multiple regression, and its use in Case

15.2 is a good teaching instrument.

ACTIVE LEARNING EXERCISES

The General Conceptual Model for Global Motors

What is the general conceptual model apparent in the Auto Concepts survey data set?

Lifestyle is measured as follows.

Attitudes and beliefs are:

• I am worried about global warming.

• Global warming is a real threat.

• We need to do something to slow global warming.

Media habits are:

Past behavior is

Nick will gain market segmentation implications from the demographics, lifestyle

variables and the past behavior (type of vehicle owned), promotional strategy

implications from the media habits, attitudes and beliefs variables.

Segmentation Associates, Inc.

This active learning exercise requires students to interpret the results of multiple

regression and to apply them to market segmentation target marketing considerations

using the underlying conceptual model concept described in the chapter. It also

illustrates the use of multiple regression to identify market segment differences.

1. What is the underlying conceptual model used by Segmentation Associates that is

apparent in these three sets of findings?

2. What are the segmentation variables that distinguish economy automobile buyers and

in what ways?

Segmentation

Variable

Compact

Automobile

Buyers

Demographics

Age

-.28

Education

-.12

Family size

+.39

Income

-.15

Life Style/Values

Active

American pride

+.30

Bargain hunter

+.45

Conservative

Cosmopolitan

-.40

Embrace change

-.30

Family values

+.69

Financially secure

-.28

Optimistic

3. What are the segmentation variables that distinguish sports car buyers and in what

ways?

Segmentation

Variable

Sports Car

Buyers

Demographics

Age

-.15

Education

+.38

Family size

-.35

Income

+.25

Life Style/Values

Active

+.59

American pride

Bargain hunter

-.33

Conservative

-.38

Cosmopolitan

+.68

Embrace change

+.65

Family values

Financially secure

+.21

Optimistic

+.71

4. What are the segmentation variables that distinguish luxury automobile buyers and in

what ways?

Segmentation

Variable

Luxury

Automobile

Buyers

Demographics

Age

+.59

Education

Family size

Income

+.68

Life Style/Values

Active

-.39

American pride

+.24

Bargain hunter

Conservative

+.54

Cosmopolitan

Embrace change

Family values

+.21

Financially secure

+.50

Optimistic

+.37

Luxury car buyers are older with higher incomes. They are conservative, financially

secure, and optimistic. They do not lead active lives, and they believe in family

values and American pride.

SYNTHESIZE YOUR LEARNING

Alpha Airlines

Students must assess the scaling assumptions underlying the questions on the survey, and

for each question, they must identify the proper form of analysis.

1. What is the target market profile of each of the following types of Alpha Airlines

traveler? That is, what demographic and lifestyle factors are related to the number of

miles traveled on Alpha Airlines for each of the following types?

a. Domestic business traveler

b. Domestic tourist traveler

c. International business traveler

d. International tourist traveler

2. Are there differences in the desirabilities of each of the five potential new Alpha

Airlines services with respect to:

a. Gender?

b. Belonging (or not) to Alpha Airlines frequent-flyer program?

c. Belonging (or not) to Alpha Airlines Prestige Club (private lounge areas in

some airports)?

d. Use or nonuse of Alpha Airlines’ website to book most of your flights?

e. Usual class of seating (business versus economy class) on Alpha Airlines?

3. Do relationships exist for estimated number of air flight trips in each of the past 3

years on any airline with:

a. Age?

b. Income?

c. Education?

d. Any of the lifestyle dimensions?

4. Do associations exist for (1) participating or not in Alpha Airlines frequent-flyer

program, (2) membership or not to Alpha Airlines Prestige Club (private lounge areas in

some airports), and/or (3) use or not of Alpha Airlines website to book most flights with:

a. Gender?

b. Marital status?

c. Usual class of seating (business versus economy class) on Alpha Airlines?

ANSWERS TO END-OF-CHAPTER QUESTIONS

1. Use an x-y graph to construct and explain a reasonably simple linear model for each

of the following cases:

A reasonable model is described under each case.

a. What is the relationship between gasoline prices and distance traveled for family

automobile touring vacations?

b. How do hurricane force wind warnings(e.g., Category 1, Category 2, etc.) relate

to purchases of flashlight batteries in the expected landfall area?

c. What is the relationship between carry-on luggage and charges for checking

luggage on airlines?

2. Indicate what the scatter diagram and probable regression line would look like for

two variables that are correlated in each of the following ways. In each instance,

assume a negative intercept.

a. -0.89

b. +0.48

3. Circle K runs a contest inviting customers to fill out a registration card. In exchange,

they are eligible for a grand-prize drawing of a trip to Alaska. The card asks for the

customer’s age, education, gender, estimated weekly purchases (in dollars) at the

Circle K, and approximate distance the Circle K is from his or her home. Identify

each of the following if a multiple regression analysis was to be performed.

a. Independent variable

b. Dependent variable

c. Dummy variable

4. Explain what is meant by the independence assumption in multiple regression. How

can you examine your data for independence, and what statistic is issued by most

statistical analysis programs? How is this statistic interpreted? That is, what would

indicate the presence of multicollinearity, and what would you do to eliminate it?

5. What is multiple regression? Specifically, what is “multiple” about it, and how does

the formula for multiple regression appear? In your indication of the formula,

identify the various terms and also indicate the signs (positive or negative) that they

may take on.

6. If one uses the “enter” method for multiple regression analysis, what statistics on an

SPSS output should be examined to assess the result? Indicate how you would

determine each of the following:

a. Variance explained in the dependent variable by the independent variables

b. Statistical significance of each of the independent variables

c. Relative importance of the independent variables in predicting the dependent

variable

7. Explain what is meant by the notion of “trimming” a multiple regression result. Use

the following example to illustrate your understanding of this concept.

A bicycle manufacturer maintains records over 20 years of the following: retail price

in dollars, co-operative advertising amount in dollars, competitors’ average retail

price in dollars, number of retail locations selling the bicycle manufacturer’s brand,

and whether or not the winner of the Tour de France was riding the manufacturers’

brand (coded as a dummy variable where 0=no, and 1-yes).

The initial multiple regression result determines the following:

Variable Significance Level

Average retail price in dollars .001

Cooperative advertising amount in dollars .202

Competitors’ average retail price in dollars .028

Number of retail locations .591

Tour de France .032

Using the “enter” method in SPSS, what would be the trimming steps you would

expect to undertake to identify the significant multiple regression result? Explain

your reasoning.

nn xbxbxbxbay …

332211 ++++=