978-1111826925 Chapter 17 Lecture Note Part 1

subject Type Homework Help
subject Pages 7
subject Words 2280
subject Authors Barry J. Babin, Jon C. Carr, Mitch Griffin, William G. Zikmund

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
Chapter 17
Determination of Sample Size:
A Review of Statistical Theory
AT-A-GLANCE
I. Introductions
A. Descriptive and inferential statistics
B. Sample statistics and population parameters
II. Making Data Usable
A. Frequency distributions
B. Proportions
C. Measures of central tendency
The mean
The median
The mode
D. Measures of dispersion
The range
Why use the standard deviation?
Variance
Standard deviation
III. The Normal Distribution
IV. Population Distribution, Sample Distribution, and Sampling Distribution
V. Central-Limit Theorem
VI. Estimation of Parameters
A. Point estimates
B. Confidence intervals
VII. Sample Size
A. Random error and sample size
B. Factors in determining sample size for questions involving means
C. Estimating sample size for questions involving means
D. The influence of population size on sample size
E. Factors in determining sample size for proportions
F. Calculating sample size for sample proportions
G. Determining sample size on the basis of judgment
H. Determining sample size for stratified and other probability samples
I. Determining level of precision after data collection
VIII. A Reminder About Statistics
LEARNING OUTCOMES
1. Understand basic statistical terminology
2. Interpret frequency distributions, proportions, and measures of central tendency and dispersion
3. Distinguish among population, sample, and sampling distributions
4. Explain the central-limit theorem
5. Summarize the use of confidence interval estimates
6. Discuss major issues in specifying sample size
CHAPTER VIGNETTE: Federal Reserve Finds Cards Are Replacing
Cash
Payment options have gone high tech, and today’s spenders are more likely to pay with a debit or credit
card or through a variety of methods for electronic transfer of funds. Researchers at the Federal Reserve
conducted surveys of depository institutions (i.e., banks, savings and loans, credit unions) asking them to
report the number of each type of payment the institutions processed. The Fed’s carefully designed the
sample, including the number of institutions to contact. The total number of institutions was known—
14,117—and the researchers had to select enough from this population to be confident that the answers
would be representative of transactions nationwide. A stratified random sample was used so that each
type of institution would be included. They determined that 2,700 institutions would need to be sampled
so that they could say that the results, with 95 percent confidence, were accurate to within 5 percent of
the responses. 1,500 institutions responded, and their responses confirmed earlier analysis showing that
the number of checks is declining while the number of electronic payments is increasing.
SURVEY THIS!
Students are asked to consider the question on the survey that asks if the respondent is employed:
1. What percentage of respondents do you think will answer “yes”?
2. Based on your estimate, how many respondents would you need to be 95 percent confident your
responses are ±5 percent of the population proportion?
3. Look at the data from your class. At the 95 percent confidence level, how precise is the measure
regarding employment status?
Consider the question that asks how “interesting” or “boring” they feel their life is:
4. Review the “rule of thumb” provided in the chapter regarding estimating the value of the standard
deviation of a scale. How many respondents would you need to be 95 percent confident your
responses are ±5 percent of the population proportion?
5. What if you want to be 99 percent confident—what would the sample size have to be?
6. Look at the data from your class. At the 95 percent confidence level, how precise is this
measure?
RESEARCH SNAPSHOTS
The Well-Chosen Average
The average pay of the people who work in an establishment may mean something or it may not.
If the average is a median that means half the employees make more than that and half make less.
But if it is a mean, you may be getting nothing more than the average of one large income (i.e.,
the proprietor’s) and the salaries of a crew of underpaid workers. A simple example is given
where the “average wage” is $57,000. However, the mode, which is the most common and
occurs most frequently, is $20,000, and the median is $30,000, which means that half the people
earn more than this and half earn less. Do politicians use statistics to lie or do the statistics lie
when they claim that the “rich do not pay taxes”? Two websites are given to look at the tax
figures.
Sampling the World
The Gallup organization’s WorldView generates a comprehensive snapshot of the opinions of
people from across the globe. The company collects data from over 150 countries on various
topics using representative samples from 98 percent of the world’s adult population, with samples
generally consisting of 1,000 individuals. Gallup’s WorldView program provides extremely
valuable information on poorer and more rural populations. To accurately reflect the world’s
sentiments and opinions, you must have an accurate sample of the world’s population.
Target and Wal-Mart Shoppers Really Are Different
Scarborough Research sampled over 200,000 adults to compare consumers who shop exclusively
at either Target or Wal-Mart. The largest share (40 percent) named both stores when asked to
identify the stores at which they had shopped during the preceding three months. However, 31
percent shopped at Wal-Mart but not Target, and 12 percent shopped at Target but not Wal-Mart.
Target-only shoppers were younger and more likely to have a high household income, were more
likely to shop at more upscale stores, and to visit many different stores than the Wal-Mart-only
shoppers. The Wal-Mart shoppers were more likely to shop at discounters (i.e., Dollar General
and Kmart) and were more likely to be at least 50 years old. Given a U.S. adult population of
approximately 220 million, do you think the sample size was adequate to make these
comparisons?
OUTLINE
I. INTRODUCTION
Descriptive and Inferential Statistics
There are two applications of statistics: (1) to describe characteristics of the population or
sample (descriptive statistics) and (2) to generalize from the sample to the population
(inferential statistics).
Sample Statistics and Population Parameters
Sample statistics are variables in the sample or measures computed from the sample
data.
Population parameters are variables or measured characteristics of the population.
We will generally use Greek lowercase letters to denote population parameters (e.g., or
) and English letters to denote sample statistics (e.g., X or S).
II. MAKING THE DATA USABLE
To make data usable, this information must be organized and summarized.
Methods for doing this include:
frequency distributions
proportions
measures of central tendency and dispersion
Frequency Distributions
Constructing a frequency table or frequency distribution is one of the most common
means of summarizing a set of data.
The frequency of a value is the number of times a particular value of a variable occurs.
Exhibit 17.1 represents a frequency distribution of respondents’ answers to a question
asking how much customers had deposited in the savings and loan.
It is also quite simple to construct a distribution of relative frequency, or a percentage
distribution, which is developed by dividing the frequency of each value by the total
number of observations, and multiply the result by 100 (see Exhibit 17.2).
Probability is the long-run relative frequency with which an event will occur.
Inferential statistics uses the concept of a probability distribution, which is conceptually
the same as a percentage distribution except that the data are converted into probabilities
(see Exhibit 17.3).
Proportions
A proportion indicates the percentage of population elements that successfully meet
some standard on the particular characteristic.
May be expressed as a percentage, a fraction, or a decimal number.
Measures of Central Tendency
There are three ways to measure the central tendency, and each has a different meaning.
Mean
The mean is simply the arithmetic average, and it is a common measure of central
tendency.
It is the sum of all the observations divided by the number of observations.
Often we will not have enough data to calculate the population mean,
m
, so we will
calculate a sample mean,
X
(read as “X bar”).
Median
The median is the midpoint of the distribution, or the 50th percentile.
In other words, the median is the value below which half the values in the sample
fall.
Mode
The mode is the measure of central tendency that merely identifies the value that
occurs most often.
Determined by listing each possible value and noting the number of times each value
occurs.
Measures of Dispersion
Accurate analysis of data also requires knowing the tendency of observations to depart
from the central tendency.
Thus, another way to summarize the data is to calculate the dispersion of the data, or how
the observations vary from the mean.
There are several measures of dispersion discussed below.
Range
The simplest measure of dispersion.
It is the distance between the smallest and largest values of a frequency distribution.
Does not take into account all the observations; it merely tells us about the extreme
values of the distribution.
While we do not expect all observations to be exactly like the mean, in a skinny
distribution they will be a short distance from the mean, while in a fat distribution
they will be spread out (see Exhibit 17.6).
The interquartile range is the range encompassing the middle 50 percent of the
observations (i.e., the range between the bottom quartile and the top quartile).
Why Use the Standard Deviation?
It is perhaps the most valuable index of spread, or dispersion.
Learning about the standard deviation will be easier if we present several other
measures of dispersion that may be used. Each of these has certain limitations that
the standard deviation does not.
Deviation—a method of calculating how far any observation is from the
mean.
Average deviation determined by calculating the deviation score of each
observation value (i.e., its difference from the mean) and summing up each
score; then we divide by the sample size (n).
While this means of calculating a measure of spread seems of
interest, it is never used because the positive deviation scores are
always canceled out by the negative deviation scores, thus leaving an
average deviation value of zero.
One might correct the disadvantage of the average deviation by
computing the absolute values of the deviations. We could ignore all
the positive and negative signs and utilize only the absolute values of
each deviation to give us the mean absolute deviation, but there are
some technical mathematical problems that make the mean absolute
deviation less valuable than some other measures.
Variance
This is another means of eliminating the sign problem caused by the
negative deviations canceling out the positive deviations.
Useful for describing the sample variability.
This procedure is to square the deviation scores and divide by the
number of observations. (In actual fact n - 1 rather than n is used in
most pragmatic business research problems.)
A very good index of the degree of dispersion.
The variance, S2, will be equal to zero if—and only if—each and
every observation in the distribution is the same as the mean.
Standard Deviation
The variance does have one major drawback—it reflects a unit of
measurement that has been squared.
Because of this, statisticians have taken the square root of the
variance.
The square root of the variance for distribution is called the
standard deviation.
Exhibit 17.7 illustrates the calculation of a standard deviation.
S is the symbol for the sample standard deviation, while
s
is the
symbol for the population standard deviation.
III. THE NORMAL DISTRIBUTION
One of the most common probability distributions in statistics is the normal distribution,
commonly represented by the normal curve.
Bell shaped and almost all (99 percent) of its values are within 3 standard deviations from
its mean.
The standardized normal distribution is a specific normal curve that has several
characteristics
1. It is symmetrical about its mean.
2. The mode identifies its highest point, which is also the mean and median, and vertical
line about which this curve is symmetrical.
3. The normal curve has an infinite number of cases (it is a continuous distribution), and
the area under the curve has a probability density equal to 1.0.
4. The standardized normal distribution has a mean of 0 and a standard deviation of 1.
Exhibit 17.9 illustrates these properties, and Exhibit 17.10 is a summary version of the typical
standardized normal table found at the end of most statistics textbooks.
The standardized normal distribution is a purely theoretical probability distribution, but it is
the most useful distribution in inferential statistics.
The standardized normal distribution is extremely valuable because we can translate or
transform any normal variable, X, into the standardized value, Z.
This has many pragmatic implications for the business researcher.
The standardized normal table in the back of most statistics and research books allows us
to evaluate the probability of the occurrence of certain events without any difficulty.
Computing the standardized value, Z, of any measurement expressed in original units is
simple:
Subtract the mean from the value to be transformed, and divide by the standard deviation
(all expressed in original units).
In the formula note that
s
, the population standard deviation, is used for calculation:
Z = X - m
s
where = hypothesized or expected value of the mean
Example: Suppose that a toy manufacturer has experienced mean sales,
m
, of 9,000 units
and a standard deviation,
s
, of 500 units during the month of September. The
production manager wishes to know if wholesalers will demand between 7,500 and 9,625
units during the month of September in the upcoming year. Because there are no tables
in the back of our textbook showing the distribution for a mean of 9,000 and a standard
deviation of 500, we must transform our distribution of toy sales, X, into the standardized
form utilizing our simple formula.
Z = 7 , 500 - 9 , 000
500 = - 3 . 00
When Z = 3.00, the area under the curve (probability) equals .499
When Z = 1.25, the area under the curve (probability) equals .394
Thus, the total area under the curve is .499 + .394 = .893
The area under the curve portraying this computation is the shaded area in
Exhibit 17.12. Thus, the sales manager knows there is a .893 probability that
sales will be between 7,500 and 9,625.

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.