This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
18
Confirmatory Data Analysis
• Stage 3: Confirm what the data reveal
▪ Confidence Intervals
▪ Null Hypothesis Significance Testing (NHST)
o Most common approach for data analysis
o Be cautious when using NHST
• Null Hypothesis Significance Testing
▪ Goal: To determine whether mean differences among groups in an
experiment are greater than differences expected simply because of
chance (error variation)
▪ Step 1: Assume the groups do not differ (H0)
o Null hypothesis
o Assume IV had no effect
▪ Step 2: Compute appropriate statistic to test for group differences
o t-test for 2 groups, F-test for 2+ groups
▪ Step 3: Obtain probability value for statistic and compare to level of
significance (α, alpha, typically p < .05)
▪ Step 4: Is a finding “Statistically Significant”?
o Outcome has small likelihood of occurring under H0
o “small likelihood”: p < .05
o Reject H0 and conclude IV had an effect on DV
o Difference between means is larger than what would be expected
if error variation (chance) alone caused the outcome.
19
Interpreting NHST
• What does a statistically significant outcome tell us?
▪ Outcome at p ≈ .05 has about a 50/50 chance of being repeated (at p
< .05) in an exact replication.
▪ As probability of observed outcome decreases (e.g., p = .025, p =
.006), probability of observing a statistically significant outcome (p <
.05) in an exact replication increases.
▪ APA recommends reporting exact probability of each statistical test.
• What do we conclude when a finding is not statistically significant?
▪ Do not reject H0 of no difference.
▪ Do not accept H0.
▪ We cannot make a conclusion about effect of IV.
o Some factor in experiment may have prevented us from observing
a true effect of the IV.
o Most common factor: too few participants
• Errors in NHST decisions
▪ NHST decisions are based on probabilities, so errors are possible
▪ Type I error: Null hypothesis is rejected when it is really true (i.e., no
true effect of IV); equal to alpha (level of significance)
▪ Type II error: Null hypothesis is false but it’s not rejected (i.e., IV truly
has an effect that wasn’t detected)
20
▪ Researchers are always tentative about their claims
o e.g., findings “support” hypothesis (do not “prove” it)
21
Experimental Sensitivity and Statistical Power
• Sensitivity of an experiment
▪ Likelihood an experiment will detect an IV effect when in fact, IV has
an effect
▪ Sensitivity affected by good research design and methods
o hold conditions constant, reduce variability
• Power of a statistical test
▪ Likelihood a statistical test will allow researchers to reject correctly H0
▪ 3 factors affect power
o level of significance (alpha)
o size of IV effect
o sample size (N)
▪ Best way to increase power: Increase sample size
22
NHST: Comparison of Two Means
• Inferential statistical tests
▪ When two means are from independent groups
o use independent groups t-test
▪ When two means are from repeated measures design or matched
groups design
o use within-subjects (repeated measures) t-test
• Independent Groups t-test
▪ Conceptual definition: Difference between means
Standard error of mean difference
▪ Compute t-statistic using statistical software or by hand using formula
▪ Obtain probability of t-statistic from output or t Table (df = N − 2)
▪ Compare probability to level of significance (typically p < .05)
▪ If observed p value is < .05, reject H0
o Conclude IV produced an effect on DV
▪ If observed p value is > .05, do not reject H0
o Withhold judgment about effect of IV on DV
o Determine power of statistical test
• Effect size: Cohen’s d
▪ Formula: 2t
d = √df
▪ Interpretation: Small effect: d = .20
Medium effect: d = .50
Large effect: d = .80
23
Significance
• Statistical significance is not the same as scientific significance or
practical/clinical significance.
• Scientific significance depends on
▪ Nature of variables under study
▪ Internal validity of a study
o a study with confoundings can produce statistically significant
effects (that cannot be interpreted)
▪ Other criteria, such as effect size
• Practical and clinical significance depend on
▪ External validity of a finding
▪ Effect size
▪ Practical considerations regarding the cost and ease with which a
treatment can be implemented
24
Recommendations for Comparisons of Two Means
• Remember there are several ways to provide evidence for a claim about
behavior.
▪ NHST
▪ Confidence Interval (CI)
▪ NHST is most common; APA recommends CIs
• Use simplest analysis
• Include descriptive statistics (M, SD) and effect size (Cohen’s d)
• Understand limitations of NHST and claims that can (and cannot) be
made.
25
Data Analysis: More than Two Conditions
• Experiments often have more than 2 conditions
▪ Single-factor (IV) experiment with 3 or more levels
▪ Complex design experiment with 2 or more IVs
• Analysis of Variance (ANOVA)
▪ Most frequently used statistical procedure for more than 2 conditions
▪ Uses NHST
▪ Identifies whether IV produces statistically significant effect on DV
▪ Logic of ANOVA: Identify sources of variation in the data
o Error variation (“chance”)
o Systematic variation (effect of IV)
▪ Error variation (within-group)
o In a properly conducted random groups design, the only
differences within each group should be error variation alone.
▪ Differences among participants (individual differences)
▪ Hold conditions constant to reduce error variation.
▪ Systematic variation (between-group)
o Second source of variation is between groups–the effect of the
different IV conditions
o If H0 is true (no effect of IV – no difference between groups), any
observed difference among groups is due to error variation alone.
o If H0 is false (IV has effect)
▪ Means for experimental conditions should differ
▪ Differences should be systematic (due to IV)
▪ Differences among group means are due to effect of IV
(systematic variation) plus error variation.
26
ANOVA: The F-test
• Determines whether variation in data due to IV is larger than what would
be expected based on error variation alone
• Conceptual definition
variation between groups
F = variation within groups
▪ “variation between groups” = systematic variation + error variation
▪ “variation within groups” = error variation
▪ Therefore systematic variation + error variation
F = error variation
• Logic
▪ If H0 is true, there is no systematic variation between groups (no
effect of IV)
▪ The F ratio has an expected value of 1.0
(zero)
systematic variation + error variation
F = error variation = 1.0
▪ As systematic variation increases (due to effect of IV), the expected
value of F ratio becomes greater than 1.0
(↑effect of IV)
systematic variation + error variation
F = error variation
• Use NHST to determine how much greater than 1.0 the F-test must be
to reach statistical significance.
27
NHST with ANOVA
• Step 1: Assume H0 – no effect of IV
• Step 2: Compute ANOVA F-tests and obtain p values for F statistics
• Step 3: Compare p values with level of significance (p < .05)
• Decisions
▪ If observed p value is < .05, reject H0 and claim IV produced an
effect.
o There is a difference somewhere among the means.
▪ If observed p value is > .05, do not reject H0.
o There is insufficient evidence to claim IV produced an effect.
• ANOVA Summary Table
▪ Provides statistics about sources of variation in data from an
experiment (example: one IV with 4 conditions)
Sum of
Source Squares (SS) df Mean Square (MS) F p
Group (between) 54.55 3 18.18 7.80 .002
Error (within) 37.20 16 2.33
Total 91.75 19
▪ Mean Square for “Group” IV = systematic + error variation
Mean Square Error (MSE) = estimate of error variation
F-test: MSGroup ÷ MSE (18.18 ÷ 2.33 = 7.80)
• Result: F(3, 16) = 7.80, p = .002
• Conclusion
▪ F statistic is statistically significant at p < .05
▪ IV produced an effect on DV: the 4 Group means differ somewhere
▪ F-test doesn’t tell us which of the means for the 4 conditions differ.
28
▪ Examine means to locate source of effect
o Use descriptive statistics and comparisons of means two at a time
29
Effect Size and ANOVA
• Measure “strength of association” between IV and DV
• Estimate the proportion of variance in participants’ scores that is due to
effect of IV
• Larger effect sizes indicate IV accounts for (“explains”) participants’
performance more than smaller effect sizes
• ANOVA: Effect size measure is “eta-squared” (η2)
▪ Calculate η2 from values in ANOVA Summary Table or report of an F-
test
Sum of Squares Between Groups
η2 = Total Sum of Squares
(F)(df effect)
or η2 = [(F)(df effect)] + (df error)
• Another effect size measure for 3 or more groups is Cohen’s f
η2
f = 1 − η2
▪ Cohen’s guidelines for effect size of f
o small: f = .10
o medium: f = .25
o large: f = .40
30
Describing Effects in Multi-Group Experiments
• Following a statistically significant omnibus F-test
▪ Identify which of the group means differ
▪ Use “comparison of two means”
• Example: Suppose an experiment has an IV with 3 conditions, a
treatment and 2 control conditions
▪ The F-test is statistically significant.
▪ Which of the 3 means differ?
• One possible comparison: Is the mean for the treatment group different
from the average of the means for the 2 control groups?
• Formula
M1 − M2
t = 1 1
[ MSE ] n1 + n2
▪ MSE from ANOVA Summary Table
▪ n1 and n2 are sample sizes associated with each Mean
▪ Check statistical significance using Table A.2 or Internet sites for t-
tests
• Cohen’s d formula
2(t)
d = √dferror
▪ Small effect: d = .30
Medium effect: d = .50
Large effect: d = .80
31
Repeated Measures ANOVA
• Procedures and Logic
▪ Similar NHST steps as for independent groups design
▪ Differences
o For complete repeated measures design, 1st compute summary
score (e.g., M, Md) for each participant for each condition
o Then summarize performance for each condition across all
participants
o Estimate of error variation: “Residual variation”
• Residual variation
▪ Variation that remains when systematic variation due to IV and
participants is removed from the estimate of total variation
▪ Variation due to participants is eliminated in repeated measures
designs
o Same individuals participate in each condition
o Repeated measures designs are more sensitive because variation
caused by different participants in conditions is eliminated
o More sensitive = better able to detect effect of IV
32
ANOVA for Complex Designs
• Complex Designs: 2 or more IVs, each with at least 2 levels
• ANOVA indicates
▪ main effects for each IV
▪ interaction effects between IVs
• Procedure for analysis depends on whether interaction effect is
statistically significant.
• Analysis of complex design with an interaction effect
▪ Identify source of interaction
o simple main effects and comparisons of two means
▪ Simple main effect
o Effect of an IV at one level of 2nd IV
o If simple main effect is statistically significant and IV has 3 or more
levels, compare means 2 at a time.
▪ After simple main effects are analyzed, examine main effects.
▪ Use confidence intervals.
o If confidence intervals do not overlap, then a difference between
population means is likely.
• Analysis of complex design with no interaction effect
▪ If omnibus (overall) ANOVA indicates interaction effect is not
statistically significant then
▪ Examine main effects
▪ If main effect(s) is statistically significant and there are 3 or more
groups, compare means 2 at a time and draw confidence intervals.
33
Reporting Results of a Complex Design
• Include the following
▪ Description of independent and dependent variables
▪ Summary statistics for cells of the design
o Text, Table, or Figure (depending on number of conditions and
effects)
o Confidence Intervals for group means
o Effect sizes
▪ Results of ANOVA
o Interaction effects and main effects
o Simple main effects and comparisons of means 2 at a time
o Power analysis for nonsignificant effects
▪ Verbal description of effects
o Include conclusions about effects of IVs
Trusted by Thousands of
Students
Here are what students say about us.
Resources
Company
Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.