11 – 1 Compensation Thirteenth Edition Gerhart Newman Milkovich
CHAPTER ELEVEN
PERFORMANCE APPRAISALS
Overview
This chapter discusses the difficulties associated with measuring performance, particularly
when using subjective procedures. Performance reviews are used for a wide variety of
organizational decisions, one of which is to guide the allocation of merit increases.
Unfortunately, the link between performance ratings and organizational outcomes may be
lacking. Performance ratings—things entered into an employee’s record—are influenced by
numerous factors besides the employee behaviors observed by raters. These factors include:
The central focus of this chapter is on the strategies to improve the understanding and
measurement of job performance. These strategies address the following issues:
The various appraisal formats and suggestions to improve them
How to select the right raters
Next, the key elements of the performance evaluation process that ensure a good outcome in
the appraisal process are outlined. Legal issuesEqual Employment Opportunity (EEO) and
Learning Objectives
Define the role of performance appraisals including performance metrics.
Identify strategies for better understanding and measuring job performance.
Chapter Eleven: Performance Appraisals 11 – 2
Lecture Outline: Overview of Major Topics
I. The Role of Performance Appraisals in Compensation Decisions
A. Performance Metrics
II. Strategies for Better Understanding and Measuring Job Performance
A. The Balanced Scorecard Approach
B. Strategy 1: Improve Appraisal Formats
C. Strategy 2: Select the Right Raters
III. Putting It All Together: The Performance Evaluation Process
IV. Equal Employment Opportunity and Performance Evaluation
V. Tying Pay to Subjectively Appraised Performance
A. Performance- and Position-Based Guidelines
B. Designing Merit Guidelines
VI. Your Turn: Performance Appraisal at American Energy Development
VII. Appendix 11-A: Balanced Scorecard Example: Department of Energy (Federal Personal
Property Management Program)
VIII. Appendix 11-B: Sample Appraisal Form for Leadership Dimension: Pfizer Pharmaceutical
11 – 3 Compensation Thirteenth Edition Gerhart Newman Milkovich
Lecture Outline: Summary of Key Chapter Points
I. The Role of Performance Appraisals in Compensation Decisions
Performance reviews are used for a wide variety of decisions in organizationsonly
one of which is to guide the allocation of merit increases.
o Unfortunately, the link between performance ratings and these outcomes is not
Performance ratingsthe things employer’s enter into an employee’s permanent
recordare influenced by a host of factors besides the employee behaviors observed
by raters:
o Organization values (e.g., valuing technical skills or interpersonal skills more
highly)
o Competition among departments
Thus, employees often voice frustration about the appraisal process.
o The biggest complaint from employees (and managers too) is that appraisals are
A. Performance Metrics
There has been huge advances in the development of metrics.
o The first observation is that pay for performance programs evolve along
multiple dimensions.
o Finding good results-oriented measures at the individual level, is very
difficult.
o Just because something is quantifiable, though, does not mean it is an
objective measure of performance.
Chapter Eleven: Performance Appraisals 11 – 4
Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
Such potential for subjectivity has led some experts to warn that so-
called objective data can be criterion-deficient and might not provide
all the details.
o Despite these concerns, most HR professionals probably would prefer to
work with quantitative data.
One of the biggest attacks against appraisals in general and subjective
appraisals in particular, comes from top names in the total quality
management area.
o Edward Deming contended that the work situation (not the individual) is
the major determinant of performance.
Variation in performance arises many times because employees don’t
Some experts argue that rather than throwing out the entire appraisal process,
total quality management principles should be applied to improve it.
o One way to improve performance appraisals would be to recognize that
II. Strategies for Better Understanding and Measuring Job Performance
Efforts to improve the performance rating process take several forms.
o First, researchers and compensation people alike devote considerable energy to
defining job performance: what exactly should be measured when evaluating
employees?
o Managers can be grouped into one of three categories, based on the types of
employee behaviors they focus on.
One group looks strictly at task performance, how the employees perform the
o Studies that examine more specific factors focus on such performance dimensions
11 – 5 Compensation Thirteenth Edition Gerhart Newman Milkovich
as:
Planning and organizing
A. The Balanced Scorecard Approach
A balanced scorecard approach is a way to look at what contributes value in
an organization.
o It acknowledges that bottom line success depends on satisfied customers
buying products and services from effective and satisfied employees who
both serve the customers and produce goods (or deliver services) in the
most operationally efficient way possible.
o If this is true, then employers need to measure all four of the following
dimensions and be prepared to say that success depends on high scores for
each:
Customer satisfaction
Besides the widespread enthusiasm in industry for this approach, there is data
that suggest implementation of a balanced scorecard can have positive
impacts on the bottom line and on rating accuracy.
o Appendix 11-A shows a balanced scorecard used by the Department of
Energy.
A second direction for performance research notes that the definition of
performance and its components is expanding.
o Jobs are becoming more dynamic, and the need for employees to adapt
A third direction for improving the quality of performance ratings centers on
identifying the best appraisal format.
The fourth direction identifies possible groups of raters (supervisors, peers,
subordinates, customers, self) and examines whether a given group provides
Chapter Eleven: Performance Appraisals 11 – 6
Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
job performance and translate it into performance ratings.
Finally, data also suggest that raters can be trained to increase the accuracy of
their ratings.
B. Strategy 1: Improve Appraisal Formats
Types of Formats
o Evaluation formats can be divided into two general categories:
Ranking
Rating
o Ranking formats require that the rater compare employees against each
other to determine the relative ordering of the group on some performance
measure.
Exhibit 11.1 illustrates three different methods of ranking employees.
The straight ranking procedure is just that: employees are ranked
relative to each other.
Alternation ranking recognizes that raters are better at ranking
people at extreme ends of distribution.
The pairedcomparison ranking method forces raters to make
ranking judgments about discrete pairs of people.
o Each individual is compared separately with all others in the
work group.
o The second category of appraisal formats ratingsis generally more
popular than ranking systems.
The popularity is not supported by evidence that rating formats are
particularly valid.
o The various rating formats have two elements in common.
First, in contrast to ranking formats, rating formats require raters to
11 – 7 Compensation Thirteenth Edition Gerhart Newman Milkovich
evaluate employees on some absolute standard rather than relative to
other employees.
o It is the types of descriptors used in anchoring this continuum that provide
the major difference in rating scales. These descriptors may be:
Adjectives
o When adjectives are used as anchors, the format is called a standard
rating scale.
Exhibit 11.2 shows a typical rating scale with adjectives as anchors.
o Switching to behaviors as anchors, behaviorally anchored rating scales
(BARS) seem to be the most common format using behaviors as
descriptors.
By anchoring scales with concrete behaviors, firms adopting a BARS
format hope to make evaluations less subjective.
Overall employee performance is calculated as a weighted average of
the ratings on all performance dimensions in both the standard rating
scale and BARS.
o In addition to adjectives and behaviors, outcomes also are used as a
standard. The most common form is Management By Objectives
(MBO).
As a first step, organization objectives are identified from the strategic
plan of the company.
Chapter Eleven: Performance Appraisals 11 – 8
Each successively lower level in the organizational hierarchy is
charged with identifying work objectives that will support attainment
of organizational goals.
Results are then compared against objectives, and a performance
rating is determined based on how well the objectives were met.
A review of firms using MBO indicates generally positive
improvements in performance both for individuals and for the
organization.
o A final type of appraisal format does not easily fall into any of the
categories yet discussed.
In an essay format, supervisors answer open-ended questions, in essay
form, describing employee performance.
Evaluating Performance Appraisal Formats
o A good performance appraisal format scores well on five dimensions:
Employee development potential (amount of feedback about
performance that the format offers)
Administrative ease
11 – 9 Compensation Thirteenth Edition Gerhart Newman Milkovich
The five main criteria are explained below.
Employee development criterion
Feedback has a positive impact on job performance.
Administrative criterion
Ease of use of evaluation results for administrative decisions
concerning wage increases, promotions, demotions, terminations,
Typically, this is a numerical rating of performance.
Personnel research criterion
Does the instrument lend itself well to validating employment
tests?
Cost criterion
Does the evaluation form initially require a long time to be
developed?
Is it time-consuming for supervisors to use the form in rating their
employees?
Is it expensive to use?
Validity criterion
By far the most research on formats in recent years has focused on
o Exhibit 11.7 provides a report card on the five most common rating
formats relative to the criteria just discussed.
o The choice of an appraisal format is dependent on the types of tasks being
performed.
Tasks can be ordered along a continuum from those that are very
routine to those for which the appropriate behavior for goal
Chapter Eleven: Performance Appraisals 11 – 10
tasks that meet the assumptions for that format.
At one extreme of the continuum are behavior-based evaluation
procedures that define specific performance expectations against
which employee performance is evaluated.
At the other extreme of the continuum are tasks that are highly
uncertain in nature.
C. Strategy 2: Select the Right Raters
A second way that firms have tried to improve the accuracy of performance
ratings is by focusing on who might conduct the ratings and which of these
sources is more likely to be accurate.
To lessen the impact of one reviewer, and to increase participation in the
Regardless of the positive responses from those who have implemented the
360-degree feedback system, today most companies still use it only for
evaluation of their top-level personnel and for employee development rather
than for appraisals or pay decisions.
Supervisors as Raters
o Some estimates indicate that more than 80% of the input for performance
ratings comes from supervisors.
11 – 11 Compensation Thirteenth Edition Gerhart Newman Milkovich
Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
is required for any given level of performance rating.
o On the negative side, supervisors are particularly prone to halo and
leniency errors.
Peers as Raters
o One of the major strengths of using peers as raters is that they work more
closely with the ratee and probably have an undistorted perspective of
typical performance, particularly in group assignments.
o Balanced against this positive are at least two powerful negatives:
Peers may have little or no experience in conducting appraisals,
Self as Rater
o Some organizations have experimented with self-ratings.
o Self-ratings are done by someone who has the most complete knowledge
about the ratee’s performance.
Customer as Rater
o This is the era of the customer.
Subordinate as Rater
o The notion of subordinates as raters is appealing since most employees
want to be successful with the people who report to them.
o Hearing how they are viewed by their subordinates gives them the chance
to both see their strengths and their weaknesses as a leader and to modify
their behavior.
Chapter Eleven: Performance Appraisals 11 – 12
Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
o The difficulty with this type of rating is in attaining candid reviews and
also in counseling the ratee on how to deal with the feedback.
o Research shows that subordinates prefer to give their feedback to
managers anonymously.
o If their identity is known they give artificially inflated ratings of their
supervisors.
Interrater Reliability (and Multiple Raters)
o As we saw in Chapter 6, an important criterion for any measure is that
variance in scores due to error be minimized and variance in scores due to
true differences be maximized.
o Reliability is defined as true variance divided by the sum of true variance
plus an error variance.
One clear path to maximizing reliability when rating are used,
interrater reliability, is to use multiple raters.
There is a formula known as the Spearman-Brown prophecy formula
which tells how interrater reliability will change as we add more
raters. See Exhibit 11.8.
ij
ij
kk rk
rk
r)1(1 +
=
Where rkk is the expected interrater reliability.
k is the number of raters to be used.
As an example, in the following data, three raters each rate the same
five employees.
The interrater correlations,
ij
r
, are:
11 – 13 Compensation Thirteenth Edition Gerhart Newman Milkovich
Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill
Education.
estimated by Spearman-Brown would be:
45).11(1
)45(.1
45.+
=
If, however, we were to use the average of all three ratings for each
45).13(1
o As an aside, because each rater rates the same five employees,
we can determine here that the three raters differ in their
Employee
Performance Rating Assigned by:
Mean
Rater 2
Rater 3
A
4
5
4.0
B
5
4
4.3
C
3
4
3.0
D
6
5
4.7
E
4
5
4.7
Mean
4.4
4.6
4.2
SD
1.0
0.5
0.7
o To see the consequences of different interrater reliabilities, we can
compute confidence intervals.
As a first step, we compute the so-called true score, which equals
mean score + (observed score mean score) x interrater reliability
SD is the standard deviation of the mean score of each employee.
To form a 95% confidence interval, we use the formula
Chapter Eleven: Performance Appraisals 11 – 14
This is a wide interval, what if we used the average of three ratings
instead?
o We know that produces an interrater reliability of .71 using
Spearman-Brown.
o To summarize, as this example shows, using more raters, all else equal,
improves interrater reliability, which directly translates into more precise
estimates of performance.
D. Strategy 3: Understand How Raters Process Information
A third way to improve performance ratings is to understand how raters
thinkwhat else influences ratings besides an employee’s performance?
Research exploring how raters process information about the performance of
the people indicates the following kinds of processes occur.
o The rater observes the behavior of a ratee.
o The rater encodes this behavior as part of a total picture of the ratee, i.e.,
Quite unintentionally, this process can produce errors, and they can occur at
any stage.
Errors in the Rating Process
o Studies show that performance actually does play an important role,
perhaps the major role, in determining how a supervisor rates a
subordinate.
11 – 15 Compensation Thirteenth Edition Gerhart Newman Milkovich
Common Errors in Appraising Performance: Criterion Contamination
o Criterion contamination, or allowing non-performance factors to affect
performance scores, occurs in every company and every job, and probably
affects everyone sometime during their careers.
o One survey of 1,816 organizations reported that only 4.6% of the
o Employees, quite naturally, will be reluctant to have pay systems tied to
such error-ridden performance ratings.
o There are several factors that lead raters to give inaccurate appraisals:
Guilt
Embarrassment about giving praise
o Companies and researchers alike have expended considerable time and
money to identify ways job performance can be measured better.
Errors in Observation (Attention)
o Generally, researchers have varied three types of input information to see
what raters pay attention to when they are collecting information for
performance appraisals.
o First, it appears that raters are influenced by general appearance
characteristics of the ratees.
o Researchers also look at change in performance over time to see if this
influences performance ratings.
Chapter Eleven: Performance Appraisals 11 – 16
(average) of performance is controlled.
o Workers who start out high in performance and then get worse are rated
lower than workers who remain consistently low.
Errors in Storage and Recall
o Research suggests that raters store information in the form of traits.
More importantly, they tend to recall information in the form of trait
categories.
For example, a rater observes a specific behavior such as an employee
resting during work hours.
o The entire rating process may be heavily influenced by the trait categories
that the rater adopts, regardless of their accuracy.
o Errors in storage and recall also appear to arise from memory decay.
Errors in the Actual Evaluation
o The context of the actual evaluation process also can influence
evaluations.
Several researchers indicate that the purpose of an evaluation affects
the rating process.
o Supervisors also tend to weigh negative attributes more heavily than
positive attributes.
o If the purpose of evaluation is to divide up a fixed pot of merit increases,
ratings also tend to be less accurate.
Supervisors who know ratings will be used to determine merit