Human Resources Chapter 11 The Person Who Wins The Most Paired Comparisons Ranked Top The Group

11 – 1 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

CHAPTER ELEVEN

PERFORMANCE APPRAISALS

Overview

This chapter discusses the difficulties associated with measuring performance, particularly

when using subjective procedures. Performance reviews are used for a wide variety of

organizational decisions, one of which is to guide the allocation of merit increases.

Unfortunately, the link between performance ratings and organizational outcomes may be

lacking. Performance ratings—things entered into an employee’s record—are influenced by

numerous factors besides the employee behaviors observed by raters. These factors include:

The central focus of this chapter is on the strategies to improve the understanding and

measurement of job performance. These strategies address the following issues:

• The various appraisal formats and suggestions to improve them

• How to select the right raters

Next, the key elements of the performance evaluation process that ensure a good outcome in

the appraisal process are outlined. Legal issues—Equal Employment Opportunity (EEO) and

Learning Objectives

• Define the role of performance appraisals including performance metrics.

• Identify strategies for better understanding and measuring job performance.

Chapter Eleven: Performance Appraisals 11 – 2

Lecture Outline: Overview of Major Topics

I. The Role of Performance Appraisals in Compensation Decisions

A. Performance Metrics

II. Strategies for Better Understanding and Measuring Job Performance

A. The Balanced Scorecard Approach

B. Strategy 1: Improve Appraisal Formats

C. Strategy 2: Select the Right Raters

III. Putting It All Together: The Performance Evaluation Process

IV. Equal Employment Opportunity and Performance Evaluation

V. Tying Pay to Subjectively Appraised Performance

A. Performance- and Position-Based Guidelines

B. Designing Merit Guidelines

VI. Your Turn: Performance Appraisal at American Energy Development

VII. Appendix 11-A: Balanced Scorecard Example: Department of Energy (Federal Personal

Property Management Program)

VIII. Appendix 11-B: Sample Appraisal Form for Leadership Dimension: Pfizer Pharmaceutical

11 – 3 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

Lecture Outline: Summary of Key Chapter Points

I. The Role of Performance Appraisals in Compensation Decisions

• Performance reviews are used for a wide variety of decisions in organizations—only

one of which is to guide the allocation of merit increases.

o Unfortunately, the link between performance ratings and these outcomes is not

• Performance ratings—the things employer’s enter into an employee’s permanent

record—are influenced by a host of factors besides the employee behaviors observed

by raters:

o Organization values (e.g., valuing technical skills or interpersonal skills more

highly)

o Competition among departments

• Thus, employees often voice frustration about the appraisal process.

o The biggest complaint from employees (and managers too) is that appraisals are

A. Performance Metrics

• There has been huge advances in the development of metrics.

o The first observation is that pay for performance programs evolve along

multiple dimensions.

o Finding good results-oriented measures at the individual level, is very

difficult.

o Just because something is quantifiable, though, does not mean it is an

objective measure of performance.

Chapter Eleven: Performance Appraisals 11 – 4

Education.

▪ Such potential for subjectivity has led some experts to warn that so-

called objective data can be criterion-deficient and might not provide

all the details.

o Despite these concerns, most HR professionals probably would prefer to

work with quantitative data.

• One of the biggest attacks against appraisals in general and subjective

appraisals in particular, comes from top names in the total quality

management area.

o Edward Deming contended that the work situation (not the individual) is

the major determinant of performance.

▪ Variation in performance arises many times because employees don’t

• Some experts argue that rather than throwing out the entire appraisal process,

total quality management principles should be applied to improve it.

o One way to improve performance appraisals would be to recognize that

II. Strategies for Better Understanding and Measuring Job Performance

• Efforts to improve the performance rating process take several forms.

o First, researchers and compensation people alike devote considerable energy to

defining job performance: what exactly should be measured when evaluating

employees?

o Managers can be grouped into one of three categories, based on the types of

employee behaviors they focus on.

▪ One group looks strictly at task performance, how the employees perform the

o Studies that examine more specific factors focus on such performance dimensions

11 – 5 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

as:

▪ Planning and organizing

A. The Balanced Scorecard Approach

• A balanced scorecard approach is a way to look at what contributes value in

an organization.

o It acknowledges that bottom line success depends on satisfied customers

buying products and services from effective and satisfied employees who

both serve the customers and produce goods (or deliver services) in the

most operationally efficient way possible.

o If this is true, then employers need to measure all four of the following

dimensions and be prepared to say that success depends on high scores for

each:

▪ Customer satisfaction

• Besides the widespread enthusiasm in industry for this approach, there is data

that suggest implementation of a balanced scorecard can have positive

impacts on the bottom line and on rating accuracy.

o Appendix 11-A shows a balanced scorecard used by the Department of

Energy.

• A second direction for performance research notes that the definition of

performance and its components is expanding.

o Jobs are becoming more dynamic, and the need for employees to adapt

• A third direction for improving the quality of performance ratings centers on

identifying the best appraisal format.

• The fourth direction identifies possible groups of raters (supervisors, peers,

subordinates, customers, self) and examines whether a given group provides

Chapter Eleven: Performance Appraisals 11 – 6

Education.

job performance and translate it into performance ratings.

• Finally, data also suggest that raters can be trained to increase the accuracy of

their ratings.

B. Strategy 1: Improve Appraisal Formats

Types of Formats

o Evaluation formats can be divided into two general categories:

▪ Ranking

▪ Rating

o Ranking formats require that the rater compare employees against each

other to determine the relative ordering of the group on some performance

measure.

▪ Exhibit 11.1 illustrates three different methods of ranking employees.

• The straight ranking procedure is just that: employees are ranked

relative to each other.

• Alternation ranking recognizes that raters are better at ranking

people at extreme ends of distribution.

• The paired–comparison ranking method forces raters to make

ranking judgments about discrete pairs of people.

o Each individual is compared separately with all others in the

work group.

o The second category of appraisal formats— ratings—is generally more

popular than ranking systems.

▪ The popularity is not supported by evidence that rating formats are

particularly valid.

o The various rating formats have two elements in common.

▪ First, in contrast to ranking formats, rating formats require raters to

11 – 7 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

evaluate employees on some absolute standard rather than relative to

other employees.

o It is the types of descriptors used in anchoring this continuum that provide

the major difference in rating scales. These descriptors may be:

▪ Adjectives

o When adjectives are used as anchors, the format is called a standard

rating scale.

▪ Exhibit 11.2 shows a typical rating scale with adjectives as anchors.

o Switching to behaviors as anchors, behaviorally anchored rating scales

(BARS) seem to be the most common format using behaviors as

descriptors.

▪ By anchoring scales with concrete behaviors, firms adopting a BARS

format hope to make evaluations less subjective.

▪ Overall employee performance is calculated as a weighted average of

the ratings on all performance dimensions in both the standard rating

scale and BARS.

o In addition to adjectives and behaviors, outcomes also are used as a

standard. The most common form is Management By Objectives

(MBO).

▪ As a first step, organization objectives are identified from the strategic

plan of the company.

Chapter Eleven: Performance Appraisals 11 – 8

▪ Each successively lower level in the organizational hierarchy is

charged with identifying work objectives that will support attainment

of organizational goals.

▪ Results are then compared against objectives, and a performance

rating is determined based on how well the objectives were met.

▪ A review of firms using MBO indicates generally positive

improvements in performance both for individuals and for the

organization.

o A final type of appraisal format does not easily fall into any of the

categories yet discussed.

▪ In an essay format, supervisors answer open-ended questions, in essay

form, describing employee performance.

Evaluating Performance Appraisal Formats

o A good performance appraisal format scores well on five dimensions:

▪ Employee development potential (amount of feedback about

performance that the format offers)

▪ Administrative ease

11 – 9 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

The five main criteria are explained below.

▪ Employee development criterion

• Feedback has a positive impact on job performance.

▪ Administrative criterion

• Ease of use of evaluation results for administrative decisions

concerning wage increases, promotions, demotions, terminations,

• Typically, this is a numerical rating of performance.

▪ Personnel research criterion

• Does the instrument lend itself well to validating employment

tests?

▪ Cost criterion

• Does the evaluation form initially require a long time to be

developed?

• Is it time-consuming for supervisors to use the form in rating their

employees?

• Is it expensive to use?

▪ Validity criterion

• By far the most research on formats in recent years has focused on

o Exhibit 11.7 provides a report card on the five most common rating

formats relative to the criteria just discussed.

o The choice of an appraisal format is dependent on the types of tasks being

performed.

▪ Tasks can be ordered along a continuum from those that are very

routine to those for which the appropriate behavior for goal

Chapter Eleven: Performance Appraisals 11 – 10

tasks that meet the assumptions for that format.

▪ At one extreme of the continuum are behavior-based evaluation

procedures that define specific performance expectations against

which employee performance is evaluated.

▪ At the other extreme of the continuum are tasks that are highly

uncertain in nature.

C. Strategy 2: Select the Right Raters

• A second way that firms have tried to improve the accuracy of performance

ratings is by focusing on who might conduct the ratings and which of these

sources is more likely to be accurate.

• To lessen the impact of one reviewer, and to increase participation in the

• Regardless of the positive responses from those who have implemented the

360-degree feedback system, today most companies still use it only for

evaluation of their top-level personnel and for employee development rather

than for appraisals or pay decisions.

Supervisors as Raters

o Some estimates indicate that more than 80% of the input for performance

ratings comes from supervisors.

11 – 11 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

Education.

is required for any given level of performance rating.

o On the negative side, supervisors are particularly prone to halo and

leniency errors.

Peers as Raters

o One of the major strengths of using peers as raters is that they work more

closely with the ratee and probably have an undistorted perspective of

typical performance, particularly in group assignments.

o Balanced against this positive are at least two powerful negatives:

▪ Peers may have little or no experience in conducting appraisals,

Self as Rater

o Some organizations have experimented with self-ratings.

o Self-ratings are done by someone who has the most complete knowledge

about the ratee’s performance.

Customer as Rater

o This is the era of the customer.

Subordinate as Rater

o The notion of subordinates as raters is appealing since most employees

want to be successful with the people who report to them.

o Hearing how they are viewed by their subordinates gives them the chance

to both see their strengths and their weaknesses as a leader and to modify

their behavior.

Chapter Eleven: Performance Appraisals 11 – 12

Education.

o The difficulty with this type of rating is in attaining candid reviews and

also in counseling the ratee on how to deal with the feedback.

o Research shows that subordinates prefer to give their feedback to

managers anonymously.

o If their identity is known they give artificially inflated ratings of their

supervisors.

Interrater Reliability (and Multiple Raters)

o As we saw in Chapter 6, an important criterion for any measure is that

variance in scores due to error be minimized and variance in scores due to

true differences be maximized.

o Reliability is defined as true variance divided by the sum of true variance

plus an error variance.

▪ One clear path to maximizing reliability when rating are used,

interrater reliability, is to use multiple raters.

▪ There is a formula known as the Spearman-Brown prophecy formula

which tells how interrater reliability will change as we add more

raters. See Exhibit 11.8.

ij

kk rk

rk

r)1(1 −+

=

▪ Where rkk is the expected interrater reliability.

▪ k is the number of raters to be used.

▪ As an example, in the following data, three raters each rate the same

five employees.

• The interrater correlations,

ij

r

, are:

11 – 13 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

Education.

estimated by Spearman-Brown would be:

45).11(1

)45(.1

45.−+

=

• If, however, we were to use the average of all three ratings for each

45).13(1

o As an aside, because each rater rates the same five employees,

we can determine here that the three raters differ in their

Employee

Performance Rating Assigned by:

Mean

Rater 1

Rater 2

Rater 3

A

3

4

5

4.0

B

4

5

4

4.3

C

2

3

4

3.0

D

4

6

5

4.7

E

5

4

5

4.7

Mean

3.6

4.4

4.6

4.2

SD

1.0

0.5

0.7

o To see the consequences of different interrater reliabilities, we can

compute confidence intervals.

▪ As a first step, we compute the so-called true score, which equals

mean score + (observed score – mean score) x interrater reliability

• SD is the standard deviation of the mean score of each employee.

▪ To form a 95% confidence interval, we use the formula

Chapter Eleven: Performance Appraisals 11 – 14

• This is a wide interval, what if we used the average of three ratings

instead?

o We know that produces an interrater reliability of .71 using

Spearman-Brown.

o To summarize, as this example shows, using more raters, all else equal,

improves interrater reliability, which directly translates into more precise

estimates of performance.

D. Strategy 3: Understand How Raters Process Information

• A third way to improve performance ratings is to understand how raters

think—what else influences ratings besides an employee’s performance?

• Research exploring how raters process information about the performance of

the people indicates the following kinds of processes occur.

o The rater observes the behavior of a ratee.

o The rater encodes this behavior as part of a total picture of the ratee, i.e.,

• Quite unintentionally, this process can produce errors, and they can occur at

any stage.

Errors in the Rating Process

o Studies show that performance actually does play an important role,

perhaps the major role, in determining how a supervisor rates a

subordinate.

11 – 15 Compensation – Thirteenth Edition Gerhart │Newman │Milkovich

Common Errors in Appraising Performance: Criterion Contamination

o Criterion contamination, or allowing non-performance factors to affect

performance scores, occurs in every company and every job, and probably

affects everyone sometime during their careers.

o One survey of 1,816 organizations reported that only 4.6% of the

o Employees, quite naturally, will be reluctant to have pay systems tied to

such error-ridden performance ratings.

o There are several factors that lead raters to give inaccurate appraisals:

▪ Guilt

▪ Embarrassment about giving praise

o Companies and researchers alike have expended considerable time and

money to identify ways job performance can be measured better.

Errors in Observation (Attention)

o Generally, researchers have varied three types of input information to see

what raters pay attention to when they are collecting information for

performance appraisals.

o First, it appears that raters are influenced by general appearance

characteristics of the ratees.

o Researchers also look at change in performance over time to see if this

influences performance ratings.

Chapter Eleven: Performance Appraisals 11 – 16

(average) of performance is controlled.

o Workers who start out high in performance and then get worse are rated

lower than workers who remain consistently low.

Errors in Storage and Recall

o Research suggests that raters store information in the form of traits.

▪ More importantly, they tend to recall information in the form of trait

categories.

▪ For example, a rater observes a specific behavior such as an employee

resting during work hours.

o The entire rating process may be heavily influenced by the trait categories

that the rater adopts, regardless of their accuracy.

o Errors in storage and recall also appear to arise from memory decay.

Errors in the Actual Evaluation

o The context of the actual evaluation process also can influence

evaluations.

▪ Several researchers indicate that the purpose of an evaluation affects

the rating process.

o Supervisors also tend to weigh negative attributes more heavily than

positive attributes.

o If the purpose of evaluation is to divide up a fixed pot of merit increases,

ratings also tend to be less accurate.

▪ Supervisors who know ratings will be used to determine merit