978-1111826925 Chapter 19 Lecture Note

Unlock access to all the studying documents.

View Full Document

Part Six

Data Analysis and Presentation

Chapter 19

Editing and Coding: Transforming Raw Data into

Information

AT-A-GLANCE

I. Stages of Data Analysis

II. Editing

A. Field editing

B. In-house editing

Illustrating inconsistency – fact or fiction?

Take action when response is obviously an error

Editing technology

C. Editing for completeness

D. Editing questions answered out of order

E. Facilitating the coding process

Editing and tabulating “don’t know” answers

F. Pitfalls of editing

G. Pretesting edit

III. Coding

A. Coding qualitative responses

Unstructured qualitative responses (Long interview)

Structured qualitative responses

Data file terminology

B. The data file

C. Code construction

D. Precoding fixed-alternative questions

E. More on coding open-ended questions

F. Devising a coding scheme

G. Code book

H. Editing and coding combined

I. Computerized survey data processing

J. Error checking

LEARNING OUTCOMES

1. Know when a response is really an error and should be edited

2. Appreciate coding of pure qualitative research

3. Understand the way data are represented in a data file

4. Understand the coding of structured responses including a dummy variable approach

5. Appreciate the ways that technological advances have simplified the coding process

CHAPTER VIGNETTE: Coding What a Person’s Face “Says”

Technological advances have now allowed business researchers a chance to collect and code data not

based upon what people say, but what their face “says.” Sensory Logic has the Facial Action Coding

System (FACS). Eye movement and facial coding has advanced to a point where respondent physical

data can be captured in real time for research purposes. Facial coding reveals a person’s engagement,

their positive and negative emotional states given a particular stimuli, and the impact or appeal of what

they are responding to. Eye tracking can tell researchers exactly what a person is looking at, and based

upon the almost imperceptible muscle changes in their facial expressions, code their emotional state. The

FACS is used in a number of consumer and market research environments.

SURVEY THIS!

How are data entry, editing, and coding made easier by using a Qualtrics-type data approach relative to a

paper and pencil survey approach? Do any of the questions in the survey present any particular coding

problems? Can any be coded using dummy coding? What type of coding would you suggest for the

question about your boss and animals shown here?

RESEARCH SNAPSHOTS

Do You Have Integrity?

Data integrity is essential to successful research and decision making. Sometimes, this is a

question of ethics (i.e., interviewer or coder simply make up data), but data integrity can also

suffer simply because the data are edited or coded poorly. Consistent coding should exist, and it

is particularly important for companies that share or sell secondary data. Occupations need a

common coding just as do product classes, industries and numerous other potential data values.

Fortunately, there are standard codes (e.g., NAICAS and SIC codes and the postal service

guidelines). Without a standardized approach, analysts may never be quite sure what they are

looking at from one data set to another.

Building a Multi-petabyte Data System

What is a petabyte? It is 1,000,000 gigabytes. Who would need such a large data system? The

largest retailer in the world—Walmart, with over 800 million transactions tied to over 30 million

customers each day. The design of the data system is a critical need for Walmart and is the key to

its success. Walmart appears to have made the investments needed to grow their data warehouse

into the future—there are even plans to have data marts, which are smaller, subject-specific data

systems that can handle the needs of a particular business area.

Coding Data “On-the-Go”

Used to be that data collection required workers to stop what they were doing to enter data into a

system. Now, with Vangard’s AccuSpeech and Mobile Voice Platform (MVP), a mobile

enterprising system that uses cellular phone technology and proprietary voice recognition

software to execute voice commands, to store, code, or recode data hands-free, data can be

entered through voice commands.

OUTLINE

I. STAGES OF DATA ANALYSIS

Raw data are recorded just as the respondent indicated, and it may not be in a form that lends

itself well to data analysis.

Raw data will often also contain errors both in the form of respondent errors and

nonrespondent errors (i.e., errors made by an interviewer or by a person creating an

electronic data file of responses).

Exhibit 19.1 provides an overview of data analysis.

The first two stages (editing and coding) result in an electronic file suitable for data analysis.

An important part of the editing, coding and filing stages is checking for errors.

Data integrity refers to the notion that the data file actually contains the information that the

researcher promised the decision maker.

II. EDITING

Fieldwork often produces data containing mistakes.

Sometimes, responses may be contradictory.

Editing is the process of checking and adjusting the data for omissions, legibility, and

consistency.

At times, the editor may need to reconstruct data.

Field Editing

Field supervisors often are responsible for conducting preliminary field editing on the

same day as the interview.

Field editing is used to:

1. Identify technical omissions such as a blank page on an interview form.

2. Check legibility of handwriting for open ended responses.

3. Clarify responses that are logically or conceptually inconsistent.

Particularly useful when personal interviews have been used to gather data.

May also be used to spot the need for further interviewer training or to correct faulty

procedures.

In-House Editing

Early reviewing of the data is not always possible.

In-house editing rigorously investigates the results of data collection.

The research supplier or the research department normally has a centralized office staff to

perform the editing and coding function.

Illustrating Inconsistency – Fact or Fiction?

Consider a situation in which a telephone interviewer has been instructed to interview

only registered voters in a state where voters must be at least 18 years old.

If the editor’s review indicates that one respondent was only 17 years old, the editor’s

task is to correct this mistake by deleting this response because this respondent

should never have been considered as a sampling unit.

The sampling units (respondents) should all be consistent with the defined

population.

The editor should also check for consistency within the data collection framework.

Take Action When Response Is Obviously an Error

In all but the most obvious situations, a change only should be made when multiple

pieces of evidence exist that some response is a mistake and when the likely true

response is obvious.

A data record may sometimes contain data on variables that the respondent should

never have been asked.

The editor may check other responses to make sure that the screening question

was answered accurately.

Editing Technology

Computer routines can check for inconsistencies automatically.

For electronic questionnaires, rules can be entered which prevent inconsistent

response from ever being stored in the file used for data analysis.

In fact, the rules can even be preprogrammed to prevent many inconsistent

responses.

Electronic questionnaires can also prevent a respondent from being directed to

the wrong set of questions based on a screening question response.

Editing for Completeness

In some cases the respondent may have answered only one portion of a two-part question.

Item nonresponse is the technical term for unanswered questions on an otherwise

complete questionnaire.

Specific decision rules for handling this problem should be meticulously outlined in the

editor’s instructions.

In many situations the decision rule is to do nothing with the missing data and simply

leave the item blank.

However, when the relationship between two questions is important, the editor may insert

a plug value, which might be an average or neutral value.

Several choices are available:

1. Leave the response blank – not a bad option unless a response for that particular

respondent is crucial, which is rarely the case.

2. Plug in alternative choice for missing data (e.g., yes the first time, no the second

time, yes the third time, and so forth).

3. Randomly select an answer.

4. A missing value can be imputed based on the respondent’s choices to other questions

– a good option if the response is important or if the effective sample size would be

too small if all missing responses are deleted.

The issue used to be a bigger deal when many statistical software programs required

complete data for an analysis to take place.

Other routines may require that an entire sampling unit be eliminated from analysis if

even a single response is missing (list-wise deletion).

Today, most statistical programs can accommodate an occasional missing response

through the use of pairwise deletion, which means the data that the respondent did

provide can still be used in statistical analysis.

Editing Questions Answered Out of Order

Another task an editor may face is rearranging the answers given to open-ended questions

(i.e., focus group interview).

If the editor is asked to list answers to all questions in a specific order, the editor may

move certain answers to the section related to the skipped question.

Facilitating the Coding Process

While all the previously described editing activities will help the coders, several editing

procedures are specifically designed to simplify the coding process.

Editing and Tabulating “Don’t Know” Answers

In many situations the respondent will answer “don’t know.”

A legitimate “don’t know” response is the same as “no opinion.”

A reluctant “don’t know” is given when an individual simply does not want to answer

a question.

If the individual does not understand the question, he or she may give a confused

“don’t know” answer.

In some situations the editor can separate the legitimate “don’t knows” from the other

“don’t knows.”

The editor may try to identify the meaning of the “don’t know” answer from other

data provided on the questionnaire.

Pitfalls of Editing

Subjectivity can enter into the editing process.

Data editors should be intelligent, experienced, and objective.

A systematic procedure for assessing the questionnaires should be developed by the

research analyst so that the editor has clearly defined decision rules to follow.

Pretesting Edit

Editing questionnaires during the pretest stage can prove very valuable.

May identify poor instructions or inappropriate question wording.

III. CODING

Editing may be differentiated from coding, which is the assignment of numerical scores or

classifying symbols to previously edited data.

Careful editing can make coding easier.

Codes are meant to represent the meaning in the data.

Assigning numerical symbols permits the transfer of data from questionnaires or interview

forms to a computer.

Codes often are, but not always, numerical symbols; however, they are more broadly defined

as rules for interpreting, classifying, and recording the data.

In qualitative research, numbers are seldom used for codes.

Coding Qualitative Responses

Unstructured Qualitative Responses (Long Interview)

Qualitative coding was introduced in Chapter 7 (i.e., hermeneutic unit, network, or

grounded theory).

The codes are usually words or phrases that represent themes.

Structured Qualitative Responses

Qualitative responses to structured questions (i.e., yes/no) can be stored in a data file

with letters (i.e., “Y” or “N”) or as numbers, but even though numbers may be used,

the variable is classificatory simply separating the positive from the negative

responses.

The researcher may consider adopting dummy coding for dichotomous responses

(i.e., yes/no) that assigns a “0” to one category and a “1” to the other.

Dummy coding provides the researcher with more flexibility in how structured,

qualitative responses are analyzed statistically.

Because a dummy variable can only represent two categories, multiple dummy

variables are needed to represent a single qualitative response that can take on more

than two categories.

The rule is that if k is the number of categories for a qualitative variable, k-1 dummy

variables are needed to represent the variable.

Data File Terminology

Most terminology describing files goes back to the early days of computers, which

produced results that were stored on actual computer cards.

Researchers organize coded data into fields, records, and files.

A field is a collection of characters (a character is a single number, letter, or special

symbol such as a question mark) that represents a single type of data, usually a

variable.

Text variables are represented by string characters which is computer terminology

for a series of alphabetic characters (nonnumeric characters) that may form a word.

String characters often contain long fields of 8 or more characters.

In contrast, a dummy variable is a numeric variable that needs only 1 character to

form a field.

A record is a collection of related fields, and was the way a single, complete

computer card was represented.

Researchers may use the term to refer to one respondent’s data.

A data file is a collection of related records that make up a data set.

Value labels are extremely useful and allow a word or short phrase to be associated

with numeric coding.

The Data File

Data are generally stored in a matrix that resembles a common spreadsheet file.

Stores data from a research project and is typically represented in a rectangular

arrangement (matrix) of data in rows and columns.

Typically, each row represents a respondent’s scores on each variable and each

column represents a variable for which there is a value for every respondent.

A spreadsheet like Excel is an acceptable way to store a data file and increasingly,

statistical programs (i.e., SPSS, SAS, and others) can work easily with an Excel

spreadsheet.

Code Construction

There are two basic rules for code construction:

1. Coding categories should be totally exhaustive, meaning that a coding category

should exist for all possible responses.

2. Coding categories should be mutually exclusive (independent), meaning that there

should be no overlap among the categories.

Precoding Fixed-Alternative Questions

When a questionnaire is highly structured, the categories may be precoded before the data

are collected (see Exhibit 19.5).

Users of web-based survey services receive a coded data file in the software of their

choice.

Precoding can be used if the researcher knows what answer categories exist before data

collection occurs.

In some cases, predetermined responses are based on standardized classification systems

(i.e., occupation).

Computer-assisted telephone interviewing (CATI) require precoding.

More on Coding Open-Ended Questions

The purpose of coding such questions is to reduce the large number of individual

responses to a few general categories of answers that can be assigned numerical codes.

Code construction reflects the judgment of the researcher.

A major objective in the code-building process is to accurately transfer the meanings

from written responses to numeric codes.

Experienced researchers recognize that the key idea in this process is that code building is

based on thoughts, not just words.

The end result of code building should be a list, in an abbreviated and orderly form, of all

the comments and thoughts given in answers to the questions.

Developing an appropriate code from the respondent’s exact comments is somewhat of an

art.

Test tabulation is the tallying of a small sample of the total number of replies to a

particular question, and the purpose is to preliminarily identify the stability and

distribution of answers that will determine a coding scheme.

During the coding procedure, the respondent’s opinions are divided into mutually

exclusive thought patterns.

After tabulating the basic responses the researcher must determine how many answer

categories are acceptable.

Devising the Coding Scheme

A coding scheme should not be too elaborate.

The coder’s task is only to summarize the data.

A preliminary scheme having too many categories can always be collapsed or reduced

later in the analysis.

If initial coding is at too abstract a level and only a few categories are established,

revising the codes will be difficult.

Experienced coders group answers under generalized headings that are pertinent to the

research question.

Individual coders should give the same code to similar responses, so categories should be

sufficiently unambiguous.

Coding open-ended questions is a very complex issue, but with practice, and by using

multiple coders so that consistency can be examined, one can become skilled at this task.

Code Book

A code book gives each variable in the study and its location in the data matrix.

It provides a quick summary that is particularly useful when a data file becomes very

large.

Researchers commonly identify individual respondents by giving each an identification

number or questionnaire number, so that errors discovered in the tabulation process can

be checked on the questionnaire to verify the answer.

Editing and Coding Combined

Frequently the person coding the questionnaire performs certain editing functions (i.e.,

translating an occupational title provided by the respondent to a code for socioeconomic

status).

Computerized Survey Data Processing

In most studies with large sample sizes, a computer is used for data processing.

Data entry – the activity of transferring data from a research project to computers.

Several alternative means exist for entering data into a computer:

In studies involving highly structured paper and pencil questionnaires, an optical

scanning system may be used to read material directly into the computer’s memory

from marked-sensed questionnaires.

When data are not optically scanned or directly entered into the computer the

moment they are collected, data processing begins with keyboarding.

A data entry process transfers coded data from the questionnaires or coding sheets

onto a hard drive.

Data entry workers may make errors, so the job should be verified by a second data

entry worker.

Error Checking

The final stage in the coding process is error checking and verification, or data cleaning,

to check for wild codes.

For example, coded values that lie outside the range of acceptable answers should be

identified.