978-1111826925 Chapter 19 Lecture Note

subject Type Homework Help
subject Pages 9
subject Words 2902
subject Authors Barry J. Babin, Jon C. Carr, Mitch Griffin, William G. Zikmund

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
Part Six
Data Analysis and Presentation
Chapter 19
Editing and Coding: Transforming Raw Data into
Information
AT-A-GLANCE
I. Stages of Data Analysis
II. Editing
A. Field editing
B. In-house editing
Illustrating inconsistency – fact or fiction?
Take action when response is obviously an error
Editing technology
C. Editing for completeness
D. Editing questions answered out of order
E. Facilitating the coding process
Editing and tabulating “don’t know” answers
F. Pitfalls of editing
G. Pretesting edit
III. Coding
A. Coding qualitative responses
Unstructured qualitative responses (Long interview)
Structured qualitative responses
Data file terminology
B. The data file
C. Code construction
D. Precoding fixed-alternative questions
E. More on coding open-ended questions
F. Devising a coding scheme
G. Code book
H. Editing and coding combined
I. Computerized survey data processing
J. Error checking
LEARNING OUTCOMES
1. Know when a response is really an error and should be edited
2. Appreciate coding of pure qualitative research
3. Understand the way data are represented in a data file
4. Understand the coding of structured responses including a dummy variable approach
5. Appreciate the ways that technological advances have simplified the coding process
CHAPTER VIGNETTE: Coding What a Person’s Face “Says”
Technological advances have now allowed business researchers a chance to collect and code data not
based upon what people say, but what their face “says.” Sensory Logic has the Facial Action Coding
System (FACS). Eye movement and facial coding has advanced to a point where respondent physical
data can be captured in real time for research purposes. Facial coding reveals a person’s engagement,
their positive and negative emotional states given a particular stimuli, and the impact or appeal of what
they are responding to. Eye tracking can tell researchers exactly what a person is looking at, and based
upon the almost imperceptible muscle changes in their facial expressions, code their emotional state. The
FACS is used in a number of consumer and market research environments.
SURVEY THIS!
How are data entry, editing, and coding made easier by using a Qualtrics-type data approach relative to a
paper and pencil survey approach? Do any of the questions in the survey present any particular coding
problems? Can any be coded using dummy coding? What type of coding would you suggest for the
question about your boss and animals shown here?
RESEARCH SNAPSHOTS
Do You Have Integrity?
Data integrity is essential to successful research and decision making. Sometimes, this is a
question of ethics (i.e., interviewer or coder simply make up data), but data integrity can also
suffer simply because the data are edited or coded poorly. Consistent coding should exist, and it
is particularly important for companies that share or sell secondary data. Occupations need a
common coding just as do product classes, industries and numerous other potential data values.
Fortunately, there are standard codes (e.g., NAICAS and SIC codes and the postal service
guidelines). Without a standardized approach, analysts may never be quite sure what they are
looking at from one data set to another.
Building a Multi-petabyte Data System
What is a petabyte? It is 1,000,000 gigabytes. Who would need such a large data system? The
largest retailer in the world—Walmart, with over 800 million transactions tied to over 30 million
customers each day. The design of the data system is a critical need for Walmart and is the key to
its success. Walmart appears to have made the investments needed to grow their data warehouse
into the future—there are even plans to have data marts, which are smaller, subject-specific data
systems that can handle the needs of a particular business area.
Coding Data “On-the-Go”
Used to be that data collection required workers to stop what they were doing to enter data into a
system. Now, with Vangard’s AccuSpeech and Mobile Voice Platform (MVP), a mobile
enterprising system that uses cellular phone technology and proprietary voice recognition
software to execute voice commands, to store, code, or recode data hands-free, data can be
entered through voice commands.
OUTLINE
I. STAGES OF DATA ANALYSIS
Raw data are recorded just as the respondent indicated, and it may not be in a form that lends
itself well to data analysis.
Raw data will often also contain errors both in the form of respondent errors and
nonrespondent errors (i.e., errors made by an interviewer or by a person creating an
electronic data file of responses).
Exhibit 19.1 provides an overview of data analysis.
The first two stages (editing and coding) result in an electronic file suitable for data analysis.
An important part of the editing, coding and filing stages is checking for errors.
Data integrity refers to the notion that the data file actually contains the information that the
researcher promised the decision maker.
II. EDITING
Fieldwork often produces data containing mistakes.
Sometimes, responses may be contradictory.
Editing is the process of checking and adjusting the data for omissions, legibility, and
consistency.
At times, the editor may need to reconstruct data.
Field Editing
Field supervisors often are responsible for conducting preliminary field editing on the
same day as the interview.
Field editing is used to:
1. Identify technical omissions such as a blank page on an interview form.
2. Check legibility of handwriting for open ended responses.
3. Clarify responses that are logically or conceptually inconsistent.
Particularly useful when personal interviews have been used to gather data.
May also be used to spot the need for further interviewer training or to correct faulty
procedures.
In-House Editing
Early reviewing of the data is not always possible.
In-house editing rigorously investigates the results of data collection.
The research supplier or the research department normally has a centralized office staff to
perform the editing and coding function.
Illustrating Inconsistency – Fact or Fiction?
Consider a situation in which a telephone interviewer has been instructed to interview
only registered voters in a state where voters must be at least 18 years old.
If the editor’s review indicates that one respondent was only 17 years old, the editor’s
task is to correct this mistake by deleting this response because this respondent
should never have been considered as a sampling unit.
The sampling units (respondents) should all be consistent with the defined
population.
The editor should also check for consistency within the data collection framework.
Take Action When Response Is Obviously an Error
In all but the most obvious situations, a change only should be made when multiple
pieces of evidence exist that some response is a mistake and when the likely true
response is obvious.
A data record may sometimes contain data on variables that the respondent should
never have been asked.
The editor may check other responses to make sure that the screening question
was answered accurately.
Editing Technology
Computer routines can check for inconsistencies automatically.
For electronic questionnaires, rules can be entered which prevent inconsistent
response from ever being stored in the file used for data analysis.
In fact, the rules can even be preprogrammed to prevent many inconsistent
responses.
Electronic questionnaires can also prevent a respondent from being directed to
the wrong set of questions based on a screening question response.
Editing for Completeness
In some cases the respondent may have answered only one portion of a two-part question.
Item nonresponse is the technical term for unanswered questions on an otherwise
complete questionnaire.
Specific decision rules for handling this problem should be meticulously outlined in the
editor’s instructions.
In many situations the decision rule is to do nothing with the missing data and simply
leave the item blank.
However, when the relationship between two questions is important, the editor may insert
a plug value, which might be an average or neutral value.
Several choices are available:
1. Leave the response blank not a bad option unless a response for that particular
respondent is crucial, which is rarely the case.
2. Plug in alternative choice for missing data (e.g., yes the first time, no the second
time, yes the third time, and so forth).
3. Randomly select an answer.
4. A missing value can be imputed based on the respondent’s choices to other questions
– a good option if the response is important or if the effective sample size would be
too small if all missing responses are deleted.
The issue used to be a bigger deal when many statistical software programs required
complete data for an analysis to take place.
Other routines may require that an entire sampling unit be eliminated from analysis if
even a single response is missing (list-wise deletion).
Today, most statistical programs can accommodate an occasional missing response
through the use of pairwise deletion, which means the data that the respondent did
provide can still be used in statistical analysis.
Editing Questions Answered Out of Order
Another task an editor may face is rearranging the answers given to open-ended questions
(i.e., focus group interview).
If the editor is asked to list answers to all questions in a specific order, the editor may
move certain answers to the section related to the skipped question.
Facilitating the Coding Process
While all the previously described editing activities will help the coders, several editing
procedures are specifically designed to simplify the coding process.
Editing and Tabulating “Don’t Know” Answers
In many situations the respondent will answer “don’t know.”
A legitimate “don’t know” response is the same as “no opinion.”
A reluctant “don’t know” is given when an individual simply does not want to answer
a question.
If the individual does not understand the question, he or she may give a confused
“don’t know” answer.
In some situations the editor can separate the legitimate “don’t knows” from the other
“don’t knows.”
The editor may try to identify the meaning of the “don’t know” answer from other
data provided on the questionnaire.
Pitfalls of Editing
Subjectivity can enter into the editing process.
Data editors should be intelligent, experienced, and objective.
A systematic procedure for assessing the questionnaires should be developed by the
research analyst so that the editor has clearly defined decision rules to follow.
Pretesting Edit
Editing questionnaires during the pretest stage can prove very valuable.
May identify poor instructions or inappropriate question wording.
III. CODING
Editing may be differentiated from coding, which is the assignment of numerical scores or
classifying symbols to previously edited data.
Careful editing can make coding easier.
Codes are meant to represent the meaning in the data.
Assigning numerical symbols permits the transfer of data from questionnaires or interview
forms to a computer.
Codes often are, but not always, numerical symbols; however, they are more broadly defined
as rules for interpreting, classifying, and recording the data.
In qualitative research, numbers are seldom used for codes.
Coding Qualitative Responses
Unstructured Qualitative Responses (Long Interview)
Qualitative coding was introduced in Chapter 7 (i.e., hermeneutic unit, network, or
grounded theory).
The codes are usually words or phrases that represent themes.
Structured Qualitative Responses
Qualitative responses to structured questions (i.e., yes/no) can be stored in a data file
with letters (i.e., “Y” or “N”) or as numbers, but even though numbers may be used,
the variable is classificatory simply separating the positive from the negative
responses.
The researcher may consider adopting dummy coding for dichotomous responses
(i.e., yes/no) that assigns a “0” to one category and a “1” to the other.
Dummy coding provides the researcher with more flexibility in how structured,
qualitative responses are analyzed statistically.
Because a dummy variable can only represent two categories, multiple dummy
variables are needed to represent a single qualitative response that can take on more
than two categories.
The rule is that if k is the number of categories for a qualitative variable, k-1 dummy
variables are needed to represent the variable.
Data File Terminology
Most terminology describing files goes back to the early days of computers, which
produced results that were stored on actual computer cards.
Researchers organize coded data into fields, records, and files.
A field is a collection of characters (a character is a single number, letter, or special
symbol such as a question mark) that represents a single type of data, usually a
variable.
Text variables are represented by string characters which is computer terminology
for a series of alphabetic characters (nonnumeric characters) that may form a word.
String characters often contain long fields of 8 or more characters.
In contrast, a dummy variable is a numeric variable that needs only 1 character to
form a field.
A record is a collection of related fields, and was the way a single, complete
computer card was represented.
Researchers may use the term to refer to one respondent’s data.
A data file is a collection of related records that make up a data set.
Value labels are extremely useful and allow a word or short phrase to be associated
with numeric coding.
The Data File
Data are generally stored in a matrix that resembles a common spreadsheet file.
Stores data from a research project and is typically represented in a rectangular
arrangement (matrix) of data in rows and columns.
Typically, each row represents a respondent’s scores on each variable and each
column represents a variable for which there is a value for every respondent.
A spreadsheet like Excel is an acceptable way to store a data file and increasingly,
statistical programs (i.e., SPSS, SAS, and others) can work easily with an Excel
spreadsheet.
Code Construction
There are two basic rules for code construction:
1. Coding categories should be totally exhaustive, meaning that a coding category
should exist for all possible responses.
2. Coding categories should be mutually exclusive (independent), meaning that there
should be no overlap among the categories.
Precoding Fixed-Alternative Questions
When a questionnaire is highly structured, the categories may be precoded before the data
are collected (see Exhibit 19.5).
Users of web-based survey services receive a coded data file in the software of their
choice.
Precoding can be used if the researcher knows what answer categories exist before data
collection occurs.
In some cases, predetermined responses are based on standardized classification systems
(i.e., occupation).
Computer-assisted telephone interviewing (CATI) require precoding.
More on Coding Open-Ended Questions
The purpose of coding such questions is to reduce the large number of individual
responses to a few general categories of answers that can be assigned numerical codes.
Code construction reflects the judgment of the researcher.
A major objective in the code-building process is to accurately transfer the meanings
from written responses to numeric codes.
Experienced researchers recognize that the key idea in this process is that code building is
based on thoughts, not just words.
The end result of code building should be a list, in an abbreviated and orderly form, of all
the comments and thoughts given in answers to the questions.
Developing an appropriate code from the respondent’s exact comments is somewhat of an
art.
Test tabulation is the tallying of a small sample of the total number of replies to a
particular question, and the purpose is to preliminarily identify the stability and
distribution of answers that will determine a coding scheme.
During the coding procedure, the respondent’s opinions are divided into mutually
exclusive thought patterns.
After tabulating the basic responses the researcher must determine how many answer
categories are acceptable.
Devising the Coding Scheme
A coding scheme should not be too elaborate.
The coder’s task is only to summarize the data.
A preliminary scheme having too many categories can always be collapsed or reduced
later in the analysis.
If initial coding is at too abstract a level and only a few categories are established,
revising the codes will be difficult.
Experienced coders group answers under generalized headings that are pertinent to the
research question.
Individual coders should give the same code to similar responses, so categories should be
sufficiently unambiguous.
Coding open-ended questions is a very complex issue, but with practice, and by using
multiple coders so that consistency can be examined, one can become skilled at this task.
Code Book
A code book gives each variable in the study and its location in the data matrix.
It provides a quick summary that is particularly useful when a data file becomes very
large.
Researchers commonly identify individual respondents by giving each an identification
number or questionnaire number, so that errors discovered in the tabulation process can
be checked on the questionnaire to verify the answer.
Editing and Coding Combined
Frequently the person coding the questionnaire performs certain editing functions (i.e.,
translating an occupational title provided by the respondent to a code for socioeconomic
status).
Computerized Survey Data Processing
In most studies with large sample sizes, a computer is used for data processing.
Data entry – the activity of transferring data from a research project to computers.
Several alternative means exist for entering data into a computer:
In studies involving highly structured paper and pencil questionnaires, an optical
scanning system may be used to read material directly into the computer’s memory
from marked-sensed questionnaires.
When data are not optically scanned or directly entered into the computer the
moment they are collected, data processing begins with keyboarding.
A data entry process transfers coded data from the questionnaires or coding sheets
onto a hard drive.
Data entry workers may make errors, so the job should be verified by a second data
entry worker.
Error Checking
The final stage in the coding process is error checking and verification, or data cleaning,
to check for wild codes.
For example, coded values that lie outside the range of acceptable answers should be
identified.

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.