Chapter 13 The Data Warehouse
13-1
Last Update: November 15, 2011 8PM
Chapter 13: The Data Warehouse
Multiple Choice
1. Of the following applications or application systems, the one that would be
considered a decision support system is ____.
a. a system that links companies in a supply chain
b. a hospital application that makes patient records available to all doctors and other
health care professionals who are treating patients
c. a university course and grade record keeping system
d. a system that provides the management of a retail chain with data helpful in
choosing sites for new stores
e. a payroll system
2. All of the following are reasons that having separate files for each DSS application is
wasteful, expensive and inefficient, except ____.
a. having separate files for each DSS application interferes with the transactional
database environment by compromising its performance
b. different DSS applications often need the same data, causing duplicate files to be
created for each application
c. while particular files support particular DSS applications, they tend to be
inflexible and do not support closely related applications that require slightly
different data
d. individual files tied to specific DSS applications do nothing to encourage other
people and groups in the company to use the company’s accumulated data to gain
a competitive advantage over the competition
e. even if someone in the company is aware of existing DSS application data that
they could use to their (and the company’s) own advantage they generally can’t
get access to it because it is “owned” by the application for which it was created
3. All of the following are characteristics of a data warehouse, except the data is ____.
a. subject oriented
b. always current
c. integrated
Chapter 13 The Data Warehouse
13-2
d. non-volatile
e. time variant
4. All of the following are characteristics of a data warehouse, except the data ____.
a. must be high quality
b. may be aggregated
c. is often denormalized
d. is not necessarily absolutely current
e. is volatile
5. Data in a typical data warehouse ____.
a. is frequently updated by changing existing attribute values as the values change in
the operational database
b. grows as new data is appended to the end of the existing tables
c. does not require timestamps
d. is always fully detailed, not summarized data
e. is organized according to the company’s TPS applications
6. Data in a typical data warehouse may be denormalized to improve performance
because ____.
a. the data does not have necessarily have to be absolutely current
b. data about each of the “subjects” in the data warehouse is typically collected from
several of the company’s transactional databases
c. data integrity problems are not a concern since the existing data is not updated
d. the data must be of high quality
e. the data may be aggregated
7. An enterprise data warehouse (EDW) ____.
a. must be the sum of all of a company’s data marts
b. is designed to support a department or a related group of departments
Chapter 13 The Data Warehouse
c. cannot exist without data marts being present
d. must be fully normalized
e. supports an entire company or a major part of one
8. A data mart (DM) ____.
a. must be the sum of all of a company’s enterprise data warehouses
b. is designed to support a department or a related group of departments
c. cannot exist without an enterprise data warehouse being present
d. must be fully normalized
e. supports an entire company or a major part of one
9. If the enterprise data warehouse was created first in a company and then later data
was extracted from it to create one or more data marts, this is known as ____
development.
a. minor-to-major
b. major-to-minor
c. bottom-up
d. top-down
e. sideways
10. Data warehouses are often referred to as multi_____ databases because each
occurrence of the subject is referenced by an occurrence of each of several
characteristics of the subject.
a. dimensional
b. faceted
c. partitioned
d. determined
e. normal
11. A way to store data warehouse data in a relational database structure is with a model
known as the ____ schema.
a. network
b. multi-pronged
c. pointer
d. star
e. ring
12. Generally, one of the dimensions of any data warehouse is ____.
a. distance
b. time
c. energy
d. information
e. pallets
13. A ____ design in a data warehouse data structure refers to the situation in which one
dimension table leads to another dimension table.
a. network
b. hierarchical
c. snowflake
d. raindrop
e. solar
14. There is a(n) _____ relationship between each dimension table entity and the fact
table entity in a star schema data warehouse structure.
a. standard
b. structured
c. unstructured
d. many-to-many
e. one-to-many
15. All of the following are steps in building a data warehouse, except ____.
a. data extraction
b. data manipulation
c. data cleaning
d. data transformation
e. data loading
16. The process of removing errors from data while building a data warehouse is known
as ____.
a. data extraction
b. data manipulation
c. data cleaning
d. data transformation
e. data loading
17. ____ is the process of copying data from the transactional databases in preparation for
loading it into the data warehouse.
a. Data extraction
b. Data manipulation
c. Data cleaning
d. Data transformation
e. Data loading
18. The process of adding externally acquired data to a data warehouse is known as ____.
a. data evolution
b. data embellishment
c. data enlarging
d. data enhancement
e. data enrichment
19. All of the following are categories or techniques of data transformation during the
building of a data warehouse, except ____.
a. coding scheme conversion
b. unit of measure conversion
c. aggregation
d. questionable data conversion
e. combining values from different attributes in transactional databases into a single
attribute in the data warehouse
20. ____ is a decision support methodology based on viewing data in multiple
dimensions.
a. Data mining
b. Market basket analysis
c. Neural networks
d. Online analytic processing
e. Data transformation
21. In the OLAP environment ____ refers to going back to the database and retrieving
finer levels of data detail than you have already retrieved.
a. data mining
b. drill-down
c. slice
d. pivot
e. rotation
22. All of the following are data mining techniques, except ____.
a. case-based learning
b. decision trees
c. data enrichment
d. neural networks
e. genetic algorithms
13-7
23. The data mining application that looks for one occurrence of a type of data being
associated with another in the stored data is called ____.
a. drill-down analysis
b. online analytic processing
c. market basket analysis
d. reversion analysis
e. data transformation
24. All of the following are desirable types of background knowledge for a data
warehouse administrator to have, except ____.
a. business expertise
b. data expertise
c. technical expertise
d. All of the above.
e. None of the above.
25. All of the following are desirable types of background knowledge for a data
warehouse administrator to have, except an understanding of ____.
a. the company’s business processes that underlies an understanding of the
company’s transactional data and databases
b. the company’s transactional data and databases for selection and integration into
the data warehouse
c. OLAP and data mining techniques so that the data warehouse design will properly
support these processes
d. how to handle very large databases, in general
e. how to handle very small databases, in general
True/False
1. Decision support systems are designed to make decisions for managers.
Chapter 13 The Data Warehouse
2. Legacy DSS applications were not oriented towards file sharing.
3. Data warehouse data is characterized as subject oriented.
4. Data warehouse data is characterized as volatile.
5. Data warehouse data is time variant or historic.
6. Data warehouse data cannot be aggregated.
7. Data warehouse data is less problematic if denormalized than operational data would
be.
8. Due to the nature of DSS applications, data warehouse data must be up-to-the-
moment current.
9. A data mart is a large-scale data warehouse that incorporates the data of an entire
company or of a major division, site, or activity of a company.
10. A data mart is a small-scale data warehouse that is designed to support a small part of
an organization, say a department or a related group of departments.
11. Bottom-up development implies that the EDW was created first and then later data
was extracted from the EDW to create one or more Data marts, initially and on an
ongoing basis.
12. Data warehouses are often referred to as multidimensional databases because each
occurrence of the subject is referenced by an occurrence of each of several
dimensions or characteristics of the subject.
13. The star schema approach to multidimensional data design is incompatible with
relational database structures.
14. In the star schema design, the dimensions are in the middle and the subjects radiate
outwards as the rays of a star.
15. A star schema has a “fact table,” which represents the data warehouse “subject”.
16. Usually, one of the dimensions of a star schema is time.
17. A snowflake design is a feature of a star schema in which one dimension table leads
to another.
18. Data enrichment is the process of copying the data from the transactional databases in
preparation for loading it into the data warehouse.
19. Adding data acquired from outside of the company to the data warehouse is known as
data extraction.
20. Data cleaning or cleansing or scrubbing refers to fixing errors in data as it is being
loaded into the operational database.
21. One type of data cleaning is to determine the correct value for missing data.
22. One type of data cleaning is to determine the correct value for impossible data.
23. Data transformation takes place after the data is loaded into the data warehouse.
24. Coding schemes used for attributes in different transactional databases must be
reconciled as they are being merged into common data warehouse tables.
25. Data loading takes place after data transformation and before data cleaning.
26. Two major data warehouse usage approaches are on-line analytic processing and data
mining.
27. Online analytic processing refers to the searching out of hidden knowledge in the
company’s data that can give the company a competitive advantage in its
marketplace.
28. Drill-down refers to going to the data warehouse and retrieving courser or broader
levels of data.
29. Interchanging the data dimensions is known as “taking a slice” in the OLAP
environment.
30. Data mining would be impossible for people to do manually because of the huge
amount of data present in the company’s data warehouse.
31. Neural networks and genetic algorithms are among the techniques for data mining.
32. Market basket analysis tries, for example, to match product purchase patterns with
customer demographics.
33. A data warehouse administrator should have a combination of business, data, and
technical expertise.
34. Data warehouses are not associated with the concept of metadata.
35. Data in the data warehouse may be a perpetually moving target because of increasing
and shifting user demands.
Chapter 13 The Data Warehouse
13-13
Problems
1. Consider the following relational database for the Central Zoo. Central Zoo wants to
maintain information about its animals, the enclosures in which they live, and its
zookeepers and the services they perform for the animals. In addition, Central Zoo has a
program by which people can be sponsor of animals. Central Zoo wants to track its
sponsors, their dependents, and associated data.
Each animal has a unique animal number and each enclosure has a unique enclosure
number. An animal can live in only one enclosure. An enclosure can have several
animals in it or it can be currently empty. A zookeeper has a unique employee number.
Every animal has been cared for by at least one and generally many zookeepers; each
zookeeper has cared for at least one and generally many animals. Each time a
zookeeper performs a specific, significant service for an animal the service type, date,
and time are recorded. A zookeeper may perform a particular service on a particular
animal more than once on a given day.
A sponsor, who has a unique sponsor number and a unique social security number,
sponsors at least one and possibly several animals. An animal may have several
sponsors or none. For each animal that a particular sponsor sponsors, the zoo wants to
track the annual sponsorship contribution and renewal date. In addition, Central Zoo
wants to keep track of each sponsor’s dependents. A sponsor may have several
dependents or none. A dependent is associated with exactly one sponsor.
Enclosure
Number
Type
Location
Size
Date
Built
ENCLOSURE Relation
Animal
Number
Species
Animal
Name
Gender
Country
Of Birth
Weight
ANIMAL Relation
Employee
Number
Employee
Name
Title
Year
Hired
ZOOKEEPER Relation
Animal
Number
Service
Type
Date
Time
CARES FOR Relation
Sponsor
Social Security
Sponsor
Chapter 13 The Data Warehouse
13-14
Number
Number
Name
Address
Telephone
SPONSOR Relation
Animal
Number
Sponsor
Number
Annual
Contribution
Renewal
Date
CONTRIBUTION Relation
Sponsor
Number
Dependent
Name
Relationship
Date of
Birth
DEPENDENT Relation
a. Design a multidimensional database using a star schema for a data warehouse for
the Central Zoo business environment. The subject will be “sponsorship” which
represents a particular sponsor sponsoring a particular animal. Be sure to keep
track of the annual contribution and the renewal date for the sponsorship. Assume
that the annual contribution can change with the renewal date, i.e. it can be
different in different years.
b. Describe three OLAP uses of this data warehouse.
c. Describe one data mining use of this data warehouse.
Answers
2. Consider the following relational database for Grand Travel Airlines.
Chapter 13 The Data Warehouse
13-15
Grand Travel Airlines has to keep track of its flight and airplane history. A flight is
uniquely identified by the combination of a flight number and a date. Every
passenger who has flown on Grand Travel has a unique passenger number. For a
particular passenger who has taken a particular flight, the company wants to keep
track of the fare that she paid for it and the date that she made the reservation for it.
Clearly, a passenger may have taken many flights (he must have taken at least one to
be in the database) and every flight has had many passengers on it.
A pilot is identified by a unique pilot (or employee) number. A flight on a particular
date has exactly one pilot. Each pilot has typically flown many flights but a pilot may
be new to the company, is in training, and has not flown any flights, yet. Each
airplane has a unique serial number. A flight on a particular date used one airplane.
Each airplane has flown on many flights and dates, but a new airplane may not have
been used at all, yet.
Pilot
Number
Pilot
Name
Date of
Birth
Date of
Hire
PILOT Relation
Flight
Number
Date
Departure
Time
Arrival
Time
Pilot
Number
Airplane
Number
FLIGHT Relation
Passenger
Number
Passenger
Name
Address
Telephone
Number
PASSENGER Relation
Flight
Number
Date
Fare
Reservation
Date
RESERVATION Relation
Airplane
Number
Model
Passenger
Capacity
Year
Built
Manufacturer
AIRPLANE Relation
a. Design a multidimensional database using a star schema for a data warehouse for
the Grand Travel Airlines business environment. The subject will be
“reservation” which represents a particular passenger reservation on a particular
flight. Be sure to keep track of the fare that the passenger paid for the flight and
the date of the reservation.
b. Describe three OLAP uses of this data warehouse.
c. Describe one data mining use of this data warehouse.
Chapter 13 The Data Warehouse
13-16
Answer