Chapter 8 Physical Database Design
28. In addition to its direct access capability, an index can be used to retrieve the records
of a file in logical sequence based on the indexed field.
29. A record is inserted into a location in a hashed file based on the result of a hashing
algorithm applied to the record’s key value.
30. In the division remainder hashing algorithm, the divisor (the number divided into the
key value) is the total number of record storage locations in the main storage area
plus the overflow area.
31. In a hashed file, the hashing algorithm may attempt to insert more than one record
into a given record storage location.
32. In a hashed file, a collision occurs when the key values of two records “hash” to the
same record storage location.
33. In a hashed file, a collision occurs when the key values of two records turn out to be
synonyms.
34. During an insertion operation into a hashed file, if the record’s key value “hashes” to
a location already occupied by another record, the record will not be inserted into the
file.
35. A hashed file cannot also have indexes built over it.
36. If a file is hashed on one field, direct access based on another field can be achieved by
building an index on the other field.
37. One of the main advantages of a hashed file is that it never has to be reorganized.
38. Physical database design is the process of modifying the logical database design to
improve performance.
39. One of the main potential problems in relational database performance is the speed
with which join operations take place.
40. The volume of data in the database does not enter into database performance.
8-22
41. Factors in the applications that use databases can affect database performance.
42. The dispersal of data on a disk is unrelated to database performance.
43. The main input to the logical database design process is the output of the physical
database design process.
44. Response time is the measure of how many queries from simultaneous users must be
satisfied in a given period of time by the application set and the database that supports
it.
45. Data volatility refers to how often stored data is updated.
46. Modifying the logical database design to improve performance may reintroduce data
redundancy.
47. Some physical database design techniques actually change the logical design while
others do not.
Chapter 8 Physical Database Design
8-23
48. Primary keys are good candidates for being indexed but foreign keys are not.
49. The more volatile the data in a table, the less will be the negative performance effect
of having a relatively large number of indexes built over it.
50. Building indexes over relational tables always has a positive effect on performance
and never has a negative effect.
51. Creating views actually changes the logical database design.
52. A view is an important device in protecting the security and privacy of data.
53. Using the clustered files technique involves reorganizing the way data is stored on the
disk without changing the logical design at all and thus without introducing data
redundancy.
54. Clustering files disperses logically related data from two tables to different parts of
the disk.
Chapter 8 Physical Database Design
55. The clustering files technique can result in slower performance when the records of
only one of the files involved must be retrieved.
56. In vertical partitioning, the rows of a table are divided up into groups and the groups
are stored separately, on different areas of a disk or on different disks.
57. In vertical partitioning, each partition must have a copy of the primary key of the
table.
58. Both horizontal and vertical partitioning can improve performance by reducing
bottlenecks that form when too many applications or users attempt to access the same
table.
59. The physical design technique known as “substituting foreign keys” can improve
performance by avoiding joins in certain situations.
60. In the physical design technique known as “substituting foreign keys”, the table with
the foreign key in it has its primary and foreign keys interchanged.
8-25
61. A new primary key attribute can be created for a table in which the primary key is
clumsy because it consists of several attributes.
62. Storing derived data improves performance in the case where a primary key consists
of several attributes and is found to be clumsy in practice.
63. Storing derived data improves performance in the case where calculated values have
to be repeatedly derived.
64. Storing repeating groups within a single table is possible and can improve
performance in some circumstances.
65. Denormalization is a physical design technique that does not change the logical
design structure.
66. Denormalization improves the performance of join operations at the expense of
reintroducing data redundancy.
67. Adding duplicate tables to a database reduces access bottlenecks and has no negative
effects.
Chapter 8 Physical Database Design
8-26
Problems
1. A fixed disk consists of five platters. Both surfaces of each of the five platters is used to
record data. There are 75 tracks on each recording surface. How many of each of the
following are there in the disk:
a. Recording surfaces?
b. Cylinders?
c. Tracks per cylinder?
2. A fixed disk has 100 cylinders. The tracks in each cylinder are numbered 0-7. Both
sides of each platter are used for recording data. How many of each of the following are
there in the disk:
a. Recording surfaces?
b. Platters?
c. Tracks per recording surface
3. A hashed file has space for 80 records. Relative record numbers of 0-79 label each of
the 80 record positions. In addition, there is space for several overflow (synonym)
records. Draw a picture of the file and using the division-remainder method, store
records with each of the following five digit keys, accounting for collisions as necessary:
a. 27043.
b. 36235.
c. 06003.
d. 40963.
Chapter 8 Physical Database Design
8-27
The following Animal File from the Central Zoo will be used in several of the following
questions. The left-hand column of relative record numbers is there to facilitate answering
the questions.
Animal
Number
Species
Animal
Name
Gender
Country of
Birth
Weight
1
13648
Elephant
M
Uganda
6,000
2
14990
Kangaroo
M
Australia
200
3
17543
Elephant
F
Nigeria
5,500
4
20165
Giraffe
F
USA
500
5
23743
Giraffe
F
USA
600
6
27579
Panda
M
China
250
7
32565
Elephant
M
India
7,000
8
34871
Grizzly Bear
F
Canada
1,500
9
35993
Lion
M
USA
580
10
38578
Tiger
F
India
470
Animal file
4. Create a simple linear index for the Animal file based on:
a. The Animal Name field.
b. The Species field.
c. The Animal Number field.
d. The combination of the Species and Animal Name fields.
Answer
Doris
Fluffy
Fred
High Top
Jumbo
King
Large Louie
Necky
Chapter 8 Physical Database Design
8-28
5. Construct a B+-tree index of the type discussed in the text for the Animal file, assuming
that now there are many more records than are shown above. The file and the index
have the following characteristics:
Chapter 8 Physical Database Design
8-29
The file is stored on eight cylinders of the disk. The highest key values on the
eight cylinders, in order, are:
Cylinder 1: 03583
Cylinder 2: 08437
Cylinder 3: 18965
Cylinder 4: 27028
Cylinder 5: 41603
Cylinder 6: 53781
Cylinder 7: 68337
Cylinder 8: 80174
Each index record can hold four key value/pointer pairs.
There are three index records at the lowest level of the tree index.
Answer
6. Construct a B+-tree index of the type discussed in the text for the Animal file, assuming
that now there are many more records than are shown above. The file and the index
have the following characteristics:
The file is stored on eight cylinders of the disk. The highest key values on the
eight cylinders, in order, are:
Cylinder 1: 03583
Cylinder 2: 08437
Cylinder 3: 18965
Cylinder 4: 27028
Cylinder 5: 41603
Cylinder 6: 53781
Cylinder 7: 68337
Cylinder 8: 80174
Each index record can hold two key value/pointer pairs.
There are four index records at the lowest level of the tree index.
Answer
8-30
7. Consider the following relational database for the Central Zoo. Central Zoo wants to
maintain information about its animals, the enclosures in which they live, and its
zookeepers and the services they perform for the animals. In addition, Central Zoo
has a program by which people can be sponsor of animals. Central Zoo wants to
track its sponsors, their dependents, and associated data.
Each animal has a unique animal number and each enclosure has a unique enclosure
number. An animal can live in only one enclosure. An enclosure can have several
animals in it or it can be currently empty. A zookeeper has a unique employee
number. Every animal has been cared for by at least one and generally many
zookeepers; each zookeeper has cared for at least one and generally many animals.
Each time a zookeeper performs a specific, significant service for an animal the
service type, date, and time are recorded. A zookeeper may perform a particular
service on a particular animal more than once on a given day.
A sponsor, who has a unique sponsor number and a unique social security number,
sponsors at least one and possibly several animals. An animal may have several
sponsors or none. For each animal that a particular sponsor sponsors, the zoo wants
to track the annual sponsorship contribution and renewal date. In addition, Central
Zoo wants to keep track of each sponsor’s dependents. A sponsor may have several
dependents or none. A dependent is associated with exactly one sponsor.
Enclosure
Number
Type
Location
Size
Date
Built
ENCLOSURE Relation
Animal
Number
Species
Animal
Name
Gender
Country
Of Birth
Weight
Enclosure
Number
ANIMAL Relation
Employee
Number
Employee
Name
Title
Year
Hired
ZOOKEEPER Relation
Chapter 8 Physical Database Design
8-31
Animal
Number
Employee
Number
Service
Type
Date
Time
CARES FOR Relation
Sponsor
Number
Social Security
Number
Sponsor
Name
Address
Telephone
SPONSOR Relation
Animal
Number
Sponsor
Number
Annual
Contribution
Renewal
Date
CONTRIBUTION Relation
Sponsor
Number
Dependent
Name
Date of
Birth
DEPENDENT Relation
Analyze each of the following situations and, using physical database design
techniques, state how you would modify the logical design shown to improve
performance or otherwise accommodate it.
a. There is a frequent need to find the total amount of annual contributions expected
for each animal from all of each animals’ sponsors.
b. There is a high priority need to quickly find the information about the
contributions for each animal together with the social security numbers of the
sponsors. The sponsor numbers are not required in this query.
c. There is a critical need to quickly list the animals that were born in a particular
country.
d. There is a frequent need to be able to retrieve detailed information about an
animal together with detiled information about the enclosure in which it is
housed.
e. There is a much more frequent and high priority need to access the records for the
monkeys than for the other animals.
f. Due to large numbers of access activity, the ANIMAL relation has become a
bottleneck.
Answer
Chapter 8 Physical Database Design
8-32
The following Airplane File from Grand Travel Airlines will be used in several of the
following questions. The left-hand column of relative record numbers is there to facilitate
answering the questions.
Airplane
Number
Manufacturer
Model
Passenger
Capacity
Year
Built
1
04653
Boeing
767
280
1988
2
06997
Boeing
747
325
1976
3
10582
Airbus
A320
256
1997
4
13160
Canadair
CRJ
54
2001
5
16420
Canadair
CRJ
54
2001
6
19521
Airbus
A300
220
1980
7
22663
Boeing
767
265
1999
8
23964
Airbus
A320
256
1998
9
28352
Boeing
767
280
1989
10
34801
Canadair
CRJ
58
2002
Airplane file
8. Create a simple linear index for the Animal file based on:
Chapter 8 Physical Database Design
8-33
a. The Year Built field.
b. The Manufacturer field.
c. The Airplane Number field.
d. The combination of the Manufacturer and Year Built fields.
Chapter 8 Physical Database Design
8-34
9. Construct a B+-tree index of the type discussed in the text for the Airplane file,
assuming that now there are many more records than are shown above. The file and the
index have the following characteristics:
The file is stored on six cylinders of the disk. The highest key values on the six
cylinders, in order, are:
Cylinder 1: 08574
Cylinder 2: 12740
Cylinder 3: 19025
Cylinder 4: 24859
Cylinder 5: 31745
Cylinder 6: 42853
Each index record can hold four key value/pointer pairs.
There are two index records at the lowest level of the tree index.
Answer
10. Construct a B+-tree index of the type discussed in the text for the Animal file, assuming
that now there are many more records than are shown above. The file and the index
have the following characteristics:
The file is stored on six cylinders of the disk. The highest key values on the six
cylinders, in order, are:
Cylinder 1: 08574
Cylinder 2: 12740
Cylinder 3: 19025
Cylinder 4: 24859
Chapter 8 Physical Database Design
8-35
Cylinder 5: 31745
Cylinder 6: 42853
Each index record can hold two key value/pointer pairs.
There are three index records at the lowest level of the tree index.
Answer
11. Consider the following relational database for Grand Travel Airlines.
Grand Travel Airlines has to keep track of its flight and airplane history. A flight
is uniquely identified by the combination of a flight number and a date. Every
passenger who has flown on Grand Travel has a unique passenger number. For a
particular passenger who has taken a particular flight, the company wants to keep
track of the fare that she paid for it and the date that she made the reservation for
it. Clearly, a passenger may have taken many flights (he must have taken at least
one to be in the database) and every flight has had many passengers on it.
A pilot is identified by a unique pilot (or employee) number. A flight on a
particular date has exactly one pilot. Each pilot has typically flown many flights
but a pilot may be new to the company, is in training, and has not flown any
flights, yet. Each airplane has a unique serial number. A flight on a particular
date used one airplane. Each airplane has flown on many flights and dates, but a
new airplane may not have been used at all, yet.
Pilot
Number
Pilot
Name
Date of
Birth
Date of
Hire
PILOT Relation
Flight
Number
Date
Departure
Time
Arrival
Time
Pilot
Number
Airplane
Number
Chapter 8 Physical Database Design
8-36
FLIGHT Relation
Passenger
Number
Passenger
Name
Address
Telephone
Number
PASSENGER Relation
Flight
Number
Date
Passenger
Number
Fare
Reservation
Date
RESERVATION Relation
Airplane
Number
Model
Passenger
Capacity
Year
Built
Manufacturer
AIRPLANE Relation
Analyze each of the following situations and, using physical database design
techniques, state how you would modify the logical design shown to improve
performance or otherwise accommodate it.
a. There is a frequent need to quickly get detailed information about a flight and
about the pilot who flew the plane.
b. It turns out that the Departure Time and Arrival Time data for flights is accessed
much more frequently and requires much faster response time than the rest of the
data in the FLIGHT relation.
c. The repeated use of the Flight Number and Date to access flight data is considered
clumsy both in the FLIGHT relation and elsewhere where the combination is
required as a foreign key.
d. When passengers phone in to get information about their flight records, the
records are located by asking for the passengers’ phone numbers.
e. There is a frequent (and obvious) need to be able to quickly find the number of
seats that are currently taken for a flight.
f. Some people in the airline have to be able to access the pilot data but are not
authorized to see the pilots’ dates of birth.
Answer
Chapter 8 Physical Database Design
8-37