INSTRUCTOR’S MANUAL
TO ACCOMPANY
40th Anniversary Edition
DATABASE PROCESSING
Fundamentals, Design, and Implementation
15th Edition
Appendix L
JSON and Document Databases
David M. Kroenke | David J. Auer | Scott L. Vandenberg | Robert C. Yoder
CHAPTER OBJECTIVES
To learn the basic concepts of NoSQL document databases
To learn the structuring facilities of JSON
ERRATA
There are no known errors at this time. Any errors that are discovered in the future will
TEACHING SUGGESTIONS
Coverage of this topic begins in Chapter 12 (Data Warehouses, Business
Intelligence Sytems, and Big Data), which is expanded upon on Appendix K (Big
Data). This appendix is ideally read after Chapter 12 (actually, the last section of
Nearly all of the content of this appendix is applicable to any other document
database, aside from the specifics of the ArangoDB Query Language (AQL), but
even there, many of the concepts will transfer. Thus it is possible to use this
appendix with other document databases, if one is careful to note the differences.
ANSWERS TO REVIEW QUESTIONS
L.1 What is a document?
A document is a set of (field name, value) pairs. It can have any number of such pairs. A value
L.2 What is a document database?
A document database typically consists of a number of named collections (sets or arrays) of
documents. These are somewhat analogous to tables in the relational model.
L.3 Why do we say that the data in a document database is semi-structured?
The documents have some structure (field names and array, for example), but the documents in a
L.4 What is different about the kinds of data and queries typically involved in a document
database compared to a relational database?
The data are both more complex (in structure) and more fluid (no fixed schema) in a document
L.5 What is JSON, and what are its main features?
JSON stands for JavaScript Object Notation. It is a very simple data model (the complete
L.6 What is ArangoDB? How is it different from many other document databases?
ArangoDB is a multimodel NoSQL DBMS. It supports document, graph, and key-value
L.7 Is there a standard query language for document databases, like SQL for relational
databases?
No.
L.8 What does CRUD stand for, and why is it important?
L.9 What are the restrictions in JSON on how a document can be constructed out of values,
arrays, and other documents?
L.10 What is a JSON array?
An array in JSON is an ordered collection of values, delimited by square brackets ([, ]). These
L.11 How does ArangoDB Enterprise Edition differ from ArangoDB Community Edition?
The Enterprise Edition is not free. It also comes with some useful enterprise-class features to
L.12 What are the names of the DBA user account and the associated administrative
database in ArangoDB?
The standard ArangoDB DBA account has the login name “root”. The default administrative
L.13 What is the name of the primary key field for the documents in an ArangoDB collection?
How is the value of that field assigned?
Every document in an ArangoDB collection has a field named “_key”. The user can assign a
L.14 What are the various ways to get document data into an ArangoDB collection?
One way is to use the ArangoDB GUI to create a document and type in data, as illustrated in this
L.15 What is an entry point?
An entry point is another term for a queryable collection in a document database. It serves as a
L.16 In the VRG2 database shown in Figure L-13, there is some nested data. Describe what
nesting means, and completely describe all the ways in which data is nested in the
VRG2 database.
Nesting of data occurs when one document is stored inside another document. In the VRG2
L.17 In the VRG3 database shown in Figure L-17, there is some nested data. Completely
describe all the ways in which data is nested in the VRG3 database.
In the VRG3 database, each work has its artist documented nested within the work document.
L.18 What is a workload?
A workload is the anticipated mix of common queries and updates for a database. Knowing
L.19 Under what workload scenario would the VRG3 database be preferred to the VRG2
database?
The VRG3 database structure will work best when most queries retrieve primarily information
L.20 What is the purpose of an AQL FOR loop?
The AQL FOR loop is similar to the FROM clause of an SQL query: it specifies a collection to
L.21 What is the purpose of an AQL RETURN clause?
The AQL RETURN clause is similar to the SELECT clause in SQL, but much more flexible. It
L.22 What is the purpose of the AQL FILTER operation?
The AQL FILTER operation is similar to an SQL WHERE clause: it places conditions on which
values/documents are to be retained for further processing by the query.
L.23 What are two ways to remove duplicates in AQL?
As in SQL, the RETURN clause has an optional DISTINCT keyword that will ensure no
L.24 What AQL operation can turn an array of arrays into an array?
The AQL FLATTEN operation will turn an array of arrays into an array. Note that this operation
L.25 What AQL syntax would be used to access the fourth customer in an array of customers
in the VRG3 database?
Array indexes in AQL begin with 0. Thus the fourth customer in any customer array would be
L.26 Describe how the “.” notation of SQL has been extended in AQL.
In SQL, the “.” notation is used to separate a table name from a column name, e.g.
L.27 What is the AQL syntax used to do grouping? What two things does the syntax allow us
to name, and why is that useful? How does it extend the similar capabilities of SQL?
The AQL keyword COLLECT is used to group in AQL, similar to how GROUP BY is used in
L.28 Translate SQL-Query-CH02-61 from SQL into AQL, assuming the relational structure
has been maintained in the document collections and that each table has become a
collection.
The SQL query in question is:
L.29 Under what circumstances should a secondary hash index be created?
Secondary hash indexes are most useful when we have a lot of queries that need to do an equality
L.30 Under what circumstances should a secondary skiplist index be created?
A skiplist index has similar performance characteristics to those of B-Trees (see Appendix G for
L.31 What is a sparse index?
A sparse index in ArangoDB is a secondary index in which documents with missing or null
L.32 What is a unique index?
A unique index in ArangoDB enforces uniqueness among the fields indexed. For example, a
L.33 What is an ArangoDB cluster?
L.34 What is data distribution?
Data distribution occurs when data from a single database are stored on more than one computer
in a cluster (network).
L.35 What is sharding, and how does ArangoDB decide how to assign documents to shards?
Sharding, which is a NoSQL term for horizontal partitioning or horizontal decomposition, occurs
L.36 What are the benefits of sharding in ArangoDB?
A collection that is too large for one computer can be stored using sharding. Also, if a query only
ANSWERS TO EXERCISES
L.37 Develop a document database for the data in the HSD SEMINAR,
SEMINAR_CUSTOMER, and CUSTOMER tables based on the HSD Database
described in Chapter 7. Your database should have two collections. Justify your design
decisions, and use the JSON syntax used in the examples in this appendix. You do not
need to include all data; at a minimum, include data for seminars 24 and customers 6
12.
Note that there are multiple ways to approach the data structuring. However, this question
The solutions below follow standard JSON syntax. The first seven documents belong in a
“customer” collection and the last three in a “seminar” collection.
{
_key: 6,
LastName: “Foxtrot”,
FirstName: “Kathy”,
EmailAddress: “Kathy.Foxtrot@somewhere.com”,
EncryptedPassword: “aa8tY4GL”,
Phone: “972-233-6234”,
StreetAddress: “11023 Elm Street”,
City: “Dallas”,
{
_key: 10,
LastName: “Ranger”,
FirstName: “Terry”,
EmailAddress: “Terry.Ranger@somewhere.com”,
EncryptedPassword: “bv3F9Qc4”,
Phone: “512-974-4455”,
StreetAddress: “56 East 18th Street”,
City: “Austin”,
State: “TX”,
ZIP: “78712”
}
{
_key: 12,
LastName: “Wayne”,
FirstName: “Joan”,
EmailAddress: “Joan.Wayne@elsewhere.com”,
{
{
_key: 3,
SeminarDate: “02-NOV-2017”,
SeminarTime: “08:30 AM”,
Location: “Austin Convention Center”,
SeminarTitle: “Kitchen on a Budget”,
Customers: [9, 10]
}
L.38 Based on your answer to Exercise L-37, describe four other design choices for the
SEMINAR_CUSTOMER many-to-many relationship, and a scenario in which each would
be the best choice.
Option 1: similar to the solution in Exercise L-37, except store a list of seminar _key values with
each customer instead of storing customer IDs with each seminar. This might be the best choice if
L.39 Download and install ArangoDB Community Edition on your own computer following the
instructions in this appendix (or, if you prefer, install it as a service on Microsoft
Windows).
L.40 Create an ArangoDB user named VRG-User.
A. Create a database named VRG1 associated with this user and insert the data
from Figures L-10 and L-12 into Artists and Customers collections, respectively.
Creating the database is self-explanatory. Follow the instructions on pages L-15 to L-19. To
[
{
“ArtistID”: 1,
“LastName”: “Miro”,
“FirstName”: “Joan”,
“Nationality”: “Spanish”,
“DateOfBirth”: 1893,
“DateDeceased”: 1983
{
“ArtistID”: 11,
“LastName”: “Sargent”,
“FirstName”: “John Singer”,
“Nationality”: “United States,
“DateOfBirth”: 1856,
“DateDeceased”: 1925
},
},
{
“ArtistID”: 18,
“LastName”: “Horiuchi”,
“FirstName”: “Paul”,
“Nationality”: “United States,
“DateOfBirth”: 1906,
“DateDeceased”: 1999
},
}
]
The Customers collection will look like this (this is the same data from Figure L-12(a)):
[ {
“CustomerID”: 1000,
“LastName”: “Janes”,
“FirstName”: “Jeffrey”,
“EmailAddress”: “Jeffrey.Janes@somewhere.com”,
“State”: “WA”,
},
{
“CustomerID”: 1001,
“LastName”: “Smith”,
“FirstName”: “David”,
“EmailAddress”: “David.Smith@somewhere.com”,
“State”: “CO”,
“ZIPorPostalCode”: 81201,
“Country”: “USA”,
“Interests”: [
1,
2,
4,
5,
11
]
]
B. Create a database named VRG2 associated with this user and insert the data
from Figure L-13 into an Artists collection.
Creating the database is self-explanatory. Follow the instructions on pages L-15 to L-19. To
“Nationality”: “Spanish”,
“DateOfBirth”: 1893,
“DateDeceased”: 1983,
“Customers”: [{
“CustomerID”: 1001,
“LastName”: “Smith”,
“FirstName”: “David,
“EmailAddress”: “David.Smith@somewhere.com”,
“State”: “CO”,
“ZIPorPostalCode”: 81201,
“Country”: “USA”
]
},
{
“ArtistID”: 2,
“LastName”: “Kandinsky”,
“FirstName”: “Wassily”,
“Nationality”: “Russian”,
“DateOfBirth”: 1866,
“DateDeceased”: 1944,
“Customers”: [{
“CustomerID”: 1001,
“LastName”: “Smith”,
“FirstName”: “David,
“EmailAddress”: “David.Smith@somewhere.com”,
“State”: “CO”,
“ZIPorPostalCode”: 81201,
“Country”: “USA”
{
“ArtistID”: 4,
“LastName”: “Matisse”,
“FirstName”: “Henri,
“Nationality”: “French”,
“DateOfBirth”: 1869,
“DateDeceased”: 1954,
“Customers”: [{
}],
“Works”: [{
“WorkID”: 524,
“Title”: “Woman With a Hat”,
“Medium”: “High Quality Limited Print”,
“Description”: “A very colorful Impressionist piece,
“Copy”: “596/750”
{
“ArtistID”: 5,
“LastName”: “Chagall”,
“FirstName”: “Marc”,
“Nationality”: “French”,
“DateOfBirth”: 1887,
“DateDeceased”: 1985,
“Customers”: [{
“CustomerID”: 1001,
“LastName”: “Smith”,
“FirstName”: “David,
“EmailAddress”: “David.Smith@somewhere.com”,
“State”: “CO”,
“ZIPorPostalCode”: 81201,
“Country”: “USA”
}],
]
},
{
“ArtistID”: 11,
“LastName”: “Sargent”,
“FirstName”: “John Singer”,
“Nationality”: “United States”,
“DateOfBirth”: 1856,
“DateDeceased”: 1925,
“Customers”: [{
“CustomerID”: 1001,
}
],
“Works”: [{
“WorkID”: 563,
“Title”: “Spanish Dancer”,
“Medium”: “High Quality Limited Print”,
“Description”: “American realist style – From work in Spain”,