Database Storage & Design Appendix L Homework It is important to convey to the students that the topic

INSTRUCTOR’S MANUAL

TO ACCOMPANY

40th Anniversary Edition

DATABASE PROCESSING

Fundamentals, Design, and Implementation

15th Edition

Appendix L

JSON and Document Databases

David M. Kroenke | David J. Auer | Scott L. Vandenberg | Robert C. Yoder

❖ CHAPTER OBJECTIVES

• To learn the basic concepts of NoSQL document databases

• To learn the structuring facilities of JSON

❖ ERRATA

There are no known errors at this time. Any errors that are discovered in the future will

❖ TEACHING SUGGESTIONS

• Coverage of this topic begins in Chapter 12 (Data Warehouses, Business

Intelligence Sytems, and Big Data), which is expanded upon on Appendix K (Big

Data). This appendix is ideally read after Chapter 12 (actually, the last section of

• Nearly all of the content of this appendix is applicable to any other document

database, aside from the specifics of the ArangoDB Query Language (AQL), but

even there, many of the concepts will transfer. Thus it is possible to use this

appendix with other document databases, if one is careful to note the differences.

❖ ANSWERS TO REVIEW QUESTIONS

L.1 What is a document?

A document is a set of (field name, value) pairs. It can have any number of such pairs. A value

L.2 What is a document database?

A document database typically consists of a number of named collections (sets or arrays) of

documents. These are somewhat analogous to tables in the relational model.

L.3 Why do we say that the data in a document database is semi-structured?

The documents have some structure (field names and array, for example), but the documents in a

L.4 What is different about the kinds of data and queries typically involved in a document

database compared to a relational database?

The data are both more complex (in structure) and more fluid (no fixed schema) in a document

L.5 What is JSON, and what are its main features?

JSON stands for JavaScript Object Notation. It is a very simple data model (the complete

L.6 What is ArangoDB? How is it different from many other document databases?

ArangoDB is a multimodel NoSQL DBMS. It supports document, graph, and key-value

L.7 Is there a standard query language for document databases, like SQL for relational

databases?

No.

L.8 What does CRUD stand for, and why is it important?

L.9 What are the restrictions in JSON on how a document can be constructed out of values,

arrays, and other documents?

L.10 What is a JSON array?

An array in JSON is an ordered collection of values, delimited by square brackets ([, ]). These

L.11 How does ArangoDB Enterprise Edition differ from ArangoDB Community Edition?

The Enterprise Edition is not free. It also comes with some useful enterprise-class features to

L.12 What are the names of the DBA user account and the associated administrative

database in ArangoDB?

The standard ArangoDB DBA account has the login name “root”. The default administrative

L.13 What is the name of the primary key field for the documents in an ArangoDB collection?

How is the value of that field assigned?

Every document in an ArangoDB collection has a field named “_key”. The user can assign a

L.14 What are the various ways to get document data into an ArangoDB collection?

One way is to use the ArangoDB GUI to create a document and type in data, as illustrated in this

L.15 What is an entry point?

An entry point is another term for a queryable collection in a document database. It serves as a

L.16 In the VRG2 database shown in Figure L-13, there is some nested data. Describe what

nesting means, and completely describe all the ways in which data is nested in the

VRG2 database.

Nesting of data occurs when one document is stored inside another document. In the VRG2

L.17 In the VRG3 database shown in Figure L-17, there is some nested data. Completely

describe all the ways in which data is nested in the VRG3 database.

In the VRG3 database, each work has its artist documented nested within the work document.

L.18 What is a workload?

A workload is the anticipated mix of common queries and updates for a database. Knowing

L.19 Under what workload scenario would the VRG3 database be preferred to the VRG2

database?

The VRG3 database structure will work best when most queries retrieve primarily information

L.20 What is the purpose of an AQL FOR loop?

The AQL FOR loop is similar to the FROM clause of an SQL query: it specifies a collection to

L.21 What is the purpose of an AQL RETURN clause?

The AQL RETURN clause is similar to the SELECT clause in SQL, but much more flexible. It

L.22 What is the purpose of the AQL FILTER operation?

The AQL FILTER operation is similar to an SQL WHERE clause: it places conditions on which

values/documents are to be retained for further processing by the query.

L.23 What are two ways to remove duplicates in AQL?

As in SQL, the RETURN clause has an optional DISTINCT keyword that will ensure no

L.24 What AQL operation can turn an array of arrays into an array?

The AQL FLATTEN operation will turn an array of arrays into an array. Note that this operation

L.25 What AQL syntax would be used to access the fourth customer in an array of customers

in the VRG3 database?

Array indexes in AQL begin with 0. Thus the fourth customer in any customer array would be

L.26 Describe how the “.” notation of SQL has been extended in AQL.

In SQL, the “.” notation is used to separate a table name from a column name, e.g.

L.27 What is the AQL syntax used to do grouping? What two things does the syntax allow us

to name, and why is that useful? How does it extend the similar capabilities of SQL?

The AQL keyword COLLECT is used to group in AQL, similar to how GROUP BY is used in

L.28 Translate SQL-Query-CH02-61 from SQL into AQL, assuming the relational structure

has been maintained in the document collections and that each table has become a

collection.

The SQL query in question is:

L.29 Under what circumstances should a secondary hash index be created?

Secondary hash indexes are most useful when we have a lot of queries that need to do an equality

L.30 Under what circumstances should a secondary skiplist index be created?

A skiplist index has similar performance characteristics to those of B-Trees (see Appendix G for

L.31 What is a sparse index?

A sparse index in ArangoDB is a secondary index in which documents with missing or null

L.32 What is a unique index?

A unique index in ArangoDB enforces uniqueness among the fields indexed. For example, a

L.33 What is an ArangoDB cluster?

L.34 What is data distribution?

Data distribution occurs when data from a single database are stored on more than one computer

in a cluster (network).

L.35 What is sharding, and how does ArangoDB decide how to assign documents to shards?

Sharding, which is a NoSQL term for horizontal partitioning or horizontal decomposition, occurs

L.36 What are the benefits of sharding in ArangoDB?

A collection that is too large for one computer can be stored using sharding. Also, if a query only

❖ ANSWERS TO EXERCISES

L.37 Develop a document database for the data in the HSD SEMINAR,

SEMINAR_CUSTOMER, and CUSTOMER tables based on the HSD Database

described in Chapter 7. Your database should have two collections. Justify your design

decisions, and use the JSON syntax used in the examples in this appendix. You do not

need to include all data; at a minimum, include data for seminars 2–4 and customers 6–

12.

Note that there are multiple ways to approach the data structuring. However, this question

The solutions below follow standard JSON syntax. The first seven documents belong in a

“customer” collection and the last three in a “seminar” collection.

{

_key: 6,

LastName: “Foxtrot”,

FirstName: “Kathy”,

EmailAddress: “Kathy.Foxtrot@somewhere.com”,

EncryptedPassword: “aa8tY4GL”,

Phone: “972-233-6234”,

StreetAddress: “11023 Elm Street”,

City: “Dallas”,

{

_key: 10,

LastName: “Ranger”,

FirstName: “Terry”,

EmailAddress: “Terry.Ranger@somewhere.com”,

EncryptedPassword: “bv3F9Qc4”,

Phone: “512-974-4455”,

StreetAddress: “56 East 18th Street”,

City: “Austin”,

State: “TX”,

ZIP: “78712”

}

{

_key: 12,

LastName: “Wayne”,

FirstName: “Joan”,

EmailAddress: “Joan.Wayne@elsewhere.com”,

{

_key: 3,

SeminarDate: “02-NOV-2017”,

SeminarTime: “08:30 AM”,

Location: “Austin Convention Center”,

SeminarTitle: “Kitchen on a Budget”,

Customers: [9, 10]

}

L.38 Based on your answer to Exercise L-37, describe four other design choices for the

SEMINAR_CUSTOMER many-to-many relationship, and a scenario in which each would

be the best choice.

Option 1: similar to the solution in Exercise L-37, except store a list of seminar _key values with

each customer instead of storing customer IDs with each seminar. This might be the best choice if

L.39 Download and install ArangoDB Community Edition on your own computer following the

instructions in this appendix (or, if you prefer, install it as a service on Microsoft

Windows).

L.40 Create an ArangoDB user named VRG-User.

A. Create a database named VRG1 associated with this user and insert the data

from Figures L-10 and L-12 into Artists and Customers collections, respectively.

Creating the database is self-explanatory. Follow the instructions on pages L-15 to L-19. To

[

{

“ArtistID”: 1,

“LastName”: “Miro”,

“FirstName”: “Joan”,

“Nationality”: “Spanish”,

“DateOfBirth”: 1893,

“DateDeceased”: 1983

{

“ArtistID”: 11,

“LastName”: “Sargent”,

“FirstName”: “John Singer”,

“Nationality”: “United States“,

“DateOfBirth”: 1856,

“DateDeceased”: 1925

},

{

“ArtistID”: 18,

“LastName”: “Horiuchi”,

“FirstName”: “Paul”,

“Nationality”: “United States“,

“DateOfBirth”: 1906,

“DateDeceased”: 1999

},

}

]

The Customers collection will look like this (this is the same data from Figure L-12(a)):

[ {

“CustomerID”: 1000,

“LastName”: “Janes”,

“FirstName”: “Jeffrey”,

“EmailAddress”: “Jeffrey.Janes@somewhere.com”,

“State”: “WA”,

},

{

“CustomerID”: 1001,

“LastName”: “Smith”,

“FirstName”: “David”,

“EmailAddress”: “David.Smith@somewhere.com”,

“State”: “CO”,

“ZIPorPostalCode”: 81201,

“Country”: “USA”,

“Interests”: [

1,

2,

4,

5,

11

]

B. Create a database named VRG2 associated with this user and insert the data

from Figure L-13 into an Artists collection.

Creating the database is self-explanatory. Follow the instructions on pages L-15 to L-19. To

“Nationality”: “Spanish”,

“DateOfBirth”: 1893,

“DateDeceased”: 1983,

“Customers”: [{

“CustomerID”: 1001,

“LastName”: “Smith”,

“FirstName”: “David“,

“EmailAddress”: “David.Smith@somewhere.com”,

“State”: “CO”,

“ZIPorPostalCode”: 81201,

“Country”: “USA”

]

},

{

“ArtistID”: 2,

“LastName”: “Kandinsky”,

“FirstName”: “Wassily”,

“Nationality”: “Russian”,

“DateOfBirth”: 1866,

“DateDeceased”: 1944,

“Customers”: [{

“CustomerID”: 1001,

“LastName”: “Smith”,

“FirstName”: “David“,

“EmailAddress”: “David.Smith@somewhere.com”,

“State”: “CO”,

“ZIPorPostalCode”: 81201,

“Country”: “USA”

{

“ArtistID”: 4,

“LastName”: “Matisse”,

“FirstName”: “Henri“,

“Nationality”: “French”,

“DateOfBirth”: 1869,

“DateDeceased”: 1954,

“Customers”: [{

}],

“Works”: [{

“WorkID”: 524,

“Title”: “Woman With a Hat”,

“Medium”: “High Quality Limited Print”,

“Description”: “A very colorful Impressionist piece“,

“Copy”: “596/750”

{

“ArtistID”: 5,

“LastName”: “Chagall”,

“FirstName”: “Marc”,

“Nationality”: “French”,

“DateOfBirth”: 1887,

“DateDeceased”: 1985,

“Customers”: [{

“CustomerID”: 1001,

“LastName”: “Smith”,

“FirstName”: “David“,

“EmailAddress”: “David.Smith@somewhere.com”,

“State”: “CO”,

“ZIPorPostalCode”: 81201,

“Country”: “USA”

}],

]

},

{

“ArtistID”: 11,

“LastName”: “Sargent”,

“FirstName”: “John Singer”,

“Nationality”: “United States”,

“DateOfBirth”: 1856,

“DateDeceased”: 1925,

“Customers”: [{

“CustomerID”: 1001,

}

],

“Works”: [{

“WorkID”: 563,

“Title”: “Spanish Dancer”,

“Medium”: “High Quality Limited Print”,

“Description”: “American realist style – From work in Spain”,