Algorithms Chapter 20 Algorithms Sequential Parallel And Distributed String Matching And Document Processing Glance

Type Homework Help

Pages 9

Words 1754

Textbook Algorithms: Sequential-- Parallel-- and Distributed 1st Edition

Authors Jerome L. Paul, Kenneth A. Berman

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

Algorithms: Sequential, Parallel and Distributed 1-1

Chapter 20

String Matching and Document Processing

At a Glance

Table of Contents

• Overview

• Objectives

• Instructor Notes

• Solutions to Selected Exercises

Algorithms: Sequential, Parallel and Distributed 1-2

Lecture Notes

Overview

In Chapter 20 we discuss string matching algorithms including some classical algorithms

based on preprocessing the pattern string, an approximation string matching algorithm

and string matching algorithms based on preprocessing the text. We begin the chapter by

discussing the string matching problem and the naïve algorithm for solving the problem.

In the next three chapters we discuss three standard string-matching algorithms, the

Knuth, Morris, and Pratt (KMP) algorithm; the Boyer and Moore (BM) algorithm; and the

Karp and Rabin (KR) algorithm. The KMP and BM string matching algorithms preprocess

the pattern string so that information gained during a search for a match of the pattern

facilitate very fast searching of a pattern string.

Chapter Objectives

After completing the chapter the student will know:

• The KMP string-matching algorithm

• The BM string-matching algorithm

• The KR string-matching algorithm

• The concept of the edit distance between two strings

• A dynamic programming algorithm for computing the edit distance

• The concepts of trie, compressed trie (PATRICIA tree), and suffix tree

• Algorithms for searching tries, compressed tries and suffix trees for a given pattern

string

Algorithms: Sequential, Parallel and Distributed 1-3

Instructor Notes

The Karp-Rabin string matching algorithm can actually be considered as a Las Vegas

probabilistic algorithm (see Chapter 24). However, since the only probabilistic component of

the algorithm is to randomly choose the modulus q, it is appropriate to consider the algorithm

in this chapter before the notion of probabilistic algorithms is introduced.

NOTE: Ken needs to exand this. In section 20.5 we discuss an algorithm for determining

whether a pattern string P[0:m – 1]

is a k-approximation of a text string T[0:n – 1].

Solutions to Selected Exercises

Section 20.2 The Knuth-Morris-Pratt Algorithm

20.1 For this exercise we use the following version of KMPStringMatcher (the version given

in the first two printings of the text contains some redundant text comparisons). We show

that the strings P = 0m-11 and T = 0n that exhibit worst case (strongly asymptotic) mn

performance for the naïve string matcher also exhibit worst case 2n performance for

KMPStringMatcher. However, the improvement of KMPStringMatcher over the naïve

CreateNext(P[0:m – 1], Next[0:m – 1])

while i < n do

if P[j] = T[i] then

if j = m – 1 then //match found at position i – m + 1

return(i – m + 1)

could begin at position i – Next[j] in T

endif

endwhile

return(– 1)

20.3 For the string P = “cincinnati”, we have Next[0:9] = [0,0,0,0,1,2,3,0,0,0]. In part (a) of

the following figure, we show a trace of the values of i and j for each iteration of the while loop

in KMPStringMatcher. Positions marked with ! are where in the first statement of the while

(a)

(b)

20.4 We prove CreateNext is correct by induction on the loop variable i. (We remark that in

the first two printings of the text, the assignment j ← Next[j – 1] made within the while loop

should be replaced by j ← Next[j].)

Basis steps: i = 0,1. Trivial

Induction step: Assume that CreateNext correctly computes Next[i – 1], where i > 1. Now

consider the computation of the while loop for i. First notice that for the beginning of first

iteration of the while loop with the value i, the variable j has the value Next[i – 1]. To see this,

20.5 Following similar reasoning to that used to establish the Key Fact on p. 637 in the text,

20.7

function SimplifiedBMStringMatcher(P[0:m – 1],T[0:n – 1], A)

Input: P[0:m – 1] (a pattern string of length m)

T[0:n – 1] (a text string of length n)

A (finite alphabet from which characters for P and T are drawn)

20.8 Consider the pattern string P[0: m – 1] = 10m-1, and text string T[0: n – 1] = 0n. Note that

the shift value corresponding to the character 0 has the value of Shift[0] = 1. Since this is the

20.10 a. For the pattern P = “01212121” all entries of the array Shift[0: |A| – 1] not indexed

by 0,1,2 have value 8. Thus,

s1 = 8 – k, k = 1,2, …, 7, c



{0,1,2}.

For c



{0,1,2}, we have

20.11 For the pattern P = “amalgam” all entries of the array Shift[0: |A| – 1] not indexed

by a, g, l, m, have value 7. Thus,

s1 = 7 – k, k = 1,2, …, 6, c



{a, l, g, m}.

Algorithms: Sequential, Parallel and Distributed 1-7

s1 = 5 – k , c = m.

The good suffix shifts for P are 7,5,5,5,5,5, for k = 1,2, … ,6, respectively.

Section 20.4 The Karp-Rabin String-Matching Algorithm

20.15 a.

0 0 0 2 2

)11(

0 0 0 2 2

1 0 2 4 2

Algorithms: Sequential, Parallel and Distributed 1-8

20.16 Assuming that each entry in the m × m patterns P[0: m – 1, 0: m – 1] and T[s: s + m –

20.17 We have

]2[]3[][]1[ 21

1−++−++++−= −−

−msTkmsTksTksTT mm

s

20.19 a)

procedure EditDistance(T[0:n – 1],P[0:m – 1],D[0:n,0:m])

Input: T[0:n – 1],P[0:m – 1] (strings)

Output: D[0:n,0:m] (array such that D[i,j] is the edit distance between of T[0:i – 1] and

P[0:j – 1] )

for i ← 0 to n do //initialize for boundary conditions

D[i,0] ← i

b) Using text comparisons as the basic operation, it is clear that EditDistance has Θ(nm)

complexity.

20.21

Algorithms: Sequential, Parallel and Distributed 1-9

Section 20.5 Tries and Suffix Trees

20.22 Given a collection of strings C, where s is the total length over all strings in C, note that

the maximum number of nodes N(T) in the standard tree T for C occurs when each string has a

20.23 The algorithm proceeds by successively processing each string in C. When a new string

x is added to the tree, the algorithm checks whether x has a prefix that agrees with a prefix of a

string already stored in a child of the root. If not, a node containing x is added to the children

20.24 The algorithm checks to see if there is a child of the root whose string has a prefix that

agrees with a prefix of the given search string. If not, the string is not in the collection C. If so,

20.25 If ST is the compressed suffix (trie) tree for a text string T, and P[0:m-1] is a pattern

string, then we follow a path in ST in the manner described in the previous exercise. However,

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Algorithms Chapter 20 Algorithms Sequential Parallel And Distributed String Matching And Document Processing Glance

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Algorithms Chapter 20 Algorithms Sequential Parallel And Distributed String Matching And Document Processing Glance

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students