Machine Learning Chapter 18 Solutions Problems Prove That The Perceptron Algorithm Its Patternbypattern Mode Operation

Type Homework Help

Pages 9

Words 2654

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

Solutions To Problems of Chapter 18

18.1. Prove that the perceptron algorithm, in its pattern-by-pattern mode of

operation, converges in a finite number of iteration steps. Assume that

θ(0) =0.

Hint: Note that since classes are assumed to be linearly separable, there

exists a normalized hyperplane, θ∗, and a γ > 0, so that

γ≤ynθT

∗xn, n = 1,2, . . . , N,

where, ynis the respective label, being +1 for ω1and −1 for ω2. By the

term normalized hyperplane, we mean that,

θT

∗= [ ˆ

θ∗, θ0∗]T,with ||ˆ

θ∗|| = 1.

In this case, ynθT

∗xnis the distance of xnfrom the hyperplane θ∗([164]).

Solution: Let us assume that we are currently at iteration i. This means

that so far i−1 update corrections have been performed by the algo-

rithm. The ith update will be performed only if the current sample, x(i),

is misclassified, i.e., only if

y(i)xT

(i)θ(i−1) ≤0,

and the update will be

Also from (1), we have that

||θ(i)||2=||θ(i−1)||2+ 2µy(i)xT

(i)θ(i−1) + (4)

µ2y2

(i)||x(i)||2

18.2. The derivative of the sigmoid functions has been computed in Problem

7.6. Compute the derivative of the hyperbolic tangent activation function

and show that it is equal to,

f0(z) = ac1−f2(z).

Solution: We have that,

f(z) = atanh(cz)⇒f0(z) = (ac) cosh−2cz)

18.3. Show that the effect of the momentum term in the gradient descent back-

propagation scheme is to effectively increase the learning convergence rate

of the algorithm.

Hint: Assume that the gradient is approximately constant over Isucces-

sive iterations.

Solution Let idenote the current iteration, i.e.,

∆θr

j(i) = a∆θr

j(i−1) −µg,(8)

18.4. Show that if a) the activation function is the hyperbolic tangent, b) the

input variables are normalized to zero mean and unit variance, then in

order to guarantee that all the outputs of the neurons are zero mean and

unit variance, the weights must be drawn from a distribution of zero mean

and standard deviation equal to

σ=m−1/2,

where mis the number of synaptic weights associated with the correspond-

ing neuron.

Hint: For simplicity, consider the bias to be zero, and also that the inputs

to each neuron are mutually uncorrelated.

Solution: Let us focus on the weights of a specific neuron, and denote

by yi, i = 1,2, . . . , m, the respective inputs. Then, the input to the acti-

vation functions will be given by

i=1

Assume that the inputs to the neuron are zero mean and unit variance

and are also uncorrelated, then

σ2

i=E[y2

i] (12)

and

E[yiyj] = δij .(13)

18.5. Consider the sum of error squares cost function

J=1

n=1

m=1

(ˆynm −ynm)2.(15)

Compute the elements of the Hessian matrix

∂2J

∂θr

kj ∂θr0

k0j0

.(16)

Near the optimum, show that the second order derivatives can be approx-

imated by

∂2J

∂θr

kj ∂θr0

k0j0

n=1

m=1

∂ˆynm

∂θr

∂ˆynm

∂θr0

k0j0

.(17)

In other words, the second order derivatives can be approximated as prod-

ucts of the first order derivatives. The derivatives can be computed by fol-

lowing similar arguments as the gradient descent backpropagation scheme,

[71].

Solution:

J=1

n=1

m=1

(ˆynm −ynm)2

∂J

∂θr

n=1

m=1

∂ˆynm

∂θr

(ˆynm −ynm)

∂2J

∂θr0

∂2ˆynm

∂θr0

(ˆynm −ynm)

∂ˆynm

18.6. It is common when computing the Hessian matrix, to assume that it is

diagonal. Show, that under this assumption, the quantities

∂2E

∂(θkj )2,

where

L)2=f00 (zL

•∂2E

∂(zr−1

k)2= (f0(zr−1

j))2Pk∂2E

∂(zr

k)2(θr

kj )2+f00 (zr−1

j)Pkr

k=1 θr

klδr

∂(θr

kj )2=∂2E

∂(zr

j)2(yr−1

k)2.(20)

For the backpropagation of the involved second derivatives, we have

I:r=L:∂E

=f0(zL

j)ej

We know from the text that,

δr−1

δr

∂zr

∂zr−1

δr

kθr

kj f0(zr−1

j) (21)

k=1

∂zr−1

∂2E

∂(zr−1

θr

kj δr

θr

∂δr

∂zr−1

f0(zr−1

j) (22)

∂(zr

k)2θr

Thus, finally we get,

∂2E

18.7. Show that if the activation function is the logistic sigmoid and the loss

function in (18.44) is used, then δL

nj in (18.21) becomes

δL

nj =a(ˆynj −ynj ).

If the cross-entropy in Eq. (18.41) is used, then it is equal to

δL

nj =aynj (ˆynj −1).

Solution: The cost function is

n=1

Jn,

18.8. Show that if the activation function is the logistic sigmoid and the relative

entropy loss function is used, then δL

nj in (18.21) becomes,

δL

nj =a(ˆynj −1)ynj .

Solution: From the theory, we have that

18.9. Show that the cross-entropy loss function depends on the relative output

errors.

Solution: We will prove it for one of the versions, and similar is the proof

for the other. Taking into consideration the limiting value 0 ln 0 = 0, we

can write the cross-entropy cost as,

J=−

n=1

k=1 ynk ln ˆynk

ynk

+ (1 −ynk) ln 1−ˆynk

1−ynk .

18.10. Show that if the activation function is the softmax and the loss function

is the cross-entropy (or the loss in (18.44)), then δL

nj in (18.21) does not

depend on the derivatives of the nonlinearity.

Solution: We will show it for Eq. (18.44) and for the cross-entropy it

Now,

∂ˆynj

∂zL

=∂

∂zL

nj )

PkL

m=1 exp(zL

nm)− exp(zL

m=1 exp(zL

nm)!2

= ˆynj (1 −ˆynj ).(23)

Also,

∂Jn

=−ynj

+1−ynj

=ˆynj −ynj

18.11. As in the previous problem, use the relative entropy as the cost function

and the softmax activation function. Then show that,

δL

nj = ˆynj −ynj .

Solution:

Jn=−

ynk ln ˆynk

ynk

nk)

k=1

18.12. Derive the backpropagation through time algorithm for training recurrent

neural networks.

Solutions: The starting set of equations are those defining an RNN, i.e.,

hn=f(Uxn+Whn−1+b),

∂wij

n=1

∂hni

∂wij

.(25)

Note from the defining RNN equations, all the wij ,j= 1,2, . . . , kh, ele-

∂J

∂W =

n=1

diag 1−h2

ni:1,kh∂J

∂hn

n−1,(27)

∂hN(where the re-

cursion starts), ∂ˆ

∂hnand ∂hn+1

∂hn. The gradient of the cost w.r. to the

outputs is straightforward, and depends on the specific loss function, see,

e.g., Problem 18.10.

Problem 18.10.

(b) Also, following similar arguments as before,

∂hn+1

= diag(1 −h2

(n+1)i:1,kh)W.

is straightforward.

The above have completed the calculations of the gradient of Jw.r. to

•For the gradient w.r. to V, we get,

∂J

∂V =

∂J

∂zn

∂c=

n=1 ∂zn

∂cT∂J

∂zn

n=1

∂zn

•The gradient w.r. to bis equal to,

∂J

∂b=

∂bT∂J

∂hn

18.13. Derive the gradient of the log-likelihood in (18.74).

Solutions: To simplify slightly the notation, we will consider only one

term of the sum. Thus we have,

∂P (v; Θ)

=∂

ln X

exp (−E)−(29)

∂θij

18.14. Prove that for the case of RBMs, the conditional probabilities are given

by the following factorized form,

P(h|v) =

exp PJ

j=1 θij vj+bihi

18.15. Derive Eq. (18.83).

Solution: The integrand is equal to

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 18 Solutions Problems Prove That The Perceptron Algorithm Its Patternbypattern Mode Operation

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 18 Solutions Problems Prove That The Perceptron Algorithm Its Patternbypattern Mode Operation

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students