Machine Learning Chapter 8 Txk Kttx Let Max Then Obviously Can Write Ttxk Bkx Ktx Where

Type Homework Help

Pages 11

Words 2616

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

kx−T2T1(x)k2≤2µ1

2−µ1

(kx−yk2− kT1(x)−yk2)

+2µ2

2−µ2

(kT1(x)−yk2− kT2T1(x)−yk2).

Let

23. Show the fundamental POCS theorem for the case of closed subspaces in

a Hilbert space, H.

Solution: Fact 1: The relaxed projection operator is self adjoint, i.e.,

hx, TCi(y)i=hTCi(x),yi,∀x,y∈H.

hx, T (y)i=hx, TCK· · · TC1(y)

=hTCK(x), TCK−1· · · TC1(y)

=. . .

=hTC1· · · TCK(x),yi

=hT∗(x),yi,

Hence,

(I−T∗)(x) = x−T∗(x) = 0,

T∗(x) = x,

and since T∗and Thave the same fixed point set (the proof trivial),

S⊥⊆C.

are now ready to establish strong convergence.

The repeated application of Ton any x∈Hleads to Tn(x) = (TTT)n(x).

We know that ∀x∈Hthere is a unique decomposition into two orthogonal

complement (closed) subspaces, i.e.,

x=y+z,y∈Cand z∈C⊥,∀x∈H

24. Derive the subdifferential of the metric distance function dC(x), where C

is a closed convex set C⊆Rland x∈Rl.

Solution: By definition we have

∂dC(x) = {g:gT(y−x) + dC(x)≤dC(y),∀y∈Rl}.

Thus let gbe a subgradient, then

Since this is true ∀y, let

y:y−x=g⇒ kgk2≤ kgk ⇒ kgk ≤ 1,

g∈B[0,1].

a) Let x/∈Cand gany subgradient. For any y∈Rl,

gT(y+PC(x)−x)≤dC(y+PC(x)) −dC(x).

−gT(x−PC(x)) ≤ −kx−PC(x)k,

gT(x−PC(x)) ≥ kx−PC(x)k.

However,

kgk ≤ 1,

and recalling the Cauchy-Schwartz inequality, we obtain

and for any y∈C

gT(y−x)≤0,kgk ≤ 1.(20)

If in addition xis an interior point, there will be ε > 0 : ∀z∈Rl

gT(x−ε(z−x)−x)≤0,

since x−ε(z−x)∈Cand the condition (20) has been used. Thus,

g=0.

This completes the proof.

25. Derive the bound in (8.55).

Solution: Subtracting θ∗from both sides of the recursion, squaring and

taking into account the definition of the subgradient, it is readily shown

that,

k=1

2Pi

k=1 µk

2Pi

k=1 µk

26. Show that if a function is γ-Lipschitz, then any of its subgradients is

bounded.

Solution: By the definition of the subgradient we have that ∀u, v we have

27. Show the convergence of the generic projected subgradient algorithm in

(8.61).

Solution: Let us break the iteration into two steps,

z(i)=θ(i−1) −µiJ′(θ(i−1)),(22)

θ(i)=PC(z(i)).(23)

28. Derive equation (8.100).

Solution: By the definition

Jn(θ) = 1

k=1

L(yk,xk,θ).

29. Consider the online version of PDMb in (8.64), i.e.,

θn=(PCθn−1−µn

J(θn−1)

||J′(θn−1)||2J′(θn−1),If J′(θn−1)6=0,

PC(θn−1),If J′(θn−1) = 0,(26)

where we have assumed that J∗= 0. If this is not the case, a shift

can accommodate for the difference. Thus we assume that we know the

minimum. For example, this is the case for a number tasks, such as the

hinge loss function, assuming linearly separable classes, or the linear ǫ-

insensitive loss function, for bounded noise. Assume that

Ln(θ) =

k=n−q+1

ωkdCk(θn−1)

k=n−q+1 ωkdCk(θn−1)dCk(θ)

Then derive that APSM algorithm of (8.39).

Solution: Let the loss function be

Ln(θ) =

k=n−q+1

ωkdCk(θn−1)

k=n−q+1 ωkdCk(θn−1)dCk(θ) =

k=n−q+1

βkdCk(θ),

k=n−q+1

θn−1−PCk(θn−1)

n(θn−1) = 1

k=n−q+1

ωk(θn−1−PCk(θn−1)),with

ωkdCk(θn−1).

k=n−q+1

ωk(θn−1−PCk(θn−1))

=θn−1+µ′

nM



ωk(PCk(θn−1)−θn−1))



30. Derive the regret bound for the subgradient algorithm in (8.83).

Solution: From the text, we have that

Ln(θn−1)− Ln(h)≤gT

n(θn−1−h)

Summing up both sides, results in

Ln(θn−1)−

Ln(h)≤

2µn||θn−1−h||2− ||θn−h||2+

2µ2

.........

2µN

||θN−1−h||2−1

2µN

||θN−h||2+

2µN

||θN−1−h||2+G2

n=1

µn,

Taking into account the bound ||θn−h||2≤F2, and selecting the step-size

31. Show that a function f(x) is σ-strongly convex if and only if the function

f(x)−σ

2||x||2is convex.

Solution:

a) Assume that

f(x)−σ

2||x||2,

is convex. Then, by the definition of the subgradient at x, we have

2||y−x||2,(32)

from which the strong convexity of f(x) is deduced.

b) Assume that f(x) is strongly convex. Then by its definition we have,

2||x||2is convex.

32. Show that if the loss function is σ-strongly convex, then if µn=1

σn , the

regret bound for the subgradient algorithm becomes

n=1

Ln(θn−1)≤1

n=1

Ln(θ∗) + G2(1 + ln N)

2σN .(35)

Solution: Taking into account the strong convexity we have that,

Ln(θn−1)− Ln(θ∗)≤gT

n(θn−1−θ∗)−σ

2||θn−1−θ∗||2,(36)

and following similar arguments as for Problem 30, we get

2σ||θ1−θ∗||2− ||θ2−θ∗||2−σ||θ1−θ∗||2+

..................................................................

Nσ||θN−1−θ∗||2− ||θN−θ∗||2−σ||θN−1−θ∗||2+

n=1

σn

≤G2

n=1

σn .

33. Consider a batch algorithm that computes the minimum of the empirical

loss function, θ∗(N), having a quadratic convergence rate, i.e.,

ln ln 1

||θ(i)−θ∗(N)||2∼i.

Show that an online algorithm, running for ntime instants so that to spend

the same computational processing resources as the batch one, achieves

for large values of Nbetter performance than the batch algorithm, i.e.,

[12],

||θn−θ∗||2∼1

Nln ln N<< 1

N∼ ||θ∗(N)−θ∗||2.

Hint: Use the fact that

||θn−θ∗||2∼1

n,and ||θ∗(N)−θ∗||2∼1

Solution: Let Kbe the number of operations per iteration for the on-

line algorithm. This amounts to a total of Kn operations. The batch

algorithm, in order to make sense, should perform O(ln ln N) operations,

34. Show property (8.111) for the proximal operator.

Solution: Assume first that p= Proxλf (x). By definition,

f(p) + 1

2λkx−pk2≤f(v) + 1

2λkx−vk2,∀v∈Rl.

Since the previous inequality holds true for any v∈Rl, it also holds true

2α2kv−pk2−αhx−p,v−pi.

After re-arranging terms in the previous relation,

λf(p)≤λf(v) + 1

2αkv−pk2− hx−p,v−pi,∀α∈(0,1).

Application of limα→0on both sides of the previous inequality results in

=f(v) + 1

2λkx−vk2+1

2λkv−pk2−1

λkv−pk2

35. Show property (8.112) for the proximal operator.

Solution: For compact notations, define pj:= Proxλf (xj), j= 1,2. Then,

36. Prove that the recursion in (8.118) converges to a minimizer of f.

Solution: Define the mapping R:= 2 Proxλf −I. Then, (8.118) takes

the following form:

xk+1 =xk+µk

2R(xk)−xk

Notice that Ris non-expansive: ∀x1,x2∈Rl,

kR(x1)−R(x2)k2=

2Proxλf (x1)−Proxλf (x2)−(x1−x2)



= 4 kProxλf (x1)−Proxλf (x2)k2+kx1−x2k2

In turn, let zbe a fixed point, then

kxk+1 −zk2=



1−µk

2(xk−z) + µk

2R(xk)−z



=1−µk

2kxk−zk2+µk

2kR(xk)−zk2

Hence, ∀k,

Given any non-negative integer k0, the previous telescoping inequality is

utilized for all k∈ {0, . . . , k0}to produce

Since the previous relation holds for any k0, applying limk0→∞ on both

sides of the inequality results into

k=0

Moreover, notice that

kProxλf (xk+1)−xk+1k=1

2kR(xk+1)−xk+1k

2



R(xk+1)−R(xk) + 1−µk

2R(xk)−xk



Since (kProxλf (xk)−xkk)k∈Nis monotonically non-increasing, and bounded

from below, it converges. Necessarily, limk→∞ kProxλf (xk)−xkk2= 0.

Otherwise, there exists an ǫ > 0 and a subsequence (km)m∈Nsuch that

This together with the fact that limm→∞ Pkm

i=0 µi(2−µi) = +∞, and (39)

imply that

+∞>

+∞

k=0

µk(2 −µk)kProxλf (xk)−xkk2

+∞

≤ kx∗−xkmk2+kxkm−Proxλf (xkm)k2+kxkm−x∗k2

+ 2 hxkm−Proxλf (xkm),Proxλf (xkm)−Proxλf (x∗)i

−2kx∗−xkmk2+ 2 hx∗−xkm,x∗−Proxλf (x∗)i

≤ kxkm−Proxλf (xkm)k2

+ 2 kxkm−Proxλf (xkm)k kProxλf (xkm)−Proxλf (x∗)k

point. To this end, assume two cluster points x,yof (xk)k∈N. This means

that there exist subsequences (xkm)m∈Nand (xlm)m∈Nwhich converge to

xand y, respectively. Moreover, notice that

hxk,x−yi=1

2kxk−yk2− kxk−xk2+kxk2− kyk2.

37. Derive (8.122) from (8.121).

Solution: Use the matrix inversion lemma

(A+BD−1C)−1=A−1−A−1B(D+CA−1B)−1CA−1,

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 8 Txk Kttx Let Max Then Obviously Can Write Ttxk Bkx Ktx Where

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 8 Txk Kttx Let Max Then Obviously Can Write Ttxk Bkx Ktx Where

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students