Machine Learning Chapter 8 Txk Kttx Let Max Then Obviously Can Write Ttxk Bkx Ktx Where

subject Type Homework Help
subject Pages 11
subject Words 2616
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
19
or
kxT2T1(x)k22µ1
2µ1
(kxyk2− kT1(x)yk2)
+2µ2
2µ2
(kT1(x)yk2− kT2T1(x)yk2).
Let
23. Show the fundamental POCS theorem for the case of closed subspaces in
a Hilbert space, H.
Solution: Fact 1: The relaxed projection operator is self adjoint, i.e.,
hx, TCi(y)i=hTCi(x),yi,x,yH.
page-pf2
20
hx, T (y)i=hx, TCK· · · TC1(y)
=hTCK(x), TCK1· · · TC1(y)
=. . .
=hTC1· · · TCK(x),yi
=hT(x),yi,
Hence,
(IT)(x) = xT(x) = 0,
or
T(x) = x,
and since Tand Thave the same fixed point set (the proof trivial),
SC.
page-pf3
are now ready to establish strong convergence.
The repeated application of Ton any xHleads to Tn(x) = (TTT)n(x).
We know that xHthere is a unique decomposition into two orthogonal
complement (closed) subspaces, i.e.,
x=y+z,yCand zC,xH
24. Derive the subdifferential of the metric distance function dC(x), where C
is a closed convex set CRland xRl.
Solution: By definition we have
dC(x) = {g:gT(yx) + dC(x)dC(y),yRl}.
Thus let gbe a subgradient, then
page-pf4
Since this is true y, let
y:yx=g⇒ kgk2≤ kgk ⇒ kgk ≤ 1,
or
gB[0,1].
a) Let x/Cand gany subgradient. For any yRl,
gT(y+PC(x)x)dC(y+PC(x)) dC(x).
gT(xPC(x)) ≤ −kxPC(x)k,
or
gT(xPC(x)) ≥ kxPC(x)k.
However,
kgk ≤ 1,
and recalling the Cauchy-Schwartz inequality, we obtain
page-pf5
and for any yC
gT(yx)0,kgk ≤ 1.(20)
If in addition xis an interior point, there will be ε > 0 : zRl
gT(xε(zx)x)0,
since xε(zx)Cand the condition (20) has been used. Thus,
g=0.
This completes the proof.
25. Derive the bound in (8.55).
Solution: Subtracting θfrom both sides of the recursion, squaring and
taking into account the definition of the subgradient, it is readily shown
that,
k=1
k=1
page-pf6
24
2Pi
k=1 µk
2Pi
k=1 µk
26. Show that if a function is γ-Lipschitz, then any of its subgradients is
bounded.
Solution: By the definition of the subgradient we have that u, v we have
27. Show the convergence of the generic projected subgradient algorithm in
(8.61).
Solution: Let us break the iteration into two steps,
z(i)=θ(i1) µiJ(θ(i1)),(22)
θ(i)=PC(z(i)).(23)
page-pf7
28. Derive equation (8.100).
Solution: By the definition
Jn(θ) = 1
n
n
X
k=1
L(yk,xk,θ).
29. Consider the online version of PDMb in (8.64), i.e.,
θn=(PCθn1µn
J(θn1)
||J(θn1)||2J(θn1),If J(θn1)6=0,
PC(θn1),If J(θn1) = 0,(26)
where we have assumed that J= 0. If this is not the case, a shift
can accommodate for the difference. Thus we assume that we know the
minimum. For example, this is the case for a number tasks, such as the
hinge loss function, assuming linearly separable classes, or the linear ǫ-
insensitive loss function, for bounded noise. Assume that
Ln(θ) =
n
X
k=nq+1
ωkdCk(θn1)
Pn
k=nq+1 ωkdCk(θn1)dCk(θ)
Then derive that APSM algorithm of (8.39).
Solution: Let the loss function be
Ln(θ) =
n
X
k=nq+1
ωkdCk(θn1)
Pn
k=nq+1 ωkdCk(θn1)dCk(θ) =
n
X
k=nq+1
βkdCk(θ),
page-pf8
´
k=nq+1
k=nq+1
θn1PCk(θn1)
or
L
´
n(θn1) = 1
L
n
X
k=nq+1
ωk(θn1PCk(θn1)),with
L=
n
X
ωkdCk(θn1).
1
L
n
X
k=nq+1
ωk(θn1PCk(θn1))
=θn1+µ
nM
n
X
ωk(PCk(θn1)θn1))
n
X
30. Derive the regret bound for the subgradient algorithm in (8.83).
page-pf9
27
Solution: From the text, we have that
Ln(θn1)− Ln(h)gT
n(θn1h)
Summing up both sides, results in
N
X
Ln(θn1)
N
X
Ln(h)
N
X
1
2µn||θn1h||2− ||θnh||2+
N
X
2µ2
2µ2
.........
1
2µN
||θN1h||21
2µN
||θNh||2+
N
X
1
2µN
||θN1h||2+G2
2
N
X
n=1
µn,
Taking into account the bound ||θnh||2F2, and selecting the step-size
page-pfa
31. Show that a function f(x) is σ-strongly convex if and only if the function
f(x)σ
2||x||2is convex.
Solution:
a) Assume that
f(x)σ
2||x||2,
is convex. Then, by the definition of the subgradient at x, we have
2||yx||2,(32)
from which the strong convexity of f(x) is deduced.
b) Assume that f(x) is strongly convex. Then by its definition we have,
2||x||2is convex.
32. Show that if the loss function is σ-strongly convex, then if µn=1
σn , the
regret bound for the subgradient algorithm becomes
1
N
N
X
n=1
Ln(θn1)1
N
N
X
n=1
Ln(θ) + G2(1 + ln N)
2σN .(35)
Solution: Taking into account the strong convexity we have that,
Ln(θn1)− Ln(θ)gT
n(θn1θ)σ
2||θn1θ||2,(36)
and following similar arguments as for Problem 30, we get
page-pfb
2σ||θ1θ||2− ||θ2θ||2σ||θ1θ||2+
..................................................................
Nσ||θN1θ||2− ||θNθ||2σ||θN1θ||2+
G2
N
X
1
G2
n=1
σn
G2
N
X
n=1
1
σn .
page-pfc
33. Consider a batch algorithm that computes the minimum of the empirical
loss function, θ(N), having a quadratic convergence rate, i.e.,
ln ln 1
||θ(i)θ(N)||2i.
Show that an online algorithm, running for ntime instants so that to spend
the same computational processing resources as the batch one, achieves
for large values of Nbetter performance than the batch algorithm, i.e.,
[12],
||θnθ||21
Nln ln N<< 1
N∼ ||θ(N)θ||2.
Hint: Use the fact that
||θnθ||21
n,and ||θ(N)θ||21
N.
Solution: Let Kbe the number of operations per iteration for the on-
line algorithm. This amounts to a total of Kn operations. The batch
algorithm, in order to make sense, should perform O(ln ln N) operations,
34. Show property (8.111) for the proximal operator.
Solution: Assume first that p= Proxλf (x). By definition,
f(p) + 1
2λkxpk2f(v) + 1
2λkxvk2,vRl.
Since the previous inequality holds true for any vRl, it also holds true
page-pfd
2α2kvpk2αhxp,vpi.
After re-arranging terms in the previous relation,
λf(p)λf(v) + 1
2αkvpk2− hxp,vpi,α(0,1).
Application of limα0on both sides of the previous inequality results in
=f(v) + 1
2λkxvk2+1
2λkvpk21
λkvpk2
35. Show property (8.112) for the proximal operator.
Solution: For compact notations, define pj:= Proxλf (xj), j= 1,2. Then,
page-pfe
36. Prove that the recursion in (8.118) converges to a minimizer of f.
Solution: Define the mapping R:= 2 Proxλf I. Then, (8.118) takes
the following form:
xk+1 =xk+µk
2R(xk)xk
Notice that Ris non-expansive: x1,x2Rl,
kR(x1)R(x2)k2=
2Proxλf (x1)Proxλf (x2)(x1x2)
2
= 4 kProxλf (x1)Proxλf (x2)k2+kx1x2k2
In turn, let zbe a fixed point, then
kxk+1 zk2=
1µk
2(xkz) + µk
2R(xk)z
2
=1µk
2kxkzk2+µk
2kR(xk)zk2
Hence, k,
Given any non-negative integer k0, the previous telescoping inequality is
page-pff
utilized for all k∈ {0, . . . , k0}to produce
Since the previous relation holds for any k0, applying limk0→∞ on both
sides of the inequality results into
k=0
Moreover, notice that
kProxλf (xk+1)xk+1k=1
2kR(xk+1)xk+1k
=1
2
R(xk+1)R(xk) + 1µk
2R(xk)xk
Since (kProxλf (xk)xkk)kNis monotonically non-increasing, and bounded
from below, it converges. Necessarily, limk→∞ kProxλf (xk)xkk2= 0.
Otherwise, there exists an ǫ > 0 and a subsequence (km)mNsuch that
This together with the fact that limm→∞ Pkm
i=0 µi(2µi) = +, and (39)
imply that
+>
+
X
k=0
µk(2 µk)kProxλf (xk)xkk2
+
X
+
X
page-pf10
≤ kxxkmk2+kxkmProxλf (xkm)k2+kxkmxk2
+ 2 hxkmProxλf (xkm),Proxλf (xkm)Proxλf (x)i
2kxxkmk2+ 2 hxxkm,xProxλf (x)i
≤ kxkmProxλf (xkm)k2
+ 2 kxkmProxλf (xkm)k kProxλf (xkm)Proxλf (x)k
point. To this end, assume two cluster points x,yof (xk)kN. This means
that there exist subsequences (xkm)mNand (xlm)mNwhich converge to
xand y, respectively. Moreover, notice that
hxk,xyi=1
2kxkyk2− kxkxk2+kxk2− kyk2.
page-pf11
37. Derive (8.122) from (8.121).
Solution: Use the matrix inversion lemma
(A+BD1C)1=A1A1B(D+CA1B)1CA1,

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.