Machine Learning Chapter 5 Solutions Problems Show That The Gradient Vector Perpendicular The Tangent Point

subject Type Homework Help
subject Pages 9
subject Words 1138
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
1
Solutions To Problems of Chapter 5
5.1. Show that the gradient vector is perpendicular to the tangent at a point
of an isovalue curve.
Solution: The differential of the cost function, J(θ), at a point θ(i), is
given by
5.2. Prove that if
X
i=1
µ2
i<,
X
i=1
µi=,
the steepest descent scheme, for the MSE loss function and for the iteration-
dependent step size case, converges to the optimal solution.
i
X
k=1
µ2
kTJ(θ(k1))J(θ(k1))
2
i
X
k=1
µk(θ(k1) θ)TJ(θ(k1)).
page-pf2
(θ(i)θ)T(θ(i)θ)(θ(0) θ)T(θ(0) θ)A
i
X
k=1
µ2
k
2
i
X
k=1
µk(θ(k1) θ)TJ(θ(k1)),
5.3. Derive the steepest gradient descent direction for the complex-valued case.
Solution: We know from the text that
J(θ+ ∆θ) = J(θr,θi)+∆θT
rrJ(θr,θi) +
page-pf3
5.4. Let θ,x be two jointly distributed random variables. Let also the function
(regression)
f(θ) = E[x|θ],
assumed to be an increasing one. Show that under the conditions in (5.29)
the recursion
page-pf4
4
and also we assume that ηnand θnare independent. Let θbe a root.
Then, we form
θnθ=θn1θµnf(θn1)µnηn.
Taking the expectation of the square we get
i=1
Assume now that
E[f2(θn1)] b < .
Then we obtain the following,
E[(θnθ)2]E[(θ0θ)2(b+σ2)
n
X
µ2
i
n
X
provided a single root exists.
Thus,
f(θ)(θθ)0,(2)
and hence,
E[(θiθ)f(θi)] 0.
page-pf5
5.5. Show that the LMS algorithm is a nonlinear estimator.
Solution: Consider the basic LMS recursion
θn=θn1+µxnen
5.6. Show equation (5.42).
Solution: Our starting point is
Σc,n =Σc,n1µΣxΣc,n1µΣc,n1Σx
page-pf6
5.7. Derive the bound in (5.45).
Hint: Use the well known property from linear algebra, that the eigenval-
ues of a matrix, ARl×l, satisfy the following bound,
max
1il|λi| ≤ max
1il
l
X
|aij |:= kAk1.
i
j=1
This is a suffcient condition for stability and guarantees that all eigenval-
ues of Ahave magnitude less than one, or
l
X
5.8. Gershgorin circle theorem. Let Abe an l×lmatrix, with entries aij ,
i, j = 1,2, . . . , l. Let Ri:= Pl
j=1
j6=i
|aij |, be the sum of absolute values of
the non-diagonal entries in row i. Then show that if λis an eigenvalue of
page-pf7
5.9. Apply the Gershgorin circle theorem to prove the bound in (5.45).
Solution: First note that the matrix
A= (IµΛ)2+µ2Λ2+µ2λλT
is non-negative definite. Indeed xRl, we have that
page-pf8
5.10. Derive the misadjustment formula given in (5.52).
Solution: Taking into account that in steady state, sn=sn1:= sin
(5.42), for the ith element we can write
si= (1 2µλi+ 2µ2λ2
i)si+µ2λiX
j
λjsj+µ2σ2
ηλi.
It is common to neglect the third term in the parenthesis on the right
5.11. Derive the APA iteration scheme.
Solution: The optimization task is
θn= arg min
θkθθn1k2
2
i=0
=θn1+1
2XT
nλ,(5)
page-pf9
9
where
xT
n
.
5.12. Given a value x, define the hyperplane comprising all values of θsuch as
xTθy= 0.
it.
5.13. Derive the recursions for the widely-linear APA.
Solution: Define
en=ynφH
n1˜
xn,
page-pfa
with
yn= [yn, . . . , ynq+1]T.
The Lagrangian becomes
q1
X
5.14. Show that a similarity transformation of a square matrix, via a unitary
matrix, does not affect the eigenvalues.
Solution: Let Σbe a matrix and let
´=THΣT.
page-pfb
5.15. Show that if xRlis a Gaussian random vector, then
F:= E[xxTSxxT] = Σxtrace{SΣx}+ 2ΣxSΣx
and if xCl,
F:= E[xxHSxxH] = Σxtrace{SΣx}+ΣxSΣx
Solution: Let
λ1· · · 0
F0=QHF Q =E[QHxxHQQHSQQHxxHQ]
=E[x0x0HS0x0x0H],
where
S0=QHSQ.
For real data, the matrix under the expectation will comprise entries
F=Σxtrace{SΣx}+ 2ΣxSΣx.
For complex valued data, we assume circular symmetric variable,
thus E[x2
k] = E[x02
k] = 0, which results in
F=ΣxTrace{SΣx}+ΣxSΣx.
page-pfc
5.16. Show that if a l×lmatrix Cis right stochastic, then all its eigenvalues
satisfy
|λi| ≤ 1, i = 1,2, . . . , l.
The same holds true for left and doubly stochastic matrices.
C1=1.
5.17. Prove Theorem 5.2
Solution: Collect all estimates from all nodes at iteration, i, together
in a common vector,
θ(i)= [θT(i)
1,θT(i)
2,...,θT(i)
K]T;θkRl;k= 1,2, . . . , K.
page-pfd
K
k=1
Then, by the definition of the Kronecker product it can be written as
x=1
K1TIlθ(0).
Define
ci
k:= θ(i)
kx,
Using the identity
(BC)(DE)=(BD CE),
we obtain
c(i)=ATi1
K11TIlIlθ(0)
K11Ti
= (AT)i1
K11T.(11)
page-pfe
14
This is because of the doubly stochastic assumption on A. Indeed
AT1
= (AT)21
which generalizes to higher powers by induction. Hence
c(i)= AT1
K11Ti
Il!θ(0)
= AT1
K11TIl!i
θ(0),
lim
i→∞ ATiθ(0) = (1Il)x
=1
K(1Il)(1TIl)θ(0)
for any θ(0). Hence, this implies that
lim
i→∞ ATi=1
K(1Il)(1TIl)
page-pff
i→∞ ATi+1 .
Thus 1
KAT11T=1
K11T,
or
AT1=1,
that is, it is left stochastic. By considering
lim
A1=1.
Since we have proved that Ais doubly stochastic, then (11) holds

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.