Machine Learning Chapter 6 Solutions Problems Show That Nonnegative Definite Its Trace Nonnegative Solution The

subject Type Homework Help
subject Pages 9
subject Words 3309
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
1
Solutions To Problems of Chapter 6
6.1. Show that if ACm×mis nonnegative definite, its trace is nonnegative.
Solution: By the definition of a positive semidefinite matrix, xCm,
6.2. Show that under a) the independence assumption of successive observation
vectors and b) the presence of white noise independent of the input, then
the LS estimator is asymptotically distributed according to the normal
distribution, i.e.,
N(θθ0)→ N(0, σ2Σ1
x),
where σ2is the noise variance and Σxthe covariance matrix of the input
observation vectors, assuming that it is invertible.
Solution: Recall that according to the law of large numbers, we have
1
N
N
X
n=1
xnxT
nΣx,
Thus, the covariance matrix of the sum of independent terms in
1
N
N
X
xnηn(1)
n=1
page-pf2
get
N(θθ0) = 1
N
N
X
n=1
xnxT
n!1 1
N
N
X
n=1
xnηn!
6.3. Let XCm×l. Then show that the two matrices,
XXHand XHX,
have the same nonzero eigenvalues.
Solution: Let λibe an eigenvalue of XHX. Then
(XHX)vi=λivi
X(XHX)vi=λiXvi
6.4. Show that if XCm×l, then the eigenvalues of XXH(XHX) are real
and nonnegative. Moreover, show that if λi6=λj,vivj.
Solution: By the definition we have,
XXHvi=λivi
vH
iXXHvi=λikvik2
page-pf3
6.5. Let XCm×l. Then show that if viis the normalized eigenvector of
XHX, corresponding to λi6= 0, then the corresponding normalized eigen-
vector uiof XXHis given by,
ui=1
λi
Xvi
Solution: From Problem 6.3, we know that the eigenvectors XXHand
XHXcorresponding to λiare related as,
qi=Xvi.
By the respective definition we have
XXHvi=λivi
6.6. Show Eq. (6.19).
Solution: From the respective definitions, we get
page-pf4
6.7. Show that the right singular vectors, v1,...,vr, corresponding to the r
singular values of a rank-rmatrix, X, solve the following iterative opti-
mization task: compute vk, k = 2,3, . . . , r, such as,
minimize 1
2||Xv||2,
subject to ||v||2= 1,
v⊥ {v1,...,vk1}, k 6= 1,
where || · || denotes the Euclidean norm.
Solution: We start with k= 1, to solve the (Rayleigh ratio) task
1
The corresponding Lagrangian becomes
v
2||Xv||2, k = 1,2, . . . , r,
s.t. ||v||2= 1,
vv1.,
The Lagrangian is now given by,
page-pf5
6.8. Show that projecting the rows of Xonto the k-rank subspace, Vk=
span{v1,...,vk}, results in the largest variance, compared to any other
k-dimensional subspace, Zk.
Solution: The projection of a row xnof Xonto Vkis given by
ˆ
xn=
k
X
i=1
(xT
nvi)vi,
i=1
Then, projecting all the rows of the matrix onto Vkresults in a total
variance,
N
X
n=1 ||ˆ
xn||2=
N
X
n=1
k
X
i=1
(xT
nvi)2=
k
X
i=1 ||Xvi||2.(5)
to (5).
Assume that Vk1is the optimal (k1)-dimensional subspace. Build
an orthonormal basis in any Zk, around a zkso that
page-pf6
6
6.13),
r
X
r
X
6.9. Show that the squared Frobenius norm is equal to the sum of the squared
singular values.
Solution: By the definition of the Frobenius norm,
kXk2
F:= X
page-pf7
6.10. Show that the best krank approximation of a matrix Xof rank r > k, in
the Frobenius norm sense, is given by:
ˆ
X=
k
X
i1
σiuivT
i,
where σiare the singular values and vi,ui, i = 1,2, . . . , r, are the right
and left singular vectors of X, respectively. Then show that the approxi-
mation error is given by:
v
u
u
t
r
X
i=k+1
σ2
i.
Solution: Let,
ˆ
X:=
k
X
i=1
σiuivT
i=
k
X
i=1
(Xvi)vT
i.(6)
From (6), it is readily deduced that the rows of ˆ
X,ˆ
xT
n, n = 1,2, . . . , N,
are projections of the respective rows of Xonto the subspace spanned by
equivalent with the smallest error norm ||en||. Hence, the corresponding
Frobenius norm,
||Xˆ
X||2
F=
N
X
n=1 ||en||2,
is the smallest one. We have proved that from all possible approximations
page-pf8
For the error matrix we have that,
E=
r
X
i=k+1
σiuivT
i.
Thus,
X
r
X
6.11. Show that ˆ
X, as given in Problem 6.10, also minimizes the spectral norm
and that,
||Xˆ
X||2=σk+1.
Solution: We first show that,
||Xˆ
X||2=σk+1.
i=1
page-pf9
Then, we have that
||(Xˆ
X)v||2=||
r
X
i=k+1
(σiuivT
i)
l
X
j=1
ajvj||2
=||
r
X
aiσiuivT
ivi||2
r
X
Bis a rank kmatrix, its null space will be of dimension lk. Thus, by
basic dimension arguments,
S:= N(B)span{v1,...,vk,vk+1} 6=.
Hence, z6=0S,||z|| = 1. Then, by the definition of the spectral
norm, and taking into account that Bz=0, we get
i=1
j=1
=||
k+1
X
i=1
σiui(vT
iz)||2
k+1
X
k+1
X
6.12. Show that the Frobenius and spectral norms are unaffected by multipli-
cation with unitary matrices, i.e.,
kXkF=kQXUkF
page-pfa
10
and
kXk2=kQXUk,
if QQT=UUT=I.
Solution: The proof for the Frobenius was given in Problem 6.9. From the
definition of the the spectral norm, we have that
kXk2=σ1,
6.13. Show that the null and range spaces of a m×lmatrix, X, of rank rare
given by,
N(X) = span{vr+1,...,vl},
R(X) = span{u1,...,ur},
where
X= [u1,...,um]D O
O O
vT
1
.
.
.
vT
l
.
Solution: Recall that
X=
r
X
i=1
σiuivT
i.
Hence, aRl,
r
X
6.14. Show that for the ridge regression
ˆ
y=
l
X
i=1
σ2
i
λ+σ2
i
(uT
iy)ui
page-pfb
11
Solution: We have that
ˆ
y=UlDV T
lVlDUT
lUlDV T
l+λI1VlDUT
ly
=UlDV T
lVlD2VT
l+λVlVT
l1VlDUT
ly
i=1
λ+σ2
i
6.15. Show that the normalized steepest descent direction of J(θ) at a point θ0,
for the quadratic norm kvkPis given by
v=1
||P1J(θ0)||P
P1J(θ0).
Solution: The task is
minimize
v
vTJ,
s.t. vTPv= 1.
2λ=(J)TP1J
6.16. Justify why the convergence of Newton’s iterative minimization method
is relatively insensitive on the Hessian matrix.
Hint: Let Pbe a positive definite matrix. Define a change of variables,
˜
θ=P1
2θ,
page-pfc
12
and carry gradient descent minimization based on the new variable.
Solution: Let ˜
θ=P1
2θ.
Note that kθk2
p:= θTPθ. Also, define,
θ(˜
Then the gradient descent step for ˜
J(˜
θ) will be
be equal to
6.17. Show that the steepest descent direction, v, of J(θ) at a point, θ0, con-
strained to
kvk1= 1,
is given by ek, where ekis the standard basis vector in the direction, k,
such that
page-pfd
=kvk1kak.
Hence,
where eithe direction corresponding to the component associated with
kak.
b) Since the `1norm is not differentiable, the notion of subgradient will
be mobilized. We have
where kvk1
vis the subdifferential set for kvk1. Then, taking into account
the subgradient of the absolute value, (10) can be written component wise
as
{1}, vj>0
page-pfe
which guarantees that
6.18. Show that the TLS solution is given by,
ˆ
θ=XTX¯σ2
l+1I1XTy,
where ¯σl+1 is the smallest singular value of [X.
.
.y].
Solution: By the respective definition we have that,
XTX XTy
yTXyTyˆ
θT LS
1= ¯σ2
l+1 ˆ
θT LS
1,
6.19. Given a set of centered data points, (yn,xn)Rl+1, derive a hyperplane
aTx+y= 0,
which crosses the origin, such as the total square distance of all the points
from it to be minimum.
Solution: Let
wT= [aT,1]T.
page-pff
15
Then we will search for the wgiven by
minimize
w
N
X
kwk2.
Let
xT
1y1
zT
1
Then,
N
X
n=1 |wTzn|2=wTZTZw.

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.