Machine Learning Chapter 11 Solutions Problems Derive The Formula For The Number Groupings Covers Theo

subject Type Homework Help
subject Pages 9
subject Words 1914
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
1
Solutions To Problems of Chapter 11
11.1. Derive the formula for the number of groupings O(N, l) in Cover’s theo-
rem.
Hint: Show first the following recursion
O(N+ 1, l) = O(N, l) + O(N, l 1).
To this end, start with Npoints and add an extra one. Show that the
extra number of linear dichotomies is solely due to those, for the Ndata
point case, which could be drawn via the new point.
Solution: Let us assume that we start with Npoints in the l-dimensional
space, and we add an extra point P. Then, the O(N, l) old dichotomies
fall into either of the two categories.
Thus, the total number of dichotomies with the N+ 1 points will be
O(N+ 1, l) = O(N, l) + O(N, l 1).(1)
The second term is the number of dichotomies of Npoints in the l-
dimensional space that are constrained to pass through a specific point
Then, assume it is true for some k. We will show that it is also true for
k+ 1. We have that
page-pf2
2
Thus
k
r=k!
r!(kr)!,=k!
r!(kr1)!(kr),
k
r+ 1 =k!
r!(r+ 1)(kr1)!,
k
(2) k=N2 and taking into account that
O(1, l) = 2, l 0,
we obtain
O(N, l) = 2
l
X
i.
page-pf3
3
11.2. Show that if N= 2(l+ 1), the number of linear dichotomies in Cover’s
theorem is equal to 22l+1.
Hint: Use the identity
j
X
i=1 j
i= 2j,
and recall that 2n+ 1
ni+ 1=2n+ 1
n+i.
Solution: From the theory, we have that for N= 2l+ 2,
2l+1
X
i=0 2l+ 1
i= 22l+1.
11.3. Show that the reproducing kernel is a positive definite one.
Solution: Consider N > 0, the real numbers a1, . . . , aN, and the elements
x1,...,xN X . Then
N
X
N
X
anamκ(xn,xm) =
N
X
N
X
anamhκ(·,xn), κ(·,xm)i
n=1
11.4. Show that if κ(·,·) is the reproducing kernel in a RKHS, H, then
H= span{κ(·,x), x X }.
page-pf4
4
Solution: We will show that the only function in Hwhich is orthogonal
to A= span{κ(·,x), x X } is the zero function. Let fHbe a function
11.5. Show the Cauchy-Schwarz inequality for kernels, that is,
kκ(x,y)k2κ(x,x)κ(y,y).
Solution: Let
φ(x) := κ(·,x),
and
φ(y) := κ(·,y).
page-pf5
11.6. Show that if
κi(·,·) : X × X 7− R, i = 1,2
are kernels then:
κ(x,y) = κ1(x,y) + κ2(x,y) is also a kernel.
(x,y), a > 0 is also a kernel.
κ(x,y) = κ1(x,y)κ2(x,y) is also a kernel.
.
Solution: Recall that it suffices to show that the respective kernel matrices
of the new constructed functions are positive semidefinite.
For the addition, the i, j element of Kis given by
11.7. Derive Equation (11.25).
Solution: The starting point is
N
X
n=1 yn
N
X
m=1
page-pf6
6
Substitute in the regularizer the form of fto get
hf, fi=*N
X
n=1
θnκ(·,xn),
N
X
m=1
θmκ(·,xm)+
=
N
X
n=1
θn*κ(·,xn),
N
X
m=1
θmκ(·,xm)+
N
X
N
X
The first term can be written as
11.8. Show that the solution for the parameters, ˆ
θ, for the kernel ridge regres-
sion, if a bias term, b, is present, is given by
K+CI 1
1TKNθ
b=y
yT1,
where 1is the vector with all its elements being equal to one. Invertibility
of the kernel matrix has been assumed.
Solution: In this case, the unknown coefficients are estimated by mini-
mizing J(θ, b), where
J(θ, b) =
N
X
n=1 yn
N
X
m=1
θmκ(xn,xm)b!+Chf, fi.(3)
page-pf7
7
Hence, we obtain the system of equations
11.9. Derive Equation (11.56).
Solution: The dual representation of the Lagrangian in (11.55) can be
written as,
L(λ) = yTλ1
4CλTKλ1
4λTλ,
11.10. Derive the dual cost function associated with the linear -insensitive loss
function.
Solution: From the text, the Lagrangian is given by
L(θ, θ0,˜
ξ,ξ,λ,µ) = 1
2kθk2+C N
X
ξn+
N
X
˜
ξn!
N
X
˜
n=1
page-pf8
8
Then we obtain
L=1
2
N
X
N
X
(˜
λnλn)(˜
λmλm)xT
nxm+C N
X
ξn+
N
X
˜
ξn!
N
X
˜
N
X
n=1
n=1
Taking into account, from the KKT conditions given in the text, that
C˜
λn˜µn=Cλnµn= 0, n = 1,2, . . . , N,
and that
N
X
n=1
˜
λn=
N
X
n=1
λn,
2
n=1
m=1
11.11. Derive the dual cost function for the separable class SVM formulation.
Solution: The Lagrangian is given by
L(θ, θ0,λ) = 1
2kθk2
N
X
n=1
λnyn(θTxn+θ0)1.
n=1
page-pf9
n=1
2
n=1
m=1
11.12. Derive the kernel approximation in Eq. (11.91)
11.13. Derive the subgradient for the Huber loss function.
Solution: The Huber loss function is given by
2|yz|2,if |yz| ≤ ,
where z=f(x). Hence, for |yz|> 

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.