Machine Learning Chapter 11 Solutions Problems Derive The Formula For The Number Groupings Covers Theo

Type Homework Help

Pages 9

Words 1914

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

Solutions To Problems of Chapter 11

11.1. Derive the formula for the number of groupings O(N, l) in Cover’s theo-

rem.

Hint: Show first the following recursion

O(N+ 1, l) = O(N, l) + O(N, l −1).

To this end, start with Npoints and add an extra one. Show that the

extra number of linear dichotomies is solely due to those, for the Ndata

point case, which could be drawn via the new point.

Solution: Let us assume that we start with Npoints in the l-dimensional

space, and we add an extra point P. Then, the O(N, l) old dichotomies

fall into either of the two categories.

Thus, the total number of dichotomies with the N+ 1 points will be

O(N+ 1, l) = O(N, l) + O(N, l −1).(1)

The second term is the number of dichotomies of Npoints in the l-

dimensional space that are constrained to pass through a specific point

Then, assume it is true for some k. We will show that it is also true for

k+ 1. We have that

Thus

k

r=k!

r!(k−r)!,=k!

r!(k−r−1)!(k−r),

k

r+ 1 =k!

r!(r+ 1)(k−r−1)!,

k

(2) k=N−2 and taking into account that

O(1, l) = 2, l ≥0,

we obtain

O(N, l) = 2

i.

11.2. Show that if N= 2(l+ 1), the number of linear dichotomies in Cover’s

theorem is equal to 22l+1.

Hint: Use the identity

i=1 j

i= 2j,

and recall that 2n+ 1

n−i+ 1=2n+ 1

n+i.

Solution: From the theory, we have that for N= 2l+ 2,

2l+1

i=0 2l+ 1

i= 22l+1.

11.3. Show that the reproducing kernel is a positive definite one.

Solution: Consider N > 0, the real numbers a1, . . . , aN, and the elements

x1,...,xN∈ X . Then

anamκ(xn,xm) =

anamhκ(·,xn), κ(·,xm)i



n=1



11.4. Show that if κ(·,·) is the reproducing kernel in a RKHS, H, then

H= span{κ(·,x), x ∈ X }.

Solution: We will show that the only function in Hwhich is orthogonal

to A= span{κ(·,x), x ∈ X } is the zero function. Let f∈Hbe a function

11.5. Show the Cauchy-Schwarz inequality for kernels, that is,

kκ(x,y)k2≤κ(x,x)κ(y,y).

Solution: Let

φ(x) := κ(·,x),

and

φ(y) := κ(·,y).

11.6. Show that if

κi(·,·) : X × X 7−→ R, i = 1,2

are kernels then:

•κ(x,y) = κ1(x,y) + κ2(x,y) is also a kernel.

•aκ(x,y), a > 0 is also a kernel.

•κ(x,y) = κ1(x,y)κ2(x,y) is also a kernel.

Solution: Recall that it suffices to show that the respective kernel matrices

of the new constructed functions are positive semidefinite.

•For the addition, the i, j element of Kis given by

11.7. Derive Equation (11.25).

Solution: The starting point is

n=1 yn−

m=1

Substitute in the regularizer the form of fto get

hf, fi=*N

n=1

θnκ(·,xn),

m=1

θmκ(·,xm)+

n=1

θn*κ(·,xn),

m=1

θmκ(·,xm)+

The first term can be written as

11.8. Show that the solution for the parameters, ˆ

θ, for the kernel ridge regres-

sion, if a bias term, b, is present, is given by

K+CI 1

1TKN θ

b=y

yT1,

where 1is the vector with all its elements being equal to one. Invertibility

of the kernel matrix has been assumed.

Solution: In this case, the unknown coefficients are estimated by mini-

mizing J(θ, b), where

J(θ, b) =

n=1 yn−

m=1

θmκ(xn,xm)−b!+Chf, fi.(3)

Hence, we obtain the system of equations

11.9. Derive Equation (11.56).

Solution: The dual representation of the Lagrangian in (11.55) can be

written as,

L(λ) = yTλ−1

4CλTKλ−1

4λTλ,

11.10. Derive the dual cost function associated with the linear -insensitive loss

function.

Solution: From the text, the Lagrangian is given by

L(θ, θ0,˜

ξ,ξ,λ,µ) = 1

2kθk2+C N

ξn+

ξn!

n=1

Then we obtain

L=1

(˜

λn−λn)(˜

λm−λm)xT

nxm+C N

ξn+

ξn!

n=1

Taking into account, from the KKT conditions given in the text, that

C−˜

λn−˜µn=C−λn−µn= 0, n = 1,2, . . . , N,

and that

n=1

λn=

n=1

λn,

n=1

m=1

11.11. Derive the dual cost function for the separable class SVM formulation.

Solution: The Lagrangian is given by

L(θ, θ0,λ) = 1

2kθk2−

n=1

λnyn(θTxn+θ0)−1.

n=1

n=1

m=1

11.12. Derive the kernel approximation in Eq. (11.91)

11.13. Derive the subgradient for the Huber loss function.

Solution: The Huber loss function is given by

2|y−z|2,if |y−z| ≤ ,

where z=f(x). Hence, for |y−z|> 

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 11 Solutions Problems Derive The Formula For The Number Groupings Covers Theo

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 11 Solutions Problems Derive The Formula For The Number Groupings Covers Theo

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students