Machine Learning Chapter 13 Get Taking Into Account That Proves The Claim Combining The Formulae For

Type Homework Help

Pages 9

Words 2776

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

we get

βyT(y−Φµ) = β||y−Φµ||2+βµTΦTy−

µTΣ−1−Aµ

=β||y−Φµ||2+βµTΦTy−

nants, we obtain

∂ln |Σ−1|

∂αk

|Σ−1|

∂|Σ−1|

∂αk

∂αkµTAµ=µ2

k.(35)

∂αk

2Σkk +1

αk−1

2µ2

Equating to zero and setting

γk= 1 −αkΣkk,

13.12. Consider a two class classification task and assume that the feature vectors

in each one of the two classes, ω1, ω2, are distributed according to the

Gaussian pdf. Both classes share the same covariance matrix Σ, and the

mean values are µ1and µ2, respectively. Prove that, given an observed

feature vector, x∈Rl, the posterior probabilities for deciding in favor of

one of the classes is given by the logistic function, i.e.,

P(ω2|x) = 1

1 + exp (−θTx+θ0),

where

θ:= Σ−1(µ2−µ1),

and

θ0=1

2(µ2−µ1)TΣ−1(µ2+µ1) + ln P(ω1)

P(ω2)

Solution: We have that

p(x|ωi) = 1

(2π)l/2|Σ−1|1/2exp −1

2(x−µi)TΣ−1(x−µi), i = 1,2.

13.13. Derive equation (13.74).

Solution: Our starting point is the cost

J:=

n=1 ynln σ(θTφn) + (1 −yn) ln 1−σ(θTφn)−1

2θTAθ,(40)

where φn:= φ(xn). Taking the gradient with respect to θ, we get

σ(θTφn)φnσ′(θTφ) +

13.14. Show Equation (13.75).

Solution: By the respective definition we have that

σ(t) = 1

1 + exp(−t),

hence

an:= ynln sn+ (1 −yn) ln(1 −sn).

Then after some simple algebra and taking into account well known dif-

ferentiation rules concerning the logarithm, we readily obtain that

∂an

∂θ=ynφ(xn)−snφ(xn).

φT(xN)

as well as that

ln (P(y|θ)p(θ|α)) =

n=1 hynln σ(θTφ(xn)) + (1 −yn) ln 1−σ(θTφ(xn))i−

13.15. Derive the recursion (13.77).

Solution: Taking the logarithm of P(y|α) in (13.75) and keeping only

the terms which depend on α, we have,

ln P(y|α) = ln p(ˆ

θMAP|α)−1

13.16. Show that if fis a convex function f:Rl→R, then it is equal to the

conjugate of its conjugate, i.e., (f∗)∗=f.

Solution: Recall from the theory that

f∗(ξ) = ξTx∗−f(x∗),(45)

(45) and (46) we get

xTξ−f∗(ξ) = f(x∗) + (x−x∗)T∇f(x∗).(47)

13.17. Prove that

f(x) = ln λ

2−λ√x, x ≥0,

is a convex function.

Solution: It is known from the theory of convex functions that if d2f(x)

dx2≥

0, then f(x) is convex ([Boyd 04]). Hence,

2x−1

4x−3

13.18. Derive variational bounds for the logistic regression function

σ(x) = 1

1 + e−x,

one of them in terms of a Gaussian function. For the latter case, use the

transformation, t=√x.

and taking the derivative we obtain

x∗:ξ−e−x

1 + e−x= 0 ⇒

e−x∗=ξ

1−ξ⇒x∗= ln 1−ξ

ξ,0< ξ < 1.

2−ln exp(x/2) + exp(−x/2).(48)

Let us now define

f(x) := −ln exp(√x/2) + exp(−√x/2), x ≥0.(49)

We will first show that f(x) is a convex function. To this end we will

prove that the second derivative is nonnegative, in the respective domain

dx2=1

8xtanh(√x

√x−1

21−tanh2(√x

2),

where we have used the chain differentiation rule as well as the property,

dtanh(y)

we have to show that sinh(2y)

2y≥1.

However, this is always true for y≥0, from the known from the analysis

expansion

sinh(y) = y+y3

Thus we can now write that

f∗(ξ)≥ξx −f(x),

f(x)≥ξx −f∗(ξ).

Hence,

13.19. Prove equation (13.100).

Solution: We know that

Q(ξ, β;ξ(j),β(j)) = N

2ln β−N

2ln(2π)−β

2E[ky−Φθk2]

K−1

2ln(2π)−1

2ln |Ξ| − 1

K−1

E[θ2

ξk

we have

•

∂

ξk

K−1

k=0

ln φ(ξk) = 1

φ(ξk)φ′(ξk)

exp λ2

2ξk

∂

∂ξk

ln |Ξ|=−1

2ξ−1

•∂

K−1

E[θ2

13.20. Derive the mean and variance of G(Tk) for a DP process.

Solution: Recall the mean and variance values of a Dirichlet distribu-

tion from Chapter 2. Also, for the case of DPs, the parameters of the

associated Dirichlet distribution are ak=αG0(Tk). Then we get,

13.21. Show that the posterior DP, after having obtained nobservations from

the set Θ, is given by,

G|θ1,...,θn∼DP α+n, 1

α+n αG0+

i=1

δθi(θ)!!

Solution: By definition we have that

α′G′

0(Tk) = αG0(Tk) + nk.

Moreover, the above is true for all finite (measurable) partitions, e.g., for

disjoint Tks. Adding over kand taking into account that probabilities add

α+n αG0+

i=1

13.22. The stick breaking construction of a DP is built around the following rule:

P1=β1∼Beta(β|1, α) and,

βi∼Beta(β|1, α),

Pi=βi

i−1

j=1

(1 −βj), i ≥2.

Show that if the number of steps is finite, i.e., we assume that Pi= 0, i >

T, for some T, then βT= 1.

Solution: If we stop at a step T, then we should have,

i=1

Pi= 1

j=1

The last equality can be shown recursively. First, it holds true for T= 2,

because P2= 1 −P1= 1 −β1. In the sequel, assume that it is true for

some T > 2, i.e.,

T−1

i−1

T−1

j=1

T−1

j=1

(1 −βj)−βT

T−1

j=1

(1 −βj)

j=1

13.23. Show that in CRP, the cluster assignments are exchangeable and do not

depend on the sequence that customers arrive, up to a permutation of the

labels of the tables.

Solution: Let us assume that ncustomers have arrived and Kntables

(clusters) have been formed. Associate with each customer a label, yi,

(50). For the denominator, recall that each customer arrives only once.

Hence, each one of the terms n(k, j), j= 1,2, . . . , nk, and k= 1,2, . . . , Kn,

has a unique and distinct, from all the others, value in the set {1,2, . . . , n}.

13.24. Show that in an IBP, the probabilities for P(Z) and the equivalence

classes, P([Z]), are given by the formulae,

P(Z) =

k=1

Γmk+α

KΓ (N−mk+ 1)

ΓN+ 1 + α

K.

and

P([Z]) = K!

Q2N−1

h=0 Kh!

k=1

Γmk+α

KΓ (N−mk+ 1)

ΓN+ 1 + α

K,

respectively. Note that Kh, h = 1,2,...,2N−1, is the number of times

the row vector associated with the hth nonzero binary number appears in

Solution: From the text and the definition of P(Z), taking into account

that the probabilities are beta distributed, we get

k(1 −Pk)N−mkP

K−1

k=1

Bα

K,1Z1

K−1

The above integrand is the normalizing constant of a Beta Pk|mk+α

P(Z) =

k=1

Γα

KΓ(1)

ΓN+ 1 + α

K

Γmk+α

KΓ (N−mk+ 1)

Z∈[Z]

permutations of Kobjects, grouped in K0,K1,. . . ,K2N−1groups is known

from combinatorics to be equal to

K

K0, K1, . . . , K2N−1=K!

Q2N−1

following arguments. Out of the Krows, K0are the zero ones. Then the

total number of permutations of these zero rows are,

K

K0=K!

K0!(K−K0)!.

Now, for each one of the above permuted matrices, we make all possi-

13.25. Show that the discarded pieces, πk, in the stick-breaking construction of

an IBP are equal to the sequence of probabilities produced in a DP stick-

breaking construction.

Solution: The sequence of the discarded segments is equal to

π1= 1 −β1,

j=1

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 13 Get Taking Into Account That Proves The Claim Combining The Formulae For

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 13 Get Taking Into Account That Proves The Claim Combining The Formulae For

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students