Machine Learning Chapter 13 Solutions Problems Show Solution The Functional Defined Plugging The Mean Field

Type Homework Help

Pages 9

Words 2907

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

Solutions To Problems of Chapter 13

13.1. Show Eq. (13.5).

Solution: The functional F(q) is defined as

F(q) = Zq(Xl,θ) ln p(X,Xl,θ)

13.2. Show equation (13.38).

Solution: From Eq. (13.37) in the text we have

ln q(j+1)

α(α) = Eq(j+1)

K−1

ln αk−1

K−1

αkθ2

k#(4)

K−1

q(j+1)

α(α)∝

k=0

α(a−1+ 1

kexp −b+1

2Eq(j+1)

[θ2

k]αk,

13.3. Show equations (13.43)-(13.45).

Solution: From the text we have

ln q(j+1)

β(β) = Eq(j+1)

13.4. Show that if

p(x)∝1

then the random variable z = ln x follows a uniform distribution.

Solution: We know that

13.5. Derive the lower bound after convergence of the variational Bayesian EM

for the linear regression task which is modeled as in Section 13.3.

Solution: The lower bound after convergence to ˜qθ(θ), ˜qα(α), ˜qβ(β), which

are defined by ˜

µθ,˜

Σθfor the Gaussian ˜qθ, by (˜a, ˜

bi), i = 1,2, . . . , l for

the gamma ˜qα(α) and ˜c, ˜

dfor the gamma ˜qβ(β), will be

k=0 ln αk−bPK−1

k=0 αk.

(d) ln p(β) = −ln Γ(c) + cln d+ (c−1) ln β−dβ.

Using identities from the Appendix of the chapter and the independence

among ˜qθ,˜qα,˜qβ, we get:

(a)

k=0

2ln(2π)−1

k=0

K−1

[ψ(˜a)−ln ˜

bk]−K

2ln(2π)−1

K−1

˜a

bk˜

Σθ+˜

µθ˜

µT

θkk .

k=0

(d)

A4:= Eβ[ln p(β)] = −ln Γ(c) + cln d+ (c−1) Eβ[ln β]−dEβ[β]

=−ln Γ(c) + cln d+ (c−1)(ψ(˜c)−ln ˜

d)−d˜c

In the sequel the respective entropies have to be computed

(a)

θE[θ]

=−1

2ln |˜

Σθ| − K

2ln(2π) + 1

2˜

µT

θ˜

Σ−1

θ˜

µθ

−1

Σ−1

2ln |˜

2ln(2π) + 1

2˜

θ˜

−1

2Trace{I+˜

Σ−1

θ˜

µθ˜

µT

θ},

where Eq. 12.48 from the text has been used.

(b)

k=0

13.6. Consider the Gaussian mixture model

p(x) =

k=1

PkN(x|µk, Q−1

k),

with priors

p(µk) = N(µk|0, β−1I),(10)

and

p(Qk) = W(Qk|ν0, W0).

Given the set of observations X={x1,...,xN},x∈Rl, derive the

respective variational Bayesian EM algorithm, using the mean field ap-

proximation for the involved posterior pdfs. Consider Pk, k = 1,2, . . . , K,

as deterministic parameters and optimize the respective lower bound of

the evidence with respect to the Pk’s.

Solution: Consider

q(Z,µ1:K, Q1:K) = q(Z)q(µ1:K)q(Q1:K),

where the notation has been introduced in Section 13.4. From the theory

we have:

Step 1a:

ln q(j+1)

z(Z) = Eq(j)

µq(j)

[ln p(X,Z,µ1:K,Q1:K)] + constant

=Eq(j)

[ln p(X |Z,µ1:K,Q1:K)]+

n=1

k=1

Hence, we have that

ln q(j+1)

z(Z) =

n=1

k=1

znkEq(j)

µq(j)

Qln P(j)

k+1

2ln |Qk| − 1

2(xn−µk)TQk(xn−µk)

or if we set

πnk=P(j)

kexp 1

[ln |Qk|]

bilities, hence

ρnk=πnk

k=1 πnk

Also note that Eq(j+1)

z[znk] = ρnk, by the binary nature of znk.

2Eq(j)

2µT

−1

2βµT

kµk+ constants,

2µT

n=1

k=1

where

Qk=βI +Eq(j)

[Qk]

k=1

ρnk

and

kEq(j)

ln q(j+1)

Q(Q1:K) = Eq(j+1)

zq(j+1)

µhln p(X |Z,µ1:K, Q1:K)+

ln p(Q1:K)i+ constants

=Eq(j+1)

znk1

Q(Q1:K) =

k=1 1

2ln |Qk|

n=1

ρnkxnxT

n−˜

µkxT

n−xn˜

µT

Q(Q1:K) =

k=1

where

˜νk=ν0+

ρnk

Eq(j+1)

[Qk] = ˜νk˜

Eq(j+1)

[ln |Qk|] =

i=1

ψ˜νk+ 1 −i

2+lln 2 + ln |˜

Wk|.

where ψ(·) is the digamma function defined in the text.

k=1

Pk= 1.(12)

Thus,

∂

ln Pk(

ρnk)−λ

Pk#= 0,

n=1

n=1

13.7. Consider the Gaussian Mixture model of Problem 13.6, with the following

priors imposed on µ,Q,and P:

p(µ, Q) = p(µ|Q)p(Q)

k=1

Nµk|0,(λQk)−1W(Qk|ν0, Wo),

that is, a Gaussian-Wishart product and

p(P) = Dir(P|a)∝

k=1

Pa−1

i.e., a Dirichlet prior. That is, Pis treated as a random vector. Derive

the E algorithmic steps of the variational Bayesian approximation adopt-

ing the mean field approximation for the involved posterior pdfs. We have

adopted the notation µin place of µ1:Kand Qin place of Q1:K, for no-

tational simplicity.

Solution: If Zis the set of latent variables associated with the mixture

indices, we have

q(Z,P,µ, Q) = q(Z)q(P)q(µ, Q).

Step 1a: We have that

ln q(j+1)

z(Z) = Eq(j)

Pq(j)

µ,Q

[ln p(X,Z,P,µ,Q)]

2Eq(j)

µ,Q

2Eq(j)

µ,Q

(14)

n=1

k=1

with

ρnk=πnk

,(16)

z"N

n=1

k=1

k=1 N

n=1

ρnk+a−1!ln Pk+ constants,

from which we obtain

q(j+1)

P(P)∝

Pak−1

n=1

k=1

Eq(j+1) [znk]1

2ln |Qk| − 1

2(xn−µk)TQk(xn−µk)

ln(|λQk|)−1

µT

k(λQk)µk+ν0−l−1

ln |Qk|

k=1

k=1 1

2ln |Qk|

k=1

|{z }

2trace{Qk

n=1

|{z }

−1

2µT

k] Qk

ρnk!µk

+µT

kQk

ρnkxn

Combining all the terms A, B, C respectively together we obtain

A:ν0+ 1 + PN

n=1 ρnk−l−1

2ln |Qk|

2µT

n=1

µk=1

n=1

ρnkxn

C:1

ρnkxnxT

n)Qk}.

2ln |Qk| − 1

2trace{˜

where

˜νk=ν0+

n=1

ρnk+ 1,

and

13.8. If µand Q are distributed according to a Gaussian-Wishart product

p(µ, Q) = N(µ|ˆ

µ,(λQ)−1)W(Q|ν, W ),

then compute the expectation

E[µTQµ].

Solution: We have that

E[µTQµ] = E[trace{QµµT}] =

EQEµ|Q[trace{QµµT}] =

13.9. Derive the Hessian matrix w.r. to θof the cost function

J(θ) =

n=1

[ynln σ(φT(xn)θ) + (1 −yn) ln(1 −σ(φT(xn)θ))]

−1

2θTAθ,

where

σ(z) = 1

1 + exp(−z).

Solution: Define

t=σ(z).

Then we have

∂t

∂z =∂

∂z

1 + exp(−z)=

Now let

Jn(t) = ynln t+ (1 −yn) ln(1 −tn).(18)

We have ∂Jn(t)

n=1

(yn−tn)φ(xn)−Aθ

n=1

(yn−σ(φT(xn)θ))φ(xn)−Aθ.

n=1

13.10. Show that the marginal of a Gaussian pdf with a gamma prior on the

variance, after integrating out the variance, is the student’s-t pdf, given

st(x|µ, λ, ν) = Γ( ν+1

Γ(ν

2)λ

πν 1/21

1 + λ(x−µ)2

νν+1

.(20)

Solution: From the text in the chapter and for the one dimensional (pa-

(21)

p(θ;a, b) = 1

baZαa−1

2exp −(b+1

(22)

Note that the quantity under the integral is a gamma distribution with

parameters α+1

13.11. Derive the pair of recursions (13.62)-(13.63).

Solution: Our starting point is Eq. (13.60).

L(α, β) = −N

2ln(2π)−1

2ln |β−1I+ ΦA−1ΦT|

D:= β−1+ ΦA−1ΦT−1=βI −βΦA+βΦTΦ−1ΦTβ. (27)

Thus,

E:= yβ−1I+ ΦA−1ΦT−1y=βyTy−

βyTΦΣΦTβy

=βyT(y−Φµ),

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 13 Solutions Problems Show Solution The Functional Defined Plugging The Mean Field

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 13 Solutions Problems Show Solution The Functional Defined Plugging The Mean Field

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students