This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
1
Solutions To Problems of Chapter 13
13.1. Show Eq. (13.5).
Solution: The functional F(q) is defined as
F(q) = Zq(Xl,θ) ln p(X,Xl,θ)
13.2. Show equation (13.38).
Solution: From Eq. (13.37) in the text we have
ln q(j+1)
α(α) = Eq(j+1)
K−1
X
ln αk−1
K−1
X
αkθ2
k#(4)
K−1
X
K−1
X
K−1
X
q(j+1)
α(α)∝
Y
k=0
α(a−1+ 1
2)
kexp −b+1
2Eq(j+1)
θ
[θ2
k]αk,
13.3. Show equations (13.43)-(13.45).
Solution: From the text we have
ln q(j+1)
β(β) = Eq(j+1)
13.4. Show that if
p(x)∝1
x,
then the random variable z = ln x follows a uniform distribution.
Solution: We know that
13.5. Derive the lower bound after convergence of the variational Bayesian EM
for the linear regression task which is modeled as in Section 13.3.
Solution: The lower bound after convergence to ˜qθ(θ), ˜qα(α), ˜qβ(β), which
are defined by ˜
µθ,˜
Σθfor the Gaussian ˜qθ, by (˜a, ˜
bi), i = 1,2, . . . , l for
the gamma ˜qα(α) and ˜c, ˜
dfor the gamma ˜qβ(β), will be
(c) ln p(α) = −Kln Γ(a) + Ka ln b+ (a−1) PK−1
k=0 ln αk−bPK−1
k=0 αk.
(d) ln p(β) = −ln Γ(c) + cln d+ (c−1) ln β−dβ.
Using identities from the Appendix of the chapter and the independence
among ˜qθ,˜qα,˜qβ, we get:
(a)
2
k=0
2ln(2π)−1
2
k=0
=1
2
K−1
X
[ψ(˜a)−ln ˜
bk]−K
2ln(2π)−1
2
K−1
X
˜a
˜
bk˜
Σθ+˜
µθ˜
µT
θkk .
k=0
˜
bk
4
(d)
A4:= Eβ[ln p(β)] = −ln Γ(c) + cln d+ (c−1) Eβ[ln β]−dEβ[β]
=−ln Γ(c) + cln d+ (c−1)(ψ(˜c)−ln ˜
d)−d˜c
˜
d.
In the sequel the respective entropies have to be computed
(a)
θE[θ]
=−1
2ln |˜
Σθ| − K
2ln(2π) + 1
2˜
µT
θ˜
Σ−1
θ˜
µθ
−1
Σ−1
2ln |˜
2ln(2π) + 1
2˜
θ˜
−1
2Trace{I+˜
Σ−1
θ˜
µθ˜
µT
θ},
where Eq. 12.48 from the text has been used.
(b)
k=0
k=0
13.6. Consider the Gaussian mixture model
p(x) =
K
X
k=1
PkN(x|µk, Q−1
k),
with priors
p(µk) = N(µk|0, β−1I),(10)
and
p(Qk) = W(Qk|ν0, W0).
Given the set of observations X={x1,...,xN},x∈Rl, derive the
respective variational Bayesian EM algorithm, using the mean field ap-
proximation for the involved posterior pdfs. Consider Pk, k = 1,2, . . . , K,
as deterministic parameters and optimize the respective lower bound of
the evidence with respect to the Pk’s.
Solution: Consider
q(Z,µ1:K, Q1:K) = q(Z)q(µ1:K)q(Q1:K),
where the notation has been introduced in Section 13.4. From the theory
we have:
Step 1a:
ln q(j+1)
z(Z) = Eq(j)
µq(j)
Q
[ln p(X,Z,µ1:K,Q1:K)] + constant
=Eq(j)
[ln p(X |Z,µ1:K,Q1:K)]+
n=1
k=1
6
Hence, we have that
ln q(j+1)
z(Z) =
N
X
n=1
K
X
k=1
znkEq(j)
µq(j)
Qln P(j)
k+1
2ln |Qk| − 1
2(xn−µk)TQk(xn−µk)
or if we set
πnk=P(j)
kexp 1
[ln |Qk|]
bilities, hence
ρnk=πnk
PK
k=1 πnk
.
Also note that Eq(j+1)
z[znk] = ρnk, by the binary nature of znk.
2Eq(j)
Q
Q
2µT
Q
−1
2βµT
kµk+ constants,
or
K
X
2µT
Q
N
X
Q
n=1
k=1
where
˜
Qk=βI +Eq(j)
Q
[Qk]
K
X
k=1
ρnk
and
kEq(j)
Q
K
X
ln q(j+1)
Q(Q1:K) = Eq(j+1)
zq(j+1)
µhln p(X |Z,µ1:K, Q1:K)+
ln p(Q1:K)i+ constants
=Eq(j+1)
X
X
znk1
Q(Q1:K) =
k=1 1
2ln |Qk|
n=1
1
X
ρnkxnxT
n−˜
µkxT
n−xn˜
µT
k
Q(Q1:K) =
k=1
8
where
˜νk=ν0+
N
X
ρnk
N
X
Eq(j+1)
Q
[Qk] = ˜νk˜
Wk
Eq(j+1)
Q
[ln |Qk|] =
l
X
i=1
ψ˜νk+ 1 −i
2+lln 2 + ln |˜
Wk|.
where ψ(·) is the digamma function defined in the text.
X
k=1
Pk= 1.(12)
Thus,
∂
X
ln Pk(
N
X
ρnk)−λ
K
X
Pk#= 0,
λ
n=1
N
n=1
13.7. Consider the Gaussian Mixture model of Problem 13.6, with the following
priors imposed on µ,Q,and P:
p(µ, Q) = p(µ|Q)p(Q)
=
K
Y
k=1
Nµk|0,(λQk)−1W(Qk|ν0, Wo),
that is, a Gaussian-Wishart product and
p(P) = Dir(P|a)∝
K
Y
k=1
Pa−1
k,
i.e., a Dirichlet prior. That is, Pis treated as a random vector. Derive
the E algorithmic steps of the variational Bayesian approximation adopt-
ing the mean field approximation for the involved posterior pdfs. We have
adopted the notation µin place of µ1:Kand Qin place of Q1:K, for no-
tational simplicity.
Solution: If Zis the set of latent variables associated with the mixture
indices, we have
q(Z,P,µ, Q) = q(Z)q(P)q(µ, Q).
Step 1a: We have that
ln q(j+1)
z(Z) = Eq(j)
Pq(j)
µ,Q
[ln p(X,Z,P,µ,Q)]
P
2Eq(j)
µ,Q
2Eq(j)
µ,Q
(14)
n=1
k=1
with
ρnk=πnk
PK
,(16)
z"N
n=1
k=1
k=1
=
K
X
k=1 N
X
n=1
ρnk+a−1!ln Pk+ constants,
from which we obtain
q(j+1)
P(P)∝
K
Y
Pak−1
k,
=
N
X
n=1
K
X
k=1
Eq(j+1) [znk]1
2ln |Qk| − 1
2(xn−µk)TQk(xn−µk)
+1
K
X
ln(|λQk|)−1
K
X
µT
k(λQk)µk+ν0−l−1
K
X
ln |Qk|
2
K
X
k=1
k=1 1
2ln |Qk|
k=1
|{z }
A
2trace{Qk
n=1
|{z }
C
−1
2µT
k] Qk
N
X
ρnk!µk
+µT
kQk
N
X
ρnkxn
C
Combining all the terms A, B, C respectively together we obtain
A:ν0+ 1 + PN
n=1 ρnk−l−1
2ln |Qk|
2µT
N
X
N
X
˜
n=1
ˆ
µk=1
˜
λ
N
X
n=1
ρnkxn
C:1
0+
N
X
ρnkxnxT
n)Qk}.
2ln |Qk| − 1
2trace{˜
12
where
˜νk=ν0+
N
X
n=1
ρnk+ 1,
and
N
X
13.8. If µand Q are distributed according to a Gaussian-Wishart product
p(µ, Q) = N(µ|ˆ
µ,(λQ)−1)W(Q|ν, W ),
then compute the expectation
E[µTQµ].
Solution: We have that
E[µTQµ] = E[trace{QµµT}] =
EQEµ|Q[trace{QµµT}] =
13.9. Derive the Hessian matrix w.r. to θof the cost function
J(θ) =
N
X
n=1
[ynln σ(φT(xn)θ) + (1 −yn) ln(1 −σ(φT(xn)θ))]
−1
2θTAθ,
13
where
σ(z) = 1
1 + exp(−z).
Solution: Define
t=σ(z).
Then we have
∂t
∂z =∂
∂z
1
1 + exp(−z)=
Now let
Jn(t) = ynln t+ (1 −yn) ln(1 −tn).(18)
We have ∂Jn(t)
1
n=1
=
N
X
n=1
(yn−tn)φ(xn)−Aθ
=
N
X
n=1
(yn−σ(φT(xn)θ))φ(xn)−Aθ.
n=1
13.10. Show that the marginal of a Gaussian pdf with a gamma prior on the
variance, after integrating out the variance, is the student’s-t pdf, given
by
st(x|µ, λ, ν) = Γ( ν+1
2)
Γ(ν
2)λ
πν 1/21
1 + λ(x−µ)2
νν+1
2
.(20)
Solution: From the text in the chapter and for the one dimensional (pa-
(21)
or
p(θ;a, b) = 1
2
baZαa−1
2exp −(b+1
(22)
Note that the quantity under the integral is a gamma distribution with
parameters α+1
13.11. Derive the pair of recursions (13.62)-(13.63).
15
Solution: Our starting point is Eq. (13.60).
L(α, β) = −N
2ln(2π)−1
2ln |β−1I+ ΦA−1ΦT|
D:= β−1+ ΦA−1ΦT−1=βI −βΦA+βΦTΦ−1ΦTβ. (27)
Thus,
E:= yβ−1I+ ΦA−1ΦT−1y=βyTy−
βyTΦΣΦTβy
=βyT(y−Φµ),
Trusted by Thousands of
Students
Here are what students say about us.
Resources
Company
Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.