This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
16
we get
βyT(y−Φµ) = β||y−Φµ||2+βµTΦTy−
µTΣ−1−Aµ
=β||y−Φµ||2+βµTΦTy−
nants, we obtain
∂ln |Σ−1|
∂αk
=1
|Σ−1|
∂|Σ−1|
∂αk
=1
∂αkµTAµ=µ2
k.(35)
∂αk
2Σkk +1
2
αk−1
2µ2
17
Equating to zero and setting
γk= 1 −αkΣkk,
13.12. Consider a two class classification task and assume that the feature vectors
in each one of the two classes, ω1, ω2, are distributed according to the
Gaussian pdf. Both classes share the same covariance matrix Σ, and the
mean values are µ1and µ2, respectively. Prove that, given an observed
feature vector, x∈Rl, the posterior probabilities for deciding in favor of
one of the classes is given by the logistic function, i.e.,
P(ω2|x) = 1
1 + exp (−θTx+θ0),
where
θ:= Σ−1(µ2−µ1),
and
θ0=1
2(µ2−µ1)TΣ−1(µ2+µ1) + ln P(ω1)
P(ω2)
Solution: We have that
p(x|ωi) = 1
(2π)l/2|Σ−1|1/2exp −1
2(x−µi)TΣ−1(x−µi), i = 1,2.
13.13. Derive equation (13.74).
Solution: Our starting point is the cost
J:=
N
X
n=1 ynln σ(θTφn) + (1 −yn) ln 1−σ(θTφn)−1
2θTAθ,(40)
where φn:= φ(xn). Taking the gradient with respect to θ, we get
N
X
1
σ(θTφn)φnσ′(θTφ) +
13.14. Show Equation (13.75).
Solution: By the respective definition we have that
σ(t) = 1
1 + exp(−t),
hence
an:= ynln sn+ (1 −yn) ln(1 −sn).
Then after some simple algebra and taking into account well known dif-
ferentiation rules concerning the logarithm, we readily obtain that
∂an
∂θ=ynφ(xn)−snφ(xn).
φT(xN)
20
as well as that
ln (P(y|θ)p(θ|α)) =
N
X
n=1 hynln σ(θTφ(xn)) + (1 −yn) ln 1−σ(θTφ(xn))i−
13.15. Derive the recursion (13.77).
Solution: Taking the logarithm of P(y|α) in (13.75) and keeping only
the terms which depend on α, we have,
ln P(y|α) = ln p(ˆ
θMAP|α)−1
13.16. Show that if fis a convex function f:Rl→R, then it is equal to the
conjugate of its conjugate, i.e., (f∗)∗=f.
Solution: Recall from the theory that
f∗(ξ) = ξTx∗−f(x∗),(45)
(45) and (46) we get
xTξ−f∗(ξ) = f(x∗) + (x−x∗)T∇f(x∗).(47)
13.17. Prove that
f(x) = ln λ
2−λ√x, x ≥0,
is a convex function.
Solution: It is known from the theory of convex functions that if d2f(x)
dx2≥
0, then f(x) is convex ([Boyd 04]). Hence,
2x−1
4x−3
13.18. Derive variational bounds for the logistic regression function
σ(x) = 1
1 + e−x,
one of them in terms of a Gaussian function. For the latter case, use the
transformation, t=√x.
and taking the derivative we obtain
x∗:ξ−e−x
1 + e−x= 0 ⇒
e−x∗=ξ
1−ξ⇒x∗= ln 1−ξ
ξ,0< ξ < 1.
=x
2−ln exp(x/2) + exp(−x/2).(48)
Let us now define
f(x) := −ln exp(√x/2) + exp(−√x/2), x ≥0.(49)
We will first show that f(x) is a convex function. To this end we will
prove that the second derivative is nonnegative, in the respective domain
dx2=1
8xtanh(√x
2)
√x−1
21−tanh2(√x
2),
where we have used the chain differentiation rule as well as the property,
dtanh(y)
we have to show that sinh(2y)
2y≥1.
However, this is always true for y≥0, from the known from the analysis
expansion
sinh(y) = y+y3
Thus we can now write that
f∗(ξ)≥ξx −f(x),
or
f(x)≥ξx −f∗(ξ).
24
Hence,
13.19. Prove equation (13.100).
Solution: We know that
Q(ξ, β;ξ(j),β(j)) = N
2ln β−N
2ln(2π)−β
2E[ky−Φθk2]
K−1
X
2ln(2π)−1
2ln |Ξ| − 1
2
K−1
X
E[θ2
k]
ξk
we have
•
∂
ξk
K−1
X
k=0
ln φ(ξk) = 1
φ(ξk)φ′(ξk)
exp λ2
2ξk
2
∂
∂ξk
ln |Ξ|=−1
2ξ−1
k.
•∂
K−1
X
E[θ2
k]
k]
.
13.20. Derive the mean and variance of G(Tk) for a DP process.
Solution: Recall the mean and variance values of a Dirichlet distribu-
tion from Chapter 2. Also, for the case of DPs, the parameters of the
associated Dirichlet distribution are ak=αG0(Tk). Then we get,
13.21. Show that the posterior DP, after having obtained nobservations from
the set Θ, is given by,
G|θ1,...,θn∼DP α+n, 1
α+n αG0+
n
X
i=1
δθi(θ)!!
Solution: By definition we have that
α′G′
0(Tk) = αG0(Tk) + nk.
Moreover, the above is true for all finite (measurable) partitions, e.g., for
disjoint Tks. Adding over kand taking into account that probabilities add
α+n αG0+
i=1
13.22. The stick breaking construction of a DP is built around the following rule:
P1=β1∼Beta(β|1, α) and,
βi∼Beta(β|1, α),
Pi=βi
i−1
Y
j=1
(1 −βj), i ≥2.
Show that if the number of steps is finite, i.e., we assume that Pi= 0, i >
T, for some T, then βT= 1.
Solution: If we stop at a step T, then we should have,
T
X
i=1
Pi= 1
j=1
The last equality can be shown recursively. First, it holds true for T= 2,
because P2= 1 −P1= 1 −β1. In the sequel, assume that it is true for
some T > 2, i.e.,
T−1
X
i−1
Y
T−1
Y
j=1
=
T−1
Y
j=1
(1 −βj)−βT
T−1
Y
j=1
(1 −βj)
T
Y
j=1
j=1
13.23. Show that in CRP, the cluster assignments are exchangeable and do not
depend on the sequence that customers arrive, up to a permutation of the
labels of the tables.
Solution: Let us assume that ncustomers have arrived and Kntables
(clusters) have been formed. Associate with each customer a label, yi,
(50). For the denominator, recall that each customer arrives only once.
Hence, each one of the terms n(k, j), j= 1,2, . . . , nk, and k= 1,2, . . . , Kn,
has a unique and distinct, from all the others, value in the set {1,2, . . . , n}.
13.24. Show that in an IBP, the probabilities for P(Z) and the equivalence
classes, P([Z]), are given by the formulae,
P(Z) =
K
Y
k=1
α
K
Γmk+α
KΓ (N−mk+ 1)
ΓN+ 1 + α
K.
and
P([Z]) = K!
Q2N−1
h=0 Kh!
K
Y
k=1
α
K
Γmk+α
KΓ (N−mk+ 1)
ΓN+ 1 + α
K,
respectively. Note that Kh, h = 1,2,...,2N−1, is the number of times
the row vector associated with the hth nonzero binary number appears in
Z.
Solution: From the text and the definition of P(Z), taking into account
that the probabilities are beta distributed, we get
K
Y
k(1 −Pk)N−mkP
α
K−1
k
k=1
Bα
K,1Z1
0
K−1
The above integrand is the normalizing constant of a Beta Pk|mk+α
P(Z) =
k=1
Γα
KΓ(1)
ΓN+ 1 + α
K
=
K
Y
α
Γmk+α
KΓ (N−mk+ 1)
Z∈[Z]
permutations of Kobjects, grouped in K0,K1,. . . ,K2N−1groups is known
from combinatorics to be equal to
K
K0, K1, . . . , K2N−1=K!
Q2N−1
following arguments. Out of the Krows, K0are the zero ones. Then the
total number of permutations of these zero rows are,
K
K0=K!
K0!(K−K0)!.
Now, for each one of the above permuted matrices, we make all possi-
13.25. Show that the discarded pieces, πk, in the stick-breaking construction of
an IBP are equal to the sequence of probabilities produced in a DP stick-
breaking construction.
Solution: The sequence of the discarded segments is equal to
π1= 1 −β1,
j=1
j=1
j=1
Trusted by Thousands of
Students
Here are what students say about us.
Resources
Company
Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.