Machine Learning Chapter 13 Get Taking Into Account That Proves The Claim Combining The Formulae For

subject Type Homework Help
subject Pages 9
subject Words 2776
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
16
we get
βyT(yΦµ) = β||yΦµ||2+βµTΦTy
µTΣ1Aµ
=β||yΦµ||2+βµTΦTy
nants, we obtain
ln |Σ1|
αk
=1
|Σ1|
|Σ1|
αk
=1
αkµTAµ=µ2
k.(35)
αk
2Σkk +1
2
αk1
2µ2
page-pf2
17
Equating to zero and setting
γk= 1 αkΣkk,
13.12. Consider a two class classification task and assume that the feature vectors
in each one of the two classes, ω1, ω2, are distributed according to the
Gaussian pdf. Both classes share the same covariance matrix Σ, and the
mean values are µ1and µ2, respectively. Prove that, given an observed
feature vector, xRl, the posterior probabilities for deciding in favor of
one of the classes is given by the logistic function, i.e.,
P(ω2|x) = 1
1 + exp (θTx+θ0),
where
θ:= Σ1(µ2µ1),
and
θ0=1
2(µ2µ1)TΣ1(µ2+µ1) + ln P(ω1)
P(ω2)
Solution: We have that
p(x|ωi) = 1
(2π)l/2|Σ1|1/2exp 1
2(xµi)TΣ1(xµi), i = 1,2.
page-pf3
13.13. Derive equation (13.74).
Solution: Our starting point is the cost
J:=
N
X
n=1 ynln σ(θTφn) + (1 yn) ln 1σ(θTφn)1
2θTAθ,(40)
where φn:= φ(xn). Taking the gradient with respect to θ, we get
N
X
1
σ(θTφn)φnσ(θTφ) +
page-pf4
13.14. Show Equation (13.75).
Solution: By the respective definition we have that
σ(t) = 1
1 + exp(t),
hence
an:= ynln sn+ (1 yn) ln(1 sn).
Then after some simple algebra and taking into account well known dif-
ferentiation rules concerning the logarithm, we readily obtain that
an
θ=ynφ(xn)snφ(xn).
φT(xN)
page-pf5
20
as well as that
ln (P(y|θ)p(θ|α)) =
N
X
n=1 hynln σ(θTφ(xn)) + (1 yn) ln 1σ(θTφ(xn))i
13.15. Derive the recursion (13.77).
Solution: Taking the logarithm of P(y|α) in (13.75) and keeping only
the terms which depend on α, we have,
ln P(y|α) = ln p(ˆ
θMAP|α)1
13.16. Show that if fis a convex function f:RlR, then it is equal to the
conjugate of its conjugate, i.e., (f)=f.
Solution: Recall from the theory that
f(ξ) = ξTxf(x),(45)
page-pf6
(45) and (46) we get
xTξf(ξ) = f(x) + (xx)Tf(x).(47)
13.17. Prove that
f(x) = ln λ
2λx, x 0,
is a convex function.
Solution: It is known from the theory of convex functions that if d2f(x)
dx2
0, then f(x) is convex ([Boyd 04]). Hence,
2x1
4x3
13.18. Derive variational bounds for the logistic regression function
σ(x) = 1
1 + ex,
one of them in terms of a Gaussian function. For the latter case, use the
transformation, t=x.
page-pf7
and taking the derivative we obtain
x:ξex
1 + ex= 0
ex=ξ
1ξx= ln 1ξ
ξ,0< ξ < 1.
=x
2ln exp(x/2) + exp(x/2).(48)
Let us now define
f(x) := ln exp(x/2) + exp(x/2), x 0.(49)
We will first show that f(x) is a convex function. To this end we will
prove that the second derivative is nonnegative, in the respective domain
page-pf8
dx2=1
8xtanh(x
2)
x1
21tanh2(x
2),
where we have used the chain differentiation rule as well as the property,
dtanh(y)
we have to show that sinh(2y)
2y1.
However, this is always true for y0, from the known from the analysis
expansion
sinh(y) = y+y3
Thus we can now write that
f(ξ)ξx f(x),
or
f(x)ξx f(ξ).
page-pf9
24
Hence,
13.19. Prove equation (13.100).
Solution: We know that
Q(ξ, β;ξ(j)(j)) = N
2ln βN
2ln(2π)β
2E[kyΦθk2]
K1
X
2ln(2π)1
2ln |Ξ| − 1
2
K1
X
E[θ2
k]
ξk
we have
ξk
K1
X
k=0
ln φ(ξk) = 1
φ(ξk)φ(ξk)
exp λ2
2ξk
2
ξk
ln |Ξ|=1
2ξ1
k.
K1
X
E[θ2
k]
k]
.
page-pfa
13.20. Derive the mean and variance of G(Tk) for a DP process.
Solution: Recall the mean and variance values of a Dirichlet distribu-
tion from Chapter 2. Also, for the case of DPs, the parameters of the
associated Dirichlet distribution are ak=αG0(Tk). Then we get,
13.21. Show that the posterior DP, after having obtained nobservations from
the set Θ, is given by,
G|θ1,...,θnDP α+n, 1
α+n αG0+
n
X
i=1
δθi(θ)!!
Solution: By definition we have that
αG
0(Tk) = αG0(Tk) + nk.
Moreover, the above is true for all finite (measurable) partitions, e.g., for
disjoint Tks. Adding over kand taking into account that probabilities add
α+n αG0+
i=1
page-pfb
13.22. The stick breaking construction of a DP is built around the following rule:
P1=β1Beta(β|1, α) and,
βiBeta(β|1, α),
Pi=βi
i1
Y
j=1
(1 βj), i 2.
Show that if the number of steps is finite, i.e., we assume that Pi= 0, i >
T, for some T, then βT= 1.
Solution: If we stop at a step T, then we should have,
T
X
i=1
Pi= 1
j=1
The last equality can be shown recursively. First, it holds true for T= 2,
because P2= 1 P1= 1 β1. In the sequel, assume that it is true for
some T > 2, i.e.,
T1
X
i1
Y
T1
Y
j=1
=
T1
Y
j=1
(1 βj)βT
T1
Y
j=1
(1 βj)
T
Y
page-pfc
j=1
j=1
13.23. Show that in CRP, the cluster assignments are exchangeable and do not
depend on the sequence that customers arrive, up to a permutation of the
labels of the tables.
Solution: Let us assume that ncustomers have arrived and Kntables
(clusters) have been formed. Associate with each customer a label, yi,
(50). For the denominator, recall that each customer arrives only once.
Hence, each one of the terms n(k, j), j= 1,2, . . . , nk, and k= 1,2, . . . , Kn,
has a unique and distinct, from all the others, value in the set {1,2, . . . , n}.
page-pfd
13.24. Show that in an IBP, the probabilities for P(Z) and the equivalence
classes, P([Z]), are given by the formulae,
P(Z) =
K
Y
k=1
α
K
Γmk+α
KΓ (Nmk+ 1)
ΓN+ 1 + α
K.
and
P([Z]) = K!
Q2N1
h=0 Kh!
K
Y
k=1
α
K
Γmk+α
KΓ (Nmk+ 1)
ΓN+ 1 + α
K,
respectively. Note that Kh, h = 1,2,...,2N1, is the number of times
the row vector associated with the hth nonzero binary number appears in
Z.
Solution: From the text and the definition of P(Z), taking into account
that the probabilities are beta distributed, we get
K
Y
k(1 Pk)NmkP
α
K1
k
k=1
Bα
K,1Z1
0
K1
The above integrand is the normalizing constant of a Beta Pk|mk+α
P(Z) =
k=1
Γα
KΓ(1)
ΓN+ 1 + α
K
=
K
Y
α
Γmk+α
KΓ (Nmk+ 1)
Z[Z]
page-pfe
permutations of Kobjects, grouped in K0,K1,. . . ,K2N1groups is known
from combinatorics to be equal to
K
K0, K1, . . . , K2N1=K!
Q2N1
following arguments. Out of the Krows, K0are the zero ones. Then the
total number of permutations of these zero rows are,
K
K0=K!
K0!(KK0)!.
Now, for each one of the above permuted matrices, we make all possi-
13.25. Show that the discarded pieces, πk, in the stick-breaking construction of
an IBP are equal to the sequence of probabilities produced in a DP stick-
breaking construction.
Solution: The sequence of the discarded segments is equal to
π1= 1 β1,
j=1
j=1
j=1
page-pff

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.