Machine Learning Chapter 3 Solutions Problems Prove The Least Squares Optimal Solution For The Linear

subject Type Homework Help
subject Pages 9
subject Words 3004
subject Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.
Unlock all pages and 1 million more documents.
Get Access
page-pf1
1
Solutions To Problems of Chapter 3
3.1. Prove the least squares optimal solution for the linear regression case given
in Eq. (3.13).
Solution: The cost function is
J(θ) =
N
X
n=1
(ynθTxn)2
N
X
n=1
n=1
3.2. Let ˆ
θi,i= 1,2, . . . , m, be unbiased estimators of a parameter vector θ, i.e.,
E[ˆ
θi] = θ,i= 1, . . . , m. Moreover, assume that the respective estimators
are uncorrelated to each other and that all have the same (total) variance,
σ2=E[(θiθ)T(θiθ)]. Show that by averaging the estimates, e.g.,
ˆ
θ=1
m
m
X
i=1
ˆ
θi,
the new estimator has total variance σ2
c:= E[(ˆ
θθ)T(ˆ
θθ)] = 1
mσ2.
Solution: First, it is easily checked out that the new estimator is also
unbiased. By the definition of the total variance (which is the trace of the
respective covariance matrix), we have
σ2
c=E[(ˆ
θθ)T(ˆ
θθ)]
m
X
1
m
X
page-pf2
2
3.3. Let a random variable x being described by a uniform pdf in the interval
[0,1
θ], θ > 0. Assume a function1g, which defines an estimator ˆ
θ:= g(x)
of θ. Then, for such an estimator to be unbiased, the following must hold:
Z1
θ
0
g(x)dx = 1.
However, such a function gdoes not exist.
Solution: Necessarily, the pdf of x must be
p(x) = (θ, x [0,1
θ],
0,otherwise.
3.4. A family {p(D;θ) : θ∈ A} is called complete if, for any vector function
h(D) such that ED[h(D)] = 0,θ, then h=0.
Show that if {p(D;θ) : θ∈ A} is complete, and there exists an MVU
estimator, then this estimator is unique.
3.5. Let ˆ
θube an unbiased estimator, i.e., E[ˆ
θu] = θ0. Define a biased one by
ˆ
θb= (1 + α)ˆ
θu. Show that the range of αwhere the MSE of ˆ
θbis smaller
than that of ˆ
θuis
2<2MSE(ˆ
θu)
MSE(ˆ
θu) + θ2
0
< α < 0.
1To avoid any confusion, let gbe Lebesgue integrable on intervals of R.
page-pf3
3
Solution: The MSE for the new estimator is
Eh(ˆ
θbθ0)2i=E(1 + α)ˆ
θuθ02
or, after using elementary algebra,
α"α+2 var(ˆ
θu)
θ2
0+ var(ˆ
θu)#<0,
0+ var(ˆ
3.6. Show that for the setting of the Problem 3.5, the optimal value of αis
equal to
α=1
1 + θ2
0
var(ˆ
θu)
,
where, of course, the variance of the unbiased estimator is equal to the
corresponding MSE.
Solution: The minimum value of
MSE(ˆ
θb) = Eh(ˆ
θbθ0)2i= (1 + α)2MSE(ˆ
θu) + α2θ2
0,
θ2
0+ var(ˆ
θu)=1
1 + θ2
0
var(ˆ
θu)
3.7. Show that the regularity condition for the Cram´er-Rao bound holds true
if the order of integration and differentiation can be interchanged.
page-pf4
4
Solution: By the definition of the expectation we have
Eln p(X;θ)
3.8. Derive the Cram´er-Rao bound for the LS estimator, when the training
data result from the linear model
yn=θxn+ηn, n = 1,2,...,
where xnand ηnare observations of i.i.d random variables, drawn from a
zero mean random process, with variance σ2
x, and a Gaussian white noise
process, with zero mean and variance σ2
η, respectively. Assume, also, that
x and ηare independent. Then, show that the LS estimator achieves the
CR bound only asymptotically.
Solution: First, notice that in this case X={(xn, yn)}N
n=1. That is, both
ynas well as xnchange as we change the training set. Define here the
quantities xN:= [x1, x2, . . . , xN]T,yN:= [y1, y2, . . . , yN]T, and recall,
also, the elementary relations
2ln 2πσ2
2σ2
η
n=1
page-pf5
5
Thus, by (2),
ln p(X;θ)
θ =1
σ2
η
N
X
n=1
(ynθxn)xn=1
σ2
η
N
X
n=1
ηnxn.(3)
θ =Nσ2
σ2
η
However, looking at (3), it becomes apparent that such a factorization is
not possible.
Let us now rewrite (3) as
ln p(X;θ)
θ =Nσ2
x
σ2
η1
Nσ2
x
N
X
n=1 ynxnθx2
n
x
N
X
N
X
page-pf6
6
a form that allows for an unbiased estimator to attain the Cram´er-Rao
bound, and the corresponding estimate is given by
N
X
Let us do it for the sake of an exercise. First of all, let us examine if the
LS estimator for this more general case is unbiased. We have
E[ˆ
θ] = E"1
Pnx2
nX
n
xnyn#=E"1
Pnx2
nX
n
xn(θxn+ηn)#
In other words, the LS estimator is unbiased even for this case, where both
output as well as input samples change in the training set and this is true
independent of the number of measurements. The corresponding variance
is given by
Eh(ˆ
θθ)2i=E
1
(Pnx2
n)2 X
n
xnηn!2
1
3.9. Let us consider the regression model
yn=θTxn+ηn, n = 1,2, . . . , N,
page-pf7
7
where the noise vector η:= [η1, . . . , ηN]Tcomprises samples from zero
mean Gaussian random variable, with covariance matrix Ση. If X:=
[x1,...,xN]Tstands for the input matrix, and y= [y1, . . . , yN]T, the
vector of the observations, then show that the corresponding estimator,
ˆ
θ=XTΣ1
ηX1XTΣ1
ηy,
is an efficient one.
Notice, here, that the previous estimator coincides with the Maximum
Likelihood (ML) one. Moreover, bear in mind that in the case where the
noise process is considered to be white, i.e., Ση=σ2IN, then the ML
estimate becomes equal to the LS one.
Solution: In the case where the parameter θbecomes a k-dimensional
vector, the Cram´er-Rao bound takes a more general form than the one
we have met previously, i.e., the case where the parameter θis a scalar.
For any unbiased estimator g(X) of the unknown parameter vector θ, the
Cram´er-Rao bound becomes as follows:
E(g(X)θ)(g(X)θ)TI1(θ),θ,
where I(θ) is the Fisher information matrix defined as
For the present model, we have that X=yand
p(y;θ) = 1
(2π)N
2(det Ση)1
2
exp 1
2(yXθ)TΣ1
η(yXθ).
θ2=XTΣ1
page-pf8
8
3.10. Assume a set of i.i.d X:= {x1, x2, . . . , xN}Gaussian random variables,
with mean µand variance σ2. Define also the quantities
Sµ:= 1
N
N
X
n=1
xn, Sσ2:= 1
N
N
X
n=1
(xnSµ)2,
¯
Sσ2:= 1
N
N
X
n=1
(xnµ)2.
Show that if µis considered to be known, a sufficient statistic for σ2is ¯
Sσ2.
Moreover, in the case where both (µ, σ2) are unknown, then a sufficient
statistic is the pair (Sµ, Sσ2).
Solution: The joint pdf of Xis obviously
p(X) = 1
(2πσ2)N
2
exp 1
2σ2X
n
(xnµ)2!.
page-pf9
9
3.11. Show that solving the task
minimize L(θ, λ) =
N
X
n=1 ynθ0
l
X
i=1
θixni!2
+λ
l
X
i=1
|θi|2,
is equivalent with minimizing
minimize L(θ, λ) =
N
X
n=1 (yn¯y)
l
X
i=1
θi(xni ¯xi)!2
+λ
l
X
i=1
|θi|2,
and the estimate of θ0is given by
ˆ
θ0= ¯y
l
X
i=1
ˆ
θi¯xi.
Solution: We have that
L(θ0, θ1:l) =
N
X
n=1 ynθ0
l
X
i=1
θixni2+
l
X
i=1
θ2
i.
page-pfa
10
3.12. This problem refers to Example 3.4, where a linear regression task with a
real valued unknown parameter θis considered. Show that MSE(ˆ
θb(λ)) <
MSE(ˆ
θMVU), i.e., the ridge regression estimate shows a lower MSE per-
formance than the one for the MVU estimate, if
λ(0,), θ2σ2
η
N,
λ0,2σ2
η
θ2σ2
η
N, θ2>σ2
η
N.
Moreover, the minimum MSE performance for the ridge regression esti-
mate is attained at λ=σ2
η2.
Solution: Theory suggests that our estimate ˆ
θbis the solution of the task
of minimizing the following loss function with respect to θR:
L(θ, λ) =
N
X
n=1
(ynθ)2+λθ2, λ 0.
The minimizer ˆ
θbwill be obtained if we set the gradient dL(θ, λ)/dθ equal
to zero, or equivalently,
1
N
X
Elementary calculus helps us to express MSE ˆ
θb(λ)as
MSE ˆ
θb(λ)=Eˆ
θb(λ)E[ˆ
θb(λ)]2+E[ˆ
θb(λ)] θ02
N2MSE ˆ
θMVU+λ2θ2
0
page-pfb
11
that
θ2
0
θ2
0
θ2
0σ2
η
N
3.13. Consider, once more, the same regression model as that of Problem 3.9,
but with Ση=IN. Compute the MSE of the predictions E[(yˆy)2], where
y is the true response and ˆy is the predicted value, given a test point x
and using the LS estimator,
page-pfc
12
Moreover,
MSE(ˆ
θ(x)) = E(y ˆy)2=Eh(θTx+ηˆ
θTx)2i
Indeed, we have that
ˆ
θ= (XTX)1XTy,
Hence,
Σˆ
θ=EDh(XTX)1XTηηTX(XTX)1i
where Σis the covariance matrix of the (zero mean) input vectors. Then,
we have
MSE(ˆ
θ)σ2
η+σ2
η
NExxTΣ1x
η
page-pfd
13
3.14. Assume that the model that generates the data is
yn=Asin 2π
Nkn +φ+ηn,(14)
where A > 0, and k∈ {1,2, . . . , N 1}. Assume that ηnare samples from
a Gaussian white noise, of variance σ2
η. Show that there is no unbiased
estimator for the phase, φ, based on Nmeasurement points, yn, n =
0,1,...N 1, that attains the Cram´er-Rao bound.
Solution: The joint pdf of the measurements y:= [y0, y1, . . . , yN1]Tis
given by
N1
X

Trusted by Thousands of
Students

Here are what students say about us.

Copyright ©2022 All rights reserved. | CoursePaper is not sponsored or endorsed by any college or university.