Machine Learning Chapter 3 Cos And Also The Fact Cos Since Where Hence Stands For Unbiased

Type Homework Help

Pages 9

Words 2417

Textbook Machine Learning: A Bayesian and Optimization Perspective 2nd Edition

Authors Sergios Theodoridis

Unlock document.

This document is partially blurred.

Unlock all pages and 1 million more documents.

Get Access

cos(2α))/2, and also the fact

N−1

n=0

cos 4π

Nkn + 2φ=1

N−1

n=0 ej(4π

Nkn+2φ)+e−j(4π

Nkn+2φ)

N−1

3.15. Show that if (y,x) are two jointly distributed random vectors, with values

in Rk×Rl, then the MSE optimal estimator of ygiven the value x=x

is the regression of yconditioned on x, i.e., E[y|x].

Solution: The proof follows a similar line as the scalar case. Let

f(x) := [f1(x), . . . , fk(x)]T

be the vector estimator. Then the MSE optimal one should minimize the

sum of square errors per component, i.e.,

3.16. Assume that x,yare jointly Gaussian random vectors, with covariance

matrix

Σ:= Ex−µx

y−µy(x−µx)T,(y−µy)T=ΣxΣxy

Σyx Σy.

Assuming also that the matrices Σxand ¯

Σ:= Σy−ΣyxΣ−1

xΣxy are

non-singular, then show that the optimal MSE estimator E[y|x] takes the

following form,

E[y|x] = E[y] + ΣyxΣ−1

x(x−E[x]).

Notice that E[y|x] is an affine function of x. In other words, for the case

where xand yare jointly Gaussian, the optimal estimator of y, in the

MSE sense, which is in general a non-linear function, becomes an affine

function of x.

In the special case where x,y are scalar random variables, then

E[y|x] = µy+ασy

σx

(x−µx),

where αstands for the correlation coefficient, defined as

α:= E[(x −µx)(y −µy)]

σxσy

with |α| ≤ 1. Notice, also, that the previous assumption on the non-

singularity of Σxand ¯

Σtranslates, in this special case, to σx6= 0 6=σy,

and |α|<1.

Solution: First, it is easy to verify that Σyx =ΣT

xy. Moreover, since

Σxand ¯

Σare assumed to be non-singular, then it can be verified, e.g.,

[Magn 99], that the determinant det Σ= det Σxdet ¯

Σ, and that

x+Σ−1

xΣxy ¯

Σ−1ΣyxΣ−1

x−Σ−1

xΣxy ¯

Σ−1

and ¯

y:= y−µy. Then, the joint pdf of xand ybecomes

p(x,y) = 1

(2π)l(det Σ)1

exp −1

2¯

xT,¯

yTΣ−1¯

y

2¯

As a result, the marginal pdf p(x) becomes

p(x) = Zp(x,y)dy=1

(2π)l/2(det Σx)1

exp −1

2¯

xTΣ−1

x¯

x

(2π)l/2(det Σx)1

2¯

Using the previous relations, we can easily see that

p(y|x) = p(x,y)

p(x)

y−ΣyxΣ−1

x¯

xT¯

Σ−1¯

y−ΣyxΣ−1

x¯

x

3.17. Assume a number lof jointly Gaussian random variables {x1,x2,...,xl},

and a non-singular matrix A∈Rl×l. If x:= [x1,x2,...,xl]T, then show

that the components of the vector y, obtained by y=Ax, are also jointly

Gaussian random variables.

A direct consequence of this result is that any linear combination of jointly

Gaussian variables is also Gaussian.

Solution: The Jacobian matrix of a linear transform y=Axis easily

shown to be

J:= J(y;x) = A.

Clearly, det Σy= (det A)2det Σx. Then, by the theorem of transformation

for random variables, e.g., [Papo 02], we have the following:

(2π)l/2(det Σy)1

2yTΣ−1

which establishes the first claim.

For the second claim, assume a non-zero vector a∈Rl, and define the lin-

ear combination of {x1,x2,...,xl}as y = aTx. Elementary linear algebra

3.18. Let x∈Rlbe a vector of jointly Gaussian random variables, of covariance

matrix Σx. Consider the general linear regression model

y= Θx+η,

where Θ ∈Rk×lis a parameter matrix and ηis the vector of noise samples,

which are considered to be Gaussian, with zero mean, and with covariance

matrix Ση, independent of x. Then show that yand xare jointly Gaus-

sian, with covariance matrix given by

|{z }

However, since xand ηare both Gaussian vector variables, and mutually

independent, then they are also jointly Gaussian. Notice also that the

3.19. Show that a linear combination of Gaussian independent variables is also

Gaussian.

Solution: This is a direct consequence of Problem 3.17, since independent

3.20. Show that if a sufficient statistic T(X) for a parameter estimation problem

exists, then T(X) suffices to express the respective ML estimate.

Solution: This is direct consequence of the Fisher-Neyman factorization

3.21. Show that if an efficient estimator exists then it is also optimal in the ML

sense.

Solution: Assume the existence of an efficient estimator, i.e., a function g

which achieves the Cram´er-Rao bound. A necessary and sufficient condi-

3.22. Let the observations resulting from an experiment be xn,n= 1,2, . . . , N .

Assume that they are independent and that they originate from a Gaussian

PDF N(µ, σ2). Both, the mean and the variance, are unknown. Prove

that the ML estimates of these quantities are given by

ˆµML =1

n=1

xn,ˆσ2

ML =1

n=1

(xn−ˆµML)2.

Solution: The log-likelihood function is given by

2ln(2π)−N

2ln σ2−1

2σ2

n=1

Taking the gradient with respect to µ, σ2, and equating it to zero we obtain

the following system of equations

3.23. Let the observations xn, n = 1,2, . . . , N, come from the uniform distribu-

tion

p(x;θ) = (1

θ,0< x ≤θ,

0,otherwise.

Obtain the ML estimate of θ.

Solution: The likelihood function is given by

L(x;θ) =

n=1

θ=1

θN.

3.24. Obtain the ML estimate of the parameter λ > 0 of the exponential distri-

bution

p(x) = (λexp(−λx), x ≥0,

0, x < 0,

based on a set of measurements, xn,n= 1,2, . . . , N.

Solution: The log-likelihood function is

L(x;λ) = Nln λ−λ

n=1

xn.

n=1 xn

3.25. Assume an µ∼ N(µ0, σ2

0), and a stochastic process {xn}∞

n=−∞, consisting

of i.i.d random variables, such that p(xn|µ) = N(µ, σ2). Consider N

observations so that X ≡ {x1, x2, . . . , xN}, and prove that the posterior

p(x|X), of any x=xn0conditioned on X, turns out to be Gaussian with

mean µNand variance σ2+σ2

N, where

µN≡Nσ2

0¯x+σ2µ0

Nσ2

0+σ2, σ2

N≡σ2σ2

Nσ2

0+σ2.

Solution: From basic theory we have that

p(µ|X) = p(X|µ)p(µ)

Rp(X|µ)p(µ)dµ =αp(µ)

k=1

p(xk|µ),

(µ−µ0)2

(xk−µ)2

Hence, limN→∞ σ2

N= 0, and for large N,p(µ|X) behaves like a δfunction

centered around µN. Thus,

3.26. Show that for the linear regression model,

y=Xθ+η,

the a-posteriori probability p(θ|y) is a Gaussian one, if the prior distri-

bution probability is given by p(θ) = N(θ, Σ0), and the noise samples

follow the multivariate Gaussian distribution p(η) = N(0, Ση). Compute

the mean vector and the covariance matrix of the posterior distribution.

Solution: It can be easily checked that p(θ|y) = const×exp −1

2Ψ, where

Ψ = (y−Xθ)TΣ−1

η(y−Xθ) + (θ−θ0)TΣ−1

0(θ−θ0)

From now on, all terms that will be independent of θwill be collected in

constant terms. Hence

Ψ = α1−2yTΣ−1

ηXθ+ (θ−θ0)TΣ−1

0(θ−θ0)

In the sequel, we will follow a standard trick that we do in situations

like that. We introduce an auxiliary variable ¯

θ, whose value is to be

determined so that to make the following to be true,

Ψ = α4+ (θ−θ0−¯

θ)TΣ−1

0+XTΣ−1

ηX(θ−θ0−¯

θ)

Inspection of (17) and (18) indicates that this can happen if we choose

3.27. Assume that xn,n= 1,2...,N, are i.i.d observations from a Gaussian

N(µ, σ2). Obtain the MAP estimate of µ, if the prior follows the expo-

nential distribution

p(µ) = λexp (−λµ), λ > 0, µ ≥0.

Solution: Upon defining X:= {x1, x2, . . . , xN}, the posterior distribution

is given by

p(µ|X)∝p(X|µ)p(µ) = λexp (−λµ)

(2π)N/2σN

n=1

exp −(xn−µ)2

2σ2.

Taking the ln, differentiating with respect to µ, and equating to zero we

Bibliography

[Magn 99] Magnus, J. R., and Neudecker, H. Matrix Differential Calculus with

Applications in Statistics and Econometrics. John Wiley & Sons,

revised Ed., 1999.

Trusted by Thousands of
Students

Here are what students say about us.

Albert

University of Michigan

“I found almost every finance case study paper for my MBA courses.”.

Anna

University of Massachutsetts

“Wow! Solution manual for 3 out of 4 courses.”.

Collins

Jacksonville State University

“One-stop shop for college students. I passed all my exams thanks to Coursepaper”.

Jill

Boston University

“A helpful studying resources, a combination of all studying material in one place”.

Drake

Clark Atlanta University

“I graduated thanks to Coursepaper”.

Karen

College of Charleston

“I invested in Coursepaper, and it is paid off after the first semester. I got straight A”.

Hill

Concordia University Irvine

“Awesome awesome awesome site”.

Rachel

Coppin State University

“The one website that I recommend to every college students”.

Machine Learning Chapter 3 Cos And Also The Fact Cos Since Where Hence Stands For Unbiased

Unlock document.

Trusted by Thousands of
Students

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Machine Learning Chapter 3 Cos And Also The Fact Cos Since Where Hence Stands For Unbiased

Unlock document.

Trusted by Thousands ofStudents

Albert

Anna

Collins

Jill

Drake

Karen

Hill

Rachel

Kristopher

Resources

Company

Legal

Trusted by Thousands of
Students