## Data Analysis report

Type

**Essay**
Pages

**12**
Word Count

**2201**
School

**N/A**
Course

**N/A****Subscribe**to view full document.View Document

2/19/2015

Data Analysis Methods

Mid-Term Project

Govind Ramchander

Vinay Gupta

Utkarsh Srivastava

MS-IS

Report:

Goal: The goal of the report is to study the factors and how they would impact the landing

distance of a commercial flight so that the risk of landing overrun is reduced.

Approach: We have landing data of 800 commercial flights to help us analyse and model

the equation to find out the distance based on the other parameters in the data supplied.

We will be following the below steps to achieve the final model.

1. Import data from the csv file.

2. Clean data based on the below requirements

a. Duration should be always greater than 40mins.

b. Ground speed should be between 30mph and 140mph.

c. Air speed should be between 30mph and 140mph.

d. Height should be at least 6m.

e. Distance should be less than 6000 feet.

3. Examine correlations between different variables in the data set.

4. Perform fitting into multiple linear regression model.

5. Re-explore and re-model data to find the most important parameters that impact

the landing distance.

Result: We found that speed_ground and speed_air have a strong correlation. Hence we

chose to retain only speed_ground in our model as it was complete (i.e. no missing values)

and also to prevent multi-collinearity. A Multiple Linear Regression model fit was then

done, assuming that distance is affected by all other variables of the dataset. In our first

iteration, firstmodel, we eliminated 3 factors viz. duration, no_pasg, and pitch from our

model since it did not significantly affect our response. In the next model we left these

variables out and went on to perform residual analysis on it check its correctness. We

found that residuals followed a trend with respect to the speed_ground variable and hence

we revised our model to include the squared value of speed_ground. Our final model,

revisedmodel, showed randomness for variation of residual values and also had an r-square

value of nearly 98% so we retained it. Our final model is of the form

distance = 2177.22 - 402.75 (aircraft) - 68.82 (speed_ground) + 0.69

(speed_ground ^ 2) + 13.71 (height)

Steps:

1. Importing data from Landing.csv file into R and analyse summary statistics

R Code:

> # Set working directory to the location containing CSV file