Data Analysis report

Type
Essay
Pages
12
Word Count
2201
School
N/A
Course
N/A
2/19/2015
Data Analysis Methods
Mid-Term Project
Govind Ramchander
Vinay Gupta
Utkarsh Srivastava
MS-IS
Report:
Goal: The goal of the report is to study the factors and how they would impact the landing
distance of a commercial flight so that the risk of landing overrun is reduced.
Approach: We have landing data of 800 commercial flights to help us analyse and model
the equation to find out the distance based on the other parameters in the data supplied.
We will be following the below steps to achieve the final model.
1. Import data from the csv file.
2. Clean data based on the below requirements
a. Duration should be always greater than 40mins.
b. Ground speed should be between 30mph and 140mph.
c. Air speed should be between 30mph and 140mph.
d. Height should be at least 6m.
e. Distance should be less than 6000 feet.
3. Examine correlations between different variables in the data set.
4. Perform fitting into multiple linear regression model.
5. Re-explore and re-model data to find the most important parameters that impact
the landing distance.
Result: We found that speed_ground and speed_air have a strong correlation. Hence we
chose to retain only speed_ground in our model as it was complete (i.e. no missing values)
and also to prevent multi-collinearity. A Multiple Linear Regression model fit was then
done, assuming that distance is affected by all other variables of the dataset. In our first
iteration, firstmodel, we eliminated 3 factors viz. duration, no_pasg, and pitch from our
model since it did not significantly affect our response. In the next model we left these
variables out and went on to perform residual analysis on it check its correctness. We
found that residuals followed a trend with respect to the speed_ground variable and hence
we revised our model to include the squared value of speed_ground. Our final model,
revisedmodel, showed randomness for variation of residual values and also had an r-square
value of nearly 98% so we retained it. Our final model is of the form
distance = 2177.22 - 402.75 (aircraft) - 68.82 (speed_ground) + 0.69
(speed_ground ^ 2) + 13.71 (height)
Steps:
1. Importing data from Landing.csv file into R and analyse summary statistics
R Code:
> # Set working directory to the location containing CSV file