Introduction
In this tutorial, we are going to implement the Flight Fare Prediction. This model predicts the price of the flight based on some parameters like total stops, journey Day, journey month, Air India, Indigo, source, destination, etc. I have trained this model using the random forest regressor and after training, fine-tune the model which is also known as hyper parameter tuning. Then save a model and deploy this Flight Fare Prediction model using the Flask application on the localhost.
About dataset:
We have a data set of flight fares in two Excel files, one is a training file and another one is a testing file. Training dataset contains 10683 rows and 11 columns. Test dataset contains 2671 rows and 10 columns.
Let’s start:
Firstly import all basic required libraries like seaborn, matplotlib, pandas, etc. Then load the training dataset which is stored in the form of an excel file and display the top 5 rows of the dataset. You can change the location of the excel file as your preference.
Then check the shape of the dataset. And all the code and output are given in the screenshot below.
Now create 4 new features from the existing information or features. Create Journey Day and Journey Month from the Date of Journey feature. Then Create Departure Hour and Departure Minute from Departure Time feature. These features help us to create better models as well as better accuracy.
Again create two new features Arrival Hour and Arrival Minute from the Arrival Time feature.
Then drop all 3 features which are of no use. Drop Arrival Time, Departure Time and Date of Journey features.
Now write a script to preprocess the Duration feature because it is given like “4h 45m” or “19h” or “45m”. So write a script which extracts the hour and minute and create a separate two list which contains the hour and minute and then create two new features Duration Hour and Duration Minute and store minute and hour list in it. And then drop the Duration feature which is of no use.
Now, we have a feature Airline which contains categorical data. So, firstly converted into numeric data using the pandas function get_dummies(). It converts the categorical categories row into the column or feature. So, we have a total of 12 categories in Airline features. So, it converts all categories into features but we use drop first. So, 11 columns will be created.
Now same here, we have a feature Source which contains categorical data. So, firstly converted into numeric data using the pandas function get_dummies(). It converts the categorical row into the column or feature. So, we have a total of 5 categories in Source features. So, it converts all categories into features but we use drop first. So, 4 columns will be created.
Now do the same for Destination, we have a feature Destination which contains categorical data. So, converted into numeric data. It converts the rows into the columns or features. So, we have a total of 6 categories in destination features. So, it converts all categories into features but we use drop first. So, 5 columns will be created.
Now drop the Route and Additional Info features.
Now apply the ordinal encoding on the Total Stops feature which contains the categorical data. So, we initialize categories level or priority wise as you can see in the image below.
Then perform EDA and Data Preprocessing. Same EDA and Data Preprocessing performed test dataset which we apply on the training dataset like check shape of dataset, create new features from the existing features, apply one hot encoding on the categorical features, drop non useful features and keep useful features, etc.
Now these are our training dataset and testing dataset as you can see in the below screenshot.
Now split the training dataset into the independent and dependent features. X contains the independent features and Y contains the dependent features.
Split the dataset into the training and testing using the train_test_split method of scikit-learn which returns four dataset as you can see in the image below.
The define the Random Forest Regressor model and train with X_train and Y_train dataset. Then test the model with X_test and Y_test dataset.
Then evaluate the model. Our training accuracy is 95.35% and testing accuracy is 82.69%.
Then Fine-Tune the Random Forest Regressor model using the hyper parameter tuning. I choose some parameters of RFR like n_estimators, max_depth, min_samples_split, min_samples_leaf, max_features, etc with different values. Then create a dictionary of that parameters.
After fine-tune the RFR model, then we check the best parameters of RFR which we got a better accuracy. Then test a model using the X_test dataset which contains the all independent features.
After testing the RFR model with a test dataset, again evaluate the model using the Mean Absolute Error, Mean Squared Error, Root Mean Squared Error and r2_score as you can see in the output below and now we got 85.73% r2_score. Now we have better accuracy as compared to the previous one.
Now save the Flight Fare Prediction model, then load the model and test the model with the X_test dataset as you can see in the image below.
I have deployed a flight fare prediction model using the flask framework of python on the localhost. Flask is a web framework that provides libraries to build lightweight web applications in python. This application takes some information to predict the fare of the flight like departure date, arrival date, source station, destination station, total stops, airline, etc. After taking this information, our model (flight fare prediction) predicts the fare of the price with this information. Below image is an deployment of the model and you can also watch the video tutorial for better understanding.
Source Code :-
- Go to my GitHub and fork or download the source code : Flight Fare Prediction
- If you have downloaded the code into your system. Then extract the zip folder.
- Go into the folder.
- Download trained model and keep into the project folder : Flight Fare Prediction Model
- Open the command prompt and go to the flight fare prediction project folder with cd command.
- Write (python prediction_marks.py) in your command prompt.
- You'll get a link like (https://127.0.0.1:5000/).
- Copy this link and Paste in the Chrome or any browser.
- Now you can use the model.
Nice explanation sir!!!👍
ReplyDeleteamazing keep going do visit link below for exploring more about data science
ReplyDelete360digitmg
amazing and here see tips for data science
ReplyDelete