Introduction
In this tutorial, we will implement a Bangalore House Price Prediction model using a Machine Learning algorithm. This model predicts the price of Bangalore's house with the help of a few parameters like availability, size, total square feet, bath, location, etc.
During this Bangaluru House Price prediction using Machine Learning tutorial you will learn several things like :-
- Exploratory data analysis
- Dealing with a missing values or noisy data
- Data preprocessing
- Create new features from existing features
- Remove outliers
- Data visualisation
- Splitting data into the training and testing
- Train linear regression model and test.
I have trained a Bengaluru House Price prediction model using linear regression algorithm and I got 86% accuracy over the testing data.
About dataset :-
What are the things that a potential home buyer considers before purchasing a house? The location, the size of the property, vicinity to offices, schools, parks, restaurants, hospitals or the stereotypical white picket fence? What about the most important factor — the price?
For example, for a potential homeowner, over 9,000 apartment projects and flats for sale are available in the range of ₹42-52 lakh, followed by over 7,100 apartments that are in the ₹52-62 lakh budget segment, says a report by property website Makaan. According to the study, there are over 5,000 projects in the ₹15-25 lakh budget segment followed by those in the ₹34-43 lakh budget category.
Buying a home, especially in a city like Bengaluru, is a tricky choice. While the major factors are usually the same for all metros, there are others to be considered for the Silicon Valley of India. With its millennial crowd, vibrant culture, great climate and a slew of job opportunities, it is difficult to ascertain the price of a house in Bengaluru.
Let’s start :-
Common step is to load all the required libraries and load the Bengaluru house data set using the Pandas function read_csv() and display the top five rows of the data set using the head() method.
Now perform an Exploratory Data Analysis. In EDA, Check the shape of the data set using the shape method. It displays the number of rows and number of columns. Then display the percentage of null values like how much percent it contains NULL values. Then check the value count of the area_type column. Then drop some features (columns) which are of no use to train our model. The features which we are going to drop are availability, area_type, society, balcony. Now display the data set.
Now check the unique values of size feature and you can see there are different types of values like in BHK, bedrooms etc. So, we write a function to extract only the starting integer values from the size feature and store it into a new bhk feature. And now you can see the size feature of the data set. Now drop the size feature which is of no use now.
Now it's time to remove the outliers from the BHK. firstly check the BHK greater than 22. If it’s greater than 22 which means it’s outlier. Now check the unique values of total_sqft which contain integer values (Like 2000), range values (2000-3000) and mixed data type values (2000Sq Meter).
Now create a user defined function is_float() with the the total_sqft as an argument and return all the floating (function convert integer values into float). Then we apply a function on the total_sqft feature. But we apply this function using a tilt(~) symbol which returns all values except floating type. It means, it returns a range and mixed data type values as you can see in the below output.
Now describe a price_per_sqft feature and in this, you can see the outlier. House price is 176470
Lakh which is not possible according to location and total square feet. So create a function remove_outlier_from_price_per_sqft(). It takes a dataset and uses a Standard Deviation technique to remove outliers. After applying this function, you can see the description below.
No again use a Standard Deviation technique to remove the outliers from the price_per_sqft.
Now visualize the number of baths using a histogram graph.
Keep only those houses who have only less than bhk-1. For example: if a house is of 4 bhk, then it contains only 3 baths (bhk-1). Now check the shape of the data and now the data set contains 7325 rows and 6 columns.
Now drop a price_per_sqft which is of no use and display the final data and now it still contains a categorical feature (location).
Now concate dummies data set with our final data set and remove a “other” column from “dummies” data set. We can identify a “other” location like if all locations are “0” then automatically “other” is “1”.
Now it's time to prepare the data set. Data set is split into the independent and dependent features and stored into the “x” and “y” data set. And check the shape of “x” and “y” as you can see below.
Then split the data set into the training and testing using the train_test_split() method which returns 4 data sets as you can see in the below image. Then check the shape of all four data sets.
Now define our linear regression model and train the model using the training data set and check the score of the model using the validation data sets.
Create a function to test the model on a custom data set which takes the location, sqft, bath, bhk, etc. So, I tested a model on 3 custom data sets as you can see in the below image.
Now save a model using a joblib library with the name “banglore house price prediction model.pkl”.
Source Code
- Go to my GitHub and fork or download the repo: Bangaluru House Price Prediction
- Open .ipnyb file in jupyter notebook.
- Now you can use it.
Video Tutorials
Thank You !!!!!!!!!!!!!!
Good job sir, I like your consistency. Thank you sir for your efforts.
ReplyDeleteNever mind brother
DeleteHello Expert, if I provide data set of Indore city Hometowns can you predict the price.
ReplyDeleteYes bro..
Delete