To view the code, please click on the GitHub repository link. Click Here
PROBLEM STATEMENT?
In this competitive world, everyone holds high goals and expectations for their dream jobs. For an employer, predicting a justified salary for an employee is a challenging task.
They cannot arbitrarily offer the expected salary to everyone.
Now, the question arises: How can the right salary be determined for an employee?
SOLUTION:
Design a system/website that takes inputs such as employee details (experience, qualifications, skills, test score, interview score, etc.) and automatically determines/predicts the employee's salary.
OBJECTIVE:
To build a machine learning model that predicts the salary of an employee based on interview score, test score, and experience.
TASK PERFORMED
DATA UNDERSTANDING AND CLEANING :
Data understanding is the first step in data science project. After properly understand the data, move to the next step i.e data cleaning.
To prepare the data model, proper data cleaning is necessary. In our case, one column contains some missing values.
We have two methods to address these missing values: deletion and imputation. We will opt for the imputation method due to the limited size of our dataset.
By calculating the mean of column values, missing values can be effectively imputed.
FEATURE TRANSFORMATION:
Feature transformation is a critical step in machine learning since, ultimately, building a regression model requires numeric data.
In our dataset, the 'experience' column is a categorical variable, and we will transform it into a numerical format.
DATA SPLITTING:
Now that we have cleaned the data, it's time to split it into training and testing datasets.
When splitting the dataset, we maintain a ratio of 0.8 to ensure that the model is trained on a substantial portion of the data.
MODEL BUILDING:
I used the linear regression algorithm to build the model because the target variable 'salary' in the dataset is of a numerical nature.
MODEL EVALUATION:
To evaluate the performance of the model, I utilized evaluation metrics such as mean absolute error and mean squared error, which indicate the extent of deviation between our predicted and actual values.
Additionally, I employed the r2_score metric, which gauges the extent to which the target variable is explained by the independent variables
RESULTS:
Successfully built a machine learning model to predict employee salaries with an accuracy of 83%.