Scaling: Apply feature scaling techniques such as StandardScaler or MinMaxScaler especially for regression algorithms. 6. Encoding Categorical Variables: Use label encoding or one-hot encoding for categorical features if present. 7. Outlier Removal: Identify and remove abnormal values in fare amounts or distances which may skew the model.

How do Linear Regression and Random Forest Regression models compare in predicting Uber ride prices, and how are they evaluated ?
Linear Regression and Random Forest Regression are both supervised learning models but differ in complexity and performance:

Linear Regression:
Assumes a linear relationship between input features and target (ride price).
Simple to interpret and fast to train.
May underperform in presence of non-linear data or outliers.

Random Forest Regression:
Ensemble learning method using multiple decision trees.
Handles non-linear relationships well and is robust to outliers.
Typically provides higher accuracy at the cost of interpretability.

Evaluation Metrics:
R² Score (Coefficient of Determination): Indicates how well the model explains the variance in the target variable.
RMSE (Root Mean Squared Error): Measures the average magnitude of the prediction error.

Typical Result:
Random Forest generally achieves a higher R² and lower RMSE compared to Linear Regression, making it better suited for ride price prediction with complex patterns in the data.

Programming Code:

Following code write in: ML_P01.py

# ML Project Program 01 

# import libraries
import numpy as np
import pandas as pd

# import dataset
data = pd.read_csv("uber_dataset/uber.csv")
# print first few data of uber dataset
data.head
# print information of Uber dataset
data.info()
# dtypes is nothing but the data types
# object is string data
# converting object to date & time
data["pickup_datetime"] = pd.to_datetime(data["pickup_datetime"])
data.info()
# successfully converted object to date & time by using to_datetime() method
# find missing values
data.isnull()
# find total number of missing values 
data.isnull().sum()
# 0 means false & 1 means True
# if Ture means null or missing values in dataset or in row
# drop the row if it has missing values

data.dropna(inplace = True)
# After drop missing value row

data.isnull().sum()
# Now create a Machine Learning Model

# import lib

from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# x is predictor variable
x = data.drop("fare_amount", axis = 1)

# y is target variable
y = data["fare_amount"]
# to apply model

x['pickup_datetime'] = pd.to_numeric(pd.to_datetime(x['pickup_datetime']))
x = x.loc[:, x.columns.str.contains('^Unnamed')]
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)

# testing data is 20%
# training data is 80%, allocated to model
# creating Linear Regression model

lrmodel = LinearRegression()
lrmodel.fit(x_train, y_train)
# model is created
# prediction

pred = lrmodel.predict(x_test)
# Calculating RMSE
lrmodelrmse = np.sqrt(mean_squared_error(pred, y_test))
print("RMSE error is: ",lrmodelrmse)
# Random Forest Regression

from sklearn.ensemble import RandomForestRegressor

# create RFR Model
rfrmodel = RandomForestRegressor(n_estimators = 100, random_state = 101)
# fit the forest

rfrmodel.fit(x_train, y_train)
rfrmodel_pred = rfrmodel.predict(x_test)
# Calculate RMSE for RFR

rfrmodel_rmse = np.sqrt(mean_squared_error(rfrmodel_pred, y_test))
print("RFR RMSE error is: ", rfrmodel_rmse)

# prediction

pred = lrmodel.predict(x_test)
print("hh",pred)
lrmodel.predict(x_test)
from sklearn import metrics

# R2 score

# R2 score Linear Regression
metrics.r2_score(y_test, pred)
# R2 score RF Model
metrics.r2_score(y_test, rfrmodel_pred)
# R2 score Linear Regression is 894% that means model not fit.
# R2 score RF Model is: 52%

# Random Forest Model best fit for this dataset, is perfect

# Thanks For Reading.

Output:

Machine Learning Program / Project - 01

Posted by go2collage

Post a Comment

0 Comments

Search This Blog

Most Popular

How to check citation of IRJET Paper /Journal

Enable the code editor line numbers in Visual Studio

Flutter Error 01: How to resolve 'Cannot delete file, The process cannot access the file because it is being used by another process in Flutter

Featured Post

Machine Learning Program / Project - 08

Program / Project Code

Pages

Footer Menu Widget

Contact form

Ad Code

Machine Learning Program / Project - 01

Posted by go2collage

You may like these posts

Post a Comment

0 Comments

Social Plugin

Search This Blog

Most Popular

How to check citation of IRJET Paper /Journal

Enable the code editor line numbers in Visual Studio

Flutter Error 01: How to resolve 'Cannot delete file, The process cannot access the file because it is being used by another process in Flutter

Featured Post

Machine Learning Program / Project - 08

Program / Project Code

Pages

Ad Code

Footer Menu Widget

Contact form