Skip to content

Housing Price Prediction (Regression)

Goal

Given housing data, you will:

  • Explore price drivers
  • Prepare features (missing values, encoding)
  • Build a baseline regression pipeline

Step 1: Load

Load housing
import pandas as pd
 
df = pd.read_csv("data/housing.csv")
print(df.shape)
print(df.head())
Load housing
import pandas as pd
 
df = pd.read_csv("data/housing.csv")
print(df.shape)
print(df.head())

Step 2: EDA: price distribution

Price distribution
import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.histplot(df["price"], bins=30, kde=True)
plt.title("House price distribution")
plt.tight_layout()
plt.show()
Price distribution
import seaborn as sns
import matplotlib.pyplot as plt
 
plt.figure(figsize=(7, 4))
sns.histplot(df["price"], bins=30, kde=True)
plt.title("House price distribution")
plt.tight_layout()
plt.show()

Step 3: Baseline preprocessing

  • Identify numeric/categorical
  • Impute missing values
  • One-hot encode
  • Scale numeric (optional)

Use the Phase 4 pipeline approach.

Step 4: Train baseline model

Baseline regression (concept)
# Use scikit-learn Pipeline + ColumnTransformer
# Choose a baseline model like LinearRegression or RandomForestRegressor
# Evaluate using MAE/RMSE on a held-out test set
Baseline regression (concept)
# Use scikit-learn Pipeline + ColumnTransformer
# Choose a baseline model like LinearRegression or RandomForestRegressor
# Evaluate using MAE/RMSE on a held-out test set

Deliverable

  • Key drivers (most correlated features)
  • Data quality issues
  • Baseline model performance

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did