Feature Engineering Basics
What is feature engineering?
Feature engineering is the process of converting raw data into useful inputs (features).
Examples:
- Date → day of week
- Amount → log(amount)
- Text → length, presence of keywords
- Customer transactions → total spend, avg order value
Example: derive features from datetime
Datetime features
import pandas as pd
df = pd.DataFrame({
"order_time": pd.to_datetime([
"2025-01-01 10:15:00",
"2025-01-02 18:30:00",
"2025-01-03 09:10:00",
]),
"amount": [250, 180, 90],
})
df["hour"] = df["order_time"].dt.hour
df["weekday"] = df["order_time"].dt.day_name()
df["is_weekend"] = df["order_time"].dt.weekday >= 5
print(df)Datetime features
import pandas as pd
df = pd.DataFrame({
"order_time": pd.to_datetime([
"2025-01-01 10:15:00",
"2025-01-02 18:30:00",
"2025-01-03 09:10:00",
]),
"amount": [250, 180, 90],
})
df["hour"] = df["order_time"].dt.hour
df["weekday"] = df["order_time"].dt.day_name()
df["is_weekend"] = df["order_time"].dt.weekday >= 5
print(df)Example: ratios and flags
Ratios
import pandas as pd
df = pd.DataFrame({"revenue": [1000, 500], "users": [100, 20]})
df["revenue_per_user"] = df["revenue"] / df["users"]
df["is_high_value"] = df["revenue_per_user"] > 20
print(df)Ratios
import pandas as pd
df = pd.DataFrame({"revenue": [1000, 500], "users": [100, 20]})
df["revenue_per_user"] = df["revenue"] / df["users"]
df["is_high_value"] = df["revenue_per_user"] > 20
print(df)Example: aggregated features (groupby)
Aggregations
import pandas as pd
orders = pd.DataFrame({
"customer_id": [1, 1, 2, 2, 2],
"amount": [100, 250, 90, 180, 300],
})
cust = orders.groupby("customer_id").agg(
total_spend=("amount", "sum"),
avg_order=("amount", "mean"),
orders=("amount", "count"),
).reset_index()
print(cust)Aggregations
import pandas as pd
orders = pd.DataFrame({
"customer_id": [1, 1, 2, 2, 2],
"amount": [100, 250, 90, 180, 300],
})
cust = orders.groupby("customer_id").agg(
total_spend=("amount", "sum"),
avg_order=("amount", "mean"),
orders=("amount", "count"),
).reset_index()
print(cust)Guiding principles
- Don’t leak target/future information.
- Prefer simple features first.
- Validate with charts and summary stats.
- Document each feature and its meaning.
If this helped you, consider buying me a coffee ☕
Buy me a coffeeWas this page helpful?
Let us know how we did
