Feature Scaling (MinMax vs Standard)
Why scale features?
Scaling is mainly important for ML algorithms that use distances or gradients:
- k-NN
- k-means
- SVM
- linear/logistic regression (often)
- neural networks
Tree-based models (like Random Forest) usually donβt need scaling.
Two common scalers
Min-Max scaling
Maps values to a fixed range (usually 0 to 1):
- x_scaled = (x - min) / (max - min)
Standardization (z-score scaling)
Centers and scales to mean=0 and std=1:
- x_scaled = (x - mean) / std
Example (using scikit-learn)
MinMaxScaler vs StandardScaler
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler
x = np.array([[10], [20], [30], [100]])
mm = MinMaxScaler()
ss = StandardScaler()
print("MinMax:")
print(mm.fit_transform(x))
print("Standard:")
print(ss.fit_transform(x))MinMaxScaler vs StandardScaler
import numpy as np
from sklearn.preprocessing import MinMaxScaler, StandardScaler
x = np.array([[10], [20], [30], [100]])
mm = MinMaxScaler()
ss = StandardScaler()
print("MinMax:")
print(mm.fit_transform(x))
print("Standard:")
print(ss.fit_transform(x))Important rule: fit on train only
When doing ML:
- Fit scaler on training data
- Transform both train and test using that scaler
This avoids data leakage.
Practical guidance
- If data has strong outliers: StandardScaler can be influenced; consider RobustScaler.
- If you need bounded values (0β1): use MinMaxScaler.
If this helped you, consider buying me a coffee β
Buy me a coffeeWas this page helpful?
Let us know how we did
