Skip to content

Descriptive Statistics (mean, median, variance)

Central tendency

Mean

  • Sensitive to outliers
  • Good for symmetric distributions
Mean
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print(np.mean(x))
Mean
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print(np.mean(x))

Median

  • Robust to outliers
Median
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print(np.median(x))
Median
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print(np.median(x))

Mode

Useful for categorical data.

Mode (SciPy)
import numpy as np
from scipy import stats
 
x = np.array([1, 1, 2, 2, 2, 3])
print(stats.mode(x, keepdims=True))
Mode (SciPy)
import numpy as np
from scipy import stats
 
x = np.array([1, 1, 2, 2, 2, 3])
print(stats.mode(x, keepdims=True))

Spread (variability)

  • Range: max - min (very sensitive)
  • Variance: average squared distance from mean
  • Standard deviation (std): sqrt(variance)
Variance / Std
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print("var:", np.var(x, ddof=1))
print("std:", np.std(x, ddof=1))
Variance / Std
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
print("var:", np.var(x, ddof=1))
print("std:", np.std(x, ddof=1))

IQR (interquartile range)

Robust measure of spread.

IQR
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
q1 = np.percentile(x, 25)
q3 = np.percentile(x, 75)
print("IQR:", q3 - q1)
IQR
import numpy as np
 
x = np.array([10, 12, 12, 13, 12, 11, 100])
q1 = np.percentile(x, 25)
q3 = np.percentile(x, 75)
print("IQR:", q3 - q1)

Quick checklist

  • Use median/IQR when outliers exist
  • Use mean/std when distribution is roughly symmetric
  • Always visualize (histogram/boxplot) before trusting summary stats

If this helped you, consider buying me a coffee ☕

Buy me a coffee

Was this page helpful?

Let us know how we did