By Hemanta Sundaray on 2021-08-22
Measures of dispersion (also known as measures of variability) help us understand how spread out values are.
Variance describes how far apart observations are spread out from their mean. It is calculated as the average squared distance from the mean.
Distances must be squared so that distances below the mean don’t cancel out the distances above the mean.
The variance gives us a statistic in squared units. For example, if we calculated the variance of the heights of students of a class (the data is in feet), the result would be in feet squared.
We use standard deviation to see how far from the mean data points are, on average.
A small standard deviation means that values are close to the mean, while a large standard deviation means that values are dispersed more widely.
Standard deviation is simply the square root of the variance.
We can calculate the variance and standard deviation of a set of data points using the statistics module provided by the Python standard library.
import pandas as pd
import statistics
scores = [34, 45, 67, 38, 89, 45, 98, 12, 24]
statistics.pvariance(scores) # calculate the variance
# 742.6172839506172
statistics.pstdev(scores) # calculate the standard deviation
# 27.251005191563433