import numpy as np
import matplotlib.pyplot as plt
Exploring probability distributions
What happens when we add two sets of data with rectangular probability distributions?
= 100000
n_samples = 200 hist_bins
= np.random.rand(n_samples)
data_1 =hist_bins)
plt.hist(data_1, bins plt.show()
= np.random.rand(n_samples)
data_2 = data_1 + data_2
data_2 =hist_bins)
plt.hist(data_2, bins plt.show()
We can see here that the two rectangular probability distributions combine into a new set of data with a triangular distribution.
This can be understood by considering rolls of two siz-sided dice. If you’ve played catan or other two dice games, you’ll know that the probability of rolling a 7 is higher than any other number. 2 and 12 are the least likely numbers. This is because there are multiple ways to roll a 7 (1&6, 2&5, 3&4), and only one way to roll a 2 (1&1) or a 12 (6&6)
Starting with the sum of the lowest number on each dice, the number of combinations of dice rolls which make the next possible sum increases. The probability of rolling the next possible number increases with a linear relationship as the number of ways of rolling this number increases. This holds true up to the middle sum, then decreases linearly.
Both the dice on their own have a rectangular probability distribution, but together their probability distribution is triangular.
Adding a third rectangular set of data
= np.random.rand(n_samples)
data_3 = data_3 + data_2
data_3 =hist_bins)
plt.hist(data_3, bins plt.show()
Now we see a bell curve.
The shape of this curve can be understood by looking at a table of the odds of each roll combination of three six-sided dice:
Adding a fourth rectangular distribution changes the shape and the spread of the bell curve.
= np.random.rand(n_samples)
data_4 = data_4 + data_3
data_4 =hist_bins)
plt.hist(data_4, bins plt.show()
Normalizing the data lets us see the way the spread of probabilities changes as we add more rectangular distributions together.
def normalize(data):
return data / data.max()
=hist_bins)
plt.hist(normalize(data_2), bins
plt.subplot()
=hist_bins)
plt.hist(normalize(data_3), bins
plt.subplot()
=hist_bins)
plt.hist(normalize(data_4), bins
plt.subplot()
'n=2', 'n=3', 'n=4'])
plt.legend(['Effect of summing multiple rectangular probability distributions.\n Data is normalized to allow overlapping bell curves.')
plt.title( plt.show()
Conclusion
We can see here that adding together data sets with rectangular probability disributions creates a triangular distribution for the first addition where n = 2, and creates data with increasingly tighter bell curve shaped probability distributions for increasing n > 2.