Understand the Relationship Between Probability and Statistics
Statistics is akin to reverse engineering probability to find the truth
Until a long time, I assumed that the expected value of a random variable is a fancy name for the sample mean. I was completely wrong.
Both the sample mean and the expected value are averages but they are not the same thing.
Let me explain it with a simple example given by Gilbert Strang, in one of his recent books (Linear Algebra and Learning from Data).
Imagine a class of fishermen. This is a class with 20% 17 years old, 50% 18 years old, and 30% 19 years old. Imagine picking a random sample of five fishermen aged 18, 17, 18, 19, and 17. The sample mean is 17.8.
However, the expected age of a randomly selected fisherman is 18.1.
Both 17.1 and 18.1, albeit different, are correct averages.
Understanding the difference between the two is in the name. The sample mean comes from a completed trial. It’s based on data that has already been collected.
The expected value comes from probabilities. It’s what we expect we will get if we undertake the trial.
So what has all this to do with the relationship between statistics and probability?
Professor Phillippe Rigollet has beautifully explained this in his Fundamentals of Statistics course. He calls it the central dogma of probability and statistics.
Let’s dissect this diagram, starting from the truth. At the end of the day, we want to understand the truth to make better decisions.
Probability equips us with the tools to model the truth and account for any “randomness”. This randomness could be an inherent part of the process that generates the truth. For example, rolling a die could be…