Until now, what we have been doing is simple: to evaluate the probability of any event in a sample space , we find the total number of outcomes, and the number of outcomes favorable to , and we then have
You must not at all forget that this holds only if all the outcomes are equally likely, that is, we have no reason to suspect that any particular outcome will be more or less likely than another. For example, we saw that the sample space of tossing a fair coin or rolling a fair die consist of equally likely outcomes. (Note that two outcomes cannot be proved mathematically to be equally likely. We either assume beforehand the equal likelihood of outcomes, or we repeat the experiment an indefinitely large number of times, and thus show empirically (rather than mathematically) that the relative frequencies of the various outcomes approach the same value).
Now, coming back to , we said that it will not hold if the various outcomes are not equally likely. For example, suppose that a die is constructed (using careful loading) such that
For such a die, the probability of rolling an odd number will be
rather than , which you would have got by doing (no. of odd outcomes / no. of total outcomes). This point is easy to understand yet mistakes are made!
A curious reader might have a further issue. She might say, “You just talked about making a die with outcomes of unequal probabilities. For example, you said that What is the basis for saying so? I understood the case of equally likely outcomes, where all probabilities are the same, but how did this figure of come about ?” Well, this number comes about by using a relative frequency approach to probability. When the die-maker says that the probability of a coming up is , what he must have done (either actually, or through a sophisticated computer simulation) is roll the die a very large number of times, and observe that comes up (about) one-eight of the time. Thus the assertion.
To summarize, there are two ways we’ve discussed to evaluate probabilities
this ‘works’ when all the outcomes are equally likely. If our event can happen in ways out of a total possible of , our required probability is
this ‘works’ in general. To find the probability of an event, we repeat the experiment a very large number of times, say , and observe how many times that particular event occurred, say . then gives us the empirical probability of an event. In fact, we should be using this relation:
that is, we should be using the value of empirical probability only if the experiment is repeated an indefinitely large number of times.
Finally, it must be said that both the approaches fail to stand up to the rigors of mathematics, because the former uses the vague phrase “equally likely” about which we can give no mathematical justification, while in the latter, we have no way to prove that the limit will actually coverage to some value, because no experiment can be repeated an infinite number of times.
Mathematicians therefore, being very finicky about rigor, define probability as a function associated with any event and that satisfies three axioms:
Axiom 1: For any event ,
Axiom 2: For the entire sample space (that is, for the sure event),
Axiom 3: For mutually exclusive events
Thus, what we have here is three axioms that the probability of any event(s) must satisfy, but these three axioms in no way tell us how to actually measure probability associated with any event. Those interested in knowing more deeply about these axioms and the interpretation of probability should find plenty of resources on the World Wide Web. For present, this much background should suffice.
Before closing this section, let us see some more examples of how events are treated as subsets of a universal set of outcomes, the sample space. Events are denoted by , , etc and the sample space by . The complementary event of any event is denoted by .
Generalising this gives
Try proving this relation for three events using a Venn diagram
This should be obvious. On the right side of the inequality, there is an extra contribution to the sum from The equality holds only for mutually exclussive events.
This also generalises obviously to events.
Try to figure this out on your own. Using a Venn diagram would be a good idea.