Thursday 7 August 2014

chapter-20 Terminology associated with Probability

It is in most cases very useful to talk about the mean of a random variable X. For example, in the experiment above of four tosses of a coin, someone might want to know the average number of heads obtained. Now the reader may wonder about what meaning to attach to the phrase “average number of heads”. After all, we are doing the experiment only once and we’ll obtain a particular value for the number of Heads, say 0 or 1 or 2 or 3 or 4, but what is then this “average number of heads”?
By the average number of heads we mean this: repeat the experiment an indefinitely large number of times. Each time you’ll get a certain number of Heads. Take the average of the number of Heads obtained in each repetition of the experiment. For example, if in 5 repetitions of this experiment, you obtain 2,2311 Heads respectively, the average number of Heads would be (2 + 2 + 3 + 1 + 1) / 5 = 1.9. This is not a natural number, which shouldn’t worry you since it is an average over the 5 repetitions. To calculate the true average, you have to repeat the experiment an indefinitely large number of times.
An alert reader might have realized that the average value of a RV is easily calculable through its PD. For example, let us calculate the true average number of heads in the experiment of Example – 18. The PD is reproduced below:
Thus, for example, P(X = 1) = \dfrac{1}{4}, which means that if the experiment is repeated an indefinitely large number of times, we’ll obtain Heads exactly once, (about) {\dfrac{1}{4}^{th}} of the time. Similarly, (about) {\dfrac{3}{8}^{th}} of the time, Heads will be obtained exactly twice, and so on. Let us denote the number of repetition of the experiment by N, where N \to \infty , Thus, the average number of Heads per repetition would be (<  >  denotes average)
 < {\rm{Heads}} > \; = \dfrac{{{\rm{Total\, no}}{\rm{. of\, Heads\, in\, }}N\,{\rm{ repetitions}}}}{N}
 = \dfrac{{0 \times \dfrac{N}{{16}} + 1 \times \dfrac{N}{4} + 2 \times \dfrac{{3N}}{8} + 3 \times \dfrac{N}{4} + 4 \times \dfrac{N}{{16}}}}{N}
 = 0 \times \dfrac{1}{{16}} + 1 \times \dfrac{1}{4} + 2 \times \dfrac{3}{8} + 3 \times \dfrac{1}{4} + 4 \times \dfrac{1}{{16}}
 = \sum  (Value of the RV) \times  (Corresponding Probability of this value)
Thus, we see that if a RV X has possible values {x_1},{x_2},\ldots ,{x_n} with respective probabilities {p_1},{p_2},\ldots ,{p_n}, the mean of X, denote by \left\langle X \right\rangle , is simply given by
\left\langle X \right\rangle  = \sum\limits_{i = 1}^n {{x_i}\,{p_i}} \ldots(1)
As another example, recall the experiment of rolling two dice where the RV Xwas the sum of the numbers on the two dice. The PD of X is given in the table on Page – 42, and the average value of X is
\left\langle X \right\rangle \; = 2 \times \dfrac{1}{{36}} + 3 \times \dfrac{1}{{18}} + 4 \times \dfrac{1}{{12}} + 5 \times \dfrac{1}{9} + 6 \times \dfrac{5}{{36}} + 7 \times \dfrac{1}{6}  + 8 \times \dfrac{5}{{36}} + 9 \times \dfrac{1}{9} + 10 \times \dfrac{1}{{12}} + 11 \times \dfrac{1}{{18}} + 12 \times \dfrac{1}{{36}}
=7
The average value is also called the expected value, which signifies that it is what we can expect to obtain by averaging the RV’s value over a large number of repetitions of the experiment. Note that the value itself may not be expected in the general sense – the “expected value” itself may be unlikely or even impossible. For example, in the rolling of a fair die, the expected value of the number that shows up is 3.5  (verify), which in itself can never be a possible outcome. Thus, you must take care while interpreting the expected value – see it as an average of the RV’s values when the experiment is repeated indefinitely.
Another quantity of great significance associated with any RV X is its variance, denoted by Var(X). To understand this properly, consider two RV{X_1} and {X_2} and their PDs shown in graphical form below.
Both the RVs have an expected value of 3 (verify), but it is obvious that there is a significant difference between the two distributions. What is this difference? Can you put it into words? And more importantly, can you quantify it?
It turns out that we can, in a way very simple to understand. The ‘data’ or the PD of {X_1} is more widely spread than that of {X_2}. This is what is obvious visually, but we must now assign a numerical value to this spread. So what we’ll do is measure the spread of the PD about the mean of the RV. For both {X_1} and {X_2}, the mean is 3, but the PD of {X_1} is spread more about 3 than that of {X_2}. We now quantity the spread in {X_1}.
Observe that the various value of X - \left\langle X \right\rangle  tell us how far the corresponding values of X are from the mean (which is fixed). One way that may come to your mind to measure the spread is sum all these distances, i.e.
 {\rm{Spread}}= \sum\limits_{\scriptstyle{\rm{For \,all}}\hfill\atop  {\scriptstyle{\rm{values }}\,\hfill\atop  \scriptstyle{\rm{of }}\,X\hfill}} {\left( {X{\rm{ - }}\left\langle X \right\rangle } \right)}
However, a little thinking should immediately make it obvious to you that the right hand side is always 0, because the data is spread in such a way around the mean that positive contributions to the sum from those X values greater than \left\langle X \right\rangle  and negative contributions from those X values smaller than \left\langle X \right\rangle exactly cancel out. Work it out yourself.
So what we do is use the sum of the squares of these distances:
 {\rm{spread}}= {\sum\limits_{\scriptstyle{\rm{For\, all}}\hfill\atop  {\scriptstyle{\rm{values }}\hfill\atop  \,\scriptstyle{\rm{of }}\,X\hfill}} {\left( {X{\rm{ - }}\left\langle X \right\rangle } \right)} ^2}
However, there is still something missing. To understand what consider the following PD:
Although the PD seems visually widespread here, the probabilities of those Xvalues far from the mean are extremely low, which means that their contribution to the spread must take into account how probable they are and so on. This is simply accomplished by multiplying the value of {\left( {X - \left\langle X \right\rangle } \right)^2}with the probability of the corresponding value of X.
Thus, if X can take the values {x_1},{x_2},\ldots ,{x_n} with probabilities {p_1},{p_2},\ldots ,{p_n}, the spread in the PD of X can be appropriately represented by
‘Spread’  = {\sum\limits_{i\; = \;1}^n {\left( {{x_i} - \left\langle X \right\rangle } \right)} ^2}{p_i}
This definition of spread is termed the variance of X, and is denoted by Var(X). Statisticians defined another quantify for spread, called the standard deviation, denote by \sigma _x^2, and related to the variance by
Var(X) = \sigma _X^2
Note that the expected value of X was
\left\langle X \right\rangle  = \;\sum\limits_{i = 1}^n {{x_i}{p_i}}
Similarly, variance is nothing but the expected value of {({x_i} - \left\langle X \right\rangle )^2}
Var\left( X \right) = \left\langle {{{\left( {{x_i} - \left\langle X \right\rangle } \right)}^2}} \right\rangle \; = \;\sum\limits_{i = 1}^n {{{\left( {{x_i} - \left\langle X \right\rangle } \right)}^2}{p_i}}
Coming back to Fig-16, the variance in {X_1} is
Var({X_1}) = \;{(1 - 3)^2} \cdot \dfrac{1}{{10}} + {(2 - 3)^2} \cdot \dfrac{1}{5} + {(3 - 3)^2} \cdot \dfrac{2}{5} + {(4 - 3)^2} \cdot \dfrac{1}{5} + {(5 - 3)^2} \cdot \dfrac{1}{{10}}
 = \dfrac{4}{{10}} + \dfrac{1}{5} + 0 + \dfrac{1}{5} + \dfrac{4}{{10}}
=1.2
Similarly, the variance in {X_2} is
Var({X_2}) = {(2 - 3)^2} \cdot \dfrac{1}{4} + {(3 - 3)^2} \cdot \dfrac{1}{2} + {(4 - 3)^2} \cdot \dfrac{1}{4}
 = \dfrac{1}{4} + 0 + \dfrac{1}{4}
 = 0.5
which confirms our visual observation that the PD of {X_1} is more widely spread than of {X_1} , because {\mathop{\rm var}} ({X_1}) > {\mathop{\rm var}} ({X_2}).

No comments:

https://www.youtube.com/TarunGehlot