Saturday, 4 May 2013

Elementary Statistics chapter 6


Basic Concepts of Probability




Terminology

    Usually, probabilities are descriptions of the likelihood of some event occurring (ranging from 0 to 1)
    The probability of two events occurring will be termed independent if knowledge of the occurrence of non-occurrence of the first event provide does not effect estimates of the probability that the second event will occur
    Two events are termed mutually exclusive if the occurrence of one event precludes occurrence of the other event
    A set of events are termed exhaustive if they embody all of the possible outcomes in some situation

Basic Laws of Probability

    The additive law of probabilities: given a set of mutually exclusive events, the probability of occurrence of one event or another event is equal to the sum of their separate probabilities
    Example:
    Place 100 marbles in a bag; 35 blue, 45 red, and 20 yellow.
P(blue)=.35    P(red)=.45    P(yellow)=.20 
    What is the probability of choosing either a red or a yellow marble from the bag?
P(red or yellow) = P(red)+ P(yellow) 
= .45+.20
= .65
    The multiplicative law of probabilities: The probability of the joint occurrence of two or more independent events is the product of their individual probabilities
Example:
    Say that the probability that I am in my office at any given moment of the typical school day is .65
    Also, say that the probability that someone is looking for me in my office at any given moment of the school day is .15
    What is the probability that that during some particular moment, I am in my office and someone looks for me there?
P(in office and someone looks)
P(in office) * P(someone looks)
= .65 x .15
= .0975
Question:
    Say than Fred takes the car into work with a probability of .50, walks with a probability of .20, and takes public transit with a probability of .30
    Barney, on the other hand, drives into work with a probability of .20, walks with a probability of .65, and takes public transit with a probability of .15
    What is the probability that Fred walked or drove to work and Barney walked or took public transit to work, assuming Fred and Barney's behaviour to be independent?

Joint and Conditional Probabilities

    The joint probability of two events A & B is the likelihood that both events will occur and is denoted as P(A,B)
    When the two events are independent, the joint probability simply follows the multiplicative rule
    thus, P(A,B) = P(A) x P(B)
    When they are not independent, it gets a little trickier ... but we won't worry about that for now
    conditional probability is the probability that some event (A) will occur, given that some other even (B) has occurred
    denoted as P(A|B)
An Example: Drinking & Driving
Accident
No Accident
Total
Drinking
7
23
30
Not Drinking
6
64
70
Total
13
87
100
    P(Drinking) = 30/100 = 0.3000
    P(Accident) = 13/100 = 0.1300
    P(Drinking, Accident)
P(Drinking) x P(Accident)
= 0.30 x 0.13
= 0.0390
    P(Drinking | Accident)
= 7/13
= 0.5385
    P(Accident | Drinking)
= 7/30
= 0.2333

Factorials!

    We will soon discuss the concepts of permutations and combinations
    Prior to that, it is necessary to understand another mathematical symbol, the symbol ! (read `factorial')
      N! = (N) x (N-1) x (N-2) x ... x (1)
      5! = 5 x 4 x 3 x 2 x 1 = 120
      3! = 3 x 2 x 1 = 6
      Note: 0! = 1

Permutations

    If 3 people (p1, p2, & p3) entered a race, how many different finishing orders are possible?
p1, p2, p3 p1, p3, p2 
p2, p1, p3 p2, p3, p1 
p3, p1, p2 p3, p2, p1 
    each of these is called a permutation of the three people, taken three at a time
    In permutation notation, this problem would be represented as ch5-1 and the answer could be solved using the following formula:
ch5-2
ch5-3
ch5-4
    Another Permutation Example:
    Say that 5 people entered the previous race (p1 thru p5), but only the first two get prizes. How many different orderings of those first two positions are possible?
ch5-5
ch5-6
ch5-7
p1, p2 p1, p3 p1, p4 p1, p5 
p2, p1 p2, p3 p2, p4 p2, p5 
p3, p1 p3, p2 p3, p4 p3, p5 
p4, p1 p4, p2 p4, p3 p4, p5 
p5, p1 p5, p2 p5, p3 p5, p4 
    Note: When doing permutations, order is important! Think of the word "permutations" as "orderings"

Combinations

    Sometimes, we are not concerned about ordering but only in how many ways certain things can be combined into groups
    For example, let's say we again have five people and we want to form a team of two people. How many different teams of two people can we from our original five?
ch5-8
ch5-9
ch5-10
ch5-11
1&2  1&3  1&4  1&5  2&3 
2&4  2&5  3&4  3&5  4&5 
    Other Examples:
    Say you had seven people and you wanted to form a committee of four people who would work together (on equal footing) to solve some problem. How many different committees are possible?
    What if the above committees were set up such that one person was to be president, the next vice president, the third treasurer, and the fourth secretary. Now how many committees are possible?

The Binomial Distribution

    The binomial distribution occurs in situations in which each of a number of independent trials (termed Bernouli trials) results in one of two mutually exclusive outcomes
    e.g., coin tosses
    The mathematical description of the binomial distribution is the following:
ch5-12
ch5-13
    where:
      p(X) = The probability of X successes
      N = The number of trials
      p = The probability of success on any given trial
      q = The probability of failure on any given trial (i.e., 1-p)
      ch5-14 = The number of combinations of N things taken X at a time
Examples:
    Suppose a batter (in baseball) gets a hit with a probability of 0.3, and gets out the rest of the time. What is the probability of that batter getting 0 hits in 10 at bats?
ch5-15
ch5-16
ch5-17
    What is the probability of flipping a fair coin eight times and getting only two heads?
ch5-18
ch5-19
ch5-20

Plotting Binomial Distributions

    Given that we can obtain probabilities for any value of X associated with some level of p, we can also use this probabilities to create a probability distribution
    For example, if we toss a fair coin ten times, the following table represents the probabilities associated with the indicated outcomes:
Number Heads
Probability
0
.001
1
.010
2
.044
3
.117
4
.205
5
.246
6
.205
7
.117
8
.044
9
.010
10
.001
    If these values were plotted as a distribution, they would look like:
    Heads
    Note 3 things; 1) these probabilities are discreet, 2) plotting probabilities, not frequency, 3) mathematically derived data, not empirically acquired
      Mean of a binomial = ch5-21
      Variance of a binomial = ch5-22
      Standard Deviation = ch5-23
      Testing Hypotheses
    Given all this, we can now ask questions like the following ...
    Let's say a person is performing a true/false exam. How many questions out of 10 does a person have to get correct for us to reject the notion that they are just guessing?
Number Correct
Probability
0
.001
1
.010
2
.044
3
.117
4
.205
5
.246
6
.205
7
.117
8
.044
9
.010
10
.001

Elementary Statistics chapter 5


Sampling Distributions and Hypothesis Testing



Statistics is arguing

    Typically, we are arguing either 1) that some value (or mean) is different from some other mean, or 2) that there is a relation between the values of one variable, and the values of another.
    Thus, following Steve's in-class example, we typically first produce some null hypothesis (i.e., no difference or relation) and then attempt to show how improbably something is given the null hypothesis.

Sampling Distributions

    Just as we can plot distributions of observations, we can also plot distributions of statistics (e.g., means)
    These distributions of sample statistics are called sampling distributions
    For example, if we consider the 48 students in my class who estimated my age as a population, their guesses have a ch4-1 of 30.77 and an ch4-2 of 4.43 ( ch4-3 = 19.58)
    If we repeatedly sampled groups of 6 people, found the ch4-4 of their estimates, and then plotted the ch4-5 s, the distribution might look like

Hypothesis Testing

    What I have previously called "arguing" is more appropriately called hypothesis testing
    Hypothesis testing normally consists of the following steps:
    • some research hypothesis is proposed (or alternate hypothesis) - H1
    • the null hypothesis is also proposed - H0
    • the relevant sampling distribution is obtained under the assumption that H0 is correct
    • I obtain a sample representative of H1 and calculate the relevant statistic (or observation)
    • Given the sampling distribution, I calculate the probability of observing the statistic (or observation) noted in step 4, by chance
    • On the basis of this probability, I make a decision

The beginnings of an example

    One of the students in our class guessed my age to be 55. I think that said student was fooling around. That is, I think that guess represents something different that do the rest of the guesses
    H0 - the guess is not really different
    H1 - the guess is different
    • obtain a sampling distribution of 0
    • calculate the probability of guessing 55, given this distribution
    • Use that probability to decide whether this difference is just chance, or something more

A Touch of Philosophy

    Some students new to this idea of hypothesis testing find this whole business of creating a null hypothesis and then shooting it down as a tad on the weird side, why do it that way?
    This dates back to a philosopher guy named Karl Popper who claimed that it is very difficult to prove something to be true, but no so difficult to prove it to be untrue
    So, it is easier to prove H0 to be wrong, than to prove HA to be right
    In fact, we never really prove H1 to be right. That is just something we imply (similarly H0)

Using the Normal Distribution to Test Hypotheses

    The "Steve's Age" example begun earlier is an example of a situation where we want to compare one observation to a distribution of observations
    This represents the simplest hypothesis-testing situation because the sampling distribution is simply the distribution of the individual observations
    Thus, in this case we can use the stuff we learned about z-scores to test hypotheses that some individual observation is either abnormally high (or abnormally low)
    That is, we use our mean and standard deviation to calculate the a z-score for the critical value, then go to the tables to find the probability of observing a value as high or higher than (or as low or lower than) the one we wish to test
Finishing the example
ch4-6 = 30.77     Critical = 55 
ch4-7 = 4.43     (ch4-8 = 19.58) 
ch4-9 ch4-10 ch4-11
    From the z-table, the area of the portion of the curve above a z of 3.21 (i.e., the smaller portion) is approximately .0006
    Thus, the probability of observing a score as high or higher than 55 is .0006.

Making decisions given probabilities

    It is important to realize that all our test really tells us is the probability of some event given some null hypothesis
    It does not tell us whether that probability is sufficiently small to reject H0, that decision is left to the experimenter
    In our example, the probability is so low, that the decision is relatively easy. There is only a .06% chance that the observation of 55 fits with the other observations in the sample. Thus, we can reject H0 without much worry
    But what if the probability was 10% or 5%? What probability is small enough to reject H0?
    It turns out there are two answers to that:
    • the real answer
    • the "conventional" answer

The Real Answer -

or Type I and Type II errors

    First some terminology...
    The probability level we pick as our cut-off for rejecting H0 is referred to as our rejection level or our significance level
    Any level below our rejection or significance level is called our rejection region
    OK, so the problem is choosing an appropriate rejection level
    In doing so, we should consider the four possible situations that could occur when we're hypothesis testing
      H0 true H0 false
      Reject H0 Type I error Correct
      Fail to Correct Type II error
      Reject H0Type I Error
Type I Error
    Type I error is the probability of rejecting the null hypothesis when it is really true
    e.g., saying that the person who guessed I was 55 was just screwing around when, in fact, it was an honest guess just like the others
    We can specify exactly what the probability of making that error was, in our example it was .06%
    Usually we specify some "acceptable" level of error before running the study
    • then call something significant if it is below this level
    This acceptable level of error is typically denoted as ch4-12
    Before setting some level of ch4-13 it is important to realize that levels of ch4-14 are also linked to type II errors
Type II Error
    Type II error is the probability of failing to reject a null hypothesis that is really false
    e.g., judging OJ as not guilty when he is actually guilty
    The probability of making a type II error is denoted as ch4-15
    Unfortunately, it is impossible to precisely calculate ch4-16 because we do not know the shape of the sampling distribution under H1
    It is possible to "approximately" measure ch4-17 , and we will talk a bit about that in Chapter 8
    For now, it is critical to know that there is a trade-off between ch4-18 and ch4-19 , as one goes down, the other goes up
    Thus, it is important to consider the situation prior to setting a significance level
The "Conventional" Answer
    While issues of type I versus type II error are critical in certain situations, psychology experiments are not typically among them (although they sometimes are)
    As a result, psychology has adopted the standard of accepting ch4-20 =.05 as a conventional level of significance
    It is important to note, however, that there is nothing magical about this value (although you wouldn't know it by looking at published articles)

One versus Two Tailed Tests

    Often, we are interested in determining if some critical difference (or relation) exists and we are not so concerned about the direction of the effect
    That situation is termed two-tailed, meaning we are interested in extreme scores at either tail of the distribution
    Note, that when performing a two-tailed test we must only consider something significant if it falls in the bottom 2.5% or the top 2.5% of the distribution (to keep ch4-21 at 5%)
    If we were interested in only a high or low extreme, then we are doing a one-tailed or directional test and look only to see if the difference is in the specific critical region encompassing all 5% in the appropriate tail
    Two-tailed tests are more common usually because either outcome would be interesting, even if only one was expected

Other Sampling Distributions

    The basics of hypothesis testing described in this chapter do not change
    All that changes across chapters is the specific sampling distribution (and its associated table of values)
    The critical issue will be to realize which sampling distribution is the one to use in which situation

Elementary Statistics chapter 4


The Normal Distribution



In Chapter 2, we spent a lot of time plotting distributions and calculating numbers to represent the distributions
This raises the obvious question:

Why Bother?

    Answer: because once we know (or assume) the shape of the distribution and have calculated the relevant statistics, we are then able to make certain inferences about values of the variable
    In the current chapter, this will be show how this works using the Normal Distribution

Why the Normal Distribution?

    As shown by Galton (19th century guy), just about anything you measure turns out to be normally distributed, at least approximately so
    That is, usually most of the observations cluster around the mean, with progressively fewer observations out towards the extremes
    Thus, if we don't know how some variable is distributed, our best guess is normality

A Cautionary Note

    Although most variables are normally distributed, it is not the case that all variables are normally distributed
    As examples, consider the following:
      Values of a dice roll
      Flipping a coin
    We will encounter some of these critters (i.e. distributions) later in the course The Relation Between Histograms and Line Graphs
    Any Histogram:
    Can be represented as a line graph:
    Example: Pop Quiz #1

Why line graphs?

    Line graphs make it easier to talk of the "area under the curve" between two points where:
      area = proportion (or percent) = probability
    That is, we could ask what proportion of our class scored between 7 & 9 on the quiz
    If we assume that the total area under the curve equals one, then the area between 7 & 9 equals the proportion of our class that scored between 7 & 9 and also indicates our best guess concerning the probability that some new data point would fall between 7 & 9
    The problem is that in order to calculate the area under a curve, you must either:
    • use calculus (find the integral), or
    • use a table that specifies the areas associated with given values of your variable
    The good news is that a table does exist, thereby allowing you to avoid calculus. The bad news is that in order to use it you must:
  • assume that your variable is normally distributed
  • use your mean and standard deviation to convert your data into z-scores such that the new distribution has a mean of 0 and a standard deviation of 1 - standard normal distribution or N(0,1)

The Standard Normal Distribution

    Mean to Larger Smaller
    z         z        Portion      Portion 
    .98       .3365    .8365        .1635 
    .99       .3389    .8389        .1611 
    1.00      .3413    .8413        .1587 
    1.01      .3438    .8438        .1562 

Converting data into Z-scores

    It would be too much work to provide a table of area values for every possible mean and standard deviation
    Instead, a table was created for the standard normal distribution, and the dataset of interest is converted to a standard normal before using the table
    How do we get our mean equal to zero? Simple, subtract the mean from each data point
    What about the standard deviation? Well, if we divide all values by a constant, we divide the standard deviation by a constant. Thus, to make the standard deviation 1, we just divide each new value by the standard deviation
    In computational form then,
    where z is the z-score for the value of X we enter into the above equation
    Once we have calculated a z-score, we can then look at the z table in Appendix Z to find the area we are interested in relevant to that value
  • as we'll see, the z table actually provides a number of areas relevant to any specific z-score
  • What percent of students scored better than 9.2 out of 10 on the quiz, given that the mean was 7.6 and the standard deviation was 1.6
    Because we are interested in the area greater than z=1, we look at the "smaller portion" part of the z table and find the value .1587
    Thus, 15.87% of the students scored better than 9.2 on the quiz
    How percent of students scored between 7 & 9 on the quiz?

https://www.youtube.com/TarunGehlot