Week Ten:  Inferential Statistics - The Statistics of Sampling

Class on Friday, April 4, will be in the SAKAI chat room.  We will work statistical problems similar to those on Quiz Ten.

Quiz Ten, the Sampling Statistics Quiz, will be due in SAKAI at 5 p.m. on Monday, April 7.  It will include some statistical calculations and some multiple choice items from Babbie,  Chapter Seven.  You may take this quiz as often as you like and the highest score will count.  If you wait until the last day to try this quiz, it is at your own risk for any computer problems.

Reading Assignments:
  1. Babbie, Chapter 7, pages 212-221 - Probability, Sampling Distributions and Estimates of Sampling Error.  An overview powerpoint on sampling is in Sakai/Resources/Week Nine
  2. Guide to Computing Margins of Error for Percentages and Means
  3. Babbie,  Chapter 9 on Survey Research.  This is not on Quiz Ten, but we will discuss it in class on April 2.  There is a powerpoint covering this chapter called SurveyResearch.ppt. in Sakai/Resources/Week Ten.   It will be on Quiz Eleven.
Notes:

How Polls Are Conducted:  Gallup
NY Times, How the Poll Was Conducted

The "margin of error" is a measure of how much our sample statistic is likely to vary from the population paramater.  This based on probability theory.  The larger your sample, the more certain your results, and the smaller the margin of error. The size of the population doesn't matter unless you are dealing with very small populations.   By convention, the "margin of error" is two standard errors (twice the standard deviation of the sampling distribution).  This is explained in Babbie. 

 The size of the sample in practice depends on how many strats you want to sample.  The margin of error for each stratum depends on the sample of individuals from that stratum..

To compute margins of error, follow the instructions in the  Guide to Computing Margins of Error on the WEB site.

Some examples,
1. In a college class with 85 students, 32 of whom are black, the mean on the midterm was 75. The standard deviation was 6.21. What is the margin of error for this mean? This is a mean score question, so I use the formula
M = 2 * sd / SQRT(N)  ;   M = 2 * 6.21/SQRT(85).  = 1.35 points, NOT %%.

  A n example using formula one.    "A survey was conducted with 1200 respondents in a state with 35,000,000, residents of whom 25% are urban and 75% are rural."  What is the margin of error? 
N = 1200.   M = 1/sqrt(n).    M = 1/34.64  M =  .02886.  Give the result as a percent with one number after the decimal point.    2.9%

A survey was conducted with 1200 respondents in a state with 35,000,000, residents of whom 25% are urban and 75% are rural.  Of these respondents, 55% preferred Hillary.  We need formula two because we have a percentage result for the sample, a sample statistic.  M = 2 * sqrt((p*(1-p))/N).  To solve this formula we need,  N=1200.  We need p, which is 55%, but we make it .55.  We work with proportions.     p=.55   q or 1-p=.45        p*(1-p) = .45 *.55   = .2475.  Divide it by N  and get .00020625  take the square root and get 0.01436    = .0287 as a proportion, which rounds of to 2.9%

A survey was conducted with 1200 respondents in a state with 35,000,000, residents of whom 25% are urban and 75% are rural.  Of these respondents, 85% preferred Hillary.  We need formula two because we have a percentage result for the sample, a sample statistic.  = .0206 or  2.1% 

A sample has 400 respondents, what is the margin of error?  N=sqrt(400) = .05 or 5%

My county has 450,000 residents, how big of a sample do I need to survey it?  Not a good question, I have to ask, how big a margin of error can you accept.  I need 5% margin of error.  Use N =  1/(m*m).  However, remember that M is the margin of error expressed as a proportion.  M = .05.   N = 400.

I need to study the South Jersey area, consisten of five counties (with various populations).  I need a margin of error of 5% for each of the counties.  How big a sample do I need?  We need 4009 for a 5% margin of error, but we need it for each of the counties, so the sample must be 2000.  This has to be drawn as 400 from each county. 


9. A survey of the tri-county area has 356 respondents, of whom 82 are black and 55 hispanic. What is the margin of error for statistics about the opinion of the hispanic residents?   This is a percentage question, but I am not given a statistic, a percentage result.  Use Formula one, M = 1/SQRT(N).  What is N???
M - 1/sqrt(55).  = .1348.  This formula gives us a proportion, not a percent, or 13.5%.  Suppose I said 61% of the hispanic respondents are voting for McGreevy.  That is that statistic for the sample.  The population paramater might vary by as much as 13.5%.  We could say that our "confidence interval" is between  61 - 13.5   and 61 + 13.5.    or between 47.5% and 74.5%.  This means the election among Hispanics is "too close to call."
 Suppose we had 400 Hispanics, the margin of error would be 5%.  For a sample of 1000, M =  1/SQRT(1000)  or 1/31.  or 3.2%
   Suppose we wanted a 5% margin of error, how large a sample do we need?  400.  Suppose we want a 5% margin of error for each of five electoral districts, how large a sample do we need?  5 * 400, or 2000.
 

Representative or random sample   Chosen at random from either the total population (simple random sample) or from subgroups of the population.(stratified random sample)

In choosing a sample size, all that matters is the amount of error you can tolerate.  The population size is not relevant.

A researcher wants to obtain a margin of error of no more than 2% in a survey of a county with a
population of 3,000,000. How large a sample is needed?    N = 1/(M*M).  M is the margin of error, expressed as a proportion.   M = .02 because it says 2%.  N = 1/(.02*.02)    N= 1/.022    N =2500.  Simple random sample.

Suppose we were going to do this for five counties, and we wanted a 2% margin of error for each?  How large a sample would we need?  A 2% margin of error requires 2500, but we need it for eachcounty so we need 5 * 2500 or  12500.  Stratified random sample, consisting of a simple random sample of each of the subgroups.

3. 59% of the respondents in a survey of a state with seven million Republican voters voted for Bush,
41% for Gore. There were 625 respondents. What is the margin of error for the percent voting for Bush?
 M =  2 * SQRT((p * (1-p))/N).  What is p, the proportion of respondents giving a certain response.  The sample statistic.  In this case, what is p?  .59  What is N?   M =  2 * SQRT((.59 * (1-.59))/625). = .03935 as a proportion, or 3.94%  expressed as a percentage.

What does that mean?  We can be "95% sure" that the population paramater (the true value for the population) is witin 3.94% of the sample statistic.  One way to express this is as a "confidence interval".
The lower bound of the confidence interval is the sample statistic minus the margin of error, in this 59-3.94 = 55.06%.
The upper bound of the confidence interval is the sample statistic plus the margin of error, in this case 59+3.94 = 62.94%.
We are confident that the true figure, the "population paramater" is between 55.06% and 62.94%.

If 47% vote for Bush, 49% for Gore and 4% for Nader.  A sample of 1200.  What is the margin of error for the Nader vote?
p = .04,  What is 1-p?  .96        M =  2 * SQRT((.04 * (1-.04))/1200) = .0113 or 1.13%
What is the margin of error for the Gore vote?  M =  2 * SQRT((.49 * (1-.49))/1200) = .0289 or 2.89%.