Class Notes for Methods of Research, Spring 2007
Note:  This is a "web-enhanced" class, not an online class.  The notes are not intended as a substitute for attending class, nor as an alternative to taking your own notes.  They will include examples shown on the screen during class, and they will be helpful in reviewing for exams.  The most recent notes will appear at the top of the page, just scroll down to find earlier notes. 

February 20

  Experimental Designs.  See the graphs in the book or on Trochim's WEB site:  Types of Designs

Essential characteristics:

  1. Two or more groups are matched, usually by random assignment, sometimes by a kind of stratified random selection, e.g., an equal number of men and women or black sand whites in each group.  But the key is random assignment so that the groups can be assumed to be the same on all variables.  "Quasi-experiments" are when we use groups that are pretty much the same but we didn't assign people at random
  2. The Independent Variable is "manipulated," i.e., it is applied to one group and not to the other
  3. Change in the Dependent Variable is measured
Experiments can be done:
  1. In laboratory settings with volunteers, e.g., student volunteers
    1. The Milgram Experiment on Obedience to Authority
  2. In institutional settings such as prisons, hospitals, rehabilitation centers, etc., where people are assigned to treatment groups
    1. New drugs and medical treatments generally must be shown to work in experiments before they are approved for use.  Often,Stanley Milgram's Obedience Experiment from http://www.new-life.net/milgram.htm treatment is compared to a placebo.  These experiments are usually "double-blind," to control for the psychological effects of knowing one is getting treatment.  This is a way of controlling subject bias and experimenter bias/
    2. In criminal justice, one might do an experiment comparing a "half way house" to drug treatment program to a prison term for offenders.  To do this, you would have to get the judge to assign offenders to different programs at random.  Ethical issues are raised here and there are likely to be objections
  3. Occasionally in natural settings, for example
    1. welfare reform experiment, assign some recipients to the old program, some to the new.  This didn't work very well, there were errors in the group assignments and the women often forgot which group they were in anyway
    2. vaccination experiments
    3. guaranteed annual income experiments
    4. Kansas City Patrol Experiment
Although logically experiments are the most rigorous way to test causal hypotheses, there are practical problems: Another example we can look at is an experimental study of internet downloads.  This was published in Science magazine because it demonstrates a sociological principle with rigorous experimental data.    Several documents from this study are in WEBCT, the most accessible summary is in a file called "Experimental Macrosociology". 


Feb 18     The Art and Science of Cause and Effect. (powerpoint)

Probabilistic cause, not an absolute cause, not a cause that is sufficient or necessary.   "Cigarette smoking causes
cancer."  WHat we mean is, smoking cigarettes increases the likelihood of getting cancer.  How much?

There are multiple causes for everything.  What we want to find out is how much each thing contributes.  There are also
causal linkages, or indirect causes.  A causes B and then B causes C.

Diagraming causal models.  We put the dependent variable at the right.  We draw arrows going into it for each causal
variable that effects it directly.  Then we can have arrows that go into the arrows, steps into the causal analysis, as in
this sample file:
http://crab.rutgers.edu/~goertzel/homomale.htm

Criteria of Causation - how do we know that something is a cause of something else.

1.  Time Order.  The cause comes before the effect.  Sometimes we sort out the time order theoretically, we assume that
education preceeds employment.  Or we can use a research design that involves gathering data at two points in time.  If
you don't have measurements at two points in time, this is shaky.

2.  Correlation.  The two variables vary together.  When one is high, the other is high OR when one is low the other is
high.  This gets at the degree of causation, the higher the correlation the strong the causal relationship.

3.  non-spuriousness,  we want to know that the correlation is not cause by something else.  We can test this with an
experimental design, if feasible.  Or we can use statistical controls, which are not quite as convincing but its all you do
in many cases.

We test for non-spuriousness by introducing controls.

Causal Models:  representations of the complex causal relationships between variables.  Variables have different causal roles, but this is determined by our causal our causal model, it is not inherent in the variables.   One person's cause can be another's effect. 

The Elaboration Model is one way of thinking about this kind of analysis, using cross-tabulation as a technique.

We first look at the relationship between the Independent and Dependent Variables, then we see what happens when we introduct Antecedent or Intervening Control Variables.

Dependent Variable - that is what we want to explain.  Often these are opinions or behaviors

Independent Variable - what we use to explain it.  Often there are traits or physical characteristics, e.g., sex or race,
almost always independent.

If you study the relationship of race on voting, for example, race would be independent and voting dependent.

Antecedent variables, things come before the independent variable.  This helps us to deal with a causal chain.
Antecedent variable cause IV which causes the DV.
If the antecedent variable "explains" the relationship, we have an "explanation", we say it is "spurious".

Intervening Variables, this that are intervening, e.g.   Race determines ideology which determines the vote.
This is an "interpretation" it tells WHY the causal relationship exists.
Path Models:  a way of graphically expressing complex causal models.

                     Marital Status and Frequency of Sex by Age

                              Under 50           50 and Older          Total

                            Divorced  Never       Divorced    Never    Divorced  Never
                            Widowed   Married     Widowed     Married  widowed   Married

Less than Monthly            29.7%     30.8%       77.9%      70.2%    54.7%      34.0%   

Monthly or More              70.3%     69.2%       22.1%      29.8%    45.3%      66.0%

TOTAL                        100%      100%        100%       100%     100%       100%

                              p=.75                p=.24                 p=.000

There is a statistically significant difference between the divorced or widowed respondents and the never married respondents in their frequency of sex.  However, when we control for age, this relationship is no longer significant.  Age is an antecedent variable, so the relationship between marital status and frequency of sex is spurious. 

Spurious means that it is not causal, the correlation is due to a third variable which is antecedent.

We compare the strength of the correlation in the total sample table with the correlations or percentage differences in the partial sample tables.  

If the correlations or percentage differences is about the same, we would say that the relationship was confirmed or supported.

If the correlation disappears, we would  have to ask whether the control variable was Antecedent or Intervening.  If it is antecedent, the relationship is spurious.  If it is intervening, we have a causal interpretation.

If it disappears in one case but not in the other, we would say that we have specified the relationship.

SEX AND 1980 PRESIDENTIAL VOTE BY INCOME GROUPS

Presidential Vote                                                                                  $17,000 to                   $25,000

(major parties only)       Total Sample                Under $17,000             $24,999                       and over

Female Male                Female Male                Female Male                Female Male

Carter                          47%     40%                 57%     43%                 40%     36%                 36%     34%

Reagan                         53%     61%                 43%     57%                 60%     64%                 64%     65%

Total                            100%   101%               100%   100%               100%   100%               100%   99%

N of cases                    (482)    (395)                (108)    (86)                  (78)      (92)                  (102)    (140)

Χ2= 4.8                        X2 = 3.96                     X2 = .08                       X2 = .01

p = .03                         p = .05                         p = .77                         p = .89

There is a statistically significant relationship between sex and presidential vote for the sample as a whole. 61% of the men voted for Reagan as compared to 53% of the women. When controlling for family income group, however, we found that the relationship between sex and the vote was significant only for the under $17,000 family income group. Among this group, 57% of the men voted for Reagan, as compared to 43% of the women. Among the higher income groups, there was no significant difference with 60% or more of both sexes voting for Reagan.

In this example, sex is the Independent variable, presidential vote Is the dependent variable, and income Is the test variable. Income Is an intervening variable between sex and vote. When compared with the original relationship, the partial relationships are split. In the terminology of the Elaboration Paradigm, this is an example of specification.



Feb 16 - We will go through the "Selecting Cases" exercise in the workbook.  Quiz Five in WEBCT covers both exercises 3 and 4.

Review of Levels of Measurement:
     If there are two and only two choices, we have a dichotomy.  It can also be called a dummy variable if one of the choices is the absence of a trait.  This,  gender (male or female) is a dichotomy.  Are you a Roman Catholic  is a dummy variable if it is coded 0 = no;  1 = yes.  We can use regression and correlation statistics with dummy or dichotomous variables.
    If there are more than two categories, but the categories are in no logical order, it is a nominal variable.  We cannot compute means, correlations or regressions.  We can do percentages and chi-square.  E.g., Religious Affiliation (1 = Protestant  2 = Catholic  3 = LDS  4 =  Jewish 5 =  Buddhist  6 = other)
   If there are more than two categories and they are in order, we it is an ordinal variable.   E.g.,  short, medium, tall.  There are statistics intended for ordinal variables but we will not make much use of them except for the median and the rank or percentile.   We can use correlation and regression with ordinal variables if they are reasonable approximations to interval, although statistical purists would disagree.
  If we know the distance between cases on a scale, it is an interval variable, e.g., height in inches, temperature in degrees, income in dollars.  These are continuous variables, while dichotomous, nominal and ordinal variables are categorical.  Most interval variables have a zero point which is a logical absence of the trait, in which case they are ratio variables and one can make statements about ratios, e.g., Jane is three times as rich as Joe.  We cannot do this with temperatures in fahrenheit or centigrade because the zero point is not an absence of temperature.  The absence has to be only logical, not empirically present, e.g, no one has zero height but it logically zero inches is an absence of height.
   Many of the measures we are a bit ambiguous, e.g., test scores.  This may just mean they are poor measures, or the trait may be inherently difficult to measure.

February 14 -  We are scheduled to have class despite the weather.  We will work through pages 67 to 79 in the workbook;  those of you who miss class can go through this on your own.  It raises some important practical points:

February 12 - Sampling

SAMPLING is used when we are interested in studying a population that is too large for us to study each individual.  The first step is to define the population we wish to make statements about, e.g. adults in New Jersey, probable voters, people convicted of felonies, graduates of our department.  We might want to study the entire population of the USA.  If we try to collect data from everyone, this is a census.  The Census Bureau does this once every decade, and misses a lot of people.  Everyone else does sampling, we select a cross-section to represent the population.  If you try to study the whole population, you often fail to do a good job.   Gallup:  How Polls are Conducted.

Size of the sample.  How big of a sample do I need? Size of the sample does not depend on the size of the population.

How do we select the sample size?  Decide on the margin of error you will tolerate?  Margin of error is equal to one divided by the square root of the sample size.  Sample of 400, the square root is 20.  1/20 = .05 or 5%.  If you interviewed 400, 300 were white, 50 were black and 50 were others.  For the blacks, with a sample of 50, we would have a 14% margin of error.  For the whites, with a sample of 300, we would have a 5.8% margin or error.

Take 300, the square root of 300 is = 17.32     1 /17.32 = .0577  * 100 = 5.8%

Sample statistic - what the sample says
population parameter - what the real figure is
Even if the sampling is done well, the response rate is less than 100%.
Weighting is done to make the sample more like the population.

This formula is for  proportions or percents (if you move the decimal over two)
  m = 1/sqrt(n)  
  Solve for N:      m2 =  1/n      n * m2 = 1     n = 1/ m2    If we need a margin of error of 3%, or .03.   n = 1/ .032

  If you have a sample size and need to know the margin of error, use    m = 1/sqrt(n)

   If you are given a margin of error and asked how large a sample you need, use  n = 1/ m2

          In these formulas n = the size of the sample (not the population).    m = the margin of error expressed as a proportion, not as a percent.  Thus, if the questions says "we need a margin of error of 5%, then m = .05.   

If our sample is stratified, this means we really have several sub-samples and we need the same size sample for each of them, regardless of the size.  For example, if we want sample white, black and Hispanic respondents and make statements about each group, we need the same size sample of both regardless of their size in the population.  Thus, if we need a margin of error of 5% for each of the three groups, then the answer is  3 * ( n = 1/ m2 ).

If you need a margin of error for a mean score (an average such as income in dollars or scores on a test), you need to know the standard deviation (sd) and the sample size (N). Ignore any other information you are given, including the size of the population.
Use the following formula: M = 2 * sd / SQRT(N)

Suppose I sample 457 Camden residents and the mean income is $27,541  and the standard deviation is $3452

M = (2 * 3452 )/sqrt(457).  This result will be in dollars, not percentages. 

M =      6904        /21.378  =  $322.95.  

Confidence Interval:   I am 95% sure that the population figure is between:  $27,218.05 and $27,863.95

Terms:

Margin of Error:  How much a sample statistic is likely to vary from the population parameter.  We say that we are 95% sure that the sample is not off by more than the margin of error.  How this is presented in NY Times.  "19 out of 20" is another way of saying 95%. 

 Confidence level:  we always use a 95% confidence level.

Confidence interval:  the range within which we think a statistic would fall, e.g., if the margin of error is 3% and the sample statistic is 67%, the confidence interval is from 64% to 70%.  We are 95% sure that the true figure is within this limit.

All of this assumes a simple random sample, which means that each person (or other sampling unit) in the population has the same chance of appearing in the sample.  In practice, however, we often do not use simple random samples, for several reasons:
  1. we may not have a list of the population.  If we do not, we first divide the sample into sub-groups of some kind (census tracts, blocks, classrooms, organizations, depending on the nature of the study).  We then sample the subgroups and list the populations in them .  This is called cluster sampling.  It is often used for interviewing people at home instead of over the telephone.
  2. We may be interested in differences between sub-groups of the sample and need to make sure we have enough of them.  In this case we select random samples of each of the relevant sub-groups, and weight the results appropriately.   This is called stratified sampling, e.g., the NY Times  explains that its surveys oversample black respondents.  Most surveys today are on the telephone.
  3. Sometimes we just go down a list, which is called systematic sampling.  This gives the same results as simple random sampling, unless there is some systematic ordering to the list that causes a distortion
  4. Sometimes we use non-random or "quota" sampling.  This is done for convenience, or because we just want to know what the range of differences is without putting numbers on them.  Internet surveys tend to be of this sort, although there are some randomly selected Internet panels.
Review the terms on pages 84-85 of the textbook, I will not retype them all in these notes.


Some margin of error problems:

Suppose I did a sample of 400,selected from the 7,357,218 people living in New Jersey.  What is the margin of error?

M  = 1 /SQRT(N).   N is the sample size, not the population size.

N = 400.   Sqrt of N =  20.   1/20  =  .05  or 5%.  If I find that 42% agree, that is my population "statistic."    The population paramater is the true value, and I would say that I am 95% sure (my confidence level) that the paramater is between 42% - 5% and 42% + 5%.   The true value should be between 37% and 47%. 

Suppose I go to 1000, what is my margin of error? 
M = 1/SQRT(1000).  =   1/ 31.62  =  .0316 or  3.2%.  The confidence interval is between 38.8% and 45.2%. 

This applies to statements made about the whole sample.  42% of the respondents said yes, the margin of error is 3.2%. 

For statements about a subgroup, the N is the number of people in that sub group (genders, races, sports fans). 

We have a sample of 1200, of whom 800 are white, 300 are black and 100 are Hispanic.  57% of the Hispanics said yes to the item.  What is the margin of error for this percent?  Since it says "of the Hispanics" our N is the number of Hispanics, or 100.  M = 1/SQRT(100)  = .10 or 10%.
For the black respondents, our margin of error is M=1/SQRT(300).  = 1 / 17.32  =  .0577 =  5.8%

For the white respondents  M =  1/SQRT(800)  =  .03535 or 3.5%. 

How large a sample do I need to get a 5% margin of error, with a population of 485,321?  N = 1/M2     M must be expressed as a proportion, not a percent.  M = .05.    .05 * .05   = .0025.
Sample size = 1/.0025  =  400

Suppose I wish to study the black, white and Hispanic populati0n and I need a margn of error of 5% for each group.  How large a sample do I need?

The other thing we need to deal with is margins of error for mean scores.  Thein a survey of 300 county residents, the mean income  is $45,321.  We need to have the standard deviation.  The Standard Deviation is a measure of variation.  The standard deviation is $3521.  M = 2 * sd/sqrt(n).  N = 300.   2 * 3521/17.31  = $203.29.




February 9 -  We went over the results of the exercise done in class on Wednesday and reviewed how to compute percents from the observed frequencies.  We reviewed the examples for Exercise 2b in the workbook.  We constructed the multivariate crosstabulation below:

Political Party Affiliation and Opinion on Government Paying for Medical Care

By Political Ideology 

                Liberals                 Moderates                  Conservatives

                 N = 459                 N = 654                     N = 610

           Dem   Ind   Rep             Dem  Ind  Rep               Dem Ind Rep 

Gov Pay    69%    60%   46%            51%   49%  39%              54%  38% 33% 

Middle     23%    28%   37%            35%   37%  40%              31%  35% 33% 

Help Self   8%    12%   17%            14%   14%  22%              15%  27% 33% 

Total      100%    100%  100%         100%   100%  100%           100%  100% 100% 

Chisq = 11   p = .03                   chisq = 7.8 p = .10         Chisq = 21.4 p = .000

The table shows that the relationship between political party affiliatioin and support for government funded medical care is significant for for liberals and conservatives.  The difference is smaller and not statistically significant for moderates.l



February 7

Scaling or index construction is when we use a number of items, such as questionnaire items, to measure a more general concept.  This can be an attitude, e.g, "conservatism" or "authoritarianism."  Or it can be a category of behavior, e.g, "violent crime" which is measured by averaging together the frequency of a number of crimes.  Magazines like to do this to come up with ranking, e.g, of the best community to live in or the worst one, the best colleges, etc.  Usually this is done just by adding up a number of characteristics (in which case your text would call it an "index", although many people still use the term scale).  This gives is a rank order of sorts, and some idea of how high or low cases are, but we really don't know how big the differences are or what the scores really mean.  What does it mean that Camden is the "most dangerous city"?  Is it really that much more dangerous than the next dangerous?  Or than an average city? 

Most of the measures we call "scales" are really what your textbook calls indexes composed of or what Trochim calls "response format" items.  Most tests in college classes are examples.  We just add up the point to measure the general variable "knowledge of research methods as covered in the first part of the course."  Another approach would be to rank the items from easy to hard and see which you could do.  This is tricky, because some people can do the hard ones and not the easy ones.  When we make an index or scale, we get measures that can be treated as interval, even if they are not strictly interval.  Scaling methods can be more precise, but these are not used as often in sociology or CJ because they are more difficult and the added information is not always needed.

Attitude caling methods include Thurstone and Guttman Scaling Likert or summative scaling is actually a method of "index" construction as defined in our book.  Thurstone scaling tries to get true interval measurement.  Guttman scaling gets ordinal measurement.  Likert or summative scaling or index construction approximates interval scaling in some ways, but it is hard to know how well.  The key thing is knowing how well the items intercorrelate.  If they intercorrelate well, it works well.  It is usually quick and effective, so it is widely used despite the lack of a rigorous logic.

  For anb example of true scaling in criminal justice, we could scale the seriousness of crimes.  There are various methods of measuring this. - paired comparisons means asking a sample of people to rate crimes based on their perceived seriousness.
New Zealand Study on Attitudes to Crime.    Crime Victims United (Oregon)

A very popular test is the Myers-Briggs Type Indicator, based on Jungian personality theory.  You can take several free versions of this and related tests online (Wikipedia article).  Several are available from similarminds.  A problem with this is that it sorts people into categories although the measure is really continuous.  This makes it understandable and it is very widely used.  We did an experiment in class with a short questionnaire and a choice between four organizational choices.  The tabulation of the results, and a chi square test, are here.  The relationship between the questionnaire results and the organizational choices was not statistically significant.


February 5 -

Quality of Measurement   -   Reliability and Validity. 
 
Reliability -  you get the same thing over and over.  Consistency.

         inter-rater - two different raters get the same answer.
         test-retest, if you take it twice the answers are the same.
           internal consistency - are the items on a test consistent?  This can be calculated by looking at the inter-item correlations.  Chronbach's alpha is a statistic that measure inter-item reliability.  Example, correlate the ABORT variables in the GSS data file.  We see that all the correlations are positive and significant.  We can then make an index of them by adding up scores on the six variables.

Correlation Coefficients
N: 1603     Missing: 1229
Cronbach's alpha: 0.874
LISTWISE deletion (1-tailed test)     Significance Levels: ** =.01, * =.05
    ABORT DEF    ABORT WANT    ABORT POOR    ABORT RAPE    ABORT SING    ABORT HLTH
ABORT DEF     1.000       0.447 **    0.439 **    0.618 **    0.443 **    0.641 **
ABORT WANT    0.447 **    1.000       0.816 **    0.435 **    0.840 **    0.332 **
ABORT POOR    0.439 **    0.816 **    1.000       0.442 **    0.827 **    0.337 **
ABORT RAPE    0.618 **    0.435 **    0.442 **    1.000       0.437 **    0.636 **
ABORT SING    0.443 **    0.840 **    0.827 **    0.437 **    1.000       0.330 **
ABORT HLTH    0.641 **    0.332 **    0.337 **    0.636 **    0.330 **    1.000  


Correlation Coefficients
N: 1603     Missing: 1229
Cronbach's alpha: 0.796
LISTWISE deletion (1-tailed test)     Significance Levels: ** =.01, * =.05
            ABORT INDX
ABORT DEF     0.734 **
ABORT WANT    0.858 **
ABORT POOR    0.855 **
ABORT RAPE    0.728 **
ABORT SING    0.859 **
ABORT HLTH    0.645 **
  


 
    Validity  is it "really" measuring what it is supposed to measure.
          Face Validity - does it look right?   This is often related to fairness, people will object to the use of measures that do not have face validity even though they may have predictive validity, e.g., using the frequency of moving as a criterion for loaning money.
          Predictive or criterion validity - does it predict what we want to predict, some "true" measure.  SAT test predicts college or law or medical school grades.
          Convergent validity -  do several measures give the same result.
             
          Construct validity - does the measure perform as our theory says it should.  We use this when we have no criterion.
  
This is the most difficult, it is used when things are inherently difficult to measure.  Essentially, it asks whether the results are consistent with what we would expect based on theory and past experience.    Camden schools reportBrim school report, see pdf page 14 for tables.  Story on Brim with graph

                          An example:  the measurement of romantic love.

                  An example:  a study of UFO Abduction Status.

                

February 2

Descriptive vs. inferential statistics.  Descriptive summarize a sample or a set of data.  The inferential tell you whether a sample is representative of a population.
With inferential statistics we generally look for p<.05 - the smaller the better.

Frequency distribution.   Take the numbers, and put them into ordered categories. 

The categories should be the same width, with the possible exceptioni of the first and the last.

under 18             2     XX
18-25                 6     XXXXXX
26-34                 0
35-44                 9     XXXXXXXXX
45-54                12    XXXXXXXXXXXX
55-64                  5    XXXXX
65 and older        2    XX

A bar chart or histogram gives you a graphic picture of the frequency distribution.  18, 63, 21 42, 12, 55,

What is the average age?   Mean  -  add them all up and divide by N where N is the number of cases.
                                         Median -  put them in order and taken the case in the middle. 

N = 36    Median is in the 45-54 category. 

January 31 -

Discussion of designing research projects.  How do we decide what to study?  Supplementary reading in Trochim on the structure of research.  You may prefer his "hourglass" metaphor to the circular one on page 14 of our textbook.
  1. Selecting a topic.  Typical motives include:
    1. Finding out something we don't know.  This may include something local, e.g., what do people in Camden think about the new Governor's actions, something that has been unresolved in earlier research, something that hasn't been studied because it is new, etc.  This is what the authors of your book mean when they say "research always starts with wondering."
    2. Another purpose that motivates research is proving to other people that what we "know" is true really is true.  This is "advocacy" research, and it can be very one-sided and lead to sloppy work.  Often this involves causal arguments, proving "why" something happens.  This kind of research may not start with "wondering" but with "arguing."
    3. Answering a question posed to us by our employer or by a client, applied research.  Here someone else really chooses the topic.
  2. Formulating a Research Question.  This means formulating a "statement" which will involve variables.  We have an argument or story in mind at this point.
  3. Defining the Concepts.  Usually not a lot of time goes into this stage of empirical research, but some people do write articles focusing on this, e.g., what does "race" or "poverty" mean, what is the difference between "sex" and "gender"  An example:  Police Crackdowns and Slowdowns
  4. Operationalizing the Concepts.  A lot of effort goes into this.  Quantitative  research means you have to measure your variables and a lot depends on having good measurement.  Sometimes this is difficult, e.g., measuring "intelligence" or "liberalism-conservatism" or "mental illness" or "crime rates (various kinds)".  Often we use standard measures created by the government agencies that collect statistics.
  5. Formulating Hypotheses.  This is usually pretty easy.  There is a distinction between "null hypotheses" and regular hypotheses, which is explained on page 13.  It means testing the hypothesis that your hypothesis is not true.  Thus, you hope to "reject the null hypothesis" rather than "accept the (regular, not-null) hypothesis".  So far as I know, there is no word for the opposite of Null, it might be Substantive?  Type One Error:  accepting that a relationship exists when it doesn't.  Type two:  rejecting a relationship when it really does exist.
  6. Making observations.  This is a major step unless we just get the observations from someone who already did the work.
  7. Analyzing the Data.  This is "number crunching"  running data through the computer.  Of course, one can also analyze qualitative data from interviews or observations, but today even that tends to get quantified (content analysis).
  8. Assessing the results.  This is really part of the analysis.  If the hypothesis doesn't work out, often researchers go back and change the hypotheses and pretend they knew all along what was going to happen
  9. Publishing the findings. This assumes that you are doing "scientific" or "pure" research, much applied research is actually distributed only within the organization that paid for it.  This may be done in person, with a "power point" presentation.  Refereed publications:  you paper is sent to other specialists for review to decide if it should be published.  "Refereed journal."  Press release.   Publication can be online as well as on paper.  You publish the research so you can get credit, see your name in print, get promoted, and also so that you can inform others, and perhaps most important, so that other people can criticize or attempt to replicate it.    Usually people replicate research in the hope of overthrowing it, if you just find the same thing as before, there is less interest.  This cancels out a lot of the bias in social research, since there is usually someone with the opposite bias to correct it.
Measurement means putting observations into categories.  Usually these categories are given numbers, although not always.  Sometimes we do this just to keep track of things, e.g., each American has a social security number, we have a library number, a student number, etc..  But often the numbers give us more information than that, e.g., the NJ driver's license gives height in feet and inches.  It also gives sex and eye color, which are described in words but could be given arbitrary numbers.  But the numbers given for height are not arbitrary. In some sciences, e.g., astronomy, numerical measurement has led to important insights, e.g, to understanding the motion of the planets.  This is because our observations can be summarized with mathematical equations that enable us to predict events.

 When we measure something, we need to be clear exactly what the measure means.  Especially when we use a number, we want to know what it means.  What is a number?  It is not so obvious as one might think.  Bertrand Russell said "A number is the class of all classes similar to a given class."  I.e., all sets of three have something in common, which we could call "threeness."

 Levels of Measurement

The first and most important question is:   is the measure continuous or categorical?   This is important because continuous variables are required for the use of statistics such as the mean, standard deviation, correlation and regression.  With continuous measurement we have precise distances between the items measured, with categorical we just have them sorted into discrete categories.

If a variable is continuous, we can ask whether it is "interval" or "ratio".    Both of these have precise distance measurement between points.  In addition, ratio measures have a logically meaningful zero point.  With ratio measures, we can talk about ratios between variables, e.g., say that $50 is twice as much money as $25.   With interval variables, such as fahrenheit temperatures, we cannot make such statement.

If a variable is categorical, we can ask whether it is "dichotomous,"  "nominal" or "ordinal"

These terms are summarized on page 52 of the book.

Dichotomous variables have only two categories.  These can be two natural categories such as "male' and "female"  or they can be artificial "dummy" variables, such as:   are you a Catholic or not;.  With dichotomies you can use regression and correlation.

Nominal variables have more than two categories, but not in any order or with a measured distance between them.

Ordinal variables have the categories in a logical order  (from "lower" to "higher"). 

In answering questions about measurement, give the highest or best level of measurement that is justified.  Any variable that meets the criteria for a ratio variable also meets the criteria for an interval variable, but the criteria for a ratio variable are more stringent so we would say that it is ratio measurement.  Any ordinal variable also meets the criteria for a nominal variable, but if it meets the criteria for ordinal we say it is ordinal.

It is important to understand that many variables can be measured at different levels.  Thus I could take height and put it into categories such as short, medium, tall in which case I would be using ordinal measurement because they are in order.  I could also measure it in inches or centimeters, which would be ratio measurement.  It is also important to understand that each of the statistics is appropriate for variables measured in some ways but not others.  Doing percentages and cross-tabulations makes sense for nominal or ordinal data. Chisquare is for nominal or ordinal data. Doing correlation or regression or means and standard deviations requires interval or ratio data.  We can make a broad distinction between categorical (nominal or ordinal) or continuous (ratio or interval) data.  The dichotomy is a special case because we can use correlation and regression with dichotomies, but we can also do percentages, cross tabulations and chisquares.


Nominal Measurement.  Categories that could be put in any order.
      Catholic, Protestant, Jewish, Moslem, LDS, Buddhist, Episcopalian, Baptist
                       variable one, category of religion, variable two denomination.
            Mental illnesses (DSMIV) e.g.,  adjustment disorder, borderline personality disorder, paranoid schizophrenic
               Crimes:   burglary, assault, murder.  What do these terms mean?  Look at the US Criminal Code.

  Each individual should go into one and only one category on a variable, one value on a variable.   For example:  What is your favorite food, we have a long list, but each person is allowed only one.

 Sorting people into categories must be as reliable and accurate or valid as possible.  One of the things we do is evaluate how accurate our measurement is. 

Ordinal Measurement.   Here we have categories in a logical order.       Very short, short, medium, very tall, tall .  Often we take continuous variables and make them ordinal.    Income:   Under $20,000   $20 to 40,000  $40 to 60,000   $60000 plus.

Interval Measurement:   TEMPERATURE IN FAHRENHEIT OR CENTIGRADE, 0 degrees is not the absence of heat.  How about the day that the "temperature doubled" in New York City?

Ratio Measurement:    Income in dollars:  a continous numerical value PLUS a meaningful zero point.  Height in inches.

There is a "Levels of Measurement Review Quiz" available on WEBCT.  This quiz is not required and does not count towards the grade.  The correct answers are explained once you take the test.


January 29: 
Concepts and Theories: 
By "science" we mean a field of study that attempts to establish generalizations based on empirical observation. Establishing generalizations means we need abstract concepts.  This is different from establishing facts about particular cases as we may do in history or in criminal investigation.  In a criminal investigation, we may ask "who committed a specific murder" and we work very hard to find that person.  In scientific research, we would say, what factors determine the frequency of murder in different communities or in different years.  The first helps to solve a case, the second helps us to formulate policies that may lessen crime in the future.  We may also use the generalizations as guidelines in solving a particular crime, e.g, usually murders are committed by men with certain characteristics...  But this is risky, and may get us into legal trouble, particularly if we use racial or ethnic characteristics, e.g., racial profiling.  It may be that cocaine smugglers are largely Hispanic, for example, but this is of little use in catching them and may lead us to hassle a lot of innocent people since the vast majority of Hispanic people are not smugglers.
Establishing general patterns can help us to change policies.  
An example is the work of  Florence Nightingale who used social research to advocate for better nursing care in the British armed forces during the Boer War.  She invented the bar graph and pie chart.

Other fields of knowledge also use concepts, concepts are a part of how the human mind and perhaps all intelligences work.  Philosophy is largely about analyzing the implications of different concepts.  Mathematics also deals with concepts because  numbers are concepts.  The small integers are especially important, especially Zero and One (or nothing and something).  Religion uses concepts  The Bible says In the beginning there was the Word, and the Word was with God, and the Word was God What does that mean?  The original Greek text uses the word "logos" which means unit of thought or idea or concept, which is where we begin also, with concepts.  How do we decide if this is a good concept or not?  We may find it fulfilling, spiritually meaningful.  We may find it beautiful.  Social science, however, is not much concerned with that.  We are much more mundane, we want useful, pragmatic concepts.  Religious concepts are good if they provoke spiritual reflection, as in reciting a Mantra in Buddhism.  Literary concepts are good if they are beautiful, which social sciences seldom are.   W.H. Auden's poem Under Which Lyre is  an aesthetic attack on social science and other applied sciences.

 Social science may not appeal to poets, but it is more useful. At least there are more jobs using social science than writing poetry. In the social science we want concepts that are parsimonious and useful and clearly defined.  We avoid ambiguity and subtleness, traits which literature and religion may value.  We are not, however, looking for concepts that are logically correct in the way that philosophy does.  We want concepts that help us to make useful discoveries about the observable world.  We like concepts that lead to falsifiable statements, which is a key difference between social science and theology or mathematics.   This is an issue now in the debate about "intelligent design" theory, a doctrine that claims to be a scientific theory but many say is a theology in disguise.  Is there any evidence that would disprove this theory.  Is the human body intelligently designed or did it evolve?  Why do we have an appendix?  Why do men have non-functional breasts?  Why are our backs weak like the backs of quadrapeds?  Why do whales have finger bones in their fins?  Why does the laryngeal nerve in the giraffe go all the way down the neck and then back up to the larynx? 

In social science we have general ideas or theories, which are statements of relationships between concepts.  From these, we make hypotheses about what we are likely to observe in empirical reality.  We gather data to test our hypotheses, and we change our theories if the tests do not work out.  At least that is how it is supposed to work!  An excellent example is the work of Felton Earls and his colleagues who sed a combination of research methods to study the causes of urban crime.  Their organizing concept was "collective efficacy".

 In real life, many social scientists act more like lawyers, selecting facts that support their preconceptions.  We are more successful in being objective in our descriptions than in our explanations or in our predictions.  We know that the  rate has been going down for the last fifteen years or so, but we are not agreed about why

The book distinguished "pure" from "applied" and "evaluation" research.  Pure research is motivated entirely by scientific curiosity, applied research seeks to further a goal.  Evaluation research seeks to determine whether a particular program works or not. 

In testing hypotheses, we can make Type One or Type Two errors.  Type One:  accepting a correlation that does not exist.  Type two:  Not accepting a correlation that does in fact exist.  There is a trade-off between the two, to the extent that we avoid making Type One error we increase the risk of Type Two error. 

The null hypothesis is a statement of how things would be if our theory were not true, generally if there was no relationship between our variables.  Some philosophers believe it is more correct to say "we reject our null hypothesis" than to say "we accept our hypothesis as true". 
Jan 26:  we went through Exercise 2a in the Workbook.

Regression equation.  The equation used to predict the Dependent variable (Y)  with the Independent variable  (X).   Y  =  a + b X  where:
X is the independent variable
Y is the dependent variable
b is the "unstandardized regression coefficient"
a is the "intercept"

In the example of Hunting and Field & Stream.   X = Hunting   Y =  Field & Stream   b = 4.659   a = 418.983    (a and b are called parameters)  X and Y are variables. 

An example:  New Jersey    X =  11.71   actual Y = 416.      To predict Y     Y' =  418.93 + 4.659 * 11.71   for new Jersey     =  473.5
                                            x = 137.93  Y = 1357           To predict Y     Y' =  418.93 + 4.659 * 137.93 for North Dakota    =  1061.5

January 24:

The Null Hypothesis:  the hypothesis that our variables are not related.  Suppose we are interested in the relationship between party affiliation (Democrat or Republican) and opinion of George Bush (approve or disapprove).  What would our "null hypothesis" be?  That the percent of Democrats approving Bush is the same as the number of Republicans.  (This is NOT to say that it is 50%, it could be any percent but the null hypothesis is that it is in the same in both groups.)

Type One and Type Two Error.   Type One, means you say something that is wrong.   Type Two, you fail to say something that is true. 

Suppose we have 400 respondents in a survey and 160 approve of Bush while 250 do not.  What percent approve of Bush?  Suppose our sample has 300 Democrats and 10
10 Republicans.  What percent are Democrats? 

Suppose we put these in a table, what would we have: 

                              Democrats           Republicans    Total

     Approve                                                             160

     Disapprove                                                          250

    Total                   300                         110             410

What would our null hypothesis percentage table look like?


                              Democrats           Republicans     Total

     Approve            39.0                   39.0%               39.0%

     Disapprove       61.0%                61.0%                61.0% 

  N =                      300                         110              410

What would our (null hypothesis) expected frequency table look like?


                              Democrats           Republicans    Total

     Approve             117.0                     42.9              160                     

     Disapprove        183.0                      67.1              250

    Total                   300                         110              410

Observed Frequencies
                            Democrats           Republicans    Total

     Approve          110                            50                   160

     Disapprove      190                             60                  250

    Total                   300                         110             410

Question?  Assume this is a random sample.  Is the difference big enough to exceed random chance.  Is it "statistically significant" 

Chisquare is a test of "statistical significance" -  an inferential statistic.  It tests whether we can generalize from our sample.  Is dependent of the sample size.

Cramer's V is a test of the "strength of the relationship"  -  descriptive statistic.  It describes the sample data.  It is independent of sample size.

The correlation coefficient is a measure of the strength of relationship (usually between continuous variables).  They vary from -1 to 0 to +1.  Zero means no relationship.  One means a perfect relationship, either positive or negative.   When comparing two correlations, the one with the largest absolute value is strongest.

You can see examples of these with the Percents, Expected Frequencies and Chi-Square Calculator (an Excel spreadsheet). 

January 22 -  We will begin our discussion of the textbook with Chapter 6 on Basic Research Design because it gives a good introduction to the kinds of research social scientists actually do.  .  How research is organized or structured to accomplish different ends.  The book discusses four "basic" types of designs.  The "Review Glossary" on page 124 is a good place to find a brief description of each.
  1. The experiment  -  subjects are recruited to be exposed to a hypothesized causal factor, called the "independent variable".  They are assigned at random to experimental and control groups.  The effect of the independent variable on a hypothesized effect or "dependent variable" is measured.  This is the best method for establishing causal relationships, so long as you can set up an experimental situation that is sufficiently close to real life.
  2. Survey Research -  A standardized set of questions is asked to a representative sample of people.  Very widely used because it is quick and efficient and gets good information about attitudes and behaviors that people are aware of and are willing to tell us about. 
  3. Field Research - We go out into the world and observe what actually goes on.  This gets at real behavior in its real setting, with the only difference being the presence of the researcher. 
  4. Aggregate or Comparative Research - We analyze statistics collected by government or other organizations.  This depends on the quality of the data.  Very widely used in criminal justice because the CJ system collects a great deal of data.  It is often referred to as the COMSTAT method.
Field research often generates only descriptive data in words, although some field studies count things as well.  The other three methods generate quantitative data.  Sometimes this data is categorical:  we count the number of cases in discrete categories, i.e., the number of men and women, the number of Republ.icans and Democrats, the number of burglaries, assaults, etc.  When we have this kind of data, we usually compute percentages.  A frequency distribution gives the frequencies for one variable, e.g, the number of men and women.  A cross-tabulation gives the frequencies for two or more variables at once, e.g, the number of men who are Democrats, the number of men who are Republicans, etc.  The simplest cross-tabulation is a 2 by 2 table, with two values for each variable.   For this example, the he variables are gender and opinion on an issue, each of which has two values.  The data are as follows:

25 men agreed
17 men disagreed
65 women agreed
30 women disagreed
 
  The first thing we do is put them in a two dimensional table, as follows and compute the row totals, the column totals and the grand totals.  We generally put the "independent" or "causal" variable in the column, the "dependent" variable in the row, as follows: 

Observed or Obtained Frequencies Men Women total
Agree 25 65 90
disagree 17 30
47
total 42 95
137

For each cell, we can compute three kinds of percentages.  We must learn to use each of them correctly in a sentence.  The key to this is the "of the" clause that occurs after the percentage.     The "of the" clause gives the base of the percent.

Column percent:  To get the column percents, we divide the cell frequencies by the column total, then multiply by 100 to get a per cent.  Thus, if I ask, "what percent of the men agree" the answer is 25/42 *100  =  59.5%.  The base of this percent is the number of men.  This is a column percent because the men are in a column.

Row percent: If I ask,  "What percent of those who agree are men," the answer is 25/90 * 100 =   27.8%,.  The base of this percent is the number of people who agree.  This is a row percent because the people who agree are all in a row.

Total percent: If I ask, "What percent of the respondents are men who agree," the answer is 25/137*100 =  18.2%.  The base of this percent is the total number of respondents.  This is called a total percent because the base is the total number of people.




January 19 -  We explored the Microcase software.  Many of the examples were from the Introducing Microcase exercise in the workbook.
January 17 -  Discussion of the syllabus, the use of WEBCT and Microcase.  If you are joining the class late and need help, check with the teaching assistant Robyn DuFrain  byn@camden.rutgers.edu