Note: This is a
"web-enhanced" class, not an online class. The notes are not
intended as a substitute for attending class, nor as an alternative to
taking your own notes. They will include examples shown on the
screen during class, and they will be helpful in reviewing for
exams.
The most recent notes will appear at the top of the page, just scroll
down to find earlier notes.
February 20
Experimental
Designs. See the graphs in the book
or on Trochim's WEB site:
Types of
Designs.
Essential
characteristics:
- Two or more groups are matched, usually by random
assignment,
sometimes by a kind of stratified random selection, e.g., an equal
number
of men and women or black sand whites in each group. But the key
is random assignment so that the groups can be assumed to be the same
on
all variables. "Quasi-experiments" are when we use groups that
are
pretty much the same but we didn't assign people at random
- The Independent Variable is "manipulated," i.e.,
it is applied
to one group and not to the other
- Change in the Dependent Variable is measured
Experiments can be done:
- In laboratory settings with volunteers, e.g.,
student volunteers
- The Milgram
Experiment on Obedience to Authority
- In institutional settings such as prisons,
hospitals, rehabilitation
centers, etc., where people are assigned to treatment groups
- New drugs and medical treatments generally must
be shown
to work in experiments before they are approved for use. Often,
treatment
is compared to a placebo. These experiments are usually
"double-blind,"
to control for the psychological effects of knowing one is getting
treatment.
This is a way of controlling subject bias and experimenter bias/
- In criminal justice, one might do an experiment
comparing
a "half way house" to drug treatment program to a prison term for
offenders.
To do this, you would have to get the judge to assign offenders to
different
programs at random. Ethical issues are raised here and there are
likely to be objections
- Occasionally in natural settings, for example
- welfare reform
experiment, assign some recipients to the
old program, some to the
new. This didn't work very well, there
were
errors in the group assignments and the women often forgot which group
they were in anyway
- vaccination experiments
- guaranteed annual income experiments
- Kansas
City Patrol Experiment.
Although logically experiments are the most rigorous
way
to test causal hypotheses, there are practical problems:
- It may be hard to manipulate the independent
variable effectively,
it may not have enough importance to people that they notice it
- Experimental conditions may not be realistic
enough, e.g.,
the Milgram experiments having people apply electric shock to people,
experiments
that simulate being in prison. An experiment is not the real
world
and people know it. This is called external validity, does the
experiment
match real world conditions
- There may be problems of internal validity,
difficulties
in carrying out the experiment:
- "History" effects - the world changes during
the experiment,
people get older, more mature, they are effected by things in the real
world
- Maturation, people get older, learn more
- Testing effects, taking the pretest measure
effects people,
causes them to change. Sometimes we have a matched but untested
control
group that is measured only after the experiment.
- Instrument effects, the testing instrument may
change.
You can't use the same exact test sometimes because people will
remember
it, so items change
- Regression to the mean, just by chance the
people who got
extremely high or low scores on a pretest are likely to get more
average
scores on the second test.
- Subject "mortality" - we may lose people.
This is especially
a problem in testing things like drug rehabilitation, it works for the
people who stick with it, the failures drop out
- Ethical concerns: people may not be willing
to be experimented
on, or it may be harmful to subject them to experimental conditions,
e.g.,
- Tuskeegee syphillis experiment denied some men
penicillin.
You can only deny an experimental drug if you are not "certain" that it
works or if the condition is not serious, e.g., common cold research
- A big strength of experiments is resolving
questions that
involve different recollections of events, e.g., children's reports of
abuse. You don't know what "really" happened and people disagree
on how well they accept the recollections of different people. In
an experiment, you know what really happened, so you can check the
accuracy
of perception. We find that children often remember things that
didn't
really happen. "20/20 report on Child Abuse experiments
(VIDEO shown in class from an ABC News 20/20 show aired October 22,
1993, hosted by Hugh Downs. Transcript available at
www.transcriptstv.com) demonstrates false memory because we know what
really happened since it happened in a controlled experimental
setting. This is much more difficult to establish in real life
case histories: Loftus:
Who Abused
Jane
Doe? There is other information
online on the Kelly
Michaels case and other cases.
Another example we can look at is an experimental study of internet
downloads. This was published in Science magazine because it
demonstrates a sociological principle with rigorous experimental
data. Several documents from this study are in WEBCT,
the most accessible summary is in a file called "Experimental
Macrosociology".
Feb 18
The
Art and Science of Cause and Effect. (powerpoint)
Probabilistic cause, not an absolute cause, not a
cause
that is sufficient or necessary. "Cigarette smoking causes
cancer." WHat we mean is, smoking cigarettes
increases
the likelihood of getting cancer. How much?
There are multiple causes for everything. What
we
want to find out is how much each thing contributes. There are
also
causal linkages, or indirect causes. A causes B
and then B causes C.
Diagraming causal models. We put the dependent
variable
at the right. We draw arrows going into it for each causal
variable that effects it directly. Then we can
have arrows that go into the arrows, steps into the causal analysis, as
in
this sample file:
http://crab.rutgers.edu/~goertzel/homomale.htm
Criteria of Causation - how do we know that
something
is a cause of something else.
1. Time Order. The cause comes before
the
effect. Sometimes we sort out the time order theoretically, we
assume
that
education preceeds employment. Or we can use a
research design that involves gathering data at two points in
time.
If
you don't have measurements at two points in time, this
is shaky.
2. Correlation. The two variables vary
together.
When one is high, the other is high OR when one is low the other is
high. This gets at the degree of causation, the
higher the correlation the strong the causal relationship.
3. non-spuriousness, we want to know
that
the correlation is not cause by something else. We can test this
with an
experimental design, if feasible. Or we can use
statistical controls, which are not quite as convincing but its all you
do
in many cases.
We test for non-spuriousness by introducing controls.
Causal Models: representations of the complex
causal
relationships between variables. Variables have different causal
roles, but this is determined by our causal our causal model, it is not
inherent in the variables. One person's cause can be
another's
effect.
The Elaboration
Model is one way of thinking about this kind of analysis, using
cross-tabulation as a technique.
We first look at the relationship between the
Independent and Dependent Variables, then we see what happens when we
introduct Antecedent or Intervening Control Variables.
Dependent Variable - that is what we want to
explain.
Often these are opinions or behaviors
Independent Variable - what we use to explain
it.
Often there are traits or physical characteristics, e.g., sex or race,
almost always independent.
If you study the relationship of race on voting, for
example,
race would be independent and voting dependent.
Antecedent variables, things come before the
independent
variable. This helps us to deal with a causal chain.
Antecedent variable cause IV which causes the DV.
If the antecedent variable "explains" the
relationship,
we have an "explanation", we say it is "spurious".
Intervening Variables, this that are intervening,
e.g.
Race determines ideology which determines the vote.
This is an "interpretation" it tells WHY the causal
relationship exists.
Path
Models: a way of graphically expressing complex causal models.
Marital Status and Frequency of Sex by Age
Under 50 50 and
Older Total
Divorced Never
Divorced Never Divorced Never
Widowed Married
Widowed Married widowed
Married
Less than
Monthly
29.7% 30.8% 77.9%
70.2% 54.7% 34.0%
Monthly or
More
70.3% 69.2%
22.1% 29.8%
45.3% 66.0%
TOTAL
100%
100%
100% 100%
100% 100%
p=.75
p=.24
p=.000
There is a statistically significant difference between the divorced or
widowed respondents and the never married respondents in their
frequency of sex. However, when we control for age, this
relationship is no longer significant. Age is an antecedent
variable, so the relationship between marital status and frequency of
sex is spurious.
Spurious means that it is not causal, the correlation is due to a third
variable which is antecedent.
We compare the strength of the correlation in the total sample table
with the correlations or percentage differences in the partial sample
tables.
If the correlations or percentage differences is about the same, we
would say that the relationship was confirmed or supported.
If the correlation disappears, we would have to ask whether the
control variable was Antecedent or Intervening. If it is
antecedent, the relationship is spurious. If it is intervening,
we have a causal interpretation.
If it disappears in one case but not in the other, we would say that we
have specified the relationship.
SEX
AND 1980 PRESIDENTIAL VOTE BY INCOME GROUPS
Presidential Vote
$17,000
to
$25,000
(major parties only)
Total
Sample
Under $17,000
$24,999
and over
Female Male
Female
Male
Female Male
Female
Male
Carter
47% 40%
57% 43%
40% 36%
36% 34%
Reagan
53% 61%
43% 57%
60% 64%
64% 65%
Total
100% 101%
100% 100%
100% 100%
100% 99%
N of cases
(482) (395)
(108) (86)
(78) (92)
(102) (140)
Χ2=
4.8
X2 =
3.96
X2 =
.08
X2 =
.01
p =
.03
p
= .05
p = .77
p = .89
There is a
statistically
significant relationship between sex and presidential vote for the
sample as a
whole. 61% of the men voted for Reagan as compared to 53% of the women.
When
controlling for family income group, however, we found that the
relationship
between sex and the vote was significant only for the under $17,000
family
income group. Among this group, 57% of the men voted for Reagan, as
compared to
43% of the women. Among the higher income groups, there was no
significant
difference with 60% or more of both sexes voting for Reagan.
In this example, sex
is the
Independent variable, presidential vote Is the dependent variable, and
income
Is the test variable. Income Is an intervening variable between sex and
vote.
When compared with the original relationship, the partial relationships
are
split. In the terminology of the Elaboration Paradigm, this is an
example of
specification.
Feb 16 - We will go through the "Selecting Cases" exercise in the
workbook. Quiz Five in WEBCT covers both exercises 3 and 4.
Review of Levels of Measurement:
If there are two and only two choices, we have
a
dichotomy. It can also
be called a
dummy variable if
one of the choices is the absence of a trait. This, gender
(male or female) is a dichotomy. Are you a Roman Catholic
is a dummy variable if it is coded 0 = no; 1 = yes. We can
use regression and correlation statistics with dummy or dichotomous
variables.
If there are more than two categories, but the
categories are in no logical order, it is a
nominal variable. We cannot
compute means, correlations or regressions. We can do percentages
and chi-square. E.g., Religious Affiliation (1 = Protestant
2 = Catholic 3 = LDS 4 = Jewish 5 =
Buddhist 6 = other)
If there are more than two categories and they are in
order, we it is an
ordinal variable.
E.g., short, medium, tall. There are statistics intended
for ordinal variables but we will not make much use of them except for
the
median and the
rank or
percentile. We can use
correlation and regression with ordinal variables if they are
reasonable approximations to interval, although statistical purists
would disagree.
If we know the distance between cases on a scale, it is an
interval variable, e.g., height in
inches, temperature in degrees, income in dollars. These are
continuous variables, while
dichotomous, nominal and ordinal variables are
categorical. Most interval
variables have a zero point which is a logical absence of the trait, in
which case they are
ratio
variables and one can make statements about ratios, e.g., Jane is three
times as rich as Joe. We cannot do this with temperatures in
fahrenheit or centigrade because the zero point is not an absence of
temperature. The absence has to be only logical, not empirically
present, e.g, no one has zero height but it logically zero inches is an
absence of height.
Many of the measures we are a bit ambiguous, e.g., test
scores. This may just mean they are poor measures, or the trait
may be inherently difficult to measure.
February 14 - We are scheduled to have class despite the
weather. We will work through pages 67 to 79 in the
workbook; those of you who miss class can go through this on your
own. It raises some important practical points:
- the difference between using aggregate number and rates.
Rates are usually more informative, using aggregates can lead to
completely wrong conclusions and often do in student papers, e.g., more
white people get married than black people, but that is only because
there are more of them, not because the marriage rate is necessarily
higher
- Levels of measurement: you should be able to look at a
variable in microcase and know what the level of measurement is.
Variable with a ! after them tend to be continuous, not recoded into
categories.
- The statistics we can use depend on the level of measurement,
e.g, Pearson's r requires continuous variables; Chi square and
Cramer's V only nominal variables. Dichotomous variables are
tricky, they can actually be used with statistics that require
continuous measurement. This is summarized on the Statistics
Overview page.
- Cronbach's Alpha is a statistic used for measuring internal
consistency reliability. Correlation coefficients can also be
used for this; Cronbach's Alpha is a sort of summary of a
correlation matrix.
- It is important to always look carefully at how a variable is
measured, including the wording of questionnaire items. This has
a strong effect on frequencies and percentages, less of an effect on
measures of relationships between variables.
February 12 - Sampling
SAMPLING
is
used when we are
interested in studying a population that is too large for us to study
each individual. The first step is to define the
population
we wish to make statements about, e.g. adults in New Jersey, probable
voters, people convicted of felonies, graduates of our
department. We might want to study the entire population of the
USA. If we try to collect data from everyone, this is a
census. The Census Bureau does this once every decade, and misses
a lot of people. Everyone else does sampling, we select a
cross-section to represent the population. If you
try to study the whole population, you often fail to do a good job.
Gallup:
How Polls are Conducted.
Size of the sample. How big of a sample do I
need?
Size
of the sample does not depend on the size of the population.
How do we select the sample size? Decide on the
margin of error you will tolerate? Margin of error is equal to
one
divided by the square root of the sample size. Sample of
400,
the square root is 20. 1/20 = .05 or 5%. If you interviewed
400, 300 were white, 50 were black and 50 were others. For the
blacks,
with a sample of 50, we would have a 14% margin of error. For the
whites, with a sample of 300, we would have a 5.8% margin or error.
Take 300, the square root of 300 is =
17.32
1 /17.32 = .0577 * 100 = 5.8%
Sample statistic - what the sample says
population parameter - what the real figure is
Even if the sampling is done well, the response rate is less than 100%.
Weighting is done to make the sample more like the population.
This formula is for proportions or percents
(if you move the decimal over two)
m = 1/sqrt(n)
Solve for N: m2 =
1/n
n * m2 = 1 n = 1/ m2
If we need a margin of error of 3%, or .03. n = 1/ .032
If you have a sample size
and need to know the margin of
error, use m = 1/sqrt(n)
If you are given
a margin of
error
and asked how large a sample you need, use n = 1/ m2
In these
formulas
n = the size of the sample (not the population). m =
the margin of error expressed as a proportion, not as a percent.
Thus, if the questions says "we need a margin of error of 5%, then m =
.05.
If our sample is stratified, this means we really have several
sub-samples and we need the same size sample for each of them,
regardless of the size. For example, if we want sample white,
black and Hispanic respondents and make statements about each group, we
need the same size sample of both regardless of their size in the
population. Thus, if we need a margin of error of 5% for each of
the three
groups,
then the answer is 3 * (
n = 1/ m2 ).
If
you need a margin of error for a mean score (an average such as income
in dollars or scores on a test), you need to know the standard
deviation
(sd) and the sample size (N). Ignore any other
information
you are given, including the size of the population.
Use the following
formula:
M
= 2 * sd / SQRT(N)
Suppose
I sample 457 Camden residents and the mean income is $27,541 and
the standard deviation is $3452
M
= (2 * 3452 )/sqrt(457). This result will be in dollars, not
percentages.
M
= 6904
/21.378 =
$322.95.
Confidence
Interval: I am 95% sure that the population figure is
between: $27,218.05 and $27,863.95
Terms:
Margin of Error: How much a sample statistic is likely to vary
from the population parameter. We say that we are 95% sure that
the sample is not off by more than the margin of error. How this
is presented in
NY Times. "19 out of 20" is another way of saying 95%.
Confidence level: we always use a 95% confidence level.
Confidence interval: the range within which we think a
statistic would fall, e.g., if the margin of error is 3% and the sample
statistic is 67%, the confidence interval is from 64% to 70%. We
are 95% sure that the true figure is within this limit.
All of this assumes a simple random sample, which means that each
person (or other sampling unit) in the population has the same chance
of appearing in the sample. In practice, however, we often do not
use simple random samples, for several reasons:
- we may not have a list of the population. If we do not, we
first divide the sample into sub-groups of some kind (census tracts,
blocks, classrooms, organizations, depending on the nature of the
study). We then sample the subgroups and list the populations in
them . This is called cluster sampling. It is often used
for interviewing people at home instead of over the telephone.
- We may be interested in differences between sub-groups of the
sample and need to make sure we have enough of them. In this case
we select random samples of each of the relevant sub-groups, and weight
the results appropriately. This is called stratified
sampling, e.g., the NY
Times explains that its surveys oversample black
respondents. Most surveys today are on the telephone.
- Sometimes we just go down a list, which is called systematic
sampling. This gives the same results as simple random sampling,
unless there is some systematic ordering to the list that causes a
distortion
- Sometimes we use non-random or "quota" sampling. This is
done for convenience, or because we just want to know what the range of
differences is without putting numbers on them. Internet surveys
tend to be of this sort, although there are some randomly selected
Internet panels.
Review the terms on pages 84-85 of the textbook, I will not retype them
all in these notes.
Some margin of error problems:
Suppose
I did a sample of 400,selected from the 7,357,218 people living in New
Jersey. What is the margin of error?
M = 1 /SQRT(N). N is the sample size, not the
population size.
N = 400. Sqrt of N = 20. 1/20
= .05 or 5%. If I find that 42%
agree, that is my population "statistic." The
population paramater
is the true value, and I would say that I am 95% sure (my confidence
level) that the paramater is between 42% - 5% and 42% + 5%.
The true
value should be between 37% and 47%.
Suppose I go to 1000, what is my margin of error?
M = 1/SQRT(1000). = 1/ 31.62 = .0316
or 3.2%. The confidence interval is between 38.8% and
45.2%.
This applies to statements made about the whole sample. 42% of
the respondents said yes, the margin of error is 3.2%.
For statements about a subgroup, the N is the number of people in that
sub group (genders, races, sports fans).
We have a sample of 1200, of whom 800 are white, 300 are black and 100
are Hispanic. 57% of the Hispanics said yes to the item.
What is the
margin of error for this percent? Since it says "of the
Hispanics" our
N is the number of Hispanics, or 100. M = 1/SQRT(100) = .10
or 10%.
For the black respondents, our margin of error is M=1/SQRT(300).
= 1 / 17.32 = .0577 = 5.8%
For the white respondents M = 1/SQRT(800) =
.03535 or 3.5%.
How large a sample do I need to get a 5% margin of error, with a
population of 485,321? N = 1/M2 M
must be expressed as a proportion, not a percent. M =
.05. .05 * .05 = .0025.
Sample size = 1/.0025 = 400
Suppose I wish to study the black, white and Hispanic populati0n and I
need a margn of error of 5% for each group. How large a sample do
I
need?
The other thing we need to deal with is margins of error for mean
scores. Thein a survey of 300 county residents, the mean
income is
$45,321. We need to have the standard deviation. The
Standard
Deviation is a measure of variation. The standard deviation is
$3521.
M = 2 * sd/sqrt(n). N = 300. 2 * 3521/17.31 =
$203.29.
|
February 9 - We went over the
results
of the exercise done in class on Wednesday and reviewed how to compute
percents from the observed frequencies. We reviewed the examples
for Exercise 2b in the workbook. We constructed the multivariate
crosstabulation below:
Political Party
Affiliation and Opinion on Government Paying for Medical Care
By Political Ideology
Liberals
Moderates
Conservatives
N =
459
N =
654
N = 610
Dem Ind
Rep
Dem Ind
Rep
Dem Ind Rep
Gov Pay
69% 60%
46%
51% 49%
39%
54% 38% 33%
Middle
23% 28%
37%
35% 37%
40%
31% 35% 33%
Help Self
8% 12%
17%
14% 14%
22%
15% 27% 33%
Total
100% 100%
100% 100%
100%
100%
100% 100% 100%
Chisq = 11 p =
.03
chisq = 7.8 p =
.10 Chisq = 21.4 p =
.000
The table shows that the relationship
between political party affiliatioin and support for government funded
medical care is significant for for liberals and conservatives.
The difference is smaller and not statistically significant for
moderates.l
February 7
Scaling or
index construction is when we use a number of items, such as
questionnaire items, to measure a more general
concept. This can be an attitude, e.g, "conservatism" or
"authoritarianism." Or it can be a category of behavior, e.g,
"violent crime" which is measured by averaging together the frequency
of a number of crimes. Magazines like to do this to come up with
ranking, e.g, of the best community to live in or the worst one, the
best colleges, etc. Usually this is done just by adding up a
number of characteristics (in which case your
text would call it an "index", although many people still use the term
scale). This gives is a rank order of sorts, and some idea of how
high or low cases are, but we really don't know how big the differences
are or what the scores really mean. What does it mean that Camden
is the "most dangerous city"? Is it really that much more
dangerous than the next dangerous? Or than an average city?
Most of the measures we call "scales" are really what your textbook
calls indexes composed of or what Trochim calls "response format"
items.
Most tests in college classes are examples. We just add up the
point to measure the general
variable "knowledge of research methods as covered in the first part of
the course." Another approach would be to rank the items from
easy to hard and see which you could do. This is tricky, because
some people can do the hard ones and not the easy ones. When we
make an index or scale, we get measures that can be treated as
interval, even if they are not strictly interval. Scaling methods
can be more precise, but these are not used as often in sociology or
CJ because they are more difficult and the added information is not
always needed.
Attitude caling methods include Thurstone
and Guttman
Scaling. Likert or
summative scaling is actually a method of "index" construction as
defined in our book. Thurstone scaling tries to get true interval
measurement. Guttman scaling gets ordinal measurement.
Likert or summative scaling or index construction approximates interval
scaling in some ways, but it is hard to know how well. The key
thing is knowing how well the items intercorrelate. If they
intercorrelate well, it works well. It is usually quick and
effective, so it is widely used despite the lack of a rigorous logic.
For anb example of true scaling in criminal justice, we could
scale the seriousness
of crimes. There are various methods of
measuring this. - paired comparisons means asking a sample of
people to rate crimes based on their perceived seriousness.
New
Zealand Study on Attitudes to Crime. Crime
Victims United (Oregon).
A very popular
test is the Myers-Briggs
Type Indicator, based on Jungian personality theory. You can
take several free versions of this and related tests online (Wikipedia article).
Several are available from similarminds.
A problem with this is that it sorts people into categories although
the measure is really continuous. This makes it understandable
and it is very widely used. We did an experiment in class with a
short questionnaire and a choice between four organizational
choices. The tabulation of the results, and a chi square test,
are here.
The relationship between the questionnaire results and the
organizational choices was not statistically significant.
February 5 -
Quality of
Measurement - Reliability and Validity.
Reliability - you get the
same thing
over and over. Consistency.
inter-rater
- two different raters get the same answer.
test-retest, if you take it twice the answers are the
same.
internal consistency - are the items on a test
consistent? This can be calculated by looking at the inter-item
correlations. Chronbach's alpha is a statistic that measure
inter-item reliability. Example, correlate the ABORT variables in
the GSS data file. We see that all the correlations are positive
and significant. We can then make an index of them by adding up
scores on the six variables.
Correlation Coefficients
N: 1603
Missing: 1229
Cronbach's alpha: 0.874
LISTWISE deletion (1-tailed
test) Significance Levels: ** =.01, * =.05
ABORT
DEF ABORT WANT ABORT
POOR ABORT RAPE ABORT
SING ABORT HLTH
ABORT DEF
1.000 0.447 ** 0.439
** 0.618 ** 0.443 **
0.641 **
ABORT WANT
0.447 ** 1.000
0.816 ** 0.435 ** 0.840
** 0.332 **
ABORT POOR
0.439 ** 0.816 ** 1.000
0.442 ** 0.827
** 0.337 **
ABORT RAPE
0.618 ** 0.435 ** 0.442
** 1.000 0.437
** 0.636 **
ABORT SING
0.443 ** 0.840 ** 0.827
** 0.437 ** 1.000
0.330 **
ABORT HLTH
0.641 ** 0.332 ** 0.337
** 0.636 ** 0.330 **
1.000
Correlation Coefficients
N: 1603 Missing: 1229
Cronbach's alpha: 0.796
LISTWISE deletion (1-tailed test) Significance
Levels: ** =.01, * =.05
ABORT INDX
ABORT DEF
0.734 **
ABORT WANT
0.858 **
ABORT POOR
0.855 **
ABORT RAPE
0.728 **
ABORT SING
0.859 **
ABORT HLTH
0.645 **
Validity is it "really"
measuring
what it is supposed to measure.
Face Validity - does it look right?
This is often related to fairness, people will object to the use of
measures that do not have face validity even though they may have
predictive validity, e.g., using the frequency of moving as a criterion
for loaning money.
Predictive or criterion validity - does it predict what we want to
predict,
some "true" measure. SAT test predicts college or law or medical
school grades.
Convergent
validity - do several measures give the same result.
Construct
validity - does the measure perform as our theory says it
should.
We use this when we have no criterion.
This is the most difficult, it is used when things are inherently
difficult to measure.
Essentially, it asks whether the
results are consistent with what we would expect based on theory and
past experience.
Camden
schools report. Brim
school report, see pdf page 14 for tables. Story
on Brim with graph.
An example: the
measurement of romantic love.
An example: a study of UFO Abduction
Status.
February 2
Descriptive vs. inferential statistics. Descriptive summarize a
sample or a set of data. The inferential tell you whether a
sample is representative of a population.
With inferential statistics we generally look for p<.05 - the
smaller the better.
Frequency distribution. Take the numbers, and put them into
ordered categories.
The categories should be the same width, with the possible exceptioni
of the first and the last.
under
18
2 XX
18-25
6 XXXXXX
26-34
0
35-44
9 XXXXXXXXX
45-54
12 XXXXXXXXXXXX
55-64
5 XXXXX
65 and older
2 XX
A bar chart or histogram gives you a graphic picture of the frequency
distribution. 18, 63, 21 42, 12, 55,
What is the average age? Mean - add them all up
and divide by N where N is the number of cases.
Median - put them in order and taken the case in the
middle.
N = 36 Median is in the 45-54 category.
January 31 -
Discussion of designing research
projects. How do we decide what to study? Supplementary
reading
in Trochim on the
structure of research. You may prefer his "hourglass"
metaphor
to the circular one on page 14 of our textbook.
- Selecting a topic. Typical
motives include:
- Finding out something we don't
know. This may include
something local, e.g., what do people in Camden think about the new
Governor's
actions, something that has been unresolved in earlier research,
something
that hasn't been studied because it is new, etc. This is what the
authors of your book mean when they say "research always starts with
wondering."
- Another purpose that motivates
research is proving to other
people that what we "know" is true really is true. This is
"advocacy"
research, and it can be very one-sided and lead to sloppy work.
Often
this involves causal arguments, proving "why" something happens.
This kind of research may not start with "wondering" but with "arguing."
- Answering a question posed to us by
our employer or by a
client, applied research. Here someone else really chooses the
topic.
- Formulating a Research Question.
This means formulating
a "statement" which will involve variables. We have an argument
or
story in mind at this point.
- Defining the Concepts. Usually
not a lot of time goes
into this stage of empirical research, but some people do write
articles
focusing on this, e.g., what does "race" or "poverty" mean, what is the
difference between "sex" and "gender" An example: Police Crackdowns
and Slowdowns.
- Operationalizing the Concepts. A
lot of effort goes
into this. Quantitative research means you have to measure
your variables and a lot depends on having good measurement.
Sometimes
this is difficult, e.g., measuring "intelligence" or
"liberalism-conservatism"
or "mental illness" or "crime rates (various kinds)". Often we
use
standard measures created by the government agencies that collect
statistics.
- Formulating Hypotheses. This is
usually pretty easy.
There is a distinction between "null hypotheses" and regular
hypotheses,
which is explained on page 13. It means testing the hypothesis
that
your hypothesis is not true. Thus, you hope to "reject the null
hypothesis"
rather than "accept the (regular, not-null) hypothesis". So far
as
I know, there is no word for the opposite of Null, it might be
Substantive?
Type One Error: accepting that a relationship exists when it
doesn't.
Type two: rejecting a relationship when it really does exist.
- Making observations. This is a
major step unless we
just get the observations from someone who already did the work.
- Analyzing the Data. This is
"number crunching"
running data through the computer. Of course, one can also
analyze
qualitative data from interviews or observations, but today even that
tends
to get quantified (content analysis).
- Assessing the results. This is
really part of the analysis.
If the hypothesis doesn't work out, often researchers go back and
change
the hypotheses and pretend they knew all along what was going to happen
- Publishing the findings.
This assumes
that you are doing
"scientific" or "pure" research, much applied research is actually
distributed
only within the organization that paid for it. This may be done
in
person, with a "power point" presentation. Refereed
publications:
you paper is sent to other specialists for review to decide if it
should
be published. "Refereed journal." Press
release.
Publication can be online as well as on paper. You publish the
research
so you can get credit, see your name in print, get promoted, and also
so
that you can inform others, and perhaps most important, so that other
people
can criticize or attempt to replicate it. Usually
people replicate
research in the hope of overthrowing it, if you just find the same
thing
as before, there is less interest. This cancels out a lot of the
bias in social research, since there is usually someone with the
opposite
bias to correct it.
Measurement means putting observations into
categories. Usually these categories are given numbers, although
not always.
Sometimes we
do this just to keep track of things, e.g., each American has a social
security number, we have a library number, a student number,
etc.. But
often the numbers give us more information than that, e.g., the NJ
driver's license gives height in feet and inches. It also gives
sex
and eye color, which are described in words but could be given
arbitrary numbers. But the numbers given for height are not
arbitrary.
In some sciences, e.g., astronomy, numerical measurement has led to
important insights, e.g, to understanding the motion of the
planets.
This is because our observations can be summarized with mathematical
equations that enable us to predict events.
When we measure something, we need to be clear exactly what the
measure means. Especially when we use a number, we want to know
what
it means. What is a number? It is not so obvious as one
might think.
Bertrand Russell said "A number is the class of all classes similar to
a given class." I.e., all sets of three have something in common,
which we could call "threeness."
Levels of Measurement
The first and most important question is: is the measure
continuous or
categorical? This is
important because continuous variables are required for the use of
statistics such as the mean, standard deviation, correlation and
regression. With continuous measurement we have precise distances
between the items measured, with categorical we just have them sorted
into discrete categories.
If a variable is
continuous,
we can ask whether it is "interval" or "ratio". Both
of these have precise distance measurement between points. In
addition, ratio measures have a logically meaningful zero point.
With ratio measures, we can talk about ratios between variables, e.g.,
say that $50 is twice as much money as $25. With interval
variables, such as fahrenheit temperatures, we cannot make such
statement.
If a variable is
categorical,
we can ask whether it is "dichotomous," "nominal" or "ordinal"
These terms are summarized on page 52 of the book.
Dichotomous variables have only two categories. These can be two
natural categories such as "male' and "female" or they can be
artificial "dummy" variables, such as: are you a Catholic
or not;. With dichotomies you can use regression and correlation.
Nominal variables have more than two categories, but not in any order
or with a measured distance between them.
Ordinal variables have the categories in a logical order (from
"lower" to "higher").
In answering questions about measurement, give the highest or best
level of measurement that is justified. Any variable that meets
the criteria for a ratio variable also meets the criteria for an
interval variable, but the criteria for a ratio variable are more
stringent so we would say that it is ratio measurement. Any
ordinal variable also meets the criteria for a nominal variable, but if
it meets the criteria for ordinal we say it is ordinal.
It is
important to understand that many variables can be measured at
different levels. Thus I could take height and put it into
categories such as short, medium, tall in which case I would be using
ordinal measurement because they are in order. I could also
measure it in inches or centimeters, which would be ratio
measurement. It is also important to understand that each of the
statistics is appropriate for variables measured in some ways but not
others. Doing percentages and cross-tabulations makes sense for
nominal or ordinal data. Chisquare is for nominal or ordinal data.
Doing correlation or regression or means and standard deviations
requires interval or ratio data. We can make a broad distinction
between categorical (nominal or ordinal) or continuous (ratio or
interval) data. The dichotomy is a special case because we can
use correlation and regression with dichotomies, but we can also do
percentages, cross tabulations and chisquares.
Nominal Measurement. Categories that could be put in any order.
Catholic, Protestant, Jewish, Moslem,
LDS, Buddhist, Episcopalian, Baptist
variable one, category of religion, variable two denomination.
Mental illnesses (DSMIV) e.g., adjustment disorder, borderline
personality disorder, paranoid schizophrenic
Crimes: burglary, assault, murder. What do these
terms mean? Look at the US Criminal Code.
Each individual should go into one and only one category on a
variable, one value on a variable. For example: What
is your
favorite food, we have a long list, but each person is allowed only one.
Sorting people into categories
must be as reliable and accurate or valid as possible. One of the
things we do is evaluate how accurate our measurement is.
Ordinal Measurement. Here we have categories in a logical
order. Very short, short, medium,
very tall, tall . Often we
take continuous variables and make them ordinal.
Income: Under
$20,000 $20 to 40,000 $40 to 60,000
$60000 plus.
Interval Measurement: TEMPERATURE IN FAHRENHEIT OR
CENTIGRADE, 0
degrees is not the absence of heat. How about the day that the
"
temperature
doubled" in New York City?
Ratio Measurement: Income in dollars: a
continous numerical value PLUS a meaningful zero point. Height in
inches.
There is a "Levels of Measurement Review Quiz" available on
WEBCT. This quiz is not required and does not count towards the
grade. The correct answers are explained once you take the test.
January 29:
Concepts and
Theories:
By "science" we mean a
field of study that attempts to establish generalizations based on
empirical observation. Establishing generalizations means we need
abstract concepts. This is different from establishing facts
about particular cases as we may do in history or in criminal
investigation. In a criminal investigation, we may ask "who
committed a specific murder" and we work very hard to find that
person. In scientific research, we would say, what factors
determine the frequency of murder in different communities or in
different years. The first helps to solve a case, the second
helps us to formulate policies that may lessen crime in the
future. We may also use the generalizations as guidelines in
solving a particular crime, e.g, usually murders are committed by men
with certain characteristics... But this is risky, and may get us
into legal trouble, particularly if we use racial or ethnic
characteristics, e.g., racial profiling. It may be that cocaine
smugglers are largely Hispanic, for example, but this is of little use
in catching them and may lead us to hassle a lot of innocent people
since the vast majority of Hispanic people are not smugglers.
Establishing general patterns can help us to change policies. An
example is the work of
Florence
Nightingale who used social research to advocate for better
nursing care in the British armed forces during the Boer War. She
invented the bar graph and pie chart.
Other fields of knowledge also use concepts, concepts are a part of how
the human mind and perhaps all intelligences work. Philosophy is
largely about analyzing the implications of different concepts.
Mathematics also deals with concepts
because numbers are concepts.
The small
integers are especially important, especially Zero and One (or nothing
and
something). Religion
uses concepts The Bible says In the beginning
there was
the Word, and the Word was with God, and the Word was God. What does that
mean? The original Greek text uses the word "logos" which means
unit of thought or idea or concept, which is where we begin also, with
concepts. How do we decide if this is a good concept or
not? We may find it fulfilling, spiritually meaningful. We
may find it beautiful. Social science, however, is not much
concerned with that. We are much more mundane, we want useful,
pragmatic concepts. Religious concepts are good if they provoke
spiritual
reflection, as in reciting a Mantra in Buddhism. Literary
concepts are
good if they are beautiful, which social sciences seldom
are. W.H.
Auden's poem
Under Which Lyre
is
an aesthetic attack on social
science and other
applied sciences.
Social science may not appeal to poets, but it is more useful. At
least there are more jobs using social science than writing poetry. In
the social science we want concepts that are
parsimonious and
useful and
clearly defined. We
avoid ambiguity and subtleness, traits which literature and religion
may value. We are not, however, looking for concepts that are
logically correct in the way that philosophy does. We
want concepts that help us to make useful discoveries about the
observable world. We like concepts that lead to
falsifiable statements,
which is a key
difference between social science and theology or mathematics.
This is an issue now in the debate about "intelligent design" theory, a
doctrine that claims to be a scientific theory but many say is a
theology in disguise. Is there any evidence that would disprove
this theory. Is the human body intelligently designed or did it
evolve? Why do we have an appendix? Why do men have
non-functional breasts? Why are our backs weak like the backs of
quadrapeds? Why do whales have finger bones in their fins?
Why does the laryngeal nerve in the giraffe go all the way down the
neck and then back up to the larynx?
In social science we have general ideas or theories, which are
statements of relationships between concepts. From these, we make
hypotheses about what we are likely to observe in empirical
reality. We gather data to test our hypotheses, and we change our
theories if the tests do not work out. At least that is how it is
supposed to work!
An excellent example is the work of
Felton
Earls and his colleagues who sed a combination of research methods
to
study the causes of urban crime. Their organizing concept was
"collective efficacy".
In real life, many social scientists act more
like lawyers, selecting facts that support their preconceptions.
We are more successful in being objective in our
descriptions than in our
explanations or in our
predictions. We know that the rate has been going down for
the last fifteen years or so, but we are not agreed about
why.
The book distinguished "pure" from "applied" and "evaluation"
research. Pure research is motivated entirely by scientific
curiosity, applied research seeks to further a goal. Evaluation
research seeks to determine whether a particular program works or
not.
In testing hypotheses, we can make Type One or Type Two errors.
Type One: accepting a correlation that does not exist. Type
two: Not accepting a correlation that does in fact exist.
There is a trade-off between the two, to the extent that we avoid
making Type One error we increase the risk of Type Two error.
The null hypothesis is a statement of how things would be if our theory
were not true, generally if there was no relationship between our
variables. Some philosophers believe it is more correct to say
"we reject our null hypothesis" than to say "we accept our hypothesis
as true".
Jan 26: we went through Exercise 2a in the Workbook.
Regression equation. The equation used to predict the Dependent
variable (Y) with the Independent variable (X).
Y = a + b X where:
X is the independent variable
Y is the dependent variable
b is the "unstandardized regression coefficient"
a is the "intercept"
In the example of Hunting and Field & Stream. X =
Hunting Y = Field & Stream b = 4.659
a = 418.983 (a and b are called parameters) X
and Y are variables.
An example: New Jersey X = 11.71 actual
Y = 416. To predict Y Y
'
= 418.93 + 4.659 *
11.71
for new Jersey = 473.5
x = 137.93 Y =
1357 To
predict Y Y
' = 418.93 + 4.659 *
137.93 for North Dakota
= 1061.5
January 24:
The Null Hypothesis: the hypothesis that our variables are not
related. Suppose we are interested in the relationship between
party affiliation (Democrat or Republican) and opinion of George Bush
(approve or disapprove). What would our "null hypothesis"
be? That the percent of Democrats approving Bush is the same as
the number of Republicans. (This is NOT to say that it is 50%, it
could be any percent but the null hypothesis is that it is in the same
in both groups.)
Type One and Type Two Error. Type One, means you say
something that is wrong. Type Two, you fail to say
something that is true.
Suppose we have 400 respondents in a survey and 160 approve of Bush
while 250 do not. What percent approve of Bush? Suppose our
sample has 300 Democrats and 10
10 Republicans. What percent are
Democrats?
Suppose we put these in a table, what would we have:
Democrats
Republicans Total
Approve
160
Disapprove
250
Total
300
110
410
What would our null hypothesis percentage table look like?
Democrats
Republicans Total
Approve
39.0
39.0%
39.0%
Disapprove
61.0%
61.0%
61.0%
N =
300
110
410
What would our (null hypothesis) expected frequency table look like?
Democrats
Republicans Total
Approve
117.0
42.9
160
Disapprove 183.0
67.1
250
Total
300
110
410
Observed Frequencies
Democrats
Republicans Total
Approve
110 50
160
Disapprove
190
60
250
Total
300
110
410
Question? Assume this is a random sample. Is the difference
big enough to exceed random chance. Is it "statistically
significant"
Chisquare is a test of "statistical significance" - an
inferential statistic. It tests whether we can generalize from
our sample. Is dependent of the sample size.
Cramer's V is a test of the "strength of the relationship"
- descriptive statistic. It describes the sample
data. It is independent of sample size.
The correlation coefficient is a measure of the strength of
relationship (usually between continuous variables). They vary
from -1 to 0 to +1. Zero means no relationship. One means a
perfect relationship, either positive or negative. When
comparing two correlations, the one with the largest absolute value is
strongest.
You can see examples of these with the
Percents,
Expected Frequencies and Chi-Square Calculator (an Excel
spreadsheet).
January 22 - We will begin our discussion of the textbook with
Chapter 6 on Basic Research Design because it gives
a good introduction to the kinds of research social scientists actually
do. . How research is organized or
structured to
accomplish
different ends. The book discusses four "basic" types of
designs. The "Review Glossary" on page 124 is a good place to
find a brief description of each.
- The experiment - subjects are recruited to be exposed
to a hypothesized causal factor, called the "independent
variable". They are assigned at random to experimental and
control groups. The effect of the independent variable on a
hypothesized effect or "dependent variable" is measured. This is
the best method for establishing causal relationships, so long as you
can set up an experimental situation that is sufficiently close to real
life.
- Survey Research - A standardized set of questions is asked
to a representative sample of people. Very widely used because it
is quick and efficient and gets good information about attitudes and
behaviors that people are aware of and are willing to tell us
about.
- Field Research - We go out into the world and observe what
actually
goes on. This gets at real behavior in its real setting, with the
only difference being the presence of the researcher.
- Aggregate or Comparative Research - We analyze statistics
collected
by government or other organizations. This depends on the quality
of the data. Very widely used in criminal justice because the CJ
system collects a great deal of data. It is often referred to as
the COMSTAT method.
Field research often generates only descriptive data in words, although
some field studies count things as well. The other three methods
generate quantitative data. Sometimes this data is
categorical: we count the number of cases in discrete categories,
i.e., the number of men and women, the number of Republ.icans and
Democrats, the number of burglaries, assaults, etc. When we have
this kind of data, we usually compute percentages. A frequency
distribution gives the frequencies for one variable, e.g, the number of
men and women. A cross-tabulation gives the frequencies for two
or more variables at once, e.g, the number of men who are Democrats,
the number of men who are Republicans, etc. The simplest
cross-tabulation is a 2 by 2 table, with two values for each
variable. For this example, the he
variables are
gender
and
opinion on an issue,
each of which has two
values. The data are as follows:
25 men agreed
17 men disagreed
65 women agreed
30 women disagreed
The first thing we do is put them in a two dimensional table, as
follows and compute the row totals, the column totals and the grand
totals. We generally put the "independent" or "causal" variable
in the column, the "dependent" variable in the row, as follows:
| Observed or Obtained Frequencies |
Men |
Women |
total |
| Agree |
25 |
65 |
90 |
| disagree |
17 |
30
|
47
|
| total |
42 |
95
|
137 |
For each cell, we can compute three kinds of percentages. We
must learn to use each of them correctly in a sentence. The key
to this is the "of the" clause that occurs after the
percentage. The "of the" clause gives the base
of the percent.
Column percent: To
get the column percents, we divide the cell frequencies by the
column total, then multiply by 100 to get a per cent. Thus, if I
ask, "what percent of the men
agree" the answer is 25/42 *100
= 59.5%. The base of this percent is the number of
men. This is a column percent because the men are in a column.
Row percent: If I
ask, "What percent of
those who agree are men," the answer
is 25/90 * 100 = 27.8%,. The base of this percent is
the number of people who agree. This is a row percent because the
people who agree are all in a row.
Total percent: If I ask,
"What percent of
the respondents are men who agree," the
answer is 25/137*100 = 18.2%. The base of this percent is
the total number of respondents. This is called a total percent
because the base is the total number of people.
January 19 - We explored the Microcase software. Many of
the examples were from the Introducing Microcase exercise in the
workbook.
January 17 - Discussion of the syllabus, the use of WEBCT and
Microcase. If you are joining the class late and need help, check
with the teaching assistant Robyn DuFrain
byn@camden.rutgers.edu