![]() |
|||||
|
Sweet/Swett DNA project |
|||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A whole new field of genealogical research is now opening up, thanks
to modern science. It relies on the simple biological fact that the
Y chromosome, unlike all others, is passed intact and virtually
unchanged from father to son down through the generations. It is
therefore possible to do a simple DNA analysis on two men and learn
immediately (well, actually the analysis takes a few weeks) whether the
two are related on the paternal side or not. In particular, by testing
direct male-line descendants of the early Sweet
immigrants to the New World, we can learn whether the immigrants were
related or not. Obviously, it is necessary to test several such
descendants for each immigrant in order to be really sure of the
results, but that's all it takes. The testing so far indicates that the
various
Sweet immigrants were from different families, and so it is now possible to
do two things that Sweet genealogists previously only dreamed of:
(1) any living male Sweet who is having trouble connecting
back to the era of the immigrants can take the DNA test and find out
which group is his, and (2) testing Sweets in England or elsewhere can provide
direct proof of which Sweet immigrants are related to which Sweets
in the old country. Needless to say, it may take some time to build
a catalog of all the distinct Sweet families, but every new test brings
us closer to the goal.
For an introduction to the field of DNA-assisted genealogy, view
Thomas Roderick's write-up. In case you want just the shortest
possible description, here it is: the test measures the lengths of
a small number of specific sequences (normally called loci or markers)
on the Y chromosome. These sequences, like most of the Y chromosome,
don't have any known genetic function, but comparing the lengths from
different test subjects can reveal how closely they are related.
Note: the test is not designed to reveal any physical
characteristics or innate tendencies. The reason it works for our
(genealogical) purposes is that the observed changes in sequence
length are neither harmful nor helpful; they simply happen now and
then, and they persist because the body doesn't notice the difference.
These persistent-yet-changeable lengths allow us to tell families
apart.
Since this test applies to the Y chromosome, the test subjects have
to be male and, in particular, have to have the surname Sweet or similar
(with a few exceptions due to adoptions, name changes, and such).
If you are interested in helping the study, but are
not a potential testee yourself, here's what you can do. Basically, there is
a list of prospects, and you just have to work your way down the list
until you find one that works...
Lots of researchers focus on their own ancestors, so that the
"and so on" may require research you haven't done yet, but it's still
something that should be within reach if you start working on it.
Note: even if you can't find a male-line cousin to take the test,
you can help the project in another way -- by making a contribution
to our General Fund that is used to help pay for the testing of
those who are not so well-off. (See below.)
The goal in all this is to come up with (collectively) at least two
male-line descendants of each identifiable Sweet "founder," preferably via
at least two different sons of the founder. Assuming the
DNA test results agree for the documented descendants of the
progenitor, we can "reconstruct" the haplotype
(DNA pattern) for that progenitor
and then compare against the haplotypes of other progenitors to see
if they were related. It's really that simple. Consider, for example,
the two John Sweets who came to Massachusetts in the early days and
both started out in what became Essex County. There is evidence
that the one who settled in Newbury spelled his name both Sweet and Swett, but
always pronounced it Swett, and it seems likely that many modern Swetts
in the US are descended from him. The other one settled in Salem but
soon relocated to Rhode Island (or, at least, his family did -- the
founder's death is not recorded, and so it's not clear whether he actually
moved or not). At any rate, many people have assumed these two were cousins,
but nobody has any proof. A DNA study can settle, once and for all,
whether the two were indeed related, thereby moving the
whole question from the realm of speculation to the realm of fact.
(Read on for the answer to this old conundrum.)
Another example is the line of Sweets in Attleborough, Massachusetts.
The earliest confirmed ancestor of this line is Henry Sweet, who
first appeared in the public record at the time of his marriage in
1687. Although his descendants lived near the Rhode Island Sweets,
there is no known connection other than geographic. A DNA study can
demonstrate that a family connection does or does not exist. Similarly,
there are many other early Sweets in or near Rhode Island who are
assumed to belong to the Rhode Island branch, often on the basis of
incomplete or circumstantial evidence. Of course, a finding that all
such groups share the same Y chromosome would not, by itself, prove
that all are descended from the immigrant John Sweet.
Other variants of the name include SWITT, SWEATT, and SWEAT. It remains
to be seen how and whether these are related.
We have arranged with FamilyTree
DNA (FTDNA) to offer a reduced, group rate of
$99 plus shipping per 12-locus DNA test to members of our project
or $124 plus shipping per 25-locus test. (There is also a group
rate for the 37- and 67-locus tests, but there is no evidence yet that we
need those tests, except in rare cases.)
The test kit is very simple and comes in the mail with
complete instructions: basically, it contains three swabs to
be rubbed on the inside of the mouth to collect loose cells. The
swabs are then popped into a preservative and mailed back to the lab.
The kit comes with an optional release form that requests FTDNA to
give your email address to any present or
future project participant who matches you exactly on the DNA test. If you
decide not to sign the release form, or simply forget to return it,
your privacy will be absolutely
protected, and FTDNA will not notify you or anyone else about matches
with your DNA. There is also a space where you can write down the country
of origin of your ultimate ancestor -- this is optional
and has no bearing on the present study.
For more information contact our project coordinator:
John Chandler.
FTDNA has established a fund to be used within our project to help
defray the costs of DNA testing. The company matched the first
$200 of contributions to this fund, but we are on our own now.
The intent of this fund is to secure the participation of potential
testees who seem likely to contribute to the success of the project
as a whole and who otherwise could not (or would not) join.
If you would like to make a donation, please visit this site:
In Table 1, 4-, 5-, or 6-digit ID's
refer to FTDNA results; a prefix of "N" before the ID means the testing
was done originally as part of the Genographic project
(see below);
a row beginning
with a name is the inferred haplotype of the family patriarch.
ID's with a mixture of lower-case letters and digits refer
to test results from outside sources.
Results discovered in the SMGF database) are
designated as "sm" followed by a number, for example, "sm10".
Other results, located in
the Ysearch database, are
designated by the Ysearch ID. Results from other labs, contributed
by the participants, are uploaded to Ysearch
to make them publicly available outside of the circle of this project.
At present, we are still "exploring the territory." Until recently,
all of the samples tested belonged to the same general group (known as
HAPLOGROUP R1b, the most common such group in western Europe). Even
now, there are only a few exceptions, and most of these are
from haplogroup I1, the second most common in that area. However,
we also have one from haplogroup E3a, which is more common in Africa.
Within the project, we can distinguish five subgroups or patterns
confirmed by multiple samples and
corresponding to known 17th-century ancestry, but there are also a
considerable number of samples with no close matches. For the time
being, the unmatched "R1b" samples are displayed along with Pattern 1,
since that is the largest group, and the non-matching, non-R1b samples
are placed as "Other."
Within each of the first two subgroups,
there is an evident "majority" pattern, and markers that
differ from the majority are colored gray in the table. Note that
DYS389ii is tabulated as reported by the testing lab, but that length
actually includes two pieces, one of which is already reported as
DYS389i. We therefore use the differences between "ii" and "i" for
the purpose of comparison in the column marked "389ii". Another
complexity comes into play with DYS464, which appears four times in
the genome. Since we cannot tell the four instances apart, they are
conventionally reported in order of increasing size. In principle,
when comparing two haplotypes, one could check off any matching
instances within DYS464 as true matches, even if they appear in
different columns, but that practice leads to a trade-off between the
number of matching markers and the sizes of the mismatches. Note that
this same ambiguity also appears with DYS385, DYS459, and other multi-copy
markers, but to a lesser extent.
When comparing haplotypes, there are two very different possible
contexts. In the general context, when the two persons are not known
to be related, and they may have nothing in common besides their surname,
the only relevant information may simply be the two haplotypes.
The degree of the relationship, if any, is the count of generations
from person "A" back to the most recent common ancestor and from there
down to person "B". When the two persons are in the same generation,
their degree of relationship is just twice the time (measured in
generations) from their common ancestor. This time is generally
abbreviated as TMRCA.
How do we get from the haplotypes to the relationship or TMRCA?
It is important to remember that we can never deduce the
relationship exactly from just the DNA testing.
However, there is a fairly simple procedure
for getting a approximate answer. By counting up all of the differences,
we get what is called the "genetic distance" and we can estimate the
TMRCA, given the number of markers compared and the average rate of
mutation for those markers.
Note that the
genetic distance as described here is an over-simplification, since
two discrepancies of one step each do not really count the same as
one discrepancy of two steps. For example, when the
genetic distance is zero, the most likely TMRCA is also zero. (To
put it another way, when two samples match exactly, we would suspect
they were taken from the same person if we didn't know better.) When
the genetic distance is one step out of 25 markers with an average
mutation rate of 0.0023 per generation, the most likely relationship is 17
generations counted up and back. Of course, this is only an estimate!
There is an additional conversion from generations to years which depends
on the average generation length. We use a round number
of 30 years per generation in this discussion.
The other context for comparing haplotypes occurs when
there is a known relationship between the
two persons being compared. Then, the question under consideration is
whether the DNA results are consistent with the conventional
genealogy. This question is not as simple as it may seem, since
the expected answer is "yes" or "no," but the real answer is a
statistical one.
The haplotypes now in hand do include four 12-marker exact
matches. The largest group of these matches forms the majority of
Pattern 1, as shown in Table 1, and includes John SWEET of Salem.
Nearly all of these now have 25-marker results, and
a clear majority
match 25/25 as well. We discuss the 25-marker results in the next section.
Four other results are only one step away
from this bunch of 12-marker matches.
Three of these, 13407, 23714, and 60265,
all have the same one-step mutation and
therefore constitute a second group of exact matches within Pattern 1.
Although
that result could be a mere coincidence, it seems more likely that these
test subjects share a common ancestor who passed that mutation on
to all three of them. Since 23714 is the one with the nearest "brick wall,"
this shared mutation may be a vital clue in tracing his ancestry. On
the other hand, all three have now extended their tests to 37 markers and
found two discrepancies at that level (both discrepancies on a pair of markers
with a high mutation rate). This result leaves open the question of
whether their shared mutation on DYS439 is a true link or just a
coincidence.
Besides these matches, we have four more, one making up the
majority of Pattern 2 and one each in Patterns 3, 4, and 5.
As the project grows, we
can expect more matches, since there are now many unique haplotypes,
and each one may find kin at any time.
Within Pattern 1, there is quite a variety of haplotypes, but most
of them, when considered in pairs on the first 12 markers, are not so very
different as to positively preclude them from sharing a common Sweet ancestor.
For example,
based on the statistics of other DNA studies, and counting only the
first 12 loci, the estimated date of
the most recent common ancestor for the cluster and 9678 is about five
centuries ago, give or take four centuries (and the same for 9678 and 15624
or for 9678 and 16328);
for 9678 and 9958, it is about eleven centuries ago,
give or take seven; for 16328 and the cluster of matches,
it is about nine centuries, give or take five.
Unfortunately, as the previous paragraph shows, testing only 12 loci
leaves a rather broad uncertainty in the estimated relationships among
members of a populous haplogroup. Even the distinctively different
Pattern 2 is only three steps away from Pattern 1 on the first 12. We
get a much clearer picture with 25 loci, as discussed below, and it
becomes clear that some of the Sweets in Pattern 1 are unrelated to
others within a genealogical time frame.
Because of the technology and economics of DNA testing, it is customary to
measure several loci in one procedure, called a multiplex. A combination
of one or more multiplexes, offered as a "package" test, is often called
a "panel." Thus, the first 12 loci constitute FTDNA's first panel, as
well as the entry-level test. The next 13 constitute the second panel,
and these two panels together make up the next-higher-level test. There
is no option to test the second panel without the first. (It may be
possible to test the second panel "alone" at some combination of other
labs, but inter-lab standardization is one of the weak points in the
DNA testing industry.) We recommend that participants in this project
take the 25-marker test.
The tight bunch of exact matches opens up a little when we consider
25 markers, though a majority of these results still agree exactly at 25/25.
For simplicity, we refer to the shared 25-marker haplotype of 9866
and the other exact matches as "the cluster."
12256, 33402, 47939, 60265, and 105761 are only one step away from the cluster,
while 18443, 28741, 30688, and 37161 are two steps.
9579, on the other hand, is seven steps from 9866 and eight from
12256. (However, 9579 may be a special case due to a rare type of
mutation that affects several markers at once.)
The three samples mentioned above (13407, 23714, and 60265)
as being close in terms of 12
markers are also very close to the cluster (only one
step in 25 markers) and almost as close to 12256 (two steps). In
addition, these three are an exact match to each other at 25 markers.
Our conclusion is that the cluster members have inherited the
ancestral haplotype of this group, while 12256 and others have
inherited one mutation each; 18443 and others have inherited two; and so on.
In general, the most common haplotype
found in a lineage group is indeed the haplotype of their common
ancestor.
The first two participants to join
this project, 9579 and 9678, are both within one step
of the cluster as viewed with 12 loci, but
25-marker testing has set them apart (seven steps each from the cluster
pattern and six steps from each other).
These results show them to be very likely unrelated to each other and
to the cluster as well (although six of the seven steps of difference
for 9579 could be explained as a single recombinant event).
Note that neither one has been traced back
by conventional genealogy to the common ancestor of the cluster.
These 25-marker results illustrate one of the pitfalls of DNA testing:
apparent matches based on 12-marker results can be deceptive. The
match on 12 markers, in combination with the fact of
a shared surname, would normally indicate that those testees are
rather closely
related (i.e., on a time scale of hundreds of years or less).
However, the 25-marker results show a much wider variety of haplotypes
within Pattern 1,
with differences indicating a separation of perhaps a
thousand years or more. How can such a discrepancy occur? The
answer is in the relatively small number of loci tested and the
relatively slow rate of mutation. All the time estimates carry a
large uncertainty. It is still possible that these testees are
a bit less distant than they now seem, but it does appear that we have
different groups of Sweets with the same or similar 12-marker haplotypes.
We are thus unsure of exactly how they are related. All we can say
is that they are probably not
descended from the same immigrant Sweet. When we have
consistent results from
documented descendants, or a clear consensus of testees on a regional
basis, the whole picture will be clearer.
The TMRCA estimates change in two ways when we consider 25 loci: first,
some of the estimated times become much longer, and, second, the
uncertainties in the estimates are smaller (but not tight enough
to be considered "precise").
Counting all 25 loci, the estimated time has jumped to about 14
centuries for 9579 and the cluster, give or take five centuries.
(If the differences are interpreted as due to recombination, however,
the estimate becomes six centuries, give or take four.)
Depending on whether the true date falls near
the beginning or end of these still rather broad ranges, the similarity
of the DNA could be either a coincidence having nothing to do with the
shared SWEET surname or the direct result of descent from a common
SWEET ancestor.
As is clear from the shading, the samples shown at the end of Pattern 1
are all very different from each other and from the cluster. For these,
taken against each other in pairs, or taken one at a time against
the cluster, the estimated dates of common ancestor
are all more than a millennium ago, with only two exceptions.
First, the estimate for 9678 and 15624 is only nine centuries back,
give or take four. These two might therefore be related within
a genealogically useful time frame, but it is by no means certain.
The other exception is 22528, who matches 9958 11/12 on the first panel
of markers and matches 9579 13/13 on the second panel. The estimate
for 22528 and 9579 based on all markers is also nine centuries.
All the other pairs are probably not related within genealogical time.
Because of the striking coincidence that 9579 matches the cluster
perfectly on the first panel and matches 22528 perfectly on the second,
we considered the possibility of a lab mix-up. However, FTDNA
obligingly reran several tests, including the second panel for 9579
and the first panel for 22528, and confirmed the original findings.
This leaves us with the striking coincidence intact. Because of the
first-panel agreement between 9579 and the cluster, we might argue
that his second-panel results are due to a recombinant event that
simultaneously equalized his multi-copy DYS459 and DYS464 markers, but
the same argument cannot be made for 22528, since his first panel
differs sharply from the cluster. This combination of results is one
of the cases where 37-marker testing could be very useful in
illuminating the relationships.
22528 is interesting for another reason. As shown on the lineage
page, he and 18443 claim a common ancestor (Eber Sweet) five
generations ago. However, their DNA does not match. Therefore, we
must conclude that there is an error somewhere in the line for at
least one of them. Additional testing has added to the picture,
however. 30688 traces back to a supposed brother of Eber,
and, although the initial report of the results for 30688
showed a substantial difference between him and 18433, a subsequent
correction reduced the difference to two steps. What's more, they
share a mutation at DYS449 from the rest of the Pattern
1 cluster. The
genealogical evidence linking the supposed brothers is only
circumstantial, but this shared mutation goes a long way toward
confirming the close tie.
In the meantime, another descendant of Eber Sweet has now taken
the DNA test, and his results are illuminating. He is only one step
away from the ancestral Pattern 1 haplotype, and that one step of
difference is the same shared mutation in 18443 and 30688. This comparison
supports the theory that all three are closely related.
If another descendant of
Elijah could be recruited, the additional comparisons could help to
confirm this theory. That still leaves the question of 22528, whose
DNA does not match the others. Perhaps more testing of Eber
descendants would pin down the point of departure.
In the meantime, another member of the project has turned up with
the same mutation at DYS449 shared by the three men discussed above.
This result suggests that the new member, N44777, may be closely related
to the other three.
Pattern 5 is at present only that -- a DNA pattern. There is no known
genealogical connection between the two members of this cluster, and,
in fact, there is good reason to believe that the nearest possible
connection is four centuries ago. 21864 has traced his lineage back
to Henry SWEET of Attleborough,
probably a first-generation immigrant to New England, although
there is a wide-spread tendency to confuse this Henry with a
grandson of John SWEET of Salem. Before we can pin down Henry's
haplotype, we must locate another
descendant (preferably from Henry's son John) to confirm the DNA reading.
In the meantime, however, this result is a 24/25 match with that of
134688, whose ancestor immigrated in 1853.
One member has received a very unusual DNA report - instead of
25 markers, he apparently has 29, including four extra copies of
DYS464. Indeed, this situation is so unusual that FTDNA has no
procedure for reporting more than three extra copies, and so the
testee's results were reported with only 28 markers. However, inquiries
sent to the lab about the difficulties of assessing the test for
DYS464 elicited the response that they were very sure and, yes, there
were eight copies, not just the reported seven.
Many participants have now extended their tests to 37 or 67
loci at FTDNA or have obtained results from other labs for loci
beyond the first 25. The cluster of descendants of John
Sweet of Salem, in particular, now has a clear consensus through 37
and the beginnings of a consensus at 67. Indeed, one member of the
cluster agrees exactly with the consensus to 37 (and thus with the presumed
ancestral pattern). Of course, this agreement does not make him in any
sense a closer relative of John Sweet than the others, and in fact he is
currently the most distant by count of generations.
Pattern 2 and Pattern 3 both also have the beginnings of a consensus
to 37 markers, but the consensus in Table 2 for Pattern 2 is rather spotty
because the two bona fide members have tested only 4 markers in common
in Table 2. Between them, they cover all but one of the loci in tables
2 and 3 (50 in all), but we have no way to be sure that all of these
results are ancestral. In contrast, Pattern 3 has only two members, and
both have tested 37 and agree on 36 of the 37.
The shading of Table 2 and Table 3 is done according to the same
rules as in Table 1.
We now have a number of test results for subjects named SWETT, mostly
descendants of an early Massachusetts immigrant,
John Swett of Newbury, who was long thought to
be related to John Sweet of Salem. These Swetts are quite different from
the cluster of Sweets, apparently separated by over three
millennia and in some cases much longer. Among these, we have another
cluster as well as some that stand apart (more about them in a moment).
Within this Swett cluster, 20321 and two others have the central haplotype,
and the others differ by one step each, indicating that
they are all probably related through a common ancestor between 2 and 43
generations ago. Indeed, their documented mutual common ancestor is 9-11
generations back from all of them.
We have, in addition, one member who exactly matches the apparent
ancestral haplotype, but whose surname is not Swett. As it happens,
this person had long been searching for evidence of his biological
grandfather. One (and only one) of the candidates was a Swett.
Therefore, this DNA test, along with the previous documentary
research, almost certainly
confirms the Swett as the true grandfather and provides a line
back to the immigrant John of Newbury.
sm10 is a special case. This test result was discovered in the
Sorenson Molecular Genealogy Foundation (SMGF)
on-line database,
including a pedigree extending back to John Swett of Newbury. Although
the name of the test subject was suppressed for privacy, it is evident
that this family has spelled the name as SWEAT since the 1700's.
Unfortunately, despite the detailed pedigree, the DNA test
results do not match the others. One possible explanation can be seen
in the pedigree itself, where John SWEAT's birth in 1789 is listed as
being only seven months after his parents' marriage. There is no
reason to focus specifically on this timing irregularity, but it could
be a break in the lineage, and only one break is needed to explain the
DNA discrepancy. It would be helpful to test others in this line,
such as descendants of the younger brothers of John Sweat, to determine
where the break might be. One such test has already been partly done
at SMGF, in the person of sm25, whose pedigree indicates he is a second
cousin once removed to sm10, and whose incomplete DNA results
show him to be close to the ancestral haplotype, matching the consensus
on 16 of the 17 comparable loci. Unfortunately, it
appears that the test of sm25 failed to produce usable results for
many loci, and we may never learn exactly how close he is to the
ancestral pattern. If we assume that the 16/17 match is indicative
of the closeness of the missing results, we must conclude that
the break in the lineage for sm10
is quite recent.
(See the lineage page.)
We have three more special cases. 60470 is from another Massachusetts
SWETT family which has been said to be related to the Newbury line.
However, the DNA testing shows no similarity at all (with a separation
perhaps in the tens of thousands of years). 62552 and 63636 are from a family
that has spelled the name SWEAT and SWETT. They all share the distinction
of belonging to a different haplogroup (see below) from Pattern 1 and
Pattern 2, but 62552 and 63636 are still very different from 60470.
Although the names SWEET and SWETT have long been
considered variants of each other, we must conclude that the
particular instances now in the project are not related. More
precisely, we have found clusters of project members
who are related to each other, but no cluster contains both surname variants.
Another variant of the name is SWEAT. As described above, some families
have switched back and forth between the SWETT and SWEAT spellings, but
it's not yet clear whether the two spellings have different origins.
We have one participant with the SWEAT
surname who ordered a test kit, but he has not sent the kit back to the lab.
Despite that setback, we have one Sweat result anyhow for comparison,
even though not a member of our project. FTDNA graciously did
a 12-marker comparison for us between their Sweat customer
and all of our members and reported that no haplotype in our project
(as of 2004 Nov 2) came within four steps of the Sweat's haplotype.
In September of 2005, we gained another SWEAT participant via the
Genographic Project of the National Geographic Society.
Because the pronunciation of SWEAT matches that of SWETT, we might
expect him to match the descendants of John SWETT or Pattern 3, if anybody,
but the fact is that his closest DNA neighbors (all those within four
steps) are SWEETs. We therefore list him with the first group
in Table 1. Based on the 2004 report from FTDNA, we know that he
cannot be within two steps of the original SWEAT customer.
We also have two members named SWEATT who are apparently not related
to any of the SWEAT members, nor to each other.
To show these results in the context of ordinary
genealogy, we have a page with the lineages
supplied by the participants. For the most part, the two types of
evidence agree -- most of the participants whose DNA matches also
share a known common ancestor, and most of those who claim common
ancestry also match. The exceptions are marked with footnotes.
Although the project accepts test results from any lab, most of
the testing so far has been done through FTDNA. Therefore, the
automatically updated web site
provided by FTDNA includes most of the project results. Indeed, it shows them
immediately, as soon as they are returned from the lab, and so you may
wish to visit that site to see the comparison of new results with
those already posted here. The only problems are that the FTDNA site
doesn't indicate the discrepant loci by shading and is
sorted by allele lengths from left to right within each pattern
grouping, sometimes making it awkward to pick out and compare specific kits.
In population genetics, individual haplotypes are classified into
broad categories called haplogroups. A large majority of Europeans
fall into three of these, designated Haplogroup R1b, Haplogroup I,
and Haplogroup R1a
(known as hg1, hg2, and hg3, respectively, in an older nomenclature).
We anticipate that the same will be true for the Sweets. Indeed,
nearly all so far are R1b, with only a few others.
One consequence of this clustering is that some
individuals are seemingly very similar to others, to the extent that
the majority of loci agree. However, to establish relatedness (on a
genealogically interesting time scale) requires that the
results be virtually identical. In any case, since "R1b" and "I" are
both widespread in Europe, the DNA results cannot
generally pin down the ethnic origins
of the Sweet lines in terms of Anglo-Saxon vs. Norman vs.
French vs. "other." R1a, when found in England, is usually considered
to be an indication of Scandinavian origin, but it may also come from
Germany or eastern Europe. (To date, we have not found any R1a in
the project.)
The one exception to this uniformity is the
member whose haplogroup is E, which is rare in Europe, but common
in Africa.
One way to get an idea of where the samples fit into the global scene
is to search for matches in the databases maintained by forensic DNA
researchers. The largest such database,
then known as YSTR,
contained 13,223 anonymous haplotypes from all over Europe as of
November of 2003.
This database tabulates nine (or in some cases ten)
of the twelve basic markers used
by FTDNA. It is thus much less specific than the commercial test,
but it still serves to indicate whether a given haplotype is relatively
common or rare.
(A portion of the database includes two extra markers,
one of which is also one of the FTDNA basic set, but only a small portion.)
Thus, for example, the haplotype of the cluster (and others) appears in 37
of the 13,223 samples. These matches are spread across Europe from
Spain to Lithuania, with several in England as well.
9678 has even more: 92 matches similarly spread all across western
and central Europe. As it happens, the one-step difference between
the cluster and 9678 places the latter one step closer to the modal haplotype
of "R1b." Also, 9958 falls even closer
to the mode of "R1b," and we find an even higher frequency:
375 matches, again spotted all over Europe.
These high frequencies are part of the explanation of why
unrelated Sweets in particular can appear to
be near-matches, or even exact matches, on a limited set of loci.
In 2004, the YSTR database was reorganized to include samples from
other continents (previously stored separately). It is now possible
to search the global database with a geographic or ethnic focus.
FTDNA maintains three databases of Y DNA results, all searchable
by customers only. One database gives
the haplogroups found for test subjects whose
haplotypes come close to the customer's own haplotype.
Another database gives the claimed
ethnic origins of near matches.
Both of these are anonymous databases containing a
mixture of customers and academic test subjects. The third database is
just customers and includes names and email addresses, but it
reveals these only under certain circumstances. First of all,
for 12-marker testees, it looks only for exact 12/12 matches or one-step-off
near-matches within the same surname group. For
25-marker testees, it looks first for 12-marker-type matches and
then for 24/25 or 25/25 overall and finally for 23/25 (provided that
the two discrepancies are no more than one step each). The scope of
these searches is limited to those who have sent in a signed release
form, but there is an optional limitation on top of that. By default,
each customer is set up for "private" searches, i.e., limited only to
members of the same surname study project. However, that setting can
be changed to "public" to include matches or near-matches among all the
customers who have similarly opted for "public" searches.
Instructions for changing this setting can be found at the
FTDNA web site.
FTDNA also operates a public database called
YSEARCH, where anyone can upload
a haplotype and related genealogical data, and anyone can search for
matches by surname or by haplotype. A similar database, called
YBASE, is operated by DNA Heritage.
Another public database exists for seeking genetic matches. The
SMGF is conducting a
world-wide research project that involves collecting DNA samples
and trying to correlate the patterns with past homelands.
The Y DNA data, along with the associated pedigrees, have been
made available on-line. Originally, these data could not be
searched by surname, but that capability was added in August of 2005.
Even so, the user interface is designed primarily to search for matches
to a haplotype that the user must supply, and the non-matching markers
are reported only as "not a match" instead of actual repeat counts.
In any case, the surname of each matching test subject is displayed
on the search results page. Note: to save
the effort of manually entering the haplotype for searching, we
have an index of the haplotypes
currently known for members of our project. Within this index,
you can click on a kit number (or the generic haplotype of a
lineage founder) to search the SMGF database for all matches and
near-matches.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||