Manuals for EurasianDNA 1.0 and EuropeanDNA 2.0 (see below)
EurasianDNA 1.0, the first test of its kind ever developed,
is a pan-genome test that reads your DNA across all 23 of your chromosomes
to report your Sub-European population (i.e. “ethnic”) affiliations.
The EuropeanDNA 2.0 test takes the two categories from the 1.0 test that include Northern and Southeastern European ancestry and divides them further into Continental European categories: Southeastern European (SEE) from eastern Spain across Italy, Greece and Turkey, to Bulgaria and Armenia. This would include the Jewish populations of that area (not as a religion). Iberia (IB) - the Iberian Peninsula of Spain and Portugal; Basque (BAS) the region of the Pyrenees Mountains between Spain and France; Continental European (CE) which includes the central portion of the European continent such as Germany, France, Switzerland, the Netherlands, etc. plus Great Britain and Ireland; Northeastern European (NEE) from Poland, the Baltic Countries, Western Russia, the Ukraine and Belarus into the Scandinavian Countries to Northern Finland including the Sami populations of Lapland.
Information available from the EurasianDNA 1.0 test contains Middle Eastern and South Asian ancestral heritages. These are not specifically included in the data from the EuropeanDNA 2.0 test, except that the South Asians did become the Romas (Gypsies) of Europe. The Romas settled mainly in Iberia and Eastern European countries such as in Romania and Bulgaria
The development of the EuroDNA™ tests (EurasianDNA 1.0 and EuropeanDNA 2.0) was made possible by the human genome project and innovative
research at DNAPrint® Genomics, Inc. The test is suitable for any person who
has taken AncestryByDNA™2.5 (ABD2.5) and obtained a European score
of at least 50% but less than 40% East Asian, less than 15% sub-Saharan African and less than 15% Native American ancestry.
EurasianDNA1.0 - The test
The “European” human population group, as we define it, corresponds to the monophyletic lineages that contributed predominantly to populations in Europe, the Middle East, Central and South Asia beginning approximately 40,000 years ago. Most “Europeans” speak languages derived from the Indo-European family and the systematic distribution of lighter pigmentation traits (skin tone, hair color, iris color) is exclusive to this group. However, it is not true that all “Europeans” speak an Indo-European language and/or are characterized by light pigmentation.
EurasianDNA 1.0 reports your European ancestry admixture much like ABD™ 2.5 does, but looks deeper within European lineages for individuals who are predominantly European: |
|
- Northern European subgroup (NOR)
- Southeastern European (Mediterranean) subgroup (MED)
- Middle Eastern subgroup (MIDEAS)
- South Asian subgroup (SA)
These groups were defined empirically – that is to say, research at DNAPrint® with DNAPrint® genetic markers reveals that the assumption of a 4-group sub-structure model within Europeans “fits” the best (though admixture within parental samples clustered as a function of geopolitical identity, and made geographic sense whatever the model).
We use the same methods and algorithms as ABD 2.5™, except the Ancestry Informative Markers (AIMs) are different.The EurasianDNA 1.0 test is comprised of 320 European AIMs, obtained from screening tens of thousands of candidates from human genome databases and DNA microchips. A fraction of these 320 are also used in the ABD™ 2.5 test, but most of the markers that power the test are brand new discoveries from recent DNA chip and Ultra High Throughput genotyping research.
The primary parental samples used to develop EurasianDNA 1.0 were collected in Europe, the Middle East and India. Each sample was anthropologically qualified; each individual and all 4 of their grandparents lived in the appropriate geopolitical region, spoke the appropriate language for the region, reported full ethnic affiliation with the parental group corresponding to the region and reported no known admixture for any of his/her grandparents.
When we add parental samples of unknown ethnicity from the United States to this mix, and use admixture determination methods that do not require prior information or definitions, we see a natural coalescence of individuals to genetic subgroups that make sense in geographical and historical terms. In other words, we obtain the same divisions as when we use samples with prior information (our “parental” samples), which illustrates that geopolitical and sociocultural notions of “ethnicity” correspond well to anthropological and genetic partitions within this region of the globe. Variation within subgroups is the rule, not the exception. However, the genetic distances between European ethnicities are lower than between the world's continental populations (which means they are derived from more recent common ancestors, and share more in common genetically). Due to the complexity of European demography and history, self-held notions of ethnicity do not correspond perfectly with anthropology, as you will appreciate if you read further.

Figure 4GROUPBAR
Figure 4GROUPBAR shows EurasianDNA 1.0 results are sorted by geopolitically defined ethnic identity, which is to say that the category along the x-axis (i.e. Irish, Italian ) is based on political borders and socio-cultural identity rather than genetics or anthropological history. Each colored column is an individual. The percentage of affiliation with each of the 4 genetically defined subgroups is shown along the y-axis – each genetic group shown with a different color. DNAPrint uses its own algorithms for the test, but this view of the data was produced using a third-party clustering program for visual presentation purposes. Inspection of the European samples in this figure shows (refer to FIGURE 4GROUPBAR):
a) Almost all of the individuals showing predominant affiliation with the genetically defined NOR subgroup designated by the yellow color (Genetic subgroup “NOR”) are Northern European or Irish.
b) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the red color (Genetic subgroup “MED”) are Southeastern Europeans (Greeks or Turks).
c) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the green color (Genetic subgroup “MIDEAS”) are Middle Eastern.
d) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the blue color (Genetic subgroup “SA”) are South Asians from the Indian sub-continent (i.e. Indian)
Table 4GROUP shows these data in a different, tabular format – listing the average admixture percentages for the 4 genetically defined within-European subgroups. The average sample size for the 10 ethnic populations shown in Table 4GROUP is 41. Admixture percentages greater than 25% are highlighted.
Table 4GROUP
Sample
|
NOR |
MED |
MIDEAS |
SA |
Northern Euro |
82 |
5.5 |
11 |
1.5 |
Irish |
77.8 |
7.8 |
8.5 |
6 |
Iberian |
48 |
29.5 |
7 |
15.5 |
Italians |
35 |
46.1 |
8.9 |
10 |
Greek |
15 |
78.1 |
1.9 |
5 |
Turkish |
22.9 |
54.6 |
8.4 |
14.1 |
Middle East I |
3.9 |
41.7 |
52.2 |
2.2 |
Middle East II |
4 |
7 |
84 |
5 |
South Asian |
2.6 |
4.7 |
3.7 |
89 |
US Caucasians |
54.5 |
22.4 |
12.6 |
8.3 |
What do the groups NOR, MED, MIDEAS and SA mean?
The arrangement NOR, MED, MIDEAS, SA, is in a northwest to southeast orientation, reminiscent of the subgroups defined by clines in synthetic maps of Europe and Western Asia obtained from classical blood group markers and Y-chromosome haplogroups (reviewed by Jobling et al., 2004). Like anthropometric traits such as hair and eye color, and the chronological and geographical distribution of archaeological evidence, the distribution of genetic variation with EurasianDNA 1.0 shows the same principal component of variation in a northwest to southeast orientation. For example, Northern European and Irish have the highest affiliation with Group 1 (TABLE 4GROUP and yellow color, FIGURE 4GROUPPIEMAP), which suggests that this genetically defined subgroup corresponds to the most northerly subsets of Cavalli-Sforza et al’s clines (1994, also see Jobling et al., 2004). NOR thus seems to correspond to “Nordic” anthropo-genetic identity, hence the name “NOR”. As we might expect of such, the percentage of NOR affiliation appears to increase gradually from the Middle East to Scandinavia and the United Kingdom, a trend that is most easily visualized in FIGURE 4GROUPPIEMAP (yellow color). Also, as we might expect, the percentage of MED affiliation increases gradually moving from Northern Europe to the Mediterranean Southeast of Europe. Geopolitical ethnic groups that were not used in developing the EurasianDNA 1.0 test, such as Iberians (Spanish, Portuguese) and Italians, type with EurasianDNA 1.0™ as of anthropo-genetic identity intermediate to Northern Europeans and Southeastern Mediterranean individuals (notice the yellow and red colors for Iberians in Spain and Italians are about half way between the levels in Ireland/Scandinavia and Greece/Turkey . Similarly, a Middle Eastern population (MIDEAS I) taken from a different region of the Middle East than the Middle Eastern population used as the “parental” or “reference” group shows predominantly Mediterranean and Middle Eastern genetic identity (this group is represented by the pie chart in FIGURE 4GROUPPIEMAP over Saudi Arabia, with half red and half green color). These types of results constitute a very important validation of EurasianDNA 1.0 – because it shows that as we might expect, gene flow or genetic relatedness between subpopulations is directly proportional to geographical distance. In other words, a test that did not work would probably show “confusing” results for Iberians and Italians relative to the results in Northern and Southeastern Europe, or between Middle Eastern sample I versus Middle Eastern sample II. US Caucasians, as one might expect based on the history of European migration to the new world, show an affiliation pattern that seems due to an unequal contribution from Northern Europeans and Mediterranean peoples. |
Blind Challenge with ethnically admixed “Caucasian” samples:
In addition to these samples of formally characterized ethnicity, we have
typed a variety of samples of self-proclaimed ethnic identity. These samples
were admixed samples of self-reported ethnic information. Of course this
is not as reliable a test reference as anthropologically qualified samples,
but they give us an idea of what to expect in mixed population such as those
we find in the United States. Each subject reported the ethnicity of all
4 grandparents and was binned into one of five arbitrarily defined ethnic
groups (I, II, III, IV, V) if at least 2 of the grandparents came from that
group. The average sample size for these groups was 70:
TABLE 4GROUPTEST
| I |
At least half of the grandparents from Ireland, Western Scandinavia,
Great Britain |
| II |
At least half of the grandparents from France, Germany, Denmark, Finland,
Holland
Poland, Eastern Europe or Russia |
| III |
At least half of the grandparents from Spain or Italy |
| IV |
At least half of the grandparents from Greece or Turkey |
| V |
A random assortment of US Caucasians regardless of from where their
grandparents originated |
| (same sample as shown in TABLE 4GROUP) |
| Group |
NOR |
MED |
MIDE |
SA |
| I |
56.4 |
20.8 |
16.4 |
6.4 |
| 11 |
51.2 |
24.5 |
15.2 |
9 |
| III |
46.9 |
33.8 |
10.6 |
8.8 |
| IV |
45 |
41 |
9 |
5 |
| V |
54.5 |
22.4 |
12.6 |
8.3 |
Although most of these individuals were of admixed ethnicity, pattern in
the results agree with the self-held notions of ethnic affiliation. As can
be seen from these results, the average individual claiming at least 50%
Northern European ancestry (Group I) showed predominant affiliation with
the NOR group, as did individuals claiming at least 50% French, German, Danish,
Dutch, Eastern European or Russian ancestry (Group II). In contrast, individuals
claiming at least 50% Greek or Turkish ancestry (Group IV) showed significantly
greater MED ancestry. As expected, individuals claiming at least 50% Spanish
or Italian ancestry (Group III) typed in between Group I and Group IV individuals.
Looking at the data in a slightly different way, we tabulate the date in
terms of the frequency of observations >25% affiliation with the genetic
sub groupings in TABLE 4GROUPEVAL. The same basic trend obtains.
TABLE 4GROUPEVAL
Number of samples with >25%
affiliation with each of the 4 European genetic groups (columns) for the
average sample of 4 types of self-reported ethnicity (rows). Percentages
above 50% are shown in yellow.
| |
NOR |
MED |
MIDEAS |
SA |
| I |
1 |
0.28 |
0.17 |
0.06 |
| II |
0.98 |
0.32 |
0.17 |
0 |
| III |
0.88 |
0.75 |
0.13 |
0 |
| IV |
0.8 |
1 |
0 |
0 |
| V |
0.95 |
0.39 |
0.14 |
0.07 |
Click on the image to see larger image

All of the individuals in this particular sample reporting at least 50%
Irish/British/Scandinavian ancestry type of >25% NOR ancestry with EurasianDNA 1.0, and all of the individuals in this particular sample reporting at least
50% Greek or Turkish ancestry type of >25% MED ancestry with EurasianDNA 1.0. Most individuals claiming at least 50% Spanish or Italian ancestry type
of both >25% NOR and >25% MED ancestry with EurasianDNA 1.0. Notice
how the MED number rises as we proceed from Northern Europe (Group I) to
Spain and Italy (III) peaking in Greece and Turkey. This illustrates that
most Europeans are of admixed NOR and MED ancestry, (whether or not they
call themselves Germans, French, Italians etc.). US “Caucasians” seem
to be a mélange of NOR and MED ancestry, but in unequal proportions
as one might expect given the unequal northern versus Mediterranean founder
contribution to the Americas.
To summarize the results of the “blind” trial:
The more southeasterly the self-reported geopolitical origin, the more similar
the average profile is to our original parental samples from the southeast
regions of Europe. This constitutes an independent validation of the anthropological
relevance of EurasianDNA 1.0.
This would seem to support, or at least not refute, the anthropological
relevance of the 4 group genetic model of Europe identified by, and used
by, EurasianDNA 1.0. As we saw with the anthropologically defined samples
in Table 4GROUP, the MED genetic group affiliation levels are inversely proportional
to the magnitude of distance from the self-reported origin to the Mediterranean
Sea – French/Germans show more than British, and Spanish/Italians show
more than French/Germans. However, simply knowing the recent (on an anthropological
time scale) geographical origin of a person is not necessarily a very accurate
predictor of genetic group affiliation because as we will discuss below the
political history of Europe, the Middle East and South Asia is complex, and
admixture between ethnic subpopulations not uncommon in recent history.
Correlations with anthropometric traits.
If
the NOR, MED, MIDEAS and SA genetic groupings are of legitimate anthropological
base, and if there is an association between anthropology and physical appearance
(which we know there is), then there should be a correlation between admixture
for certain genetic groupings and the expression of physical traits. When
speaking of European populations, two obvious physical traits come to mind – skin
melanin content and iris color. Elsewhere on this website we have discussed
the correlation between skin melanin index and sub-Saharan African and European
admixture – with higher levels of European admixture correlated with
lower melanin index values in admixed African Americans and Puerto Ricans.
So too is skin melanin index differentially distributed within the European
group (mainly along a northwest to southeast gradient). We have also discussed
the association between higher levels of sub-Saharan African and lower levels
of European admixture in individuals of predominant European ancestry and
darker iris colors. Here we ask whether the same types of associations can
be seen within the European diaspora sorted by genetically defined sub-population
admixture proportions rather than continental ancestry admixture. Since blue
iris colors are far more frequently found in Northern European populations
relative to Middle Eastern, South Asian populations, then the degree of Northern
European admixture should be associated with lighter iris colors and Middle
Eastern and South Asian admixture should be associated with darker iris colors.
In other words, if the test works, we should see that individuals typing
of high NOR ancestry with EurasianDNA 1.0 tend to have lighter colored
eyes than individuals typing with low NOR ancestry, and individuals typing
with high MIDEAS or SA ancestry should show darker eyes on average than those
with low levels.
We digitally scored iris colors for some of the subjects shown in FIGURE
4GROUPBAR, and Tables4GROUP and 4GROUPTEST and as shown in Table K4ETHNIRIS,
significant genetic group admixture levels are associated with the expected
iris color shades. NOR admixture greater than or equal to 70% - 80% (rows
1-3) is strongly associated with iris color scores greater than 2.2, which
are scores corresponding to “Light” colors - scores of 2.2 or
greater correspond almost perfectly to perceived colors lighter than the
mean for individuals of majority European ancestry and includes the light
hazels, greens, grays and blues. In contrast, SA admixture levels greater
than 25% were associated with color scores below 2.2 (“Dark”),
corresponding to color scores below the mean (dark hazels, browns, blacks).
MIDEAS admixture was not associated with iris color score, though lighter
iris colors are not infrequently found in the Middle East. (Indeed, the genes
imparting lighter iris colors may have very well originated in the Fertile
Crescent prior to amplification by genetic drift in Europe starting 45,000
years ago).
Table 4ETHNIRIS
| |
Genetic Groups |
Admixture Levels |
color |
Exact-p |
Total European sample
(N) |
Sample Metting Threshold
(N) |
| 1 |
NOR (N. Euro + Irish) |
>70% |
Light (color score > 2.2) |
0.003 |
184 |
27 |
| 2 |
NOR (N. Euro + Irish) |
>75% |
Light (color score > 2.2) |
0.004 |
184 |
21 |
| 3 |
NOR (N. Euro + Irish) |
>80% |
Light (color score > 2.2) |
0.033 |
184 |
12 |
| 4 |
SA (South Asian) |
>25% |
Dark (color score <2.2) |
0.060 |
184 |
15 |
| 5 |
SA (South Asian) |
>30% |
Dark (color score <2.2) |
0.035 |
184 |
8 |
It turns out that iris color can be predicted accurately for individuals
only with a direct knowledge of pigmentation gene genotypes – a knowledge
of genetic ancestry enables only an indirect inference. However, this exercise
shows that there is a correlation between genetic groupings Northern European
ancestry, as EurasianDNA 1.0 determines it, and lighter iris colors,
and South Asian ancestry (as EurasianDNA 1.0 determines it) and darker
iris colors. Note that these are correlations between lower levels of admixture
and iris color in individuals that describe themselves as Caucasians – not
obvious correlations between very high levels of admixture and iris color
in a more eclectic sample of polarized ancestry (which would be trivial to
show). The results presented in TABLE ETHNIRIS suggest that iris color shades
are distributed throughout Europe as a function of Northern European ancestry
admixture, and that EurasianDNA 1.0 accurately and precisely measures
this ancestry admixture.
For example, if the test did not work, there is no reason why we would see
a statistically meaningful association between higher levels of NOR admixture
and lighter iris color shades in a “Caucasian” sample completely
unrelated to the reference samples used in developing the test!
Variation within groups.
Variation within groups seems to be the rule rather than the exception,
which you can appreciate by studying FIGURE 4GROUPBAR. Even in our original
parental groups we observed considerable within-group variation. This of
course if the whole point of assessing ancestry molecularly from the DNA
rather than based on self-held notions, modern-day geographical origin or
even historical records which are based on political boundaries.
CAUTION: Your results may or may not fit your geopolitical expectations
As
a result of the reality that geopolitical and anthropological/genetic heritage
do not correspond perfectly, you may obtain results that you do not expect.
Indeed, with EurasianDNA 1.0, you are looking much further
back in time (many tens of thousands of years) than you are accustomed to
from geopolitical records.
Although the EurasianDNA 1.0 test provides good genetic resolution between
the dominant 4 European subgroups – resolution that we have shown is
anthropologically and historically relevant – you should understand
before you buy the test that your results may not fit your expectations.
Most of us are used to thinking about ancestry in terms of recent geopolitical
boundaries and events rather than on anthropological terms. Let us take an
example to illustrate the potential discordance between geopolitical and
anthropological identity. A blond hair, blue eyed person for whom all great
grandparents were born in Northern Italy or Greece may very likely be derived
from individuals who lived in more northerly climes 10,000 years ago (hence
the blonde hair, possibly). Nevertheless, they would most likely describe
themselves as Italian, Greek or Mediterranean even though they may be genetically
more close to Nordic peoples of Scandinavia. Because of the age of the anthropological
origins we read from the DNA, as opposed to from an archive which may have
existed only for the past 1,000 years or so, such a person would not be able
to use political records in a library or archive to trace their heritage
as far back as with an anthropological test such as EurasianDNA 1.0.
EurasianDNA 1.0 does not care where a person lives, where their great
grandparents lived, what language is spoken or with whom affiliation is felt.
It looks back to the mixture of your ancestors that lived in pre-historic
Europe, the Middle East, possibly South Asia many thousands of years ago.
For the Northern Italian with blond hair and blue eyes we can clearly see
the difference in physical characteristics suggesting Nordic heritage, and
so a report of predominant NOR ancestry with EurasianDNA 1.0 may not
be all that surprising, but for many people there are no clues to be derived
from physical appearance and genetic ancestry may not be as easy to understand
with the eye. As an objective, independent reporter of anthropological heritage,
this is in fact the power of EurasianDNA 1.0.
You may not want to purchase this test if:
- You are uncomfortable with the fact that a test such as this has never
before been introduced, and so there is nothing with which to compare it.
- You cannot accept that there is a difference between anthropological
and geopolitical ancestry. In other words, if you feel that every
person whose grandparents were born in Italy should type with substantial
MED ancestry regardless of where their ancestors were derived from
5,000 to 10,000 years ago or longer, or else the test isn’t
working properly, then this test is not for you.
- It bothers you that we do not exactly understand the precise genetic
origins of NOR, MED, MIDDEAS or SA identity.
- It bothers you that there is no genetic group measured by our test that
precisely matches any geopolitical boundaries.
- Any changes in the way you view the heritage of the deeper roots of your
family tree would possibly bother you in any way.
- You scored less than 50% “European” or “Indo-European” with
AncestryByDNA™ 2.5.
- You have not yet taken AncestryByDNA™ 2.5.
|
About 6) and 7) - What if you don’t meet the criteria? You may still
take the test if you haven’t taken ABD2.5, or have taken it but did
not receive a score of at least 50% European, but interpreting the results
would be extraordinarily difficult, and they may not mean what you think
they mean. Why is this? A person with 100% Sub-Saharan African ancestry could
take the test and forcing his ancestry to fit among a 4-group European model,
we would obtain an answer that has only an abstract significance based on
genetic distances rather than actual ethnic affiliations. You would likely
misinterpret the results, and so we do not offer if the qualifications are not met.
More of variation within groups and unexpected results
If you look at the detail of FIGURE 4GROUPBAR you can appreciate the frequency
with which customers may obtain results that do not comport with their geopolitical
expectations. Note that there is substantial variation in percentage composition
within each of the ethnic populations.
South Asian Indians type predominantly of SA genetic affiliation, with little
apparent admixture, but Middle Eastern samples show a bit more admixture,
and each person is unique. Even more admixture is observed for Mediterraneans
and Northern Europeans and it is almost certain that not all of these latter
individuals would have expected to see such affiliation. Some Greeks and
Italians show more NOR ancestry than MED, and more NOR ancestry than some
individuals from Ireland or Scandinavia. Again, these individuals might not
have expected to see such results. THIS DOES NOT MEAN they are not Greek
or Italian, only that they are likely to be of different anthropological
heritage than most Greeks and Italians. In other words, on an anthropological
time scale their ancestors were relatively new Greeks or Italians – they
were more likely relatively recent immigrants to the Mediterranean than a
person that scored of high MED ancestry and unless the movement of the family
into the Mediterranean was within the past few generations, genealogical
research would probably not have identified the non-MED ancestry. Similarly,
but in reverse, some (about 8%) of the Irish/ British/Scandinavian samples
were characterized by extensive MED admixture although they would likely
not expect central European or Mediterranean ancestry (note the infrequent
incidence of large amounts of red color for certain North European and Irish
individuals of FIGURE 4GROUP). Again, EurasianDNA 1.0 reports anthropological
heritage rather than geopolitical or socio-cultural identity and depending
on the date for the apparent admixture, these people may not have expected
these results.
Variation such as this translates into a certain amount of discordance between
ethnic expectations and genetic affiliation for some individuals in some
groups, a discordance that varies from individual to individual and from
ethnic group to ethnic group. When interpreting results it is important to
understand the difference between biology/anthropology and socio-politics/geography.
Variance in ancestry affiliation within the ethnic groups is likely a by-product
of several things working together, but some of these things are more significant
than others. The most significant reason is:
- Between ethnic admixture within the European continental population group,
which is likely much more extensive than between-continent admixture. From
this we would expect lower Fst values for randomly selected SNPs, relatively
low genetic distance between groups, a harder time finding SNPs with good
Fst values when screening the genome for good AIMs and a greater error estimating
admixture for ethnic relative to continental admixture.
- The allele frequency differential (d) and Fst values for our marker set
are reasonably high (average Fst = 0.10), at least adequate for standard
errors in the range of 10% - 20% or so. The cumulative d value for our 330
markers, among each of the population pairs of our model is shown below in
TABLE DELTA, and the values are exceptionally high for a test as affordable
as this (meaning we have excellent power to infer ancestry for such an economical
test).
TABLE DELTA
| Cumulative Delta Values, 326 AIMs |
| |
Northern European |
Greek/Turkish |
Middle Eastern |
South Asian |
| Northern European |
0 |
36.7 |
46.8 |
44.7 |
| Greek, Turkish |
0 |
47.9 |
43.5 |
| Middle Eastern |
|
0 |
49.7 |
| South Asian |
|
|
0 |
- Sampling error in allele frequency estimation. Our average sample size
for parental (reference) samples is on the order of 40 samples. A larger
number would ensure greater accuracy, but the results discussed above suggest
that even so, the accuracy of ABDEURO 2.5 is good. Nevertheless, we intend
to gradually increase our sample sizes for future version releases (i.e ABDEURO
3.0), and if supreme precision is crucial for your application you may wish
to wait and be tested with a future version of the test. It bears noting,
however, that error caused by inadequate parental sample size would most
likely manifest itself as a haphazard error, which is not what we see (for
example, the type of unexpected admixture commonly seen for Irish is not
Middle Eastern-MIDEAS or South Asian-SA, but Mediterranean (MED), which makes
geographical and historical sense).
- Defects with the simple model of admixture assumed for this analysis,
namely, no genetic drift between modern-day descendants of parental populations
and the parental populations themselves, no linkage between markers in admixed
individuals, the assumption that offspring are derived from admixed parents
and no selection over the past 50,000 years at any of our AIM loci. Defects
in these assumptions would tend to create imprecision, but could also introduce
systematic bias and influence the results on a population scale as they did
in the classical gene studies of African/European admixture in the 1960’s
and 1970’s (see Chakraborty, 1986 for a review, these studies used
polymorphisms in genes that are subject to natural selection, which is unwise).
It is unlikely that this source of error is substantial however, because
if it were, we would likely see greater within-group variation within all
groups due to “artificial” admixture than we have observed and
the trending in FIGURE 4GROUPPIEMAP would probably make much less geographic
sense. Another way to say this it that the admixture observed due to this
type of error/bias would tend to produce “noise” that is not
necessarily geographically sensible as most of our results seem to be. For
example, the red color in FIGURE 4GROUPPIEMAP would not necessarily be restricted
to Central/Mediterranean Europe and Middle East – Northern Europeans
and South Asians would show it at these high levels too. However, this is
not to say that incorrect assumptions about the admixture process do not
cause error in our results – they do, and we do not know how much,
just that it is unlikely to be substantial. Other authors have shown relatively
modest improvements in statistical accuracy when incorporating mathematically
complex models that account for uncertainty in the population model although
only in certain circumstances – namely on a population level (Wang
2004). We are working with Dr. Paul McKeigue of the University College Dublin
to implement such models in future versions of EuropeanDNA 1.0, and we
plan to make improvements in algorithm design available to customers as they
are (and if they are) developed.
|
The most likely explanation for the bulk of the variability of genetic affiliations
within geopolitical groups in Europe is simply that Europe is an admixed
continent (on an ethnic scale), of relatively complex interactive history.
Inspection of the individual results shows most samples are characterized
by extensive admixture, of type that, given the complex political and anthropological
history of Europe, perhaps is not unexpected. In other words, geographic
pattern in the admixture results suggests sensibility and anthropological
meaning. The geographical sensibility of the admixture observed for this
data is similar to that observed by Rosenberg et al., 2001 who used STRs
with his program “STRUCTURE: to partition European subpopulations.
In both this work and Rosenberg’s study, ethic populations of origin
geographically intermediate to Mediterranean and Northern European populations
(such as Iberians in our work, and French in Rosenberg’s) showed intermediate
Mediterranean and Northern European genetic admixture (Rosenberg et al.,
2001). This would seem hardly to be a coincidence.
However, even if this is true that most of the variation within geopolitically
defined ethnic groups is a reflection of the complex interactive history
among European populations over the recent past, it does not alleviate the
difficulty of explaining to a person who is Greek why they type as significantly
Northern European – even if they do have blond hair and blue irises.
Indeed, variation in results within the Greeks is particularly interesting;
extensive NOR admixture was observed for many of the samples from Greece,
Italy and Turkey; though most of these individuals showed predominant MED
affiliation, several showed less than 25% MED affiliation. Rosenberg et al.
2001 showed a similar phenomena with Mediterranean samples – only a
fraction of Italians in their analysis showed significant Mediterranean specific
genetic affiliations (see red and tan color of the Europe part of Fig 2,
k=4, Rosenberg et al., 2001). It is doubtful that any of these Greek or Italian
individuals would have expected such a result.
We know that there has been extensive gene flow within Europe, and that
there is an accentuated uncoupling between geopolitical and genetic heritage.
We know that there is great anthropometric trait variation within ethnic
groups of Europe – the blond hair/blue eyed individuals from Northern
Italy and Greece for example are characteristically different from the typical
brown eyed/haired Italian or Greek found in the more southern parts of both
countries but such a phenomena is not unique to Greece and Italy; some Swedes
and Norwegians have atypical dark complexions too. From where are their ancestors?
Are the “Swedes” and “Norwegians” with darker complexions
and hair/iris colors the ones showing greater MED, MIDDEAS and SA ancestry
(the latter of which could have come from Roma genetic contributions)? Are
Europeans in general better described in genetic terms than geopolitical
due to a natural and expected uncoupling between anthropometric trait value
and ancestry? These questions are not possible to answer with the analysis
available to date, but undoubtedly will serve as fodder for future work.
Again, this promises to be the power of EurasianDNA 1.0.
A Historical Perspective
Perhaps it is not unexpected that Mediterranean regions of
Europe are more genetically diverse than Nordic regions. Both the
Byzantine and Roman empires were known as effective melting pots for
peoples throughout the Middle East and Europe, wherein its citizens
might have been more united thorough political and socio-cultural
rather than genetic ties. The Mediterranean represented the center of
the civilized world for a long time, and it is plausible that due to
this, gene flow from Northern to Mediterranean Europe was greater than
in the reverse direction. Between 1900 and 1500 BC, the Mycenaeans or
Achaeans moved southwest from southwestern Russia and invaded/settled
wave after wave into modern-day Greece. These civilizations spread to
Southern Italy, Libya, Cyrenecia and the Near East, and they included
the fighting groups that attacked the Trojans in Asia Minor 1200 BC,
exploits that were retold centuries later (circa 750BC) in the Iliad
and the Odyssey. Later, the classical city-states of Greece emerged
reaching its cosmopolitan peak around the 5th century. Additionally,
migration into Europe from the Middle East and most of pre-history Asia
is known to have passed through southeastern Europe, hence the
northwest to southeast classical blood group marker clines,
archaeological record clines and Y-chromosome clines (all of which are
reviewed elegantly by Jobling et al., 2004). This might suggest that
gene flow into the Mediterranean has had a relatively strong influence
in shaping genetic structure in this part of Europe relative to others.
Greece was at one time a Roman province (2nd century BC) until
Constantinople fell to the Crusaders in 1204. In 1453, the Turks took
Constantinople and made Greece a Turkish province, perhaps providing
greater opportunity for East to West migration. In all, the complexity
of Mediterranean history over the past few thousand years may explain
why we detect more variation in admixture for people from this part of
Europe – some individuals with substantial Middle Eastern genetic
affiliation, many others with Northern European affiliation. Of course
modern European history is expected to have accentuated the admixture
in this part of Europe, but admixture is certainly not unique to
Mediterranean Europe. Certainly larger samples and other markers are
needed to fully address the myriad possibilities for how/whether the
extant Northern European and Middle Eastern admixture in Greece is
related to recent historical events. Notwithstanding the mechanism
creating this apparent admixture, at least with this panel as well as
with Rosenberg’s STR panel, it would seem that a fair number
individuals who identify with Greek or Italian ethnicity will show more
Northern European and less Mediterranean admixture than they expect.
Test Error
As with AncestryByDNA™ 2.5, statistical
error is caused by the imperfect information about ancestry provided by the
DNA. In scientific terms, the markers we use are continuously distributed
among the ancestral groups, not strictly private to any one, and so admixture
estimation is determined based on probability. This is largely why the MLEs
(Maximum Likelihood Estimates of ancestry) are only correctly communicated
in terms of their confidence intervals. We can quantify the amount of error
caused by this imperfection for the average customer using mathematical simulations.
To do this, we use our knowledge of the allele frequencies in each group
and the relationships between the alleles to create a large number of “multilocus genotypes” or
simulated individuals. Ancestry admixture is determined for each. If we create
1,000 simulated individuals of 100% Northern European ancestry, and observe
that the average sample showed a level of 100% Northern European ancestry
then we would conclude there would be no bias or error in the test caused
by continuous allele frequencies. If the average sample showed 90% Northern
European ancestry, there would be a 10% error caused by continuous allele
frequencies and any Northern European would be best suited to consider their
Northern European result to be accurate to from +/- 10%.
| |
NE |
G |
M |
SA |
Bias |
| NE |
91.35 |
3.8 |
2.14 |
2.71 |
8.65 |
| G |
2.72 |
93.72 |
1.54 |
2.02 |
6.28 |
| M |
0.65 |
0.41 |
98.92 |
0.02 |
1.08 |
| SA |
1.6 |
2.62 |
1.64 |
94.14 |
5.86 |
| |
|
|
|
Average Bias |
5.45 |
The results from EurasianDNA 1.0 simulations is shown below in TABLE
SIM3.0
The ancestry for the average simulated sample showed 5.45% cumulative error
across all of the 4 of the European groups in their score. For example, a
person with the true values of 55% NOR, 45% MED, 0% MID and 0% SA and typing
with the same error as the average simulated parental sample could receive
an EurasianDNA 1.0 score of 52% NOR, 47% MED, 1% MID and 0% SA, or 53%NOR,
46%MED, 0%MID and 1%SA or any other percentage combination where the difference
between the score and the true values add up across all 4 groups to be about
5-6%. Of course, any one rare individual could show 20% error, or 0% error,
but the average individual shows 5.45% error.
The simulations just presented are from individuals simulated to be homogeneously
affiliated with one group. What do we see when we simulate admixed offspring?
The results are shown below in TABLE MIXEURO
TABLE MIXEURO
| |
NE |
G |
M |
SA |
Bias |
| 50NOR-50MED (n=40) |
50.47 |
44.77 |
2.67 |
2.10 |
4.77 |
| 75NOR-25MED (n=10) |
79.80 |
13.60 |
1.40 |
5.20 |
6.60 |
| 25NOR-75MED (n=10) |
20.10 |
75.60 |
2.00 |
2.30 |
4.30 |
| 50NOR-50MIDEAS (n=39) |
47.00 |
5.40 |
45.58 |
2.03 |
4.30 |
| 75NOR-25MIDEAS (n=10) |
70.50 |
2.70 |
22.40 |
4.40 |
7.10 |
| 25NOR-75MIDEAS (n=10) |
16.90 |
2.30 |
78.20 |
2.60 |
4.90 |
| 50NOR-50SA (n=40) |
40.15 |
10.10 |
3.05 |
46.70 |
13.15 |
| 75NOR-25SA (n=10) |
70.80 |
3.90 |
2.00 |
23.30 |
5.90 |
| 25NOR-75SA (n=10) |
25.70 |
0.50 |
3.40 |
70.40 |
3.90 |
| 50MED-50MIDEAS (n=40) |
5.25 |
44.40 |
47.30 |
3.05 |
8.30 |
| 75MED-25MIDEAS (n=10) |
1.60 |
70.70 |
21.41 |
6.30 |
7.90 |
| 25MED-75MIDEAS (n=10) |
11.50 |
11.20 |
76.10 |
1.20 |
12.70 |
| 50MED-50SA (n=40) |
2.53 |
45.00 |
1.35 |
51.13 |
3.88 |
| 75MED-25SA (n=10) |
6.60 |
67.70 |
3.10 |
22.60 |
9.70 |
| 25MED-75SA (n=10) |
4.00 |
23.20 |
2.60 |
70.20 |
6.60 |
| 50MIDEAS -50SA (n=38) |
3.42 |
2.24 |
48.42 |
45.92 |
5.66 |
| 75MED-25SA (n=10) |
2.40 |
7.40 |
71.90 |
18.30 |
9.80 |
| 25MED-75SA (n=10) |
0.00 |
0.60 |
20.70 |
78.70 |
0.60 |
| Average Bias |
6.84 |
Error is shown in orange highlight. The table is read this way: 50NOR-50MED
(n=40) means 40 simulated 50%/50% Northern European (NOR) / 50% Mediterranean
(MED) individuals. The average 50/50 NOR/MED mix showed 50.47% NOR ancestry
and 44.77% MED ancestry, which differed from the expected levels of 50% for
each by the level of bias, which equals 4.77%.
The average simulated admixed sample showed 6.84% error caused by continuous
allele frequencies.
Combining the various simulation results discussed we conclude that:
The error caused by continuous allele frequencies is about 6-7%, depending
on the type of admixture you show, and the type of majority ancestry you
show.
You can match your results to one of the categories above that most closely
fits to determine the expected levels of this type of error for you.
Other Error
As discussed already, there
are other mechanisms that can cause error besides continuous allele frequencies
which come from our inability to go back in time and measure precisely how
and when admixture occurred in various parts of Europe. In scientific terms,
there may be imperfections in the admixture model we use to estimate admixture,
or substantial and directional genetic drift may have taken place between
modern day populations and the populations that admixed thousands of years
ago. Scientists debate these issues all the time, and there is no one answer
that is guaranteed to be correct. It would seem that the only way to estimate
these errors is to compare expected and observed results for people with
carefully documented genealogy, but even this is not possible. We cannot
use genealogy information from the past few generations to evaluate results
from an anthropological test that is looking back (potentially) thousands
of years. Since most genealogists do not have reliable information going
back that far, we simply do not have access to the reference data with which
we would need to compare performance against expectations and measure this
error, or modify the test to eliminate it. When we run this test in particular,
we are doing what meteorologists are doing when they calculate a hurricane
track projection cone. The meteorologist cannot know for certain exactly
where the storm will go, but he/she understand how the storms respond to
major weather features (like fronts) well enough to form probability statements
predicting the storm track. Historically, these predictions are usually quite
impressive – they are fairly close
to where the storms actually go. The same is true with EurasianDNA 1.0 – the
results suggest that the estimates are fairly close to true values, but almost
never exactly correct. Another way to look at it, when we run our test for
you consider that we are a lot like pilots flying a jumbo jet through low
hanging clouds at dusk– we can see the runway and most of the lights,
but not all of them and certainly not the details of the terrain around them.
We are good enough pilots, and can do a good enough job understanding our
surroundings to land the plane safely, but not to draw a detailed map of
the area.
EurasianDNA 1.0 Pedigrees as an aid to interpreting
results
Pedigrees are useful for understanding the strengths and limitations of
the test, and instructive on how best to interpret results. One pedigree
is shown below in Table PED1:
Table PED1
| |
NOR |
MED |
MIDEA |
SA |
| Mother |
44 |
26 |
30 |
0 |
| Father |
74 |
18 |
8 |
0 |
| Child 1 |
70 |
15 |
15 |
0 |
| Child 2 |
58 |
19 |
23 |
0 |
In the pedigree above, it is clear that there is significant MED and MIDEA
ancestry in the children, which we could conclude from the relatively high
levels observed, but looking at the results in the entire pedigree it appears
that while the MED ancestry in the children was contributed by both parents,
the MIDEA ancestry came mainly from the mother. Both mother and father here
are real people who describe their heritage as continental European, but
the MIDEA and SA result may provide a basis for a new genealogical line of
investigation – one that is focused in time farther back than most
genealogists consider with geopolitical records and surnames. This is true
particularly for the mothers side of the family tree (i.e. from where might
this MIDEA ancestry have come?).
This pedigree was obtained from a real family, and we have obtained similarly
satisfying results with 6 other family pedigrees. We have simulated over
120 pedigrees, and obtained satisfying results for all but two of them -
the pedigree shown in PED3 (real people, not simulated) is the most discordant
pedigree we have yet observed, and it provides an opportunity to discuss
how EurasianDNA 1.0 results should and should not be interpreted. In
particular, it illustrates why it is better to draw conclusions about your
anthropological heritage from results of small pedigrees, even if incomplete
(such as a child and a mother), rather than individual people (such as only
the mother).
A mother, father and two children (STR paternity test positive) are shown
below in Table PED3. None of the people had conducted amateur genealogical
study, but as with most people the two parents had a good idea of their predominant
ancestry. Via self-reporting, most (not necessarily all) of the mothers recent
ancestors 3 generations ago were known to have been English and German, and
the father reported himself as a little over half Greek (exact percentage
unknown).
Table PED3.
| |
NOR |
MED |
MIDEA |
SA |
| Mother |
50 |
10 |
15 |
25 |
| Father |
55 |
40 |
5 |
0 |
| Child 1 |
66 |
0 |
34 |
0 |
| Child 2 |
35 |
45 |
0 |
20 |
The fathers results were more or less consistent with his expectations (particularly
considering that most Greeks, Turks and Italians show less MED ancestry (which
is a measure of anthropological identity) than they expect from blood (which
is derived from geopolitical, socio-cultural and very recent geographical
identity - we have discussed the difference between anthropological and geopolitical
ancestry elsewhere on this site). As expected from self-reported ancestry
of “a little more than half Greek”, the father shows relatively
high MED ancestry with EurasianDNA 1.0 compared to the mother who reported
no Mediterranean or Southeastern European ancestry. Not expected from self-reporting
in geopolitical terms however was the 15% MIDDEA and 25% SA ancestry for
the mother. With an error of 7 percentage points, it seems unlikely that
this result is due to statistical error and from this result we have good
evidence for non-NOR, non-MED ancestry within the mothers family tree – perhaps
extending back a few hundred to a few thousand years ago. However, looking
at the children, the evidence is stronger. Child 1 is an example of an individual
who obtained results that are somewhat discordant from those of his parents;
we would have expected him to have scored somewhere between 5-15% MIDDEA,
and 0-24% SA, but instead he scored of 34% MIDDEA and 0% SA. Some of the
discrepancy is likely due to the random nature of chromosome inheritance
(genetic assortment) but some is undoubtedly due to test error. Child 2 showed
less NOR and MIDDEA and more MED than expected, for the same reasons.
Why is it that this pedigree so discordant? Recall that the average results
are accurate to within 6-7% for any individual, meaning some people will
exhibit 0% error, some 10% error, and a few 15-20% due to continuous allele
frequencies. Say the mother has a 15% error in an opposite direction from
one of her offspring, - her 15% MIDDEA is really 30% and her sons 34% is
really 19%. With the father at 5%, the expected results for the son are 5-30%
and the true value of 19% now make sense.
Of over 126 pedigrees studies, it is unusual for a single child to show
such divergence from expectations much less two in the same pedigree like
this, our most discordant pedigree, but even in this case we can extract
useful information. When we look at the results in the context of the pedigree,
it is clear that the mothers non-NOR and non-MED EurasianDNA 1.0 results
are confirmed in her children (just not in the exact percentages we would
expect, given the error associated with the test). The MIDDEA and SA ancestry
observed in the children appears clearly to have come from the mother rather
than the father. So, concluding that there is significant non-NOR and non-MED
ancestry in this pedigree, contributed from the mothers ancestors is relatively
safe. Note that concluding such from just one of the children, or even the
mother is less secure – it is in confirming the result and its inheritance
in the pedigree as a whole that we obtain the confidence to hypothesize that
the mothers and fathers anthropological heritage are different. Given our
confidence from the pedigree, we can now ask from where might this MIDEA
and SA ancestry have come? Most likely, the mother had a significant number
of Western/Central Asian, Middle Eastern and/or South Asian ancestors sometime
within the past thousand (or more or less) years. For example, perhaps some
of her ancestors were Roma gypsy, or perhaps some were Bedouin Arabs who
settled in Europe within the past 1,000 years. The mother in this case feels
such non-NOR, non-MED ancestry likely came from her mother rather than her
father, since much more is known about the latter individuals ancestors.
To form hypotheses like these from a single test result, one would need to
see a relatively high level or MIDEA and/or SA ancestry, but lower levels
are useful for forming such a hypothesis when results are available from
multiple individuals within a pedigree. For this reason, to form the most
sound hypotheses of your distant anthropological heritage, we recommend testing
your mother and father if possible, your spouse and/or some of your brothers,
sisters and children (i.e. in as many individuals in your immediate family
as possible).
Here are a sampling of the typical pedigree results
we obtain with EurasianDNA 1.0:
| |
NOR |
MED |
MIDEA |
SA |
| Gr5-1-P1 |
96 |
0 |
4 |
0 |
| Gr5-1-P2 |
0 |
96 |
0 |
4 |
| Gr5-1-S1 |
35 |
58 |
0 |
7 |
| Gr5-1-S2 |
39 |
52 |
0 |
9 |
| |
NOR |
MED |
MIDEA |
SA |
| Gr8-0-P1 |
40 |
46 |
6 |
8 |
| Gr8-0-P2 |
0 |
92 |
0 |
8 |
| Gr8-0-S1 |
12 |
71 |
10 |
7 |
| Gr8-0-S2 |
18 |
76 |
0 |
6 |
| |
NOR |
MED |
MIDEA |
SA |
| Gr9-0-P1 |
97 |
0 |
3 |
0 |
| Gr9-0-P2 |
0 |
0 |
100 |
0 |
| Gr9-0-S1 |
41 |
1 |
58 |
0 |
| Gr9-0-S2 |
40 |
0 |
59 |
1 |
| |
NOR |
MED |
MIDEA |
SA |
| Gr13-0-P1 |
69 |
31 |
0 |
0 |
| Gr13-0-P2 |
0 |
0 |
0 |
100 |
| Gr13-0-S1 |
19 |
21 |
0 |
60 |
| Gr13-0-S2 |
32 |
22 |
0 |
46 |
| |
NOR |
MED |
MIDEA |
SA |
| Gr18-1-P1 |
0 |
40 |
43 |
17 |
| Gr18-1-P2 |
0 |
38 |
50 |
12 |
| Gr18-1-S1 |
0 |
44 |
46 |
10 |
| Gr18-1-S2 |
0 |
37 |
38 |
25 |
|
Interesting facts about the test
- NOR scores above 80% are unusual, and when they are obtained are usually
obtained for people with light eye color shades, hair tones and skin complexions.
- More NOR ancestry is seen in Greeks and Turks than MED in Northern Europeans
such as Irish.
- About half of the Greeks tested show less MED ancestry than they expected,
but most all Greeks tested so far have shown substantial MED ancestry.
- Middle Eastern subtypes all type with substantial MIDEAS ancestry.
- The amount of South Asian ancestry a South Asian individual exhibits
depends on from where in South Asia they are from –high levels
of non-South Asian admixture is commonly seen in Northern India, but
not Southern India for instance.
- The average North African has too much non-European ancestry for EurasianDNA 1.0 to be used within the accuracy specifications discussed
on this website.
|
GO BACK TO TOP
EuropeanDNA 2.0 - The test
This test is a direct result of the work published from Dr. Mark Shriver's laboratory at the Pennsylvania State University in 2007 (Bauchet et al., 2007). This study genotyped 11,071 autosomal SNPs (Single Nucleotide Polymorphisms, found on Chromosomes 1-22) in a population of continental Europeans and Eurasians. Up until March of 2007 this represented the largest study of its kind yet performed.

Figure 1. Results from Bauchet et al., 2007 which studied 12 European populations with 11,071 SNPs. Results were obtained using the STRUCTURE program. Bars represent individuals and the color mix of each bar represents the proportions of each of 5 possible ancestries in this analysis. Black lines separate sample sets derived from different regions of Europe as indicated in the legend below.
The Bauchet Paper
The Bauchet study involved Armenians, Jewish, Greek, Spanish, Basque, French, Italian, German, English, Irish, Polish and Finnish samples. The study identified predominant axes of population structure along a North-South axis, but also along a West-East axis as well. This result is similar to other less-detailed autosomal studies that preceded it, including that executed at DNAPrint®'s laboratory which underlies our EurasianDNA 1.0 product. Figure 1 shows the result obtained with the 11,071 SNPs when dividing the European continent into 5 sub-populations. You will note that each element of European ancestry is represented by a color, and each sample as a bar. Each sample is characterized with its own unique ancestry mix, represented by the different colors in each bar. As may be seen, there are clear patterns for each population. For example, there are 7 individuals of Greek heritage, and each is characterized by mostly “red” ancestry, with a small amount of “light blue” and less of the other elements. The “red” ancestry is shared predominantly by individuals of Southeastern European, Armenian and Jewish (cultural, not religion) ancestry. The light “light blue” ancestry is shared among individuals of Spanish and the “brown” ancestry predominantly among individuals of Basque heritage – though it is interesting that some Italians and Spanish show some extensive “brown” admixture. The “green” ancestry is shared among individuals of continental European ancestry, such as Germans, English, French, Irish and Polish. The “dark blue” ancestry is found predominantly in individuals of Northeastern European ancestry, such as Polish, Baltic and Finnish. We can ascribe geographical names for each of these elements of ancestry as follows, but note that these names are arbitrary – each element of ancestry corresponds to relatively isolated sub-populations that lived long ago in locations not precisely known and here we choose names reflective of modern-day distributions:
Red – SOUTHEASTERN EUROPEAN/EURASIAN (SEE)
Light Blue – IBERIAN (IB)
Brown – BASQUE (BAS)
Light Green – CONTINENTAL EUROPEAN (CE)
Blue – NORTHEASTERN EUROPEAN (NEE)
EuropeanDNA 2.0 Accuracy and Precision
To create EuropeanDNA 2.0, we re-analyzed the same genetic data and samples used in Bauchet et al., 2007. We harvested the most informative European AIMs from the 11,071 SNPs typed by Bauchet et al., 2007, using a measure called the Fst, and we found that a specially selected set of 1,349 of these provided most all of the information of the larger set of 11,071. This ability of the smaller marker set to resolve the same 5 elements of European ancestry shown in Figure 1 can be seen in Figure 2, (the same individuals were used for both figures). The main differences, aside from the meaningless difference in the order of colors from top to bottom in each bar is a slightly higher background of red ancestry for continental Europeans. This type of subtle difference between a very large and smaller marker set is to be expected, and we can fortunately quantify the reliability of the estimates as we will discuss later.

Figure 2. Results from DNAPrint’s laboratory which studied the same 12 European populations as Bauchet et al., 2007, but using only 1,349 specially selected SNPs from Bauchet's set of 11,071 SNPs. Results were obtained using the STRUCTURE program. Bars represent individuals and the color mix of each bar represents the proportions of each of 5 possible ancestry in this analysis. Black lines separate sample sets derived from different regions of Europe as indicated in the legend below. Providing results that are similar to those obtained using all 11,071 SNPs in Bauchet et al., 2007, this marker set was chosen to constitute the EuropeanDNA 2.0 marker panel.
Individual Results
For primarily European individuals (no significant African, East Asian or Native American admixture), sub-European ancestry mix can be determined by typing the individual with the 1,349 EuropeanDNA 2.0 markers and comparison to the reference European “parental” samples. This comparison enables us to determine an individual’s percentage of “Southeastern European”, “Iberian”, “Basque”, “Continental European” and “Northeastern European” ancestry and infer from where the individual's European ancestors most likely derived. For example, a person with half “red” (“Southeastern European”) and half “light blue” (“Iberian”), with no “green”, “dark blue” or “brown” ancestry would most likely be someone of mixed European ancestry, with ancestors who were from or contributed to Armenian, Jewish, Greek and/or other Southeastern European populations (e.g. - Italy) as well as ancestors who were from or contributed to the Spanish population. Such a person would most likely be of mixed Armenian, Jewish, Greek and/or Spanish heritage and less likely to be of German, English, Irish, French, Polish or Finnish heritage.
Figure 3 shows a specific example. The results for a test sample (“Customer”) are shown to the right of the bar plot for the reference parental samples. The “Customer” exhibits primarily blue, or Northeastern European (NEE) ancestry. The only reference samples that exhibit this type of pattern are Finnish, though of course we are referring to only a sampling of Europe and individuals from other Northeastern European populations are likely to also exhibit this type of admixture pattern. This “Customer” can conclude that their European ancestry is primarily Northeastern European (NEE), and such a “Customer” would most likely have recent ancestors from Norway, Finland, Sweden, Russia, and/or possibly the Baltic countries.
Figure 3.EuropeanDNA 2.0 results for the reference, parental population samples compared to that for a single unknown “Customer”. The unknown sample was an individual created in a computer to be of 100% Northeastern European ancestry through the process of genetic simulation.
Comparing EurasianDNA 1.0 and EuropeanDNA 2.0
There are important differences between the EuropeanDNA 2.0 and EurasianDNA 1.0. tests. EurasianDNA 1.0 assumes a Eurasian population model – that is, that your ancestors came from Continental Europe, the Middle East and/or South Asia. EuropeanDNA 2.0 assumes a primarily continental European model – that is, that your ancestors mostly came from Europe (Southeast, Continental, Northeast etc.). You will note that EuropeanDNA 2.0 considers a larger set of more closely related populations. This is a more difficult problem to solve, and for this reason, EuropeanDNA 2.0 uses a much larger number of markers in your DNA (1,349 versus 333 for EurasianDNA 1.0).
How to Use EuropeanDNA 2.0
Since EuropeanDNA 2.0 assumes a continental population model, it is useful only for individuals who are primarily continental European. How might a person know this? EuropeanDNA 2.0 is the latest release of a continuum of other products that are useful for making this type of decision. A customer without any knowledge of their ancestry might take these tests in a logical order, such as:
- Take the AncestryBydna™ 2.5 test and learn that they are primarily European, with little sub-Saharan African, East Asian or Native American admixture.
- Then take the EurasianDNA 1.0 test and learn that most of their grandparents were likely to be of continental European origin.
- And then, take the new EuropeanDNA 2.0 test to learn from where in Europe these ancestors derived.
Step 2 may be considered optional for most customers - some customers might skip this step based on a written genealogical record, or some other evidence. However step 1) is generally required – since we need to make sure the results we disseminate meet basic quality control criteria (that is, we would not want to test an East Asian, Native American or African individual with EuropeanDNA 2.0. Since the 2.0 test assumes primarily continental European ancestry - the semantic meaning of the results would be lost, and a proper interpretation of the results would be very difficult, and meaningful only in terms of genetic distance and shared ancestry rather than derived ancestry).
EuropeanDNA 2.0 can be used as a second opinion for EurasianDNA 1.0 results
Since the power and resolution with which we can infer genetic ancestry is proportional to the number of populations and markers studied, the Bauchet paper provided us an opportunity to develop a more advanced test for those who desire to pinpoint their European ancestry with more precision and resolution. For example, a typical EurasianDNA 1.0 customer may have obtained results of equally mixed NOR1 and NOR2 ancestry. Both types are found throughout Europe, and the ratios differ from population to population but not by a lot, so this customer most likely would only be able to conclude that their European ancestry was predominantly Continental European, as opposed to Southeastern European, Middle Eastern or South Asian. However, with the enhanced resolution and power of EuropeanDNA 2.0, this customer would likely be able to pinpoint their ancestry with more precision. EurasianDNA 1.0 and EuropeanDNA 2.0 use different sets of genetic markers. Since EuropeanDNA 2.0 is designed for a more detailed look at sub-European ancestry, it uses many more markers (1,349 versus 333 for EurasianDNA 1.0).
What is the Difference between EuropeanDNA 2.0 and other “Ethnogeographic” autosomal tests?
The difference is profound. The main difference is power – due to the large number of markers used and the work that went into selecting these markers from the human genome. Autosomal genetic tests are relatively new and a great advance over the Y chromosome and mtDNA test (which only report on a small fraction of your ancestors). However among the new autosomal tests, quality differs dramatically from EuropeanDNA 2.0. Discerning sub-European ancestry and admixture is very ambitious, and the research needed to find and develop the markers required was not inexpensive. Some groups have taken a “quick-and-dirty” approach and launched so-called "Ethnogeographical" DNA tests that use forensic markers called STRs for reporting sub-European ancestry from the autosomes. Usually, autosomal STRs are not selected from human DNA based on their ancestry information content, and so they provide less power. Because they lack this power, many of these tests are incapable of reporting admixture (such as 50% Iberian, 50% Basque), as opposed to primary affiliation (such as, “the sample belongs to group X”). Worse, the error associated with these other tests is not only dramatically higher, but primarily undocumented and undisclosed to customers. The power of using Ancestry Informative SNPs is that a very large collection can be screened, and large panels of highly informative markers can be assembled to provide the EuropeanDNA 2.0 test with unprecedented power. This power translates into accuracy and meaning, which other tests attempting to resolve European sub-ancestry lack. Further, we can precisely quantify the error so that customers know how to interpret their results. That the bases for most of these other tests for inferring ancestry have not been published – in papers detailing their performance, or as part of other papers where they have been used for this purpose - speaks volumes. The basis for the EuropeanDNA 2.0 panel of markers was published in the American Journal of Human Genetics earlier in 2007 (Bauchet et al., 2007) and, of course, the publication was peer reviewed.
EuropeanDNA 2.0 Accuracy and Precision
Since our AIMs(Ancestry Informative Markers) are not linked to one another, we can easily create simulated samples in a computer, and measure the mathematical error encountered in assessing admixture with EuropeanDNA 2.0 by comparing the results for these simulated samples with their expected results. Inspection of the results from simulated 100% Continental European (CE), for instance, shows that the average simulated 100% CE sample registers with about 91.3% CE ancestry and about 2.5% NEE error, 1.5% IB error, 2.0% BA error and 2.8% SEE error (total error = 8.7%) (Table I).

Table I. Average results for simulated samples (column 1) with respect to each element of European sub-Ancestry with the EuropeanDNA 2.0 panel (subsequent columns).
Thus, for an individual who is primarily of CE ancestry, readings in the low single digit %s of BA, SEE, IB and NEE are not meaningful. As can be seen in Table I, the average level of error for any specific type of admixture in an individual of any particular type of primary ancestry varies from about 1% to about 3% depending on the type of ancestry and/or ancestry mix exhibited by the sample. Over all types of admixture and primary ancestry backgrounds, using 100% and all possible 50%/50% simulated samples, we calculate that the average bias or error is 7.3%. This figure is impressive for a within-continent assay and rivals what we obtain for the mathematically and genetically easier problem of between-continent admixture (AncestryByDNA™ 2.5). This result is possible given the large number of optimally informative markers used by EuropeanDNA 2.0.
The error is caused by the fact that the difference in sequence for the AIMs between groups is not absolute, but continuous. For example, the frequency of the minor allele of one AIM may be 0.35 in the SEE group, but only about 0.10 in the other groups so it provides some “power” for resolving ancestry between these two groups but not absolute power. Over 1,349 AIMs, the SEE vs. “other” power would be much better than for one AIM, but still not absolute, so we can consider this to be a form of test deficiency. Although EuropeanDNA 2.0 has the most power to resolve between European groups over any other commercially available high-throughput panel before it, it is not perfect and this imperfection is what we measure with the simulations as “bias” in Table I.
For interpreting results, it is more useful to understand what levels of admixture are required in order to conclude with 95% certainty that the admixture is real. Table II shows these values, which were obtained from simulated samples used for Table I. From Table II we can see that an individual of primarily NEE ancestry needs to see greater than 6.1% CE admixture in order to safely conclude that the CE admixture is bona-fide, as opposed to statistical noise. As another example from Table II, an individual of primarily IB ancestry would need to see greater than 7.3 BA admixture to conclude that the admixture was real as opposed to merely statistical noise. Overall, taking the average of all of these values, we can see that admixture levels of over 6% are generally required in order to safely conclude that the admixture is real. Values as low as these have never before been reported for a commercially available (or academic) autosomal assay and they indicate that EuropeanDNA 2.0 is a remarkably sensitive assay given the ambitious nature of its goal of measuring within-continental admixture. The resultant values show that, generally speaking, EuropeanDNA 2.0 can easily detect the ancestry contributed by a single grandparent and in most cases, a single great-grandparent (and in certain cases, even a single great-great grandparent).

Table II. Level above which one needs to observe in order to conclude with 95% certainty that the reading is the result of bona-fide ancestry rather than mere statistical noise. You will note that for individuals of most types of ancestral backgrounds, for most types of ancestry, readings in the mid to high single digit percentages are significant indications of that type of ancestry. However, the value differs depending on the primary ancestry backgrounds and type of admixture.
Sensitivity for Detecting Single European Relatives
Based on our simulation results, we know the levels we need to observe in order to conclude with 95% certainty that the level is a significant indication of ancestry. Since we have these values, we can now assess how well we are able to detect admixture from a single relative. We start with the contribution from a single great-great grandparent (GGP). Assume an individual has 7 GGPs from one ancestry group and 1 GGP from another. How well EuropeanDNA 2.0 is suited for detecting the ancestry contributed by this single GGP can be determined with additional simulations. We simulate individuals for each possible 1GGP/7GGP scenario, obtain EuropeanDNA2.0 results for these simulated individuals and determine how frequently the ancestry contributed by the single GGP exceeds the 95% threshold. A perfect European assay would detect the ancestry contributed by the single GGP 100% of the time (we don’t have to deal with independent assortment issues here since we have simulated each sample to be 12% ancestry from one group and 88% ancestry from another). Table III shows the results. On average, EuropeanDNA 2.0 was able to detect the ancestry contributed by the single GGP 53.3% of the time, although the success rate depends upon the groups from which the 7 GGPs and 1GGP come. For example, for an individual with 7 SEE great-grandparents and 1 NEE great-grandparent, and who inherited 88% and 12% admixture respectively as expected, EuropeanDNA 2.0 will detect the NEE ancestry (show a level over the 95% threshold) 82.0% of the time. In contrast, for an individual with 7 NEE great-grandparents and 1 CE great-grandparent, and who inherited 88% and 12% admixture respectively as expected, EuropeanDNA 2.0 will detect the CE ancestry (show a level over the 95% threshold) 26.0% of the time. In short, there is about a 50/50 chance that a person could detect the ancestry contribution from a single GGP using EuropeanDNA 2.0.

Table III. Percent of simulated samples showing expected ancestry percentages above the 95% threshold require for concluding the ancestry is “real”, where the expected percentage was contributed by a single great-grandparent (GGP) on a background of 7 other GGPs of another ancestry. 100 samples were simulated for each 1:7 GGP combination. For example, when 7 GGPs come from the SEE group, and 1 GGP from the NEE group, EURO2.0 reliably detects the NEE admixture 82.0% of the time – that is, the percentage observed is above the threshold required to conclude that it exists 82% of the time.
Bibliography
Bauchet M, McEvoy B, Pearson L, Quillen E, Sarkisian T, Hovhannesyan K, Deka R, Bradley D, Shriver M. 2007. Measuring European Population Stratification with Microarray Genotype Data. American Journal of Human Genetics 80(5):948-956.
Cavalli-Sforza LL, Menozzi, P. and A Piazza. 1994. The History and Geography of Human Genes. Princeton University Press, Princeton, NJ
Chakraborty, R. 1986. Gene admixture in human populations: models and predictions. Yearbook of Phys. Anthropol. 29:1-43.
Jobling, M., M. Hurles, C. Tyler-Smith. Human evolutionary genetics, origins, peoples and disease. New York, NY.
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun;155(2):945-59.
Wang, J. 2003. Maximum-likelihood estimation of admixture proportions from genetic data. Genetics 164:747-765 |