home
what is ancestry by DNA
products and services
faq
press
glossary
contact
order now
 
Ancestry Kit | Upgrade | Manual

Manuals for EurasianDNA 1.0 and EuropeanDNA 2.0 (see below)

EurasianDNA 1.0, the first test of its kind ever developed, is a pan-genome test that reads your DNA across all 23 of your chromosomes to report your Sub-European population (i.e. “ethnic”) affiliations.

The EuropeanDNA 2.0 test takes the two categories from the 1.0 test that include Northern and Southeastern European ancestry and divides them further into Continental European categories: Southeastern European (SEE) from eastern Spain across Italy, Greece and Turkey, to Bulgaria and Armenia. This would include the Jewish populations of that area (not as a religion).  Iberia (IB) - the Iberian Peninsula of Spain and Portugal; Basque (BAS) the region of the Pyrenees Mountains between Spain and France; Continental European (CE) which includes the central portion of the European continent such as Germany, France, Switzerland, the Netherlands, etc. plus Great Britain and Ireland; Northeastern European (NEE) from Poland, the Baltic Countries, Western Russia, the Ukraine and Belarus into the Scandinavian Countries to Northern Finland including the Sami populations of Lapland.

Information available from the EurasianDNA 1.0 test contains Middle Eastern and South Asian ancestral heritages.  These are not specifically included in the data from the EuropeanDNA 2.0 test, except that the South Asians did become the Romas (Gypsies) of Europe.  The Romas settled mainly in Iberia and Eastern European countries such as in Romania and Bulgaria

The development of the EuroDNA™ tests (EurasianDNA 1.0 and EuropeanDNA 2.0) was made possible by the human genome project and innovative research at DNAPrint® Genomics, Inc. The test is suitable for any person who has taken AncestryByDNA™2.5 (ABD2.5) and obtained a European score of at least 50% but less than 40% East Asian, less than 15% sub-Saharan African and less than 15% Native American ancestry.

EurasianDNA™ 1.0

EuropeanDNA 2.0

EurasianDNA1.0 - The test

The “European” human population group, as we define it, corresponds to the monophyletic lineages that contributed predominantly to populations in Europe, the Middle East, Central and South Asia beginning approximately 40,000 years ago. Most “Europeans” speak languages derived from the Indo-European family and the systematic distribution of lighter pigmentation traits (skin tone, hair color, iris color) is exclusive to this group. However, it is not true that all “Europeans” speak an Indo-European language and/or are characterized by light pigmentation.

EurasianDNA 1.0 reports your European ancestry admixture much like ABD™ 2.5 does, but looks deeper within European lineages for individuals who are predominantly European:

  1. Northern European subgroup (NOR)
  2. Southeastern European (Mediterranean) subgroup (MED)
  3. Middle Eastern subgroup (MIDEAS)
  4. South Asian subgroup (SA)

These groups were defined empirically – that is to say, research at DNAPrint® with DNAPrint® genetic markers reveals that the assumption of a 4-group sub-structure model within Europeans “fits” the best (though admixture within parental samples clustered as a function of geopolitical identity, and made geographic sense whatever the model).

We use the same methods and algorithms as ABD 2.5™, except the Ancestry Informative Markers (AIMs) are different.The EurasianDNA 1.0 test is comprised of 320 European AIMs, obtained from screening tens of thousands of candidates from human genome databases and DNA microchips. A fraction of these 320 are also used in the ABD™ 2.5 test, but most of the markers that power the test are brand new discoveries from recent DNA chip and Ultra High Throughput genotyping research.

The primary parental samples used to develop EurasianDNA 1.0 were collected in Europe, the Middle East and India. Each sample was anthropologically qualified; each individual and all 4 of their grandparents lived in the appropriate geopolitical region, spoke the appropriate language for the region, reported full ethnic affiliation with the parental group corresponding to the region and reported no known admixture for any of his/her grandparents.

When we add parental samples of unknown ethnicity from the United States to this mix, and use admixture determination methods that do not require prior information or definitions, we see a natural coalescence of individuals to genetic subgroups that make sense in geographical and historical terms. In other words, we obtain the same divisions as when we use samples with prior information (our “parental” samples), which illustrates that geopolitical and sociocultural notions of “ethnicity” correspond well to anthropological and genetic partitions within this region of the globe. Variation within subgroups is the rule, not the exception. However, the genetic distances between European ethnicities are lower than between the world's continental populations (which means they are derived from more recent common ancestors, and share more in common genetically). Due to the complexity of European demography and history, self-held notions of ethnicity do not correspond perfectly with anthropology, as you will appreciate if you read further.
euro
Figure 4GROUPBAR
Figure 4GROUPBAR shows EurasianDNA 1.0 results are sorted by geopolitically defined ethnic identity, which is to say that the category along the x-axis (i.e. Irish, Italian ) is based on political borders and socio-cultural identity rather than genetics or anthropological history. Each colored column is an individual. The percentage of affiliation with each of the 4 genetically defined subgroups is shown along the y-axis – each genetic group shown with a different color. DNAPrint uses its own algorithms for the test, but this view of the data was produced using a third-party clustering program for visual presentation purposes. Inspection of the European samples in this figure shows (refer to FIGURE 4GROUPBAR):
a) Almost all of the individuals showing predominant affiliation with the genetically defined NOR subgroup designated by the yellow color (Genetic subgroup “NOR”) are Northern European or Irish.
b) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the red color (Genetic subgroup “MED”) are Southeastern Europeans (Greeks or Turks).
c) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the green color (Genetic subgroup “MIDEAS”) are Middle Eastern.
d) Almost all of the individuals showing predominant affiliation with the genetically defined subgroup designated by the blue color (Genetic subgroup “SA”) are South Asians from the Indian sub-continent (i.e. Indian)
Table 4GROUP shows these data in a different, tabular format – listing the average admixture percentages for the 4 genetically defined within-European subgroups. The average sample size for the 10 ethnic populations shown in Table 4GROUP is 41. Admixture percentages greater than 25% are highlighted.


Table 4GROUP


Sample

NOR

MED

MIDEAS

SA

Northern Euro

82

5.5

11

1.5

Irish

77.8

7.8

8.5

6

Iberian

48

29.5

7

15.5

Italians

35

46.1

8.9

10

Greek

15

78.1

1.9

5

Turkish

22.9

54.6

8.4

14.1

Middle East I

3.9

41.7

52.2

2.2

Middle East II

4

7

84

5

South Asian

2.6

4.7

3.7

89

US Caucasians

54.5

22.4

12.6

8.3

What do the groups NOR, MED, MIDEAS and SA mean?
The arrangement NOR, MED, MIDEAS, SA, is in a northwest to southeast orientation, reminiscent of the subgroups defined by clines in synthetic maps of Europe and Western Asia obtained from classical blood group markers and Y-chromosome haplogroups (reviewed by Jobling et al., 2004). Like anthropometric traits such as hair and eye color, and the chronological and geographical distribution of archaeological evidence, the distribution of genetic variation with EurasianDNA 1.0 shows the same principal component of variation in a northwest to southeast orientation. For example, Northern European and Irish have the highest affiliation with Group 1 (TABLE 4GROUP and yellow color, FIGURE 4GROUPPIEMAP), which suggests that this genetically defined subgroup corresponds to the most northerly subsets of Cavalli-Sforza et al’s clines (1994, also see Jobling et al., 2004). NOR thus seems to correspond to “Nordic” anthropo-genetic identity, hence the name “NOR”. As we might expect of such, the percentage of NOR affiliation appears to increase gradually from the Middle East to Scandinavia and the United Kingdom, a trend that is most easily visualized in FIGURE 4GROUPPIEMAP (yellow color). Also, as we might expect, the percentage of MED affiliation increases gradually moving from Northern Europe to the Mediterranean Southeast of Europe. Geopolitical ethnic groups that were not used in developing the EurasianDNA 1.0 test, such as Iberians (Spanish, Portuguese) and Italians, type with EurasianDNA 1.0™ as of anthropo-genetic identity intermediate to Northern Europeans and Southeastern Mediterranean individuals (notice the yellow and red colors for Iberians in Spain and Italians are about half way between the levels in Ireland/Scandinavia and Greece/Turkey . Similarly, a Middle Eastern population (MIDEAS I) taken from a different region of the Middle East than the Middle Eastern population used as the “parental” or “reference” group shows predominantly Mediterranean and Middle Eastern genetic identity (this group is represented by the pie chart in FIGURE 4GROUPPIEMAP over Saudi Arabia, with half red and half green color). These types of results constitute a very important validation of EurasianDNA 1.0 – because it shows that as we might expect, gene flow or genetic relatedness between subpopulations is directly proportional to geographical distance. In other words, a test that did not work would probably show “confusing” results for Iberians and Italians relative to the results in Northern and Southeastern Europe, or between Middle Eastern sample I versus Middle Eastern sample II. US Caucasians, as one might expect based on the history of European migration to the new world, show an affiliation pattern that seems due to an unequal contribution from Northern Europeans and Mediterranean peoples
.

Blind Challenge with ethnically admixed “Caucasian” samples:

In addition to these samples of formally characterized ethnicity, we have typed a variety of samples of self-proclaimed ethnic identity. These samples were admixed samples of self-reported ethnic information. Of course this is not as reliable a test reference as anthropologically qualified samples, but they give us an idea of what to expect in mixed population such as those we find in the United States. Each subject reported the ethnicity of all 4 grandparents and was binned into one of five arbitrarily defined ethnic groups (I, II, III, IV, V) if at least 2 of the grandparents came from that group. The average sample size for these groups was 70:

TABLE 4GROUPTEST

I At least half of the grandparents from Ireland, Western Scandinavia, Great Britain
II At least half of the grandparents from France, Germany, Denmark, Finland, Holland Poland, Eastern Europe or Russia
III At least half of the grandparents from Spain or Italy
IV At least half of the grandparents from Greece or Turkey
V A random assortment of US Caucasians regardless of from where their grandparents originated
(same sample as shown in TABLE 4GROUP)

Group NOR MED MIDE SA
I 56.4 20.8 16.4 6.4
11 51.2 24.5 15.2 9
III 46.9 33.8 10.6 8.8
IV 45 41 9 5
V 54.5 22.4 12.6 8.3

Although most of these individuals were of admixed ethnicity, pattern in the results agree with the self-held notions of ethnic affiliation. As can be seen from these results, the average individual claiming at least 50% Northern European ancestry (Group I) showed predominant affiliation with the NOR group, as did individuals claiming at least 50% French, German, Danish, Dutch, Eastern European or Russian ancestry (Group II). In contrast, individuals claiming at least 50% Greek or Turkish ancestry (Group IV) showed significantly greater MED ancestry. As expected, individuals claiming at least 50% Spanish or Italian ancestry (Group III) typed in between Group I and Group IV individuals. Looking at the data in a slightly different way, we tabulate the date in terms of the frequency of observations >25% affiliation with the genetic sub groupings in TABLE 4GROUPEVAL. The same basic trend obtains.

TABLE 4GROUPEVAL
Number of samples with >25% affiliation with each of the 4 European genetic groups (columns) for the average sample of 4 types of self-reported ethnicity (rows). Percentages above 50% are shown in yellow.

  NOR MED MIDEAS SA
I 1 0.28 0.17 0.06
II 0.98 0.32 0.17 0
III 0.88 0.75 0.13 0
IV 0.8 1 0 0
V 0.95 0.39 0.14 0.07

 

Click on the image to see larger image

All of the individuals in this particular sample reporting at least 50% Irish/British/Scandinavian ancestry type of >25% NOR ancestry with EurasianDNA 1.0, and all of the individuals in this particular sample reporting at least 50% Greek or Turkish ancestry type of >25% MED ancestry with EurasianDNA 1.0. Most individuals claiming at least 50% Spanish or Italian ancestry type of both >25% NOR and >25% MED ancestry with EurasianDNA 1.0. Notice how the MED number rises as we proceed from Northern Europe (Group I) to Spain and Italy (III) peaking in Greece and Turkey. This illustrates that most Europeans are of admixed NOR and MED ancestry, (whether or not they call themselves Germans, French, Italians etc.). US “Caucasians” seem to be a mélange of NOR and MED ancestry, but in unequal proportions as one might expect given the unequal northern versus Mediterranean founder contribution to the Americas.

To summarize the results of the “blind” trial:

The more southeasterly the self-reported geopolitical origin, the more similar the average profile is to our original parental samples from the southeast regions of Europe. This constitutes an independent validation of the anthropological relevance of EurasianDNA 1.0.

This would seem to support, or at least not refute, the anthropological relevance of the 4 group genetic model of Europe identified by, and used by, EurasianDNA 1.0. As we saw with the anthropologically defined samples in Table 4GROUP, the MED genetic group affiliation levels are inversely proportional to the magnitude of distance from the self-reported origin to the Mediterranean Sea – French/Germans show more than British, and Spanish/Italians show more than French/Germans. However, simply knowing the recent (on an anthropological time scale) geographical origin of a person is not necessarily a very accurate predictor of genetic group affiliation because as we will discuss below the political history of Europe, the Middle East and South Asia is complex, and admixture between ethnic subpopulations not uncommon in recent history.

Correlations with anthropometric traits.

If the NOR, MED, MIDEAS and SA genetic groupings are of legitimate anthropological base, and if there is an association between anthropology and physical appearance (which we know there is), then there should be a correlation between admixture for certain genetic groupings and the expression of physical traits. When speaking of European populations, two obvious physical traits come to mind – skin melanin content and iris color. Elsewhere on this website we have discussed the correlation between skin melanin index and sub-Saharan African and European admixture – with higher levels of European admixture correlated with lower melanin index values in admixed African Americans and Puerto Ricans. So too is skin melanin index differentially distributed within the European group (mainly along a northwest to southeast gradient). We have also discussed the association between higher levels of sub-Saharan African and lower levels of European admixture in individuals of predominant European ancestry and darker iris colors. Here we ask whether the same types of associations can be seen within the European diaspora sorted by genetically defined sub-population admixture proportions rather than continental ancestry admixture. Since blue iris colors are far more frequently found in Northern European populations relative to Middle Eastern, South Asian populations, then the degree of Northern European admixture should be associated with lighter iris colors and Middle Eastern and South Asian admixture should be associated with darker iris colors. In other words, if the test works, we should see that individuals typing of high NOR ancestry with EurasianDNA 1.0 tend to have lighter colored eyes than individuals typing with low NOR ancestry, and individuals typing with high MIDEAS or SA ancestry should show darker eyes on average than those with low levels.

We digitally scored iris colors for some of the subjects shown in FIGURE 4GROUPBAR, and Tables4GROUP and 4GROUPTEST and as shown in Table K4ETHNIRIS, significant genetic group admixture levels are associated with the expected iris color shades. NOR admixture greater than or equal to 70% - 80% (rows 1-3) is strongly associated with iris color scores greater than 2.2, which are scores corresponding to “Light” colors - scores of 2.2 or greater correspond almost perfectly to perceived colors lighter than the mean for individuals of majority European ancestry and includes the light hazels, greens, grays and blues. In contrast, SA admixture levels greater than 25% were associated with color scores below 2.2 (“Dark”), corresponding to color scores below the mean (dark hazels, browns, blacks). MIDEAS admixture was not associated with iris color score, though lighter iris colors are not infrequently found in the Middle East. (Indeed, the genes imparting lighter iris colors may have very well originated in the Fertile Crescent prior to amplification by genetic drift in Europe starting 45,000 years ago).

Table 4ETHNIRIS

  Genetic Groups Admixture Levels color Exact-p
Total European sample
(N)
Sample Metting Threshold
(N)
1 NOR (N. Euro + Irish) >70% Light (color score > 2.2) 0.003 184 27
2 NOR (N. Euro + Irish) >75% Light (color score > 2.2) 0.004 184 21
3 NOR (N. Euro + Irish) >80% Light (color score > 2.2) 0.033 184 12
4 SA (South Asian) >25% Dark (color score <2.2) 0.060 184 15
5 SA (South Asian) >30% Dark (color score <2.2) 0.035 184 8

It turns out that iris color can be predicted accurately for individuals only with a direct knowledge of pigmentation gene genotypes – a knowledge of genetic ancestry enables only an indirect inference. However, this exercise shows that there is a correlation between genetic groupings Northern European ancestry, as EurasianDNA 1.0 determines it, and lighter iris colors, and South Asian ancestry (as EurasianDNA 1.0 determines it) and darker iris colors. Note that these are correlations between lower levels of admixture and iris color in individuals that describe themselves as Caucasians – not obvious correlations between very high levels of admixture and iris color in a more eclectic sample of polarized ancestry (which would be trivial to show). The results presented in TABLE ETHNIRIS suggest that iris color shades are distributed throughout Europe as a function of Northern European ancestry admixture, and that EurasianDNA 1.0 accurately and precisely measures this ancestry admixture.

For example, if the test did not work, there is no reason why we would see a statistically meaningful association between higher levels of NOR admixture and lighter iris color shades in a “Caucasian” sample completely unrelated to the reference samples used in developing the test!

Variation within groups.

Variation within groups seems to be the rule rather than the exception, which you can appreciate by studying FIGURE 4GROUPBAR. Even in our original parental groups we observed considerable within-group variation. This of course if the whole point of assessing ancestry molecularly from the DNA rather than based on self-held notions, modern-day geographical origin or even historical records which are based on political boundaries.

CAUTION: Your results may or may not fit your geopolitical expectations
As a result of the reality that geopolitical and anthropological/genetic heritage do not correspond perfectly, you may obtain results that you do not expect. Indeed, with EurasianDNA 1.0, you are looking much further back in time (many tens of thousands of years) than you are accustomed to from geopolitical records.

Although the EurasianDNA 1.0 test provides good genetic resolution between the dominant 4 European subgroups – resolution that we have shown is anthropologically and historically relevant – you should understand before you buy the test that your results may not fit your expectations. Most of us are used to thinking about ancestry in terms of recent geopolitical boundaries and events rather than on anthropological terms. Let us take an example to illustrate the potential discordance between geopolitical and anthropological identity. A blond hair, blue eyed person for whom all great grandparents were born in Northern Italy or Greece may very likely be derived from individuals who lived in more northerly climes 10,000 years ago (hence the blonde hair, possibly). Nevertheless, they would most likely describe themselves as Italian, Greek or Mediterranean even though they may be genetically more close to Nordic peoples of Scandinavia. Because of the age of the anthropological origins we read from the DNA, as opposed to from an archive which may have existed only for the past 1,000 years or so, such a person would not be able to use political records in a library or archive to trace their heritage as far back as with an anthropological test such as EurasianDNA 1.0. EurasianDNA 1.0 does not care where a person lives, where their great grandparents lived, what language is spoken or with whom affiliation is felt. It looks back to the mixture of your ancestors that lived in pre-historic Europe, the Middle East, possibly South Asia many thousands of years ago. For the Northern Italian with blond hair and blue eyes we can clearly see the difference in physical characteristics suggesting Nordic heritage, and so a report of predominant NOR ancestry with EurasianDNA 1.0 may not be all that surprising, but for many people there are no clues to be derived from physical appearance and genetic ancestry may not be as easy to understand with the eye. As an objective, independent reporter of anthropological heritage, this is in fact the power of EurasianDNA 1.0.

You may not want to purchase this test if:

  1. You are uncomfortable with the fact that a test such as this has never before been introduced, and so there is nothing with which to compare it.
  2. You cannot accept that there is a difference between anthropological and geopolitical ancestry. In other words, if you feel that every person whose grandparents were born in Italy should type with substantial MED ancestry regardless of where their ancestors were derived from 5,000 to 10,000 years ago or longer, or else the test isn’t working properly, then this test is not for you.
  3. It bothers you that we do not exactly understand the precise genetic origins of NOR, MED, MIDDEAS or SA identity.
  4. It bothers you that there is no genetic group measured by our test that precisely matches any geopolitical boundaries.
  5. Any changes in the way you view the heritage of the deeper roots of your family tree would possibly bother you in any way.
  6. You scored less than 50% “European” or “Indo-European” with AncestryByDNA 2.5.
  7. You have not yet taken AncestryByDNA 2.5.

About 6) and 7) - What if you don’t meet the criteria? You may still take the test if you haven’t taken ABD2.5, or have taken it but did not receive a score of at least 50% European, but interpreting the results would be extraordinarily difficult, and they may not mean what you think they mean. Why is this? A person with 100% Sub-Saharan African ancestry could take the test and forcing his ancestry to fit among a 4-group European model, we would obtain an answer that has only an abstract significance based on genetic distances rather than actual ethnic affiliations. You would likely misinterpret the results, and so we do not offer if the qualifications are not met.

More of variation within groups and unexpected results
If you look at the detail of FIGURE 4GROUPBAR you can appreciate the frequency with which customers may obtain results that do not comport with their geopolitical expectations. Note that there is substantial variation in percentage composition within each of the ethnic populations.

South Asian Indians type predominantly of SA genetic affiliation, with little apparent admixture, but Middle Eastern samples show a bit more admixture, and each person is unique. Even more admixture is observed for Mediterraneans and Northern Europeans and it is almost certain that not all of these latter individuals would have expected to see such affiliation. Some Greeks and Italians show more NOR ancestry than MED, and more NOR ancestry than some individuals from Ireland or Scandinavia. Again, these individuals might not have expected to see such results. THIS DOES NOT MEAN they are not Greek or Italian, only that they are likely to be of different anthropological heritage than most Greeks and Italians. In other words, on an anthropological time scale their ancestors were relatively new Greeks or Italians – they were more likely relatively recent immigrants to the Mediterranean than a person that scored of high MED ancestry and unless the movement of the family into the Mediterranean was within the past few generations, genealogical research would probably not have identified the non-MED ancestry. Similarly, but in reverse, some (about 8%) of the Irish/ British/Scandinavian samples were characterized by extensive MED admixture although they would likely not expect central European or Mediterranean ancestry (note the infrequent incidence of large amounts of red color for certain North European and Irish individuals of FIGURE 4GROUP). Again, EurasianDNA 1.0 reports anthropological heritage rather than geopolitical or socio-cultural identity and depending on the date for the apparent admixture, these people may not have expected these results.

Variation such as this translates into a certain amount of discordance between ethnic expectations and genetic affiliation for some individuals in some groups, a discordance that varies from individual to individual and from ethnic group to ethnic group. When interpreting results it is important to understand the difference between biology/anthropology and socio-politics/geography. Variance in ancestry affiliation within the ethnic groups is likely a by-product of several things working together, but some of these things are more significant than others. The most significant reason is:

  1. Between ethnic admixture within the European continental population group, which is likely much more extensive than between-continent admixture. From this we would expect lower Fst values for randomly selected SNPs, relatively low genetic distance between groups, a harder time finding SNPs with good Fst values when screening the genome for good AIMs and a greater error estimating admixture for ethnic relative to continental admixture.
  2. The allele frequency differential (d) and Fst values for our marker set are reasonably high (average Fst = 0.10), at least adequate for standard errors in the range of 10% - 20% or so. The cumulative d value for our 330 markers, among each of the population pairs of our model is shown below in TABLE DELTA, and the values are exceptionally high for a test as affordable as this (meaning we have excellent power to infer ancestry for such an economical test).

    TABLE DELTA

    Cumulative Delta Values, 326 AIMs
      Northern European Greek/Turkish Middle Eastern South Asian
    Northern European 0 36.7 46.8 44.7
    Greek, Turkish 0 47.9 43.5
    Middle Eastern   0 49.7
    South Asian     0
  3. Sampling error in allele frequency estimation. Our average sample size for parental (reference) samples is on the order of 40 samples. A larger number would ensure greater accuracy, but the results discussed above suggest that even so, the accuracy of ABDEURO 2.5 is good. Nevertheless, we intend to gradually increase our sample sizes for future version releases (i.e ABDEURO 3.0), and if supreme precision is crucial for your application you may wish to wait and be tested with a future version of the test. It bears noting, however, that error caused by inadequate parental sample size would most likely manifest itself as a haphazard error, which is not what we see (for example, the type of unexpected admixture commonly seen for Irish is not Middle Eastern-MIDEAS or South Asian-SA, but Mediterranean (MED), which makes geographical and historical sense).
  4. Defects with the simple model of admixture assumed for this analysis, namely, no genetic drift between modern-day descendants of parental populations and the parental populations themselves, no linkage between markers in admixed individuals, the assumption that offspring are derived from admixed parents and no selection over the past 50,000 years at any of our AIM loci. Defects in these assumptions would tend to create imprecision, but could also introduce systematic bias and influence the results on a population scale as they did in the classical gene studies of African/European admixture in the 1960’s and 1970’s (see Chakraborty, 1986 for a review, these studies used polymorphisms in genes that are subject to natural selection, which is unwise). It is unlikely that this source of error is substantial however, because if it were, we would likely see greater within-group variation within all groups due to “artificial” admixture than we have observed and the trending in FIGURE 4GROUPPIEMAP would probably make much less geographic sense. Another way to say this it that the admixture observed due to this type of error/bias would tend to produce “noise” that is not necessarily geographically sensible as most of our results seem to be. For example, the red color in FIGURE 4GROUPPIEMAP would not necessarily be restricted to Central/Mediterranean Europe and Middle East – Northern Europeans and South Asians would show it at these high levels too. However, this is not to say that incorrect assumptions about the admixture process do not cause error in our results – they do, and we do not know how much, just that it is unlikely to be substantial. Other authors have shown relatively modest improvements in statistical accuracy when incorporating mathematically complex models that account for uncertainty in the population model although only in certain circumstances – namely on a population level (Wang 2004). We are working with Dr. Paul McKeigue of the University College Dublin to implement such models in future versions of EuropeanDNA 1.0, and we plan to make improvements in algorithm design available to customers as they are (and if they are) developed.

The most likely explanation for the bulk of the variability of genetic affiliations within geopolitical groups in Europe is simply that Europe is an admixed continent (on an ethnic scale), of relatively complex interactive history. Inspection of the individual results shows most samples are characterized by extensive admixture, of type that, given the complex political and anthropological history of Europe, perhaps is not unexpected. In other words, geographic pattern in the admixture results suggests sensibility and anthropological meaning. The geographical sensibility of the admixture observed for this data is similar to that observed by Rosenberg et al., 2001 who used STRs with his program “STRUCTURE: to partition European subpopulations. In both this work and Rosenberg’s study, ethic populations of origin geographically intermediate to Mediterranean and Northern European populations (such as Iberians in our work, and French in Rosenberg’s) showed intermediate Mediterranean and Northern European genetic admixture (Rosenberg et al., 2001). This would seem hardly to be a coincidence.

However, even if this is true that most of the variation within geopolitically defined ethnic groups is a reflection of the complex interactive history among European populations over the recent past, it does not alleviate the difficulty of explaining to a person who is Greek why they type as significantly Northern European – even if they do have blond hair and blue irises.

Indeed, variation in results within the Greeks is particularly interesting; extensive NOR admixture was observed for many of the samples from Greece, Italy and Turkey; though most of these individuals showed predominant MED affiliation, several showed less than 25% MED affiliation. Rosenberg et al. 2001 showed a similar phenomena with Mediterranean samples – only a fraction of Italians in their analysis showed significant Mediterranean specific genetic affiliations (see red and tan color of the Europe part of Fig 2, k=4, Rosenberg et al., 2001). It is doubtful that any of these Greek or Italian individuals would have expected such a result.

We know that there has been extensive gene flow within Europe, and that there is an accentuated uncoupling between geopolitical and genetic heritage. We know that there is great anthropometric trait variation within ethnic groups of Europe – the blond hair/blue eyed individuals from Northern Italy and Greece for example are characteristically different from the typical brown eyed/haired Italian or Greek found in the more southern parts of both countries but such a phenomena is not unique to Greece and Italy; some Swedes and Norwegians have atypical dark complexions too. From where are their ancestors? Are the “Swedes” and “Norwegians” with darker complexions and hair/iris colors the ones showing greater MED, MIDDEAS and SA ancestry (the latter of which could have come from Roma genetic contributions)? Are Europeans in general better described in genetic terms than geopolitical due to a natural and expected uncoupling between anthropometric trait value and ancestry? These questions are not possible to answer with the analysis available to date, but undoubtedly will serve as fodder for future work. Again, this promises to be the power of EurasianDNA 1.0.

A Historical Perspective
Perhaps it is not unexpected that Mediterranean regions of Europe are more genetically diverse than Nordic regions. Both the Byzantine and Roman empires were known as effective melting pots for peoples throughout the Middle East and Europe, wherein its citizens might have been more united thorough political and socio-cultural rather than genetic ties. The Mediterranean represented the center of the civilized world for a long time, and it is plausible that due to this, gene flow from Northern to Mediterranean Europe was greater than in the reverse direction. Between 1900 and 1500 BC, the Mycenaeans or Achaeans moved southwest from southwestern Russia and invaded/settled wave after wave into modern-day Greece. These civilizations spread to Southern Italy, Libya, Cyrenecia and the Near East, and they included the fighting groups that attacked the Trojans in Asia Minor 1200 BC, exploits that were retold centuries later (circa 750BC) in the Iliad and the Odyssey. Later, the classical city-states of Greece emerged reaching its cosmopolitan peak around the 5th century. Additionally, migration into Europe from the Middle East and most of pre-history Asia is known to have passed through southeastern Europe, hence the northwest to southeast classical blood group marker clines, archaeological record clines and Y-chromosome clines (all of which are reviewed elegantly by Jobling et al., 2004). This might suggest that gene flow into the Mediterranean has had a relatively strong influence in shaping genetic structure in this part of Europe relative to others. Greece was at one time a Roman province (2nd century BC) until Constantinople fell to the Crusaders in 1204. In 1453, the Turks took Constantinople and made Greece a Turkish province, perhaps providing greater opportunity for East to West migration. In all, the complexity of Mediterranean history over the past few thousand years may explain why we detect more variation in admixture for people from this part of Europe – some individuals with substantial Middle Eastern genetic affiliation, many others with Northern European affiliation. Of course modern European history is expected to have accentuated the admixture in this part of Europe, but admixture is certainly not unique to Mediterranean Europe. Certainly larger samples and other markers are needed to fully address the myriad possibilities for how/whether the extant Northern European and Middle Eastern admixture in Greece is related to recent historical events. Notwithstanding the mechanism creating this apparent admixture, at least with this panel as well as with Rosenberg’s STR panel, it would seem that a fair number individuals who identify with Greek or Italian ethnicity will show more Northern European and less Mediterranean admixture than they expect.

Test Error
As with AncestryByDNA 2.5, statistical error is caused by the imperfect information about ancestry provided by the DNA. In scientific terms, the markers we use are continuously distributed among the ancestral groups, not strictly private to any one, and so admixture estimation is determined based on probability. This is largely why the MLEs (Maximum Likelihood Estimates of ancestry) are only correctly communicated in terms of their confidence intervals. We can quantify the amount of error caused by this imperfection for the average customer using mathematical simulations. To do this, we use our knowledge of the allele frequencies in each group and the relationships between the alleles to create a large number of “multilocus genotypes” or simulated individuals. Ancestry admixture is determined for each. If we create 1,000 simulated individuals of 100% Northern European ancestry, and observe that the average sample showed a level of 100% Northern European ancestry then we would conclude there would be no bias or error in the test caused by continuous allele frequencies. If the average sample showed 90% Northern European ancestry, there would be a 10% error caused by continuous allele frequencies and any Northern European would be best suited to consider their Northern European result to be accurate to from +/- 10%.

  NE G M SA Bias
NE 91.35 3.8 2.14 2.71 8.65
G 2.72 93.72 1.54 2.02 6.28
M 0.65 0.41 98.92 0.02 1.08
SA 1.6 2.62 1.64 94.14 5.86
        Average Bias 5.45

The results from EurasianDNA 1.0 simulations is shown below in TABLE SIM3.0

The ancestry for the average simulated sample showed 5.45% cumulative error across all of the 4 of the European groups in their score. For example, a person with the true values of 55% NOR, 45% MED, 0% MID and 0% SA and typing with the same error as the average simulated parental sample could receive an EurasianDNA 1.0 score of 52% NOR, 47% MED, 1% MID and 0% SA, or 53%NOR, 46%MED, 0%MID and 1%SA or any other percentage combination where the difference between the score and the true values add up across all 4 groups to be about 5-6%. Of course, any one rare individual could show 20% error, or 0% error, but the average individual shows 5.45% error.

The simulations just presented are from individuals simulated to be homogeneously affiliated with one group. What do we see when we simulate admixed offspring? The results are shown below in TABLE MIXEURO

TABLE MIXEURO

  NE G M SA Bias
50NOR-50MED (n=40) 50.47 44.77 2.67 2.10 4.77
75NOR-25MED (n=10) 79.80 13.60 1.40 5.20 6.60
25NOR-75MED (n=10) 20.10 75.60 2.00 2.30 4.30
50NOR-50MIDEAS (n=39) 47.00 5.40 45.58 2.03 4.30
75NOR-25MIDEAS (n=10) 70.50 2.70 22.40 4.40 7.10
25NOR-75MIDEAS (n=10) 16.90 2.30 78.20 2.60 4.90
50NOR-50SA (n=40) 40.15 10.10 3.05 46.70 13.15
75NOR-25SA (n=10) 70.80 3.90 2.00 23.30 5.90
25NOR-75SA (n=10) 25.70 0.50 3.40 70.40 3.90
50MED-50MIDEAS (n=40) 5.25 44.40 47.30 3.05 8.30
75MED-25MIDEAS (n=10) 1.60 70.70 21.41 6.30 7.90
25MED-75MIDEAS (n=10) 11.50 11.20 76.10 1.20 12.70
50MED-50SA (n=40) 2.53 45.00 1.35 51.13 3.88
75MED-25SA (n=10) 6.60 67.70 3.10 22.60 9.70
25MED-75SA (n=10) 4.00 23.20 2.60 70.20 6.60
50MIDEAS -50SA (n=38) 3.42 2.24 48.42 45.92 5.66
75MED-25SA (n=10) 2.40 7.40 71.90 18.30 9.80
25MED-75SA (n=10) 0.00 0.60 20.70 78.70 0.60
Average Bias 6.84

 

Error is shown in orange highlight. The table is read this way: 50NOR-50MED (n=40) means 40 simulated 50%/50% Northern European (NOR) / 50% Mediterranean (MED) individuals. The average 50/50 NOR/MED mix showed 50.47% NOR ancestry and 44.77% MED ancestry, which differed from the expected levels of 50% for each by the level of bias, which equals 4.77%.

The average simulated admixed sample showed 6.84% error caused by continuous allele frequencies.

Combining the various simulation results discussed we conclude that:

The error caused by continuous allele frequencies is about 6-7%, depending on the type of admixture you show, and the type of majority ancestry you show.

You can match your results to one of the categories above that most closely fits to determine the expected levels of this type of error for you.

Other Error
As discussed already, there are other mechanisms that can cause error besides continuous allele frequencies which come from our inability to go back in time and measure precisely how and when admixture occurred in various parts of Europe. In scientific terms, there may be imperfections in the admixture model we use to estimate admixture, or substantial and directional genetic drift may have taken place between modern day populations and the populations that admixed thousands of years ago. Scientists debate these issues all the time, and there is no one answer that is guaranteed to be correct. It would seem that the only way to estimate these errors is to compare expected and observed results for people with carefully documented genealogy, but even this is not possible. We cannot use genealogy information from the past few generations to evaluate results from an anthropological test that is looking back (potentially) thousands of years. Since most genealogists do not have reliable information going back that far, we simply do not have access to the reference data with which we would need to compare performance against expectations and measure this error, or modify the test to eliminate it. When we run this test in particular, we are doing what meteorologists are doing when they calculate a hurricane track projection cone. The meteorologist cannot know for certain exactly where the storm will go, but he/she understand how the storms respond to major weather features (like fronts) well enough to form probability statements predicting the storm track. Historically, these predictions are usually quite impressive – they are fairly close to where the storms actually go. The same is true with EurasianDNA 1.0 – the results suggest that the estimates are fairly close to true values, but almost never exactly correct. Another way to look at it, when we run our test for you consider that we are a lot like pilots flying a jumbo jet through low hanging clouds at dusk– we can see the runway and most of the lights, but not all of them and certainly not the details of the terrain around them. We are good enough pilots, and can do a good enough job understanding our surroundings to land the plane safely, but not to draw a detailed map of the area.

EurasianDNA 1.0 Pedigrees as an aid to interpreting results


Pedigrees are useful for understanding the strengths and limitations of the test, and instructive on how best to interpret results. One pedigree is shown below in Table PED1:

Table PED1

  NOR MED MIDEA SA
Mother 44 26 30 0
Father 74 18 8 0
Child 1 70 15 15 0
Child 2 58 19 23 0

In the pedigree above, it is clear that there is significant MED and MIDEA ancestry in the children, which we could conclude from the relatively high levels observed, but looking at the results in the entire pedigree it appears that while the MED ancestry in the children was contributed by both parents, the MIDEA ancestry came mainly from the mother. Both mother and father here are real people who describe their heritage as continental European, but the MIDEA and SA result may provide a basis for a new genealogical line of investigation – one that is focused in time farther back than most genealogists consider with geopolitical records and surnames. This is true particularly for the mothers side of the family tree (i.e. from where might this MIDEA ancestry have come?).

This pedigree was obtained from a real family, and we have obtained similarly satisfying results with 6 other family pedigrees. We have simulated over 120 pedigrees, and obtained satisfying results for all but two of them - the pedigree shown in PED3 (real people, not simulated) is the most discordant pedigree we have yet observed, and it provides an opportunity to discuss how EurasianDNA 1.0 results should and should not be interpreted. In particular, it illustrates why it is better to draw conclusions about your anthropological heritage from results of small pedigrees, even if incomplete (such as a child and a mother), rather than individual people (such as only the mother).

A mother, father and two children (STR paternity test positive) are shown below in Table PED3. None of the people had conducted amateur genealogical study, but as with most people the two parents had a good idea of their predominant ancestry. Via self-reporting, most (not necessarily all) of the mothers recent ancestors 3 generations ago were known to have been English and German, and the father reported himself as a little over half Greek (exact percentage unknown).

Table PED3.

  NOR MED MIDEA SA
Mother 50 10 15 25
Father 55 40 5 0
Child 1 66 0 34 0
Child 2 35 45 0 20

The fathers results were more or less consistent with his expectations (particularly considering that most Greeks, Turks and Italians show less MED ancestry (which is a measure of anthropological identity) than they expect from blood (which is derived from geopolitical, socio-cultural and very recent geographical identity - we have discussed the difference between anthropological and geopolitical ancestry elsewhere on this site). As expected from self-reported ancestry of “a little more than half Greek”, the father shows relatively high MED ancestry with EurasianDNA 1.0 compared to the mother who reported no Mediterranean or Southeastern European ancestry. Not expected from self-reporting in geopolitical terms however was the 15% MIDDEA and 25% SA ancestry for the mother. With an error of 7 percentage points, it seems unlikely that this result is due to statistical error and from this result we have good evidence for non-NOR, non-MED ancestry within the mothers family tree – perhaps extending back a few hundred to a few thousand years ago. However, looking at the children, the evidence is stronger. Child 1 is an example of an individual who obtained results that are somewhat discordant from those of his parents; we would have expected him to have scored somewhere between 5-15% MIDDEA, and 0-24% SA, but instead he scored of 34% MIDDEA and 0% SA. Some of the discrepancy is likely due to the random nature of chromosome inheritance (genetic assortment) but some is undoubtedly due to test error. Child 2 showed less NOR and MIDDEA and more MED than expected, for the same reasons.

Why is it that this pedigree so discordant? Recall that the average results are accurate to within 6-7% for any individual, meaning some people will exhibit 0% error, some 10% error, and a few 15-20% due to continuous allele frequencies. Say the mother has a 15% error in an opposite direction from one of her offspring, - her 15% MIDDEA is really 30% and her sons 34% is really 19%. With the father at 5%, the expected results for the son are 5-30% and the true value of 19% now make sense.

Of over 126 pedigrees studies, it is unusual for a single child to show such divergence from expectations much less two in the same pedigree like this, our most discordant pedigree, but even in this case we can extract useful information. When we look at the results in the context of the pedigree, it is clear that the mothers non-NOR and non-MED EurasianDNA 1.0 results are confirmed in her children (just not in the exact percentages we would expect, given the error associated with the test). The MIDDEA and SA ancestry observed in the children appears clearly to have come from the mother rather than the father. So, concluding that there is significant non-NOR and non-MED ancestry in this pedigree, contributed from the mothers ancestors is relatively safe. Note that concluding such from just one of the children, or even the mother is less secure – it is in confirming the result and its inheritance in the pedigree as a whole that we obtain the confidence to hypothesize that the mothers and fathers anthropological heritage are different. Given our confidence from the pedigree, we can now ask from where might this MIDEA and SA ancestry have come? Most likely, the mother had a significant number of Western/Central Asian, Middle Eastern and/or South Asian ancestors sometime within the past thousand (or more or less) years. For example, perhaps some of her ancestors were Roma gypsy, or perhaps some were Bedouin Arabs who settled in Europe within the past 1,000 years. The mother in this case feels such non-NOR, non-MED ancestry likely came from her mother rather than her father, since much more is known about the latter individuals ancestors. To form hypotheses like these from a single test result, one would need to see a relatively high level or MIDEA and/or SA ancestry, but lower levels are useful for forming such a hypothesis when results are available from multiple individuals within a pedigree. For this reason, to form the most sound hypotheses of your distant anthropological heritage, we recommend testing your mother and father if possible, your spouse and/or some of your brothers, sisters and children (i.e. in as many individuals in your immediate family as possible).

Here are a sampling of the typical pedigree results we obtain with EurasianDNA 1.0:

  NOR MED MIDEA SA
Gr5-1-P1 96 0 4 0
Gr5-1-P2 0 96 0 4
Gr5-1-S1 35 58 0 7
Gr5-1-S2 39 52 0 9

  NOR MED MIDEA SA
Gr8-0-P1 40 46 6 8
Gr8-0-P2 0 92 0 8
Gr8-0-S1 12 71 10 7
Gr8-0-S2 18 76 0 6

  NOR MED MIDEA SA
Gr9-0-P1 97 0 3 0
Gr9-0-P2 0 0 100 0
Gr9-0-S1 41 1 58 0
Gr9-0-S2 40 0 59 1

  NOR MED MIDEA SA
Gr13-0-P1 69 31 0 0
Gr13-0-P2 0 0 0 100
Gr13-0-S1 19 21 0 60
Gr13-0-S2 32 22 0 46

  NOR MED MIDEA SA
Gr18-1-P1 0 40 43 17
Gr18-1-P2 0 38 50 12
Gr18-1-S1 0 44 46 10
Gr18-1-S2 0 37 38 25

Interesting facts about the test

  1. NOR scores above 80% are unusual, and when they are obtained are usually obtained for people with light eye color shades, hair tones and skin complexions.
  2. More NOR ancestry is seen in Greeks and Turks than MED in Northern Europeans such as Irish.
  3. About half of the Greeks tested show less MED ancestry than they expected, but most all Greeks tested so far have shown substantial MED ancestry.
  4. Middle Eastern subtypes all type with substantial MIDEAS ancestry.
  5. The amount of South Asian ancestry a South Asian individual exhibits depends on from where in South Asia they are from –high levels of non-South Asian admixture is commonly seen in Northern India, but not Southern India for instance.
  6. The average North African has too much non-European ancestry for EurasianDNA 1.0 to be used within the accuracy specifications discussed on this website.

GO BACK TO TOP

EuropeanDNA 2.0 - The test

This test is a direct result of the work published from Dr. Mark Shriver's laboratory at the Pennsylvania State University in 2007 (Bauchet et al., 2007).  This study genotyped 11,071 autosomal SNPs (Single Nucleotide Polymorphisms, found on Chromosomes 1-22) in a population of continental Europeans and Eurasians. Up until March of 2007 this represented the largest study of its kind yet performed.

image3

Figure 1.  Results from Bauchet et al., 2007 which studied 12 European populations with 11,071 SNPs.  Results were obtained using the STRUCTURE program.  Bars represent individuals and the color mix of each bar represents the proportions of each of 5 possible ancestries in this analysis.  Black lines separate sample sets derived from different regions of Europe as indicated in the legend below.

The Bauchet Paper

The Bauchet study involved Armenians, Jewish, Greek, Spanish, Basque, French, Italian, German, English, Irish, Polish and Finnish samples.   The study identified predominant axes of population structure along a North-South axis, but also along a West-East axis as well.  This result is similar to other less-detailed autosomal studies that preceded it, including that executed at DNAPrint®'s laboratory which underlies our EurasianDNA 1.0 product.   Figure 1 shows the result obtained with the 11,071 SNPs when dividing the European continent into 5 sub-populations.  You will note that each element of European ancestry is represented by a color, and each sample as a bar.  Each sample is characterized with its own unique ancestry mix, represented by the different colors in each bar.  As may be seen, there are clear patterns for each population.  For example, there are 7 individuals of Greek heritage, and each is characterized by mostly “red” ancestry, with a small amount of “light blue” and less of the other elements.  The “red” ancestry is shared predominantly by individuals of Southeastern European, Armenian and Jewish (cultural, not religion) ancestry.  The light “light blue” ancestry is shared among individuals of Spanish and the “brown” ancestry predominantly among individuals of Basque heritage – though it is interesting that some Italians and Spanish show some extensive “brown” admixture.  The “green” ancestry is shared among individuals of continental European ancestry, such as Germans, English, French, Irish and Polish.  The “dark blue” ancestry is found predominantly in individuals of Northeastern European ancestry, such as Polish, Baltic and Finnish.  We can ascribe geographical names for each of these elements of ancestry as follows, but note that these names are arbitrary – each element of ancestry corresponds to relatively isolated sub-populations that lived long ago in locations not precisely known and here we choose names reflective of modern-day distributions:

Red – SOUTHEASTERN EUROPEAN/EURASIAN (SEE)
Light Blue – IBERIAN (IB)
Brown – BASQUE (BAS)
Light Green – CONTINENTAL EUROPEAN (CE)
Blue – NORTHEASTERN EUROPEAN (NEE)

EuropeanDNA 2.0 Accuracy and Precision

To create EuropeanDNA 2.0, we re-analyzed the same genetic data and samples used in Bauchet et al., 2007.  We harvested the most informative European AIMs from the 11,071 SNPs typed by Bauchet et al., 2007, using a measure called the Fst, and we found that a specially selected set of 1,349 of these provided most all of the information of the larger set of 11,071.  This ability of the smaller marker set to resolve the same 5 elements of European ancestry shown in Figure 1 can be seen in Figure 2, (the same individuals were used for both figures).  The main differences, aside from the meaningless difference in the order of colors from top to bottom in each bar is a slightly higher background of red ancestry for continental Europeans.  This type of subtle difference between a very large and smaller marker set is to be expected, and we can fortunately quantify the reliability of the estimates as we will discuss later.




Figure 2.  Results from DNAPrint’s laboratory which studied the same 12 European populations as Bauchet et al., 2007, but using only 1,349 specially selected SNPs from Bauchet's set of 11,071 SNPs.  Results were obtained using the STRUCTURE program.  Bars represent individuals and the color mix of each bar represents the proportions of each of 5 possible ancestry in this analysis.  Black lines separate sample sets derived from different regions of Europe as indicated in the legend below.  Providing results that are similar to those obtained using all 11,071 SNPs in Bauchet et al., 2007, this marker set was chosen to constitute the EuropeanDNA 2.0 marker panel.

Individual Results

For primarily European individuals (no significant African, East Asian or Native American admixture), sub-European ancestry mix can be determined by typing the individual with the 1,349 EuropeanDNA 2.0 markers and comparison to the reference European “parental” samples.   This comparison enables us to determine an individual’s percentage of “Southeastern European”, “Iberian”, “Basque”, “Continental European” and “Northeastern European” ancestry and infer from where the individual's European ancestors most likely derived.  For example, a person with half “red” (“Southeastern European”) and half “light blue” (“Iberian”), with no “green”, “dark blue” or “brown” ancestry would most likely be someone of mixed European ancestry, with ancestors who were from or contributed to Armenian, Jewish, Greek and/or other Southeastern European populations (e.g. - Italy) as well as ancestors who were from or contributed to the Spanish population.  Such a person would most likely be of mixed Armenian, Jewish, Greek and/or Spanish heritage and less likely to be of German, English, Irish, French, Polish or Finnish heritage.

Figure 3 shows a specific example.  The results for a test sample (“Customer”) are shown to the right of the bar plot for the reference parental samples.  The “Customer” exhibits primarily blue, or Northeastern European (NEE) ancestry.  The only reference samples that exhibit this type of pattern are Finnish, though of course we are referring to only a sampling of Europe and individuals from other Northeastern European populations are likely to also exhibit this type of admixture pattern.  This “Customer” can conclude that their European ancestry is primarily Northeastern European (NEE), and such a “Customer” would most likely have recent ancestors from Norway, Finland, Sweden, Russia, and/or possibly the Baltic countries.

Euro2image3

Figure 3.EuropeanDNA 2.0 results for the reference, parental population samples compared to that for a single unknown “Customer”.  The unknown sample was an individual created in a computer to be of 100% Northeastern European ancestry through the process of genetic simulation.

Comparing EurasianDNA 1.0 and EuropeanDNA 2.0

There are important differences between the EuropeanDNA 2.0 and EurasianDNA 1.0. tests.  EurasianDNA 1.0 assumes a Eurasian population model – that is, that your ancestors came from Continental Europe, the Middle East and/or South Asia.  EuropeanDNA 2.0 assumes a primarily continental European model – that is, that your ancestors mostly came from Europe (Southeast, Continental, Northeast etc.).  You will note that EuropeanDNA 2.0 considers a larger set of more closely related populations.  This is a more difficult problem to solve, and for this reason, EuropeanDNA 2.0 uses a much larger number of markers in your DNA (1,349 versus 333 for EurasianDNA 1.0). 

How to Use EuropeanDNA 2.0

Since EuropeanDNA 2.0 assumes a continental population model, it is useful only for individuals who are primarily continental European.  How might a person know this?   EuropeanDNA 2.0 is the latest release of a continuum of other products that are useful for making this type of decision.  A customer without any knowledge of their ancestry might take these tests in a logical order, such as:

  1. Take the AncestryBydna™ 2.5 test and learn that they are primarily European, with little sub-Saharan African, East Asian or Native American admixture.
  2. Then take the EurasianDNA 1.0 test and learn that most of their grandparents were likely to be of continental European origin.
  3. And then, take the new EuropeanDNA 2.0 test to learn from where in Europe these ancestors derived.

Step 2 may be considered optional for most customers - some customers might skip this step based on a written genealogical record, or some other evidence. However step 1) is generally required – since we need to make sure the results we disseminate meet basic quality control criteria (that is, we would not want to test an East Asian, Native American or African individual with EuropeanDNA 2.0.  Since the 2.0 test assumes primarily continental European ancestry - the semantic meaning of the results would be lost, and a proper interpretation of the results would be very difficult, and meaningful only in terms of genetic distance and shared ancestry rather than derived ancestry).

EuropeanDNA 2.0 can be used as a second opinion for EurasianDNA 1.0 results

Since the power and resolution with which we can infer genetic ancestry is proportional to the number of populations and markers studied, the Bauchet paper provided us an opportunity to develop a more advanced test for those who desire to pinpoint their European ancestry with more precision and resolution.  For example, a typical EurasianDNA 1.0 customer may have obtained results of equally mixed NOR1 and NOR2 ancestry.  Both types are found throughout Europe, and the ratios differ from population to population but not by a lot, so this customer most likely would only be able to conclude that their European ancestry was predominantly Continental European, as opposed to Southeastern European, Middle Eastern or South Asian.  However, with the enhanced resolution and power of EuropeanDNA 2.0, this customer would likely be able to pinpoint their ancestry with more precision.  EurasianDNA 1.0 and EuropeanDNA 2.0 use different sets of genetic markers.  Since EuropeanDNA 2.0 is designed for a more detailed look at sub-European ancestry, it uses many more markers (1,349 versus 333 for EurasianDNA 1.0).

What is the Difference between EuropeanDNA 2.0 and other “Ethnogeographic” autosomal tests?

The difference is profound.  The main difference is power – due to the large number of markers used and the work that went into selecting these markers from the human genome. Autosomal genetic tests are relatively new and a great advance over the Y chromosome and mtDNA test (which only report on a small fraction of your ancestors).  However among the new autosomal tests, quality differs dramatically from EuropeanDNA 2.0.  Discerning sub-European ancestry and admixture is very ambitious, and the research needed to find and develop the markers required was not inexpensive. Some groups have taken a “quick-and-dirty” approach and launched so-called "Ethnogeographical" DNA tests that use forensic markers called STRs for reporting sub-European ancestry from the autosomes.  Usually, autosomal STRs are not selected from human DNA based on their ancestry information content, and so they provide less power.  Because they lack this power, many of these tests are incapable of reporting admixture (such as 50% Iberian, 50% Basque), as opposed to primary affiliation (such as, “the sample belongs to group X”).  Worse, the error associated with these other tests is not only dramatically higher, but primarily undocumented and undisclosed to customers.  The power of using Ancestry Informative SNPs is that a very large collection can be screened, and large panels of highly informative markers can be assembled to provide the EuropeanDNA 2.0 test with unprecedented power.  This power translates into accuracy and meaning, which other tests attempting to resolve European sub-ancestry lack.  Further, we can precisely quantify the error so that customers know how to interpret their results.  That the bases for most of these other tests for inferring ancestry have not been published – in papers detailing their performance, or as part of other papers where they have been used for this purpose - speaks volumes.  The basis for the EuropeanDNA 2.0 panel of markers was published in the American Journal of Human Genetics earlier in 2007 (Bauchet et al., 2007) and, of course, the publication was peer reviewed.

EuropeanDNA 2.0 Accuracy and Precision

Since our AIMs(Ancestry Informative Markers)  are not linked to one another, we can easily create simulated samples in a computer, and measure the mathematical error encountered in assessing admixture with EuropeanDNA 2.0 by comparing the results for these simulated samples with their expected results.  Inspection of the results from simulated 100% Continental European (CE), for instance, shows that the average simulated 100% CE sample registers with about 91.3% CE ancestry and about 2.5% NEE error, 1.5% IB error, 2.0% BA error and 2.8% SEE error (total error = 8.7%) (Table I).


EUro2image4

Table I.  Average results for simulated samples (column 1) with respect to each element of European sub-Ancestry with the EuropeanDNA 2.0 panel (subsequent columns).

Thus, for an individual who is primarily of CE ancestry, readings in the low single digit %s of BA, SEE, IB and NEE are not meaningful.  As can be seen in Table I, the average level of error for any specific type of admixture in an individual of any particular type of primary ancestry varies from about 1% to about 3% depending on the type of ancestry and/or ancestry mix exhibited by the sample.  Over all types of admixture and primary ancestry backgrounds, using 100% and all possible 50%/50% simulated samples, we calculate that the average bias or error is 7.3%.  This figure is impressive for a within-continent assay and rivals what we obtain for the mathematically and genetically easier problem of between-continent admixture (AncestryByDNA™ 2.5).  This result is possible given the large number of optimally informative markers used by EuropeanDNA 2.0.

The error is caused by the fact that the difference in sequence for the AIMs between groups is not absolute, but continuous.  For example, the frequency of the minor allele of one AIM may be 0.35 in the SEE group, but only about 0.10 in the other groups so it provides some “power” for resolving ancestry between these two groups but not absolute power.  Over 1,349 AIMs, the SEE vs. “other” power would be much better than for one AIM, but still not absolute, so we can consider this to be a form of test deficiency.  Although EuropeanDNA 2.0 has the most power to resolve between European groups over any other commercially available high-throughput panel before it, it is not perfect and this imperfection is what we measure with the simulations as “bias” in Table I. 

For interpreting results, it is more useful to understand what levels of admixture are required in order to conclude with 95% certainty that the admixture is real.  Table II shows these values, which were obtained from simulated samples used for Table I.  From Table II we can see that an individual of primarily NEE ancestry needs to see greater than 6.1% CE admixture in order to safely conclude that the CE admixture is bona-fide, as opposed to statistical noise.  As another example from Table II, an individual of primarily IB ancestry would need to see greater than 7.3 BA admixture to conclude that the admixture was real as opposed to merely statistical noise.  Overall, taking the average of all of these values, we can see that admixture levels of over 6% are generally required in order to safely conclude that the admixture is real.  Values as low as these have never before been reported for a commercially available (or academic) autosomal assay and they indicate that EuropeanDNA 2.0 is a remarkably sensitive assay given the ambitious nature of its goal of measuring within-continental admixture.  The resultant values show that, generally speaking, EuropeanDNA 2.0 can easily detect the ancestry contributed by a single grandparent and in most cases, a single great-grandparent (and in certain cases, even a single great-great grandparent).

Euro2image5

Table II.  Level above which one needs to observe in order to conclude with 95% certainty that the reading is the result of bona-fide ancestry rather than mere statistical noise.  You will note that for individuals of most types of ancestral backgrounds, for most types of ancestry, readings in the mid to high single digit percentages are significant indications of that type of ancestry.  However, the value differs depending on the primary ancestry backgrounds and type of admixture.

Sensitivity for Detecting Single European Relatives

Based on our simulation results, we know the levels we need to observe in order to conclude with 95% certainty that the level is a significant indication of ancestry.  Since we have these values, we can now assess how well we are able to detect admixture from a single relative.  We start with the contribution from a single great-great grandparent (GGP).  Assume an individual has 7 GGPs from one ancestry group and 1 GGP from another.  How well EuropeanDNA 2.0 is suited for detecting the ancestry contributed by this single GGP can be determined with additional simulations.  We simulate individuals for each possible 1GGP/7GGP scenario, obtain EuropeanDNA2.0 results for these simulated individuals and determine how frequently the ancestry contributed by the single GGP exceeds the 95% threshold.    A perfect European assay would detect the ancestry contributed by the single GGP 100% of the time (we don’t have to deal with independent assortment issues here since we have simulated each sample to be 12% ancestry from one group and 88% ancestry from another).  Table III shows the results.  On average, EuropeanDNA 2.0 was able to detect the ancestry contributed by the single GGP 53.3% of the time, although the success rate depends upon the groups from which the 7 GGPs and 1GGP come.  For example, for an individual with 7 SEE great-grandparents and 1 NEE great-grandparent, and who inherited 88% and 12% admixture respectively as expected, EuropeanDNA 2.0 will detect the NEE ancestry (show a level over the 95% threshold) 82.0% of the time.  In contrast, for an individual with 7 NEE great-grandparents and 1 CE great-grandparent, and who inherited 88% and 12% admixture respectively as expected, EuropeanDNA 2.0 will detect the CE ancestry (show a level over the 95% threshold) 26.0% of the time.  In short, there is about a 50/50 chance that a person could detect the ancestry contribution from a single GGP using EuropeanDNA 2.0.

Euro2image6

Table III.   Percent of simulated samples showing expected ancestry percentages above the 95% threshold require for concluding the ancestry is “real”, where the expected percentage was contributed by a single great-grandparent (GGP) on a background of 7 other GGPs of another ancestry.  100 samples were simulated for each 1:7 GGP combination.  For example, when 7 GGPs come from the SEE group, and 1 GGP from the NEE group, EURO2.0 reliably detects the NEE admixture 82.0% of the time – that is, the percentage observed is above the threshold required to conclude that it exists 82% of the time. 

Bibliography
Bauchet M, McEvoy B, Pearson L, Quillen E, Sarkisian T, Hovhannesyan K, Deka R, Bradley D, Shriver M. 2007.  Measuring European Population Stratification with Microarray Genotype Data.  American Journal of Human Genetics 80(5):948-956.

Cavalli-Sforza LL, Menozzi, P. and A Piazza. 1994. The History and Geography of Human Genes. Princeton University Press, Princeton, NJ
Chakraborty, R. 1986. Gene admixture in human populations: models and predictions. Yearbook of Phys. Anthropol. 29:1-43.

Jobling, M., M. Hurles, C. Tyler-Smith. Human evolutionary genetics, origins, peoples and disease. New York, NY.

Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun;155(2):945-59.

Wang, J. 2003. Maximum-likelihood estimation of admixture proportions from genetic data. Genetics 164:747-765