Post by djoser-xyyman on Feb 8, 2019 11:28:45 GMT -5
ADMIXTURE is really bad..pun intended - badMIXTURE software
A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots -
Daniel J. Lawson(2018)
While researching different genetic software I came across this. Yes, we tend to misinterpret what these ADMIXTURE chart means. Inferring that the population with the highest frequency/color is the ‘ancestral” populations. I know that NOT to be true most experts know that also. ADMIXTURE only shows “shared” ancestral. Other tools are need to show parental populations.
Here is some excerpts from the study.
Quote:
“
The history of the world told by STRUCTURE or ADMIXTURE is thus a tale that is skewed towards populations that are currently large and that have grown from small numbers of founders, with the bottlenecks that that implies.
Discussion
STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled data follows a simple history comprising a differentiation phase
followed by a mixture phase, as assumed in an ADMIXTURE model and highlighted by case study 1. Naïve inferences—(xyyman comment not really naïve it is intentional by the researchers. The lay person is naïve ) based on
this model (the Protocol of Fig. 1) can be misleading if sampling strategy or the inferred value of the number of populations K is
inappropriate, or if recent bottlenecks or unobserved ancient structure appear in the data. It is therefore useful when interpreting
the results obtained from real data to think of STRUCTURE and ADMIXTURE as algorithms that parsimoniously explain variation
between individuals rather than as parametric models of divergence and admixture
It is not obvious that any single approach represents a direct replacement
as a data summary tool. Here we build more directly on the results of STRUCTURE/ADMIXTURE by developing a new
approach, badMIXTURE, to examine which features of the data are poorly fit by the model. Rather than intending to replace more
specific or sophisticated analyses, we hope to encourage their use by making the limitations of the initial analysis clearer.
assumptions. Here we use simulated and real data to illustrate
how following this protocol can lead to inference of false histories,
and how badMIXTURE can be used to examine model fit and
avoid common pitfalls.
In the case of African Americans, the most important
sources are West Africans, who were brought to the Americas as
slaves, and European settlers. The two groups are thought to have
been previously separated with minimal genetic contact for tens
of thousands of years. This means that their history can be
separated into two phases, a “divergence phase” lasting thousands
of years of largely independent evolution and an “admixture
phase”, in which large populations met and admixed within the
last few hundred years. Specifically, most of the ancestors of
African Americans that lived 500 years ago were either Africans
or Europeans. The goal of the algorithm is to reconstruct the gene
frequencies of these two distinct “ancestral” populations and to
estimate what proportion of their genome each African American
inherited from them.
badMIXTURE results distinguish between scenarios. badMIXTURE
Uses patterns of DNA sharing to assess the goodness of fit of a recent admixture model to the underlying the genetic data.
These sharing profiles are generated using CHROMOPAINTER11, which calculates, for each individual, which of the
other individual(s) in the sample are most closely related for each stretch of genome, using either haplotype or allele matching. This
process is called “chromosome painting”, and can be thought of in terms of “palettes” (Fig. 2c), which can also be visualised as bar
plots. The palette measures the proportion of the genome of each individual that is most closely related to the individuals sampled
from each of the labelled populations. The painting palettes differ for the three simulated scenarios (Fig. 2c), showing that there
should be information in the genetic data to distinguish between them, even though they give almost identical ADMIXTURE
bar plots.
A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots -
Daniel J. Lawson(2018)
While researching different genetic software I came across this. Yes, we tend to misinterpret what these ADMIXTURE chart means. Inferring that the population with the highest frequency/color is the ‘ancestral” populations. I know that NOT to be true most experts know that also. ADMIXTURE only shows “shared” ancestral. Other tools are need to show parental populations.
Here is some excerpts from the study.
Quote:
“
The history of the world told by STRUCTURE or ADMIXTURE is thus a tale that is skewed towards populations that are currently large and that have grown from small numbers of founders, with the bottlenecks that that implies.
Discussion
STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled data follows a simple history comprising a differentiation phase
followed by a mixture phase, as assumed in an ADMIXTURE model and highlighted by case study 1. Naïve inferences—(xyyman comment not really naïve it is intentional by the researchers. The lay person is naïve ) based on
this model (the Protocol of Fig. 1) can be misleading if sampling strategy or the inferred value of the number of populations K is
inappropriate, or if recent bottlenecks or unobserved ancient structure appear in the data. It is therefore useful when interpreting
the results obtained from real data to think of STRUCTURE and ADMIXTURE as algorithms that parsimoniously explain variation
between individuals rather than as parametric models of divergence and admixture
It is not obvious that any single approach represents a direct replacement
as a data summary tool. Here we build more directly on the results of STRUCTURE/ADMIXTURE by developing a new
approach, badMIXTURE, to examine which features of the data are poorly fit by the model. Rather than intending to replace more
specific or sophisticated analyses, we hope to encourage their use by making the limitations of the initial analysis clearer.
assumptions. Here we use simulated and real data to illustrate
how following this protocol can lead to inference of false histories,
and how badMIXTURE can be used to examine model fit and
avoid common pitfalls.
In the case of African Americans, the most important
sources are West Africans, who were brought to the Americas as
slaves, and European settlers. The two groups are thought to have
been previously separated with minimal genetic contact for tens
of thousands of years. This means that their history can be
separated into two phases, a “divergence phase” lasting thousands
of years of largely independent evolution and an “admixture
phase”, in which large populations met and admixed within the
last few hundred years. Specifically, most of the ancestors of
African Americans that lived 500 years ago were either Africans
or Europeans. The goal of the algorithm is to reconstruct the gene
frequencies of these two distinct “ancestral” populations and to
estimate what proportion of their genome each African American
inherited from them.
badMIXTURE results distinguish between scenarios. badMIXTURE
Uses patterns of DNA sharing to assess the goodness of fit of a recent admixture model to the underlying the genetic data.
These sharing profiles are generated using CHROMOPAINTER11, which calculates, for each individual, which of the
other individual(s) in the sample are most closely related for each stretch of genome, using either haplotype or allele matching. This
process is called “chromosome painting”, and can be thought of in terms of “palettes” (Fig. 2c), which can also be visualised as bar
plots. The palette measures the proportion of the genome of each individual that is most closely related to the individuals sampled
from each of the labelled populations. The painting palettes differ for the three simulated scenarios (Fig. 2c), showing that there
should be information in the genetic data to distinguish between them, even though they give almost identical ADMIXTURE
bar plots.