ADMIXTURE is really bad..pun intended - badMIXTURE software

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

ADMIXTURE is really bad..pun intended - badMIXTURE software Feb 8, 2019 11:28:45 GMT -5

Quote

Post by djoser-xyyman on Feb 8, 2019 11:28:45 GMT -5

ADMIXTURE is really bad..pun intended - badMIXTURE software

A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots -
Daniel J. Lawson(2018)

While researching different genetic software I came across this. Yes, we tend to misinterpret what these ADMIXTURE chart means. Inferring that the population with the highest frequency/color is the ‘ancestral” populations. I know that NOT to be true most experts know that also. ADMIXTURE only shows “shared” ancestral. Other tools are need to show parental populations.

Here is some excerpts from the study.

Quote:
“
The history of the world told by STRUCTURE or ADMIXTURE is thus a tale that is skewed towards populations that are currently large and that have grown from small numbers of founders, with the bottlenecks that that implies.

Discussion
STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled data follows a simple history comprising a differentiation phase
followed by a mixture phase, as assumed in an ADMIXTURE model and highlighted by case study 1. Naïve inferences—(xyyman comment not really naïve it is intentional by the researchers. The lay person is naïve ) based on
this model (the Protocol of Fig. 1) can be misleading if sampling strategy or the inferred value of the number of populations K is
inappropriate, or if recent bottlenecks or unobserved ancient structure appear in the data. It is therefore useful when interpreting
the results obtained from real data to think of STRUCTURE and ADMIXTURE as algorithms that parsimoniously explain variation
between individuals rather than as parametric models of divergence and admixture

It is not obvious that any single approach represents a direct replacement
as a data summary tool. Here we build more directly on the results of STRUCTURE/ADMIXTURE by developing a new
approach, badMIXTURE, to examine which features of the data are poorly fit by the model. Rather than intending to replace more
specific or sophisticated analyses, we hope to encourage their use by making the limitations of the initial analysis clearer.

assumptions. Here we use simulated and real data to illustrate
how following this protocol can lead to inference of false histories,
and how badMIXTURE can be used to examine model fit and
avoid common pitfalls.

In the case of African Americans, the most important
sources are West Africans, who were brought to the Americas as
slaves, and European settlers. The two groups are thought to have
been previously separated with minimal genetic contact for tens
of thousands of years. This means that their history can be
separated into two phases, a “divergence phase” lasting thousands
of years of largely independent evolution and an “admixture
phase”, in which large populations met and admixed within the
last few hundred years. Specifically, most of the ancestors of
African Americans that lived 500 years ago were either Africans
or Europeans. The goal of the algorithm is to reconstruct the gene
frequencies of these two distinct “ancestral” populations and to
estimate what proportion of their genome each African American
inherited from them.

badMIXTURE results distinguish between scenarios. badMIXTURE
Uses patterns of DNA sharing to assess the goodness of fit of a recent admixture model to the underlying the genetic data.
These sharing profiles are generated using CHROMOPAINTER11, which calculates, for each individual, which of the
other individual(s) in the sample are most closely related for each stretch of genome, using either haplotype or allele matching. This
process is called “chromosome painting”, and can be thought of in terms of “palettes” (Fig. 2c), which can also be visualised as bar
plots. The palette measures the proportion of the genome of each individual that is most closely related to the individuals sampled
from each of the labelled populations. The painting palettes differ for the three simulated scenarios (Fig. 2c), showing that there
should be information in the genetic data to distinguish between them, even though they give almost identical ADMIXTURE
bar plots.

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

ADMIXTURE is really bad..pun intended - badMIXTURE software Feb 8, 2019 11:29:10 GMT -5

Quote

Post by djoser-xyyman on Feb 8, 2019 11:29:10 GMT -5

badMIXTURE distinguishes the Recent Admixture scenario from alternatives because the Recent Admixture model makes the
distinct prediction that admixed individuals are not particularly related to each other, as shown by the small amount of black in
their palettes in Fig. 2c.

badMIXTURE is still informative without linkage information. STRUCTURE/ADMIXTURE has been applied to thousands of
different species, most of which do not have the linkage maps (either physical or genetic) usually required for chromosome
painting. The algorithm can also be applied to data sets with relatively small numbers of markers. It would therefore be
advantageous to be able to apply a similar approach to these data sets.

Case Study 3:
Worldwide human data.
An important consideration in any STRUCTURE analysis is sample size. This is
vividly illustrated by the analyses of Friedlaender et al.18 who augmented a pre-existing microsatellite data set from a worldwide
collection by a similar number of samples from Melanesia

the fine-scale relationships amongst the Melanesians whilst accounting for admixture. Our purpose here is to ask what
the results imply, when interpreted literally, about the relationships between Melanesians, East Asians and Europeans. For all values from K = 2 to K = 9, the French population is inferred to
be a mixture between an East Asian population and a Melanesian one (Fig. 5d, e). Only for K = 10 do the French form their own
cluster and still have variable levels of admixture from East Asians (Fig. 5c). Throughout, interpretation of the ancestral
populations based on where individuals are geographically today

The history of the world told by STRUCTURE or ADMIXTURE is thus a tale that is **skewed towards** populations that are currently
large and that have grown from small numbers of founders, with the bottlenecks that that implies.

Discussion
STRUCTURE and ADMIXTURE are popular because they give the user a broad-brush view of variation in genetic data, while allowing
the possibility of zooming down on details about specific individuals or labelled groups. Unfortunately it is rarely the case that sampled
data follows a simple history comprising a differentiation phase followed by a mixture phase, as assumed in an ADMIXTURE
model and highlighted by case study 1. Naïve inferences based on this model (the Protocol of Fig. 1) can be misleading if sampling
strategy or the inferred value of the number of populations K is inappropriate, or if recent bottlenecks or unobserved ancient
structure appear in the data. It is therefore useful when interpreting the results obtained from real data to think of STRUCTURE and
ADMIXTURE as algorithms that parsimoniously explain variation between individuals rather than as parametric models of divergence
and admixture

Non-African humans have a few percent Neanderthal ancestry, but this is invisible to STRUCTURE or ADMIXTURE since it does not
result in differences in ancestry profiles between individuals. The

Several methods have been developed to estimate K1,2,22, but for real data, the assumption that there is a true value is always
incorrect; the question rather being whether the model is a good enough approximation to be practically useful. First, there may be
close relatives in the sample which violates model assumptions23. Second, there might be “isolation by distance”, meaning that there
are no discrete populations at all

The algorithm uses variation in admixture proportions between individuals to approximately
mimic the effect of more than K distinct drift events without estimating ancestral populations corresponding to each one. In
other words, an admixture model is almost always “wrong” (Assumption 2 of the Core protocol, Fig. 1) and should not be
interpreted without examining whether this lack of fit matters for a given question.
In other words, an admixture model is almost always “wrong”
In other words, an admixture model is almost always “wrong”
In other words, an admixture model is almost always “wrong”

In other words, an admixture model is almost always “wrong”

In other words, an admixture model is almost always “wrong”

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

ADMIXTURE is really bad..pun intended - badMIXTURE software Feb 8, 2019 11:29:43 GMT -5

Quote

Post by djoser-xyyman on Feb 8, 2019 11:29:43 GMT -5

The popularity of STRUCTURE and its descendants as unsupervised clustering methods means that they will
be applied and interpreted, for which badMIXTURE provides important assistance. However, these analyses should always be
followed up with tests of specific hypotheses, using other approaches. Running STRUCTURE or ADMIXTURE is the
beginning of a detailed demographic and historical analysis, not the end.

Without data you are just another person with an opinion - Deming

djoser-xyyman Vizier Without data you are just another person with an opinion - Deming Posts: 3,268	ADMIXTURE is really bad..pun intended - badMIXTURE software Feb 8, 2019 11:30:26 GMT -5 asante likes this Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by djoser-xyyman on Feb 8, 2019 11:30:26 GMT -5 All models show Euroepans are a subset of Africans. genes mimic geography! Always had!
	Without data you are just another person with an opinion - Deming

ADMIXTURE is really bad..pun intended - badMIXTURE software

Post by djoser-xyyman on Feb 8, 2019 11:28:45 GMT -5

Post by djoser-xyyman on Feb 8, 2019 11:29:10 GMT -5

Post by djoser-xyyman on Feb 8, 2019 11:29:43 GMT -5

Post by djoser-xyyman on Feb 8, 2019 11:30:26 GMT -5