|
Post by djoser-xyyman on Mar 27, 2018 10:39:32 GMT -5
Man!, I wish I had finished my MSBA in Computer science. All these are coded in languages I don’t know. Apparently this is run in the Python environment. Why is it so important? Because this I yet another software that may be used to determine ACTUAL direction of migration. Damn! I just don’t have the time. It should debunk all the hypothesis about “Eurasians Back-migrating” . Such a stupid idea that is even being repeated on here.
---------------------------- Quote from: Efficiently inferring the demographic history of many populations with allele count data John A. Kamm(2018)
2 Background Suppose a sample of n = .n1;§; nD/ genomes have been sampled from D “demes” or populations. The positions in the genome where the samples are not all identical are called segregating sites. In most organisms mutations are rare; most sites are not segregating. It is therefore reasonable to assume, as we do from now on, that each position in the genome has experienced at most a single mutation in its history, and that each individual can be labeled as having the “ancestral” or “derived” (mutant) allele at each segregating site. In population genetics, this simplifying assumption is known as the infinite sites model. The sample frequency spectrum (SFS) is a D-dimensional array [fx] Ë Z.n1+1/5.nD+1/ whose entry fx counts the number of segregating sites with exactly x copies of the derived allele and n * x copies of the ancestral allele, where x = .x1;§; xD/ Ë ND0 with 0 f xd f nd for each d = 1;§;D. Note we only consider segregating sites with 2 alleles, so f0 = fn = 0 by definition. Compared to the full data set (i.e., the complete genetic sequences of all n = n1 + 5 + nd genomes), the SFS [fx] is a compressed, low dimensional summary which nevertheless preserves much of the signal about the various population size changes, divergence times, and admixture events that occurred over the course of the populations’ history.
|
|
|
Post by djoser-xyyman on Mar 27, 2018 10:46:06 GMT -5
user-friendly software implementation that will enable practitioners to deploy our method. The software uses automatic differentiation (Corliss et al., 2002; Bhaskar et al., 2015; Maclaurin et al., 2015) to compute derivatives of the SFS, leading to efficient optimization and parameter inference. Our package, called momi2, is available for download at github.com/popgenmethods/momi2.
|
|
|
Post by djoser-xyyman on Mar 27, 2018 10:48:30 GMT -5
We applied momi2 to estimate the strength and timing of basal Eurasian admixture into early European farmers, and the split time of the basal Eurasian lineage. To do this, we built a demographic model relating 12 samples from 8 populations, shown in Figure 2. These samples consisted of the Altai Neanderthal (Prüfer et al., 2014); the 45,000 year old Ust’Ishim man from Siberia (Fu et al., 2014); 3 present-day populations (Mbuti, Sardinian, Han) with 3, 2, and 2 samples respectively; and 3 ancient samples representing the European ancestry components identified by Lazaridis et al. (2014): a 7,500 year old sample from the Linearbandkeramik (LBK) culture (representing EEF), an 8,000 year old sample from the Loschbour rock shelter in Luxembourg (representing WHG), and the 24,000 year old Mal’ta boy (“MA1”) from Siberia (representing ANE). After data cleaning, our dataset consisted of 2:4 106 autosomal transversion SNPs. See Appendix A.3 for more details about the data
|
|
|
Post by djoser-xyyman on Mar 27, 2018 10:51:13 GMT -5
Our inferred demography, along with nonparametric bootstrap re-estimates, are shown in Figure 2 and Table 2. Our parametric bootstrap estimates are shown in Figure 3. We inferred a pulse of 0.094 (95% CI of 0.049-0.174) from the ghost Basal Eurasian population to EEF ancestry (LBK), substantially less than the 0.44 inferred by (Lazaridis et al., 2014). This admixture was inferred to occur 33.7 kya (95% CI of 10.8- 41.1 kya), shortly after the Loschbour-LBK split at 37.7 kya (95% CI of 32.2-42.3 kya). The split time of the ghost Basal Eurasian lineage from other Eurasians was inferred at 79.8 kya (95% CI of 67.4-101 kya). Other parameters were broadly in line with previous estimates, such as a Mbuti-Eurasian split of 96 kya, a Han-European split of 50 kya, a Neanderthal split of 696 kya, and Eurasians deriving 0.03 of their ancestry from Neanderthal (Terhorst et al., 2017; Green et al., 2010; Meyer et al., 2016).
|
|
|
Post by djoser-xyyman on Mar 27, 2018 10:53:59 GMT -5
Inferring the optimal demography from start to finish took 2.5 hours on a laptop with 4 CPU cores, and used 2 GB RAM. The 300 bootstraps were run separately on a high-performance compute cluster. To our knowledge, no other method can infer this demographic model using the full SFS. The moments software package (Jouganous et al., 2017) is capable of computing the SFS for up to 5 populations, less than the 8 populations here, though it can scale to more individuals per population than momi2.
Estimating the mutation rate was possible here because we did not use a prespecified mutation rate to estimate the model in Figure 2, instead using the known ages of the Ust’Ishim, LBK, Loschbour, and MA1 samples to calibrate dates.
|
|
|
Post by djoser-xyyman on Mar 27, 2018 10:56:12 GMT -5
To avoid biases in ancient DNA caused by deamination (Dabney et al., 2013), we removed all transitions (i.e. AG and CT mutations), keeping only the transversions. We used Chimp as a proxy for the ancestral allele, removing all sites where the Chimp allele was missing. After data cleaning, we were left with 2,444,888 autosomal transversion SNPs that were segregating among the samples excluding MA1 and Neanderthal
|
|
|
Post by clydewin98 on Mar 28, 2018 15:34:58 GMT -5
Man!, I wish I had finished my MSBA in Computer science. All these are coded in languages I don’t know. Apparently this is run in the Python environment. Why is it so important? Because this I yet another software that may be used to determine ACTUAL direction of migration. Damn! I just don’t have the time. It should debunk all the hypothesis about “Eurasians Back-migrating” . Such a stupid idea that is even being repeated on here. ---------------------------- Quote from: Efficiently inferring the demographic history of many populations with allele count data John A. Kamm(2018) 2 Background Suppose a sample of n = .n1;§; nD/ genomes have been sampled from D “demes” or populations. The positions in the genome where the samples are not all identical are called segregating sites. In most organisms mutations are rare; most sites are not segregating. It is therefore reasonable to assume, as we do from now on, that each position in the genome has experienced at most a single mutation in its history, and that each individual can be labeled as having the “ancestral” or “derived” (mutant) allele at each segregating site. In population genetics, this simplifying assumption is known as the infinite sites model. The sample frequency spectrum (SFS) is a D-dimensional array [fx] Ë Z.n1+1/5.nD+1/ whose entry fx counts the number of segregating sites with exactly x copies of the derived allele and n * x copies of the ancestral allele, where x = .x1;§; xD/ Ë ND0 with 0 f xd f nd for each d = 1;§;D. Note we only consider segregating sites with 2 alleles, so f0 = fn = 0 by definition. Compared to the full data set (i.e., the complete genetic sequences of all n = n1 + 5 + nd genomes), the SFS [fx] is a compressed, low dimensional summary which nevertheless preserves much of the signal about the various population size changes, divergence times, and admixture events that occurred over the course of the populations’ history.You're getting it twisted. It's not the program its the data you put into the program. They are using Bayesian statitics as a result, the program supports what ever opinion you already have concerning a data set.
|
|
|
Post by djoser-xyyman on Aug 19, 2018 20:19:22 GMT -5
bump
|
|