|
Post by djoser-xyyman on Jun 19, 2018 13:51:11 GMT -5
lobstr.teamerlich.org/install.html#osxdarwin.informatics.indiana.edu/str/index.htmlgithub.com/Arkarachai/STR-FMUsing STR-FM to assess error profiles in the context of STRs for Illumina data Detecting STRs in short reads Abstract Go to: Background Short tandem repeats (STRs) are found in many prokaryotic and eukaryotic genomes, and are commonly used as genetic markers, in particular for identity and parental testing in DNA forensics. The unstable expansion of some STRs was associated with various genetic disorders (e.g., the Huntington disease), and thus was used in genetic testing for screening individuals at high risk. Traditional STR analyses were based on the PCR amplification of STR loci followed by gel electrophoresis. With the availability of massive whole genome sequencing data, it becomes practical to mine STR profiles in silico from genome sequences. Software tools such as lobSTR and STR-FM have been developed to address these demands, which are, however, built upon whole genome reads mapping tools, and thus may not be sensitive enough. Go to: Results In this paper, we present a standalone software tool STRScan that uses a greedy algorithm for targeted STR profiling in next-generation sequencing (NGS) data. STRScan was tested on the whole genome sequencing data from Venter genome sequencing and 1000 Genomes Project. The results showed that STRScan can profile 20% more STRs in the target set that are missed by lobSTR. Go to: Conclusion STRScan is particularly useful for the NGS-based targeted STR profiling, e.g., in genetic and human identity testing. STRScan is available as open-source software at darwin.informatics.indiana.edu/str/.
|
|
|
Post by djoser-xyyman on Jun 19, 2018 14:29:13 GMT -5
omictools.com/microsatellite-detection-categoryYou will need Download the lobSTR package. Download the lobSTR resource bundle for the appropriate reference genome (available on the downloads page). If you'd like to use your own custom set of STR loci, see Building a custom reference before proceeding. Announcements As of v4+, allelotype can now take input BAM files generated by aligners other than lobSTR. The only aligner that has been tested and shown to work well is BWA-MEM. For more details on running allelotype with BWA-MEM-generated BAMs as input, see Best practices for using lobSTR with BWA-MEM alignments Using noise models generated by previous allelotype versions (before v3.0.0) with versions 3.0.0 and above will result in erroneous genotype calls. If using allelotype 3.0.0+, you must use the new noise model (illumina.v3.pcrfree) or a custom noise model built with the Basic usage (as of v4.0.0) To run lobSTR using default parameters: Single-end reads: lobSTR -f FILE1,FILE2,.. \ --index-prefix PATH_TO_INDEX/lobSTR_ \ -o OUTPUT_PREFIX \ --rg-sample SAMPLE \ --rg-lib LIB Paired-end reads: lobSTR --p1 FILE1,FILE2,.. --p2 FILE1,FILE2,... \ --index-prefix PATH_TO_INDEX/lobSTR_ \ -o OUTPUT_PREFIX
|
|
|
Post by zarahan on Jun 19, 2018 22:07:05 GMT -5
OK, now break down some application please. In a nutshell summary- WHat to STR profiles tell us about population movement and settlement in : --Egypt? -Nubia? -The Nile Valley?
ANd again in brief nutshell: how do STRs compare with other genetic markers used in said population analysis?
|
|
|
Post by djoser-xyyman on Jun 20, 2018 6:35:42 GMT -5
STRs are short tandem repeats. Each geographic population ie “race” has STRs profile that are UNIQUE to them. The typical internationally used STRs are and the Chr-Chromosome number they are found on: CSF1PO chr5 -Amarna D13S317 chr13 - Amarna D16S539 chr16 -Amarna D18S51 chr18 -Amarna D21S11 chr21 -Amarna D3S1358 chr3 -?? D5S818 chr5 D7S820 chr7 - Amarna D8S1179 chr8 FGA chr4 - Amarna PentaD chr21 PentaE chr15 TH01 chr11 TPOX chr2 As you can see from the JAMA Report(you don’t need DNATribes) the STR profile of the Amarnas were Reported The STR is what is up today used by law enforcement to ID criminal with absolute confidence. THEY DO NOT USED SNP!!!!!! Because you can’t use SNP. In other words if they want an idea of where the criminal is from they run the sample through the STR database to pinpoint the most likely geographic origin of the criminal. Each geographic population has their own UNIQUE profile for CODIS STRs. Eg Northern Europeans are distinct from Sub-Saharan Africans who are both different to Native Americans. That is why the STR profile of the Amarnas came back Sub-Saharan African. If the Samples of the Amarnas were run through the FBI STR database they would come back sub-saharan African it is that simple. But you don’t need the FBI database. There are many software and databases out there on the internet. They are free. You can also do the analysis manually on a spread sheet. But the STR profile of the Abusir was NOT released. This is the debate ElMaestro and I are having. Instead the SNP was released. They trick is to pull STR from the released SNP BAM files of the Abusir. This is the task at hand.
|
|
|
Post by djoser-xyyman on Jun 20, 2018 7:01:19 GMT -5
There is a reason why STR is preferred. STR are “linked” and is in disequilibrium ie LD, linkage disequilibrium which means during cell meosis or recombination STRs are NOT broken apart, they are too close together. That is why population that are geographically close will have similar STR profiles. That is why the Amarnas came back sub-saharan ie Great Lakes and South Africa(Malawi/Botswana/Mbuti?) and they were somewhat distant to West Africans.
If we can pull the STR from the samples of the Abusir we will know also if they were sub-saharan Africans. I am absolute sure they are sub-saharan African. The trick is getting the STR from the released data. This time around the researchers did not release the STR profile instead they released the SNP profile. Which is useless because all sub-saharans carry “Eurasian” SNP. The trick is how the SNPs “come together” to form the STRs.
But there are other methods to process the SNPs and make it useful ie TreeMix Which is what Sage and ElMaestro has been toying with. I havent’ done any serious computer work like coding in over a decade. But someone can and may crack the code with the Abusir.
I know enough to say it can be done. DNATribes crack the code with the Amarnas now anyone can do it. A similar thing can be done with the Abusir. We need at least 3-4 STRs, not all of them.
|
|
|
Post by djoser-xyyman on Jun 20, 2018 8:47:47 GMT -5
I am speculating that these two tools ...and maybe others can be used ie crack the code for the Abusir.
1. LOBSTR 2. STRScan 3. Others?
They should be able to crack the code for the Abusir. We already know the trick used by the researchers on the Abusir. The Abusir DID carry ancestry related to sub-Saharan Africans which is found mostly in horners and some south Africans but while they had you focusing on one component found in Yorubans they mis-drected people away from the sub-Saharan component found in SSA east Africans.
We never stated tat AEians came from West Africa.
These two software are quoted as saying they will take a bam files and spit out CODIS type STR from SNP data packets.
The Abusir data packet are in bam file format!!!
We can settle this once and for all.
|
|
|
Post by djoser-xyyman on Jun 20, 2018 8:52:44 GMT -5
I am sure these large companies like DNAConsultants and DNATribes etc have already done it. They are sitting on the results. If the results were in their favor it would be out by now and they would be making money like what they did with the Amarnas. It is impossible for the ABusir to be anything but Africans. PURE AFRICANS with no admixture. Forget about what you read about invasions because the Levant was occupied by Bedouins at that time. That is why the Bedouins are the closest based upon the sample set they used. We all know who the Bedouins are....they are remnants of North AFRICANS in the Levant.
|
|
|
Post by djoser-xyyman on Jun 20, 2018 10:12:42 GMT -5
popSTR: population-scale detection of STR variants. Kristmundsdóttir S1, Sigurpálsdóttir BD2, Kehr B1, Halldórsson BV1,2. Author information Abstract MOTIVATION: Microsatellites, also known as short tandem repeats (STRs), are tracts of repetitive DNA sequences containing motifs ranging from two to six bases. Microsatellites are one of the most abundant type of variation in the human genome, after single nucleotide polymorphisms (SNPs) and Indels. Microsatellite analysis has a wide range of applications, including medical genetics, forensics and construction of genetic genealogy. However, microsatellite variations are rarely considered in whole-genome sequencing studies, in large due to a lack of tools capable of analyzing them. RESULTS: Here we present a microsatellite genotyper, optimized for Illumina WGS data, which is both faster and more accurate than other methods previously presented. There are two main ingredients to our improvements. First we reduce the amount of sequencing data necessary for creating microsatellite profiles by using previously aligned sequencing data. Second, we use population information to train microsatellite and individual specific error profiles. By comparing our genotyping results to genotypes generated by capillary electrophoresis we show that our error rates are 50% lower than those of lobSTR, another program specifically developed to determine microsatellite genotypes. AVAILABILITY AND IMPLEMENTATION: Source code is available on Github: github.com/DecodeGenetics/popSTR CONTACT: snaedis.kristmundsdottir@decode.is or bjarni.halldorsson@decode.is. popstr software
|
|
|
Post by zarahan on Jun 20, 2018 20:57:31 GMT -5
WHen you say Abusir being pure Africans, can you define 'pure' a bit more? I would have thought there would be some mixing by the Late Period from all the foreign elements- including ROmans, as in the recent Abusir study which is one of its main limitations- and they admitted it. And on top of that they drew most sampling from the further north, dpwnplaying the south, as has been done many times in the past to skew results. egyptsearchreloaded.proboards.com/post/17940“According to the radiocarbon dates .. the samples can be grouped into three time periods: Pre-Ptolemaic (New Kingdom, Third Intermediate Period and Late Period), Ptolemaic and Roman Period."
"Written sources indicate that by the third century BCE Abusir el-Meleq was at the centre of a wider region that comprised the northern part of the Herakleopolites province, and had close ties with the Fayum.. We aim to study changes and continuities in the genetic makeup of the ancient inhabitants of the Abusir el-Meleq community .. since all sampled remains derive from this community in Middle Egypt and have been radiocarbon dated to the late New Kingdom to the Roman Period..” --Abusir study
|
|
|
Post by djoser-xyyman on Jun 20, 2018 21:11:23 GMT -5
Bedoiuns has occupied the Levant and Arabia for thousands of years. They are the orginal inhabitants of the Peninsular. Bedouins are essentially African Berbers.
So....what invasion?
|
|
|
Post by djoser-xyyman on Jun 21, 2018 7:50:35 GMT -5
We cannot and should not believe everything you read especially when written by Europeans. Can you read hieroglyphics? I can’t . But I under genetics enough to say with assurance there is no genetic evidence the Romans or Greeks are any different to the AEians. In fact the genetic evidence shows that there is no genetic proof that ancient Egypt was occupied by modern “Europeans”-type in early history. Eg we now know the R-1b in the Siwa are NOT R1b-M269 but the African version of R1b-V88. The only “foreign” genetic element found in modern Egyptians are from around 1300AD which most likely are the Ottoman Turks. It is NOT Islamic 6-700AD. So simple logic dictates if there is no genetic material….where did they go? Were the Romans and Greeks and AEians very similar genetically? That is the only conclusion we can come to. We know the Bedouins are indeed very ancient and very close genetically to the Abusirs but so too are the Somali’s. What conclusion can you draw then?
First You get the scientific FACTs then take it from there.
We now have to question everything we were taught or read. If you can’t read or understand the original language like Greek or hieroglyphics. Question the translation!
-------------------- Quote: “I would have thought there would be some mixing by the Late Period from all the foreign elements- including Romans, as in the recent Abusir study which is one of its main limitations- and they admitted it.
And on top of that they drew most sampling from the further north, dpwnplaying the south, as has been done many times in the past to skew results”
|
|
|
Post by djoser-xyyman on Jun 21, 2018 7:55:32 GMT -5
Just as another example about questioning what we were taught as kids in our schools. Modern genetic research has also shown that the Islamic peoples of the Iberia was expelled(late 1400’s)! I should correct that to say they were never expelled to North African regions as what the “history books” teaches us. Sources cited in my thread.
|
|
|
Post by djoser-xyyman on Jun 28, 2018 12:09:49 GMT -5
|
|
|
Post by djoser-xyyman on Jun 28, 2018 12:10:14 GMT -5
|
|
|
Post by djoser-xyyman on Jul 17, 2018 13:07:43 GMT -5
Simons Genome Diversity Project www.simonsfoundation.org/simons-genome-diversity-project/ The primary dataset (Panel C in the first column of the metadata file) consists of data from 260 genomes from 127 populations: 39 Africans, 23 Native Americans, 27 Central Asians or Siberians, 49 East Asians, 27 Oceanians, 38 South Asians and 71 West Eurasians. For convenience, genotyping results for an additional 18 genome sequences published previously are also included. The data include Variant Call Formats files (VCFs) with genotype calls at every position in the genome. The consortium also plans to release BAM files containing the raw sequencing reads. The researchers eliminated bias of alleles toward matching the human genome reference sequence, and determined genotypes on a single-sample basis to avoid preferential calling of genotypes from populations that had more individuals represented. Please note that there are approximately 10 terabytes of data and because of the large dataset size, the data need to be downloaded using the gridFTP software after applying for and obtaining a certificate from the hosting site. Use of the genome sequence data (Please respect Fort Lauderdale principles) All data are made freely available. However, please observe the Fort Lauderdale principles, which entitle the data producers to make the first presentation and publish the first genome-wide analysis of the data. The data can be used freely for studies of individual genes or other individual features of the genome. Simons Genome Diversity Project dataset is now available on the Seven Bridges Cancer Genomics Cloud
|
|