Is Reich Lab coming clean? - Extensive DNA database- WOWWW!
Apr 14, 2019 13:40:42 GMT -5
isaiah likes this
Post by djoser-xyyman on Apr 14, 2019 13:40:42 GMT -5
Downloadable genotypes of present-day and ancient DNA data (compiled from published papers)
reich.hms.harvard.edu/downloadable-genotypes-worlds-published-ancient-dna-data
genome (in hg19 coordinates).
All data released here:
(a) have already been published (some by our group and some by other groups - see full list of references below),
(b) have permissions appropriate for fully public data release,
(c) are typed at a set of 1,233,013 sites in the genome (or 597,573 sites for present-day individuals genotyped on the Affymetrix Human Origins array). Typing is typically pseudo-haploid for ancient samples, when coverage is too low for full genotyping.
There are two datasets:
"1240K" : Ancient and present-day individuals (from either shotgun sequencing data or in-solution target capture, with a range of coverages) at 1,233,013 sites,
"1240K+HO": Data from the above set merged with present-day individuals typed on the Human Origins array with 597,573 sites.
Each dataset consists of four files, in eigenstrat format. For details, please see: eigensoft:
.anno: Rich meta-information for each individual.
.ind : Three columns: Individual ID, sex determination, and group label.
.snp : Information on each analyzed SNP position (SNP id, physical/genetic location and reference/variant alleles, where the reference allele matches hg19).
.geno: Genotypes.
This is a data release as of Fri Feb 22 12:25:37 EST 2019. We aim to continue to update this dataset over time.
Version 37.2
Description .anno .ind .snp .geno Tarball all files Notes
1240K link
(1.1Mb) link
(189Kb) link
(75Mb) link
(1.5Gb) link
(1.6Gb) 5081 individuals
(2107 ancient, 2974 present-day)1
1240K+HO link
(987Kb) link
(281Kb) link
(36Mb) link
(1.1Gb) link
(1.2Gb) 7744 individuals
(2107 ancient, 5637 present-day)1
1: includes one ancestral reference, and three present-day references: human, chimp, gorilla.
We would be grateful if users of this dataset could alert us to any errors they detect and help us to fill in missing data. This could include: (1) errors or missing information for location, latitude, longitude, archaeological context, date, and group label, (2) concerns about Y chromosome or mitochondrial DNA haplogroup determinations, and (3) evidence for other problems in the data or annotations for individuals. Please write to Swapan 'Shop' Mallick and David Reich with any suggestions. We would also be grateful if members of the community could suggest additional content that would be helpful to add to this page to make it maximally useful. Finally, please let us know if there is any ancient DNA data we should be including that we have missed.
reich.hms.harvard.edu/downloadable-genotypes-worlds-published-ancient-dna-data
genome (in hg19 coordinates).
All data released here:
(a) have already been published (some by our group and some by other groups - see full list of references below),
(b) have permissions appropriate for fully public data release,
(c) are typed at a set of 1,233,013 sites in the genome (or 597,573 sites for present-day individuals genotyped on the Affymetrix Human Origins array). Typing is typically pseudo-haploid for ancient samples, when coverage is too low for full genotyping.
There are two datasets:
"1240K" : Ancient and present-day individuals (from either shotgun sequencing data or in-solution target capture, with a range of coverages) at 1,233,013 sites,
"1240K+HO": Data from the above set merged with present-day individuals typed on the Human Origins array with 597,573 sites.
Each dataset consists of four files, in eigenstrat format. For details, please see: eigensoft:
.anno: Rich meta-information for each individual.
.ind : Three columns: Individual ID, sex determination, and group label.
.snp : Information on each analyzed SNP position (SNP id, physical/genetic location and reference/variant alleles, where the reference allele matches hg19).
.geno: Genotypes.
This is a data release as of Fri Feb 22 12:25:37 EST 2019. We aim to continue to update this dataset over time.
Version 37.2
Description .anno .ind .snp .geno Tarball all files Notes
1240K link
(1.1Mb) link
(189Kb) link
(75Mb) link
(1.5Gb) link
(1.6Gb) 5081 individuals
(2107 ancient, 2974 present-day)1
1240K+HO link
(987Kb) link
(281Kb) link
(36Mb) link
(1.1Gb) link
(1.2Gb) 7744 individuals
(2107 ancient, 5637 present-day)1
1: includes one ancestral reference, and three present-day references: human, chimp, gorilla.
We would be grateful if users of this dataset could alert us to any errors they detect and help us to fill in missing data. This could include: (1) errors or missing information for location, latitude, longitude, archaeological context, date, and group label, (2) concerns about Y chromosome or mitochondrial DNA haplogroup determinations, and (3) evidence for other problems in the data or annotations for individuals. Please write to Swapan 'Shop' Mallick and David Reich with any suggestions. We would also be grateful if members of the community could suggest additional content that would be helpful to add to this page to make it maximally useful. Finally, please let us know if there is any ancient DNA data we should be including that we have missed.