Post by djoser-xyyman on Jan 3, 2013 20:07:55 GMT -5
The utility of short tandem repeat(STR) loci beyond human identification: Implications for development of new DNA typing systems
Ranajit Chakraborty1, Bruce Budowle2 Yixi Zhong1
Forensic Sciences Research and Training Center, FBI Academy, Quantico, VA, USA
Since the first characterization of the population genetic properties of repeat polymorphisms, the number of short tandem repeat (STR) loci validated for forensic use has now grown to at least 13. Worldwide variations of allele frequencies at these loci have been studied, showing that variations of interpopulation diversity at these loci do not compromise the power of identification of individuals.
However, data collected for validation of these loci for forensic use has utility beyond human identification; the origin and past migration history of modern humans can be reconstructed from worldwide variations at these loci.
Here, we provide the absolute power of the validated set of 13 STR loci for addressing these issues using multilocus genotype data on 1,401 individuals belonging to seven populations:
(US European-American, US African-American, Jamaican, Italian, Swiss, Chinese and Apache Native-American).
Genomic research is discovering new classes of polymorphic loci (such as the single nucleotide polymorphisms, SNPs) and lineage markers (such as the mitochondrial DNA and Y-chromosome markers);
We conclude that the current set of STR loci is adequate for addressing most problems of human identification. However, if suitable number of SNPs are used that would match the power of the STR loci, they alone cannot resolve more complex cases unless they are supplemented by the validated STR loci.
2 STR loci ± the present battery of 13 loci
13 polymorphic STR (CSF1PO,
TH01,
TPOX,
FGA,
D3S1358,
D5S818,
D7S820,
D8S1179,
D13S317,
D16S539,
D18S51,
D21S11
and vWA)
and one genderdetermining (Amelogenin)
AmpFlSTR Profiler PCR Amplification Kit, User©s Manual, Part number 402945, Rev. A, Perkin-Elmer Applied Biosystems, Norwalk, CN 1997
The genetic characteristics of these loci, such as chromosomal locations, repeat motifs, range of repeat sizes commonly found in different populations, are available in the product brochures of the commercial kits currently validated for such purposes [6, 7]. Several features are clear from the summary characteristics of these STR loci. First, all but two of the 13 STR loci are located on different chromosomes. The two (CSF1PO and D5S818) that are located on the same chromosome are also far apart (CSF1PO is localized to 5q33.3-34 and D5S818 to 5q23.3-32). This implies that it is unlikely to observe correlated genotypes at these loci, unless the population is highly inbred and isolated or very severely fragmented (i.e. substructured). Second, the genomic location of these loci suggests that they are unlikely to have any major functional significance. This implies that genotypes at these loci cannot be predicted from known physical and/or behavioral attributes of individuals. Third, the allele-size ranges observed at these loci suggest the these loci are expected to be highly polymorphic even in isolated populations. Note that most of these loci have been found to be polymorphic even in other great apes [11±13], suggesting that these STRs represent evolutionarily relatively ancient polymorphisms in comparison to the genetic differentiation of the modern human populations.
3 Evolutionary genetic inference from 13 STR loci
A considerable amount of worldwide data is now available at these loci which supports the above assertions. In this presentation, we consider allele frequency data from seven major population groups (US European-American, Italian, Swiss, US African-American, Jamaican, Chinese, and Apache Native-Americans) for each of the 13 STR loci. For example, it shows that most alleles are present in all populations, and the loci are extremely polymorphic in each population
XXXXXXXXX
Neighbor joining networks based on DA (panel a) and (dm)2 (panel b) are virtually identical; this conforms with current anthropological knowledge that the populations of African ancestry form a cluster, this being the most distant cluster from the remaining populations. These network trees also confirm that the US European-Americans are closest to the Europeans (e.g., the Italians and the Swiss), and that a considerable proportion of the genes of US African-Americans are of African descent. The proximity of the Apache group to the Chinese indicates that the ancestry of this Native American-Indian tribe is of (north-)east Asian origin.
with respect to the number of alleles and percent heterozygosity per locus), the populations of African ancestry have the highest level of within-population variation, and the Native American Apache population has the lowest variability. While this may be interpreted in terms of the antiquity of modern populations (i.e., modern humans evolved in Africa, and all other human populations are the result of a relatively recent Out-of- Africa migration), the question can be raised whether such differences imply any greater degree of genetic similarity between random individuals in populations with a lower within-population variation. For example, the most common 13-locus homozygous profile is expected to occur in a full-sibling with a frequency no more than 1 in 7000 sibs of an Apache Native-American. In other words, the suspect should have 7000 full-sibs before an expected recurrence of the same specific 13-locus homozygous profile is to be observed. For human families, this is clearly a biological absurdity
In summary, the above computations show that the present battery of 13 STR loci provide sufficient statistical strength for most applications of forensic identification (including DNA mixture interpretations) and parentage analysis. Illustrative worldwide data used in these computations exhibit that although the point estimates of the various statistical measures are somewhat different from one population to the other, anthropologically affine population groups yield estimates that are within their respective sampling variation. As a consequence, we may surmise that forensic databases described by major anthropological population groupings should be enough to evaluate statistical strengths of DNA evidence based on this set of 13 STR loci.
5 SNP versus STR
In conclusion, the above numerical results of the comparative efficiency analysis of SNP and STR loci suggest that, without population data on SNP loci, a definite prescription regarding the required number of SNP loci cannot be given; to equal the power of the 13 STR loci with regard to genotypic match probability and/or paternity exclusion, however, somewhere in the range of 30±60 SNP loci would be needed, and they must be selected in such a manner that the assumption of independence across loci are met. Note that since SNP loci are biallelic (and hence, less mutable than the STR loci), the population substructure effect on SNP loci can be more severe than at the STR loci [37, 38]. Hence, more careful validation studies of SNP loci would be needed before implementing them for forensic and paternity analysis. In addition, the efficiency of SNP loci for interpreting DNA mixture evidence is far more reduced, necessitating a far
Above and beyond this, the population data collected in this context can address many of the broad questions of the human genome diversity studies, such as the evolutionary relationship of populations, the implications of reduced genetic variation in specific populations, as well as inference of the past demographic history of populations [39]. The availability of commercial kits for genotyping the STR loci [6, 7] offers the opportunity to conduct population genetic analysis by pooling data through interlaboratory comparisons of results. Worldwide allele frequency data at these 13 STR loci also raises some questions that could yield information as to the mechanism of maintenance of genetic variation at these tetranucleotide loci.
Ranajit Chakraborty1, Bruce Budowle2 Yixi Zhong1
Forensic Sciences Research and Training Center, FBI Academy, Quantico, VA, USA
Since the first characterization of the population genetic properties of repeat polymorphisms, the number of short tandem repeat (STR) loci validated for forensic use has now grown to at least 13. Worldwide variations of allele frequencies at these loci have been studied, showing that variations of interpopulation diversity at these loci do not compromise the power of identification of individuals.
However, data collected for validation of these loci for forensic use has utility beyond human identification; the origin and past migration history of modern humans can be reconstructed from worldwide variations at these loci.
Here, we provide the absolute power of the validated set of 13 STR loci for addressing these issues using multilocus genotype data on 1,401 individuals belonging to seven populations:
(US European-American, US African-American, Jamaican, Italian, Swiss, Chinese and Apache Native-American).
Genomic research is discovering new classes of polymorphic loci (such as the single nucleotide polymorphisms, SNPs) and lineage markers (such as the mitochondrial DNA and Y-chromosome markers);
We conclude that the current set of STR loci is adequate for addressing most problems of human identification. However, if suitable number of SNPs are used that would match the power of the STR loci, they alone cannot resolve more complex cases unless they are supplemented by the validated STR loci.
2 STR loci ± the present battery of 13 loci
13 polymorphic STR (CSF1PO,
TH01,
TPOX,
FGA,
D3S1358,
D5S818,
D7S820,
D8S1179,
D13S317,
D16S539,
D18S51,
D21S11
and vWA)
and one genderdetermining (Amelogenin)
AmpFlSTR Profiler PCR Amplification Kit, User©s Manual, Part number 402945, Rev. A, Perkin-Elmer Applied Biosystems, Norwalk, CN 1997
The genetic characteristics of these loci, such as chromosomal locations, repeat motifs, range of repeat sizes commonly found in different populations, are available in the product brochures of the commercial kits currently validated for such purposes [6, 7]. Several features are clear from the summary characteristics of these STR loci. First, all but two of the 13 STR loci are located on different chromosomes. The two (CSF1PO and D5S818) that are located on the same chromosome are also far apart (CSF1PO is localized to 5q33.3-34 and D5S818 to 5q23.3-32). This implies that it is unlikely to observe correlated genotypes at these loci, unless the population is highly inbred and isolated or very severely fragmented (i.e. substructured). Second, the genomic location of these loci suggests that they are unlikely to have any major functional significance. This implies that genotypes at these loci cannot be predicted from known physical and/or behavioral attributes of individuals. Third, the allele-size ranges observed at these loci suggest the these loci are expected to be highly polymorphic even in isolated populations. Note that most of these loci have been found to be polymorphic even in other great apes [11±13], suggesting that these STRs represent evolutionarily relatively ancient polymorphisms in comparison to the genetic differentiation of the modern human populations.
3 Evolutionary genetic inference from 13 STR loci
A considerable amount of worldwide data is now available at these loci which supports the above assertions. In this presentation, we consider allele frequency data from seven major population groups (US European-American, Italian, Swiss, US African-American, Jamaican, Chinese, and Apache Native-Americans) for each of the 13 STR loci. For example, it shows that most alleles are present in all populations, and the loci are extremely polymorphic in each population
XXXXXXXXX
Neighbor joining networks based on DA (panel a) and (dm)2 (panel b) are virtually identical; this conforms with current anthropological knowledge that the populations of African ancestry form a cluster, this being the most distant cluster from the remaining populations. These network trees also confirm that the US European-Americans are closest to the Europeans (e.g., the Italians and the Swiss), and that a considerable proportion of the genes of US African-Americans are of African descent. The proximity of the Apache group to the Chinese indicates that the ancestry of this Native American-Indian tribe is of (north-)east Asian origin.
with respect to the number of alleles and percent heterozygosity per locus), the populations of African ancestry have the highest level of within-population variation, and the Native American Apache population has the lowest variability. While this may be interpreted in terms of the antiquity of modern populations (i.e., modern humans evolved in Africa, and all other human populations are the result of a relatively recent Out-of- Africa migration), the question can be raised whether such differences imply any greater degree of genetic similarity between random individuals in populations with a lower within-population variation. For example, the most common 13-locus homozygous profile is expected to occur in a full-sibling with a frequency no more than 1 in 7000 sibs of an Apache Native-American. In other words, the suspect should have 7000 full-sibs before an expected recurrence of the same specific 13-locus homozygous profile is to be observed. For human families, this is clearly a biological absurdity
In summary, the above computations show that the present battery of 13 STR loci provide sufficient statistical strength for most applications of forensic identification (including DNA mixture interpretations) and parentage analysis. Illustrative worldwide data used in these computations exhibit that although the point estimates of the various statistical measures are somewhat different from one population to the other, anthropologically affine population groups yield estimates that are within their respective sampling variation. As a consequence, we may surmise that forensic databases described by major anthropological population groupings should be enough to evaluate statistical strengths of DNA evidence based on this set of 13 STR loci.
5 SNP versus STR
In conclusion, the above numerical results of the comparative efficiency analysis of SNP and STR loci suggest that, without population data on SNP loci, a definite prescription regarding the required number of SNP loci cannot be given; to equal the power of the 13 STR loci with regard to genotypic match probability and/or paternity exclusion, however, somewhere in the range of 30±60 SNP loci would be needed, and they must be selected in such a manner that the assumption of independence across loci are met. Note that since SNP loci are biallelic (and hence, less mutable than the STR loci), the population substructure effect on SNP loci can be more severe than at the STR loci [37, 38]. Hence, more careful validation studies of SNP loci would be needed before implementing them for forensic and paternity analysis. In addition, the efficiency of SNP loci for interpreting DNA mixture evidence is far more reduced, necessitating a far
Above and beyond this, the population data collected in this context can address many of the broad questions of the human genome diversity studies, such as the evolutionary relationship of populations, the implications of reduced genetic variation in specific populations, as well as inference of the past demographic history of populations [39]. The availability of commercial kits for genotyping the STR loci [6, 7] offers the opportunity to conduct population genetic analysis by pooling data through interlaboratory comparisons of results. Worldwide allele frequency data at these 13 STR loci also raises some questions that could yield information as to the mechanism of maintenance of genetic variation at these tetranucleotide loci.