Abusir STRs Finally? Africans? | Egyptsearch Reloaded

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Dec 13, 2018 15:33:34 GMT -5

Quote

Post by djoser-xyyman on Dec 13, 2018 15:33:34 GMT -5

human geneome refrence hg19 fatsa

code wget
wget hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/est.fa.gz

in freebayes:
freebayes -f est.fa JK2911_bam_complete_sorted.bam >JK2911_ABUSIR-var.vcf

then PLINK
Then merge?

hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/

ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/Homo_sapiens_assembly19.fasta

ftp.ensembl.org/pub/release-77/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz

ftp.ncbi.nlm.nih.gov/sra/reports/Assembly/GRCh37-HG19_Broad_variant/Homo_sapiens_assembly19.fasta

www.biostars.org/p/1796/

ftp.ensembl.org/pub/current_fasta/homo_sapiens/dna/

cat file1.fa file2.fa etc >multifastafile.fa will get you the reference human genome

freebayes compile

Last Edit: Dec 13, 2018 16:39:34 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 9, 2019 15:17:18 GMT -5 Tukuler al~Takruri likes this

Quote

Post by djoser-xyyman on Jan 9, 2019 15:17:18 GMT -5

SNPedia Report on Abusir jk2888

Oh! CODIS STR feasture was REMOVED in this version. Can't get my hands on the prior version with CODIS STR feature. A veteran coder can reinstate that feature. I am a novice coder!

BAM Analysis kit has pulled out these SNPs for Abusir-JK2888
Bam Kit is windows based and can be easily installed.

1. Download and install Bam Analysis kit(windows based but runs in DOS mode)

www.y-str.org/2014/04/bam-analysis-kit.html

2. Download Bam and Fastq files of the 3 Abusir mummies eg JK2888 etc

3. Make sure all files are in the same folder!!!! Including data files

4. Crate a sub-folder named "out". THIS IS VERY IMPORTANT! All Lower case "out"

5. Run the first Abusir bam file. Eg jk-2888. Accept all default settings if you want

6. The program will run for about 1-2hrs creating VCF files for each chromosome which you can copy and save as they are created. After about 2 hrs the "out" folder will have many files. One of which is "genome_complete.txt.gz" which is a compressed file of the entire aligned genome for Abusir JK2888 etc

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 9, 2019 15:17:53 GMT -5

Quote

Post by djoser-xyyman on Jan 9, 2019 15:17:53 GMT -5

SNP report for ABusir jk2888

rs11803731
Location: 1:152083325
Your Genotype: AA
Summary: straighter hair (more ..)

rs763317
Location: 7:55095197
Your Genotype: AA
Summary: 3.5x increased lung cancer risk for never-smoking females (more ..)

rs7027989
Location: 9:21817754
Your Genotype: AA
Summary: 1.26x increased myocardial infarction risk (more ..)

rs10811661
Location: 9:22134094
Your Genotype: TT
Summary: 1.2x increased risk for type-2 diabetes (more ..)

rs505922
Location: 9:136149229
Your Genotype: TT
Summary: blood type O (more ..)

rs10993994
Location: 10:51549496
Your Genotype: TT
Summary: 1.2x prostate cancer risk (more ..)

rs11636232
Location: 15:28386626
Your Genotype: CC
Summary: darker eye color (more ..)

rs1426654
Location: 15:48426484
Your Genotype: AA
Summary: probably light-skinned, European ancestry (more ..)xyyman comment- KENYANS!! Henn et al

rs736839
Location: 18:46528065
Your Genotype: CC
Summary: common, increased risk for sickle cell leg ulcers (more ..)

rs6036025
Location: 20:22039868
Your Genotype: GG
Summary: more likely to go bald (more ..)

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 16, 2019 15:39:59 GMT -5

Quote

Post by djoser-xyyman on Jan 16, 2019 15:39:59 GMT -5

Ok. As I said, I update tutorial when I have something new.

So for newbies and others interested. I got PLINK2 to work.! YES!! This a major breakthrough. Why?

Plink2 can run under Windows and since people on here uses Windows they can start playing around with doing their own testing of aDNA like with the Abusir.

So we have two major pieces(Software) that run off Windows. BamAnalysis kit and PLINK2.

Procedure?
1. Download 3 Abusir bam files.
2. Download and install BAMAnalyisis kit
3. Download PLINK2. Extract. This is line command so no "installing" is necessary.
4. Botht BAmAnalysis Kit and PLINK2 runs in the DOS/CMD environment under Windows.. Unlike many other programs that run under Unix which most people are not familiar with.
5. Process the ABusir mummies using BamA Kit and get the VCF files. You can combine the VCF files or test separately for each mummy.
6. Convert the Abusir VCF files to plink files using PLINK2 to get a bim, bed and fam files
7. Use Merge/Combine plink Files with HapMAp or your test comparison dataset. There should be only one large file with all your populations including Abusirs. Use merge?catenate file command.
8. Run these files(*.bim, *.bed, *.fam) through ADMIXTURE
9. Run these ADMIXTURE results files through gplot in R to see ADMIXTURE Chart

I am still working on tagging, labeling and titling.

Working on other things also like IBD etc. KING

More to come.

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 16, 2019 15:40:18 GMT -5

Quote

Post by djoser-xyyman on Jan 16, 2019 15:40:18 GMT -5

Script used for PLINK2 eg

plink2 --make-bed --vcf bam_chr3.vcf --out binary_testB

Your input file and output files are in italic and could be anything for name. Specify you VCF input file and give you output file any name you wish but just remember it..especially if your working folder is filled.

PLINK2 will spit out 4 files. Three will the bim, bed and fam files to be used in ADMIXTURE.

Last Edit: Jan 16, 2019 15:40:57 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 22, 2019 9:22:50 GMT -5

Quote

Post by djoser-xyyman on Jan 22, 2019 9:22:50 GMT -5

Fasta/fastq/BAM files to VCF files then index or sort/concat with VCFtools or bcftools.

Concatenate All Vcf.Gz Files
(assumes that your files are tabix indexed).

vcf-concat *.vcf.gz | gzip -c > out.vcf.gz

or use bcftools

-----------------------

How to concatenate “by chromosome”-VCFs?
Quote:
I have a several VCFs which are VCF which only contain information by chromosome. That is, there's a chromosome 1 VCF (with only chr1), a chromosome 2 VCF (with only chr2), etc.

I checked to make sure that these VCFs were valid via VCFtools, i.e.. vcf-validator

Quote:
I would recommend bcftools concat. You can't just cat them together because each file has a header section. The bcftools command will handle all that for you. Each vcf file must be sorted prior to calling concat

bcftools concat -o total_chroms.vcf chr1.vcf chr2.vcf chr3.vcf ... chrX.vcf

-o is the output file name

Indexing:
bcftools index -t or -c in.vcf.gz

-t=tabix -c = csi

Last Edit: Jan 22, 2019 10:42:57 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 22, 2019 11:07:18 GMT -5

Quote

Post by djoser-xyyman on Jan 22, 2019 11:07:18 GMT -5

vcftools [ --vcf FILE | --gzvcf FILE | --bcf FILE] [ --out OUTPUT PREFIX ] [ FILTERING OPTIONS ] [ OUTPUT OPTIONS ]

input
--vcf <input_filename>
--gzvcf <input_filename>

Filtering
--chr <chromosome>
--not-chr <chromosome>

--from-bp <integer>
--to-bp <integer>

--bed <filename>

OUTPUT OTHER FORMATS/b]
--BEAGLE-GL
--BEAGLE-PL
--plink
--plink-tped
--chrom-map

Converting VCF files to PLINK format
vcftools --vcf input_data.vcf --plink --chr 1 --out output_in_plink

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 22, 2019 12:38:54 GMT -5

Quote

Post by djoser-xyyman on Jan 22, 2019 12:38:54 GMT -5

jk2134_chr1.vcf.gz

vcf-validator jk2134_chr1.vcf.gz

vcf-sort jk2134_chr1.vcf.gz | gzip -c > sort-1-jk2134_chr1.vcf.gz

vcf-sort jk2134_chr2.vcf.gz | gzip -c > sort-1-jk2134_chr2.vcf.gz

vcf-sort jk2134_chr1.vcf.gz | gzip > sort-2-jk2134_chr1.vcf.gz

vcf-sort jk2134_chr2.vcf.gz | gzip > sort-2-jk2134_chr2.vcf.gz

vcf-concat

vcf-concat sort-1-jk2134_chr1.vcf.gz sort-1-jk2134_chr2.vcf.gz | gzip -c > combined-1-2-jk2134-jan-22-out.vcf.gz

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 22, 2019 12:39:52 GMT -5

Quote

Post by djoser-xyyman on Jan 22, 2019 12:39:52 GMT -5

next...bcftools

----

Fasta/fastq/BAM files to VCF files then index or sort/concat with VCFtools or bcftools.

Concatenate All Vcf.Gz Files
(assumes that your files are tabix indexed).

vcf-concat *.vcf.gz | gzip -c > out.vcf.gz

or use bcftools

-----------------------

How to concatenate “by chromosome”-VCFs?
Quote:
I have a several VCFs which are VCF which only contain information by chromosome. That is, there's a chromosome 1 VCF (with only chr1), a chromosome 2 VCF (with only chr2), etc.

I checked to make sure that these VCFs were valid via VCFtools, i.e.. vcf-validator

Quote:
I would recommend bcftools concat. You can't just cat them together because each file has a header section. The bcftools command will handle all that for you. Each vcf file must be sorted prior to calling concat

bcftools concat -o total_chroms.vcf chr1.vcf chr2.vcf chr3.vcf ... chrX.vcf

-o is the output file name

Indexing:
bcftools index -t or -c in.vcf.gz

-t=tabix -c = csi

Last Edit: Jan 22, 2019 12:43:28 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 23, 2019 16:21:26 GMT -5

Quote

Post by djoser-xyyman on Jan 23, 2019 16:21:26 GMT -5

test
freebayes -f hg19.fa chr1.bam > jk2888-chr1.vcf

Only about 14,000 autosomal SNPs were used when compared to the reference(hg19). Can someone verify?

------------------------------------------
Hg19 Reference (run in the same folder to obtain reference and index)
Just download it from here:

Step1
hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz

Step2
tar -zxvf chromFa.tar.gz

step3
cat chr*.fa > hg19.fa

Done!

Take about 20min total

>>>>>>>>>>>

Index the reference with BWA

Script:
./bwa index hg19.fa

Takes about 90minutes depending on system configuration

>>>>>>>>>>>>>>>>>>>>

Running freebayes to obtain VCF file to be used with plink1.9/plink2

freebayes -f hg19.fa JK2888-bam_complete_sorted.bam > jk2888-var1.vcf

Took about 3 hrs

-------------------------------

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 24, 2019 21:45:54 GMT -5

Quote

Post by djoser-xyyman on Jan 24, 2019 21:45:54 GMT -5

freebayes -f hg19.fa JK2911udg.fixedRG.bam > jk2911-freebayes-var1.vcf

grep '^#' jk2888_chr1.vcf > jk2888_merge.vcf

grep -v '^#' jk2888_chr1.vcf jk2888_chr2.vcf jk2888_chr3.vcf jk2888_chr4.vcf jk2888_chr5.vcf jk2888_chr6.vcf jk2888_chr7.vcf jk2888_chr8.vcf jk2888_chr9.vcf jk2888_chr10.vcf jk2888_chr11.vcf jk2888_chr12.vcf jk2888_chr13.vcf jk2888_chr14.vcf jk2888_chr15.vcf jk2888_chr16.vcf jk2888_chr17.vcf jk2888_chr18.vcf jk2888_chr19.vcf jk2888_chr20.vcf jk2888_chr21.vcf >> jk2888_merge.vcf

plink --make-bed --vcf jk2134_merge.vcf --out jk2134-forplink-out

error correction

plink --make-bed --allow-extra-chr --vcf jk2134_merge.vcf --out jk2134-forplink-out

plink --make-bed --allow-extra-chr --vcf jk2911_merge.vcf --out jk2911-forplink-out

plink --make-bed --allow-extra-chr --vcf jk2888_merge.vcf --out jk2888-forplink-out

Last Edit: Jan 24, 2019 22:07:00 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 25, 2019 15:51:44 GMT -5

Quote

Post by djoser-xyyman on Jan 25, 2019 15:51:44 GMT -5

BTW @ newbies. I came across some NEW tools to plot ADMIXTURE . It is really simple. Now. There are several ways to convert the Abusir bam/fastq files to plink format to plot in ADMIXTURE. We can do the same with these dataset on the ancient Iberians ....WITHOUT African autosomes but African uniparental haplogroups. Wink! wink!

The easier process I just came across
1. Get the Bam/Fastq files of the ancient individuals say the Iberians above..OP
2. Download and install the following software. These are all done on a LINUX/UNIX/UBUNTU System - Freebayes, Plink19/Plink2, BWA, ADMIXTURE, R, and the human refrence-hg19 in fasta format.
3. Index the hg19 using BWA to setup you reference aligner using hg19 fasta file
4. Verify the Abusir sample sets(x3) are in bam format and in the same folder as the aligned hg19 packet(Indexed).
5. Create you Abusir VCf files using Freebayes. Eg script
freebayes -f hg19.fa JK2888udg.fixedRG.bam > jk2888-from-full-bam.vcf
6. This will kick out the Abusir file(jk2888-from-full-bam.vcf) to run through the next software. ...PLINK1.9
7. Using PLINK to create your Abusir plink files (bim,bed, fam) of the 3 abusir. One at a time.
8. Merge the 3 Abusir samples from PLINK with you database files also in PLINK format. HGDP or whichever you want to use.
9. With the new merged file run it through ADMIXTURE to get you Q file
10 Running your Q file with R software to get your barplot.

I am still working on labelling. More to come.

BTW running to freebayes can take up to 8 hours or more on the larger Abusir mummy file.

I will post pictures and a more detailed tutorial when I have the time. But it is all there above. There are other methods but so far I found this is the cleanest and fastest using Freebayes and BWA.
Installing the programs is a bear.

Oh! No.5 is simply stating "Freebayes to take file jk2888udg.fixedRG.bam, align it to the hg19 reference and kick out the VCf version of the abusir file" ...which can be used in PLINK which in turn can be used in ADMIXTURE. SIMPLE!!!!

Your file naming convention is up to you.

Last Edit: Jan 25, 2019 16:06:03 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming

Tukuler al~Takruri
Administrator

WomenMusicRumCigars

Life is what you make it! You can dance a world.

Posts: 1,972

Abusir STRs Finally? Africans? Jan 27, 2019 21:01:10 GMT -5

Quote

Post by Tukuler al~Takruri on Jan 27, 2019 21:01:10 GMT -5

One guy didn't take well to my reduxes.
Hoping you find something useful about it.

Sorted by
• 100% frequencies
• majority frequencies
• plurality frequencies
• tandem K's
• minority

I got 5 or 6, likely geographic, major populations out of the 10 allowed bins.
I think 21's what they use for global analysis now.

In a quick glance the redux indicates isolates, 'partners', infusions, and 'trickles'.

As seen, the algorithm determines the layout.
Independent minded blacks will have to design these kinds of programs but based on algorithms they will build.
Using TheSet's algorithms will give you TheSet's refined data.
Refined data influenced by unconscious bias, good bad indifferent, of the coder.

Dec 2, 2018 20:47:31 GMT -5 djoser-xyyman said:

Last Edit: Jan 27, 2019 21:06:27 GMT -5 by Tukuler al~Takruri

-----------------------------
© 2010 - 2021 YYT al~Takruri
-----------------------------
truth is prism refracted fact
i'm just another point of view

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 30, 2019 11:26:11 GMT -5

Quote

Post by djoser-xyyman on Jan 30, 2019 11:26:11 GMT -5

Ta! Still have to figure out how to keep pops together...

Jan 27, 2019 21:01:10 GMT -5 Tukuler al~Takruri said:

One guy didn't take well to my reduxes.
Hoping you find something useful about it.

Sorted by
• 100% frequencies
• majority frequencies
• plurality frequencies
• tandem K's
• minority

I got 5 or 6, likely geographic, major populations out of the 10 allowed bins.
I think 21's what they use for global analysis now.

In a quick glance the redux indicates isolates, 'partners', infusions, and 'trickles'.

As seen, the algorithm determines the layout.
Independent minded blacks will have to design these kinds of programs but based on algorithms they will build.
Using TheSet's algorithms will give you TheSet's refined data.
Refined data influenced by unconscious bias, good bad indifferent, of the coder.

Dec 2, 2018 20:47:31 GMT -5 djoser-xyyman said:

Without data you are just another person with an opinion - Deming

djoser-xyyman
Vizier

Without data you are just another person with an opinion - Deming

Posts: 3,268

Abusir STRs Finally? Africans? Jan 30, 2019 11:27:07 GMT -5

Quote

Post by djoser-xyyman on Jan 30, 2019 11:27:07 GMT -5

the latest

placeholder - creating bim bed fam files for the Abusirs

After creating you vcf file for each Abusir above using freebayes or other methods/script. Then...

e.g.
a.---- Validate it!!!

vcf-validator jk2888-hg19.vcf, etc

b. - ---create a tabix index for each to merge

bgzip -c jk2888-hg19.vcf > jk2888-hg19.gz
tabix -p vcf jk2888-hg19.gz

c. ---merge using vcftools
vcf-merge jk2911-hg19.gz jk2888-hg19.gz jk2134-hg19.gz | bgzip -c > jk2888-jk-2134-jk-2911-out.vcf.gz

d. ---unzip and run through plink

plink --make-bed --vcf jk2888-jk-2134-jk-2911-out.vcf --out jk2888-jk-2134-jk-2911-out

result output-
jk2888-jk-2134-jk-2911-out.bim
jk2888-jk-2134-jk-2911-out.bam
jk2888-jk-2134-jk-2911-out.fam

These can now be run in ADMIXTURE and R

Last Edit: Jan 30, 2019 12:00:01 GMT -5 by djoser-xyyman

Without data you are just another person with an opinion - Deming