=======GATK多样本(群call)需要0/0基因型位点======= GATK升级到4之后,HaplotypeCaller只能处理单样本文件,当有多样本时,官方建议使用HaplotypeCaller对单bam文件分别进行变异检测,生成GVCF文件,GVCF会记录每一个位点到情况,包括有无突变,VCF只记录突变位点情况,之后在下一步对GVCF文件进行合并。 前处理与流程一致,测试脚本如下: /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar ReorderSam -I=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/QC/bam/RNA_AS_2.bam -O=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder.bam -R=/TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa --TMP_DIR=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/tmp --VALIDATION_STRINGENCY SILENT /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar AddOrReplaceReadGroups -I=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder.bam -O=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head.bam -SO=coordinate -LB=LB -PL=illumina -PU=PU -SM=RNA_AS_2 --TMP_DIR=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/tmp --VALIDATION_STRINGENCY SILENT /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar MarkDuplicates -I=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head.bam -O=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head_dedup.bam --VALIDATION_STRINGENCY=SILENT -M=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head_dedup_metrics.txt -MAX_FILE_HANDLES=4000 --CREATE_INDEX=true --REMOVE_DUPLICATES=true --TMP_DIR=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/tmp /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar SplitNCigarReads --skip-mapping-quality-transform -R=/TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa -I=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head_dedup.bam -O=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head_dedup_split.bam --tmp-dir=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/tmp call变异位点,加参数-ERC GVCF,输出GCVF文件 /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar HaplotypeCaller -I=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2_reorder_head_dedup_split.bam -O=/TJPROJ6/NC_BG_SH/shouhou/202303/X101SC22072170-Z02-J001/SNP/RNA_AS_2/RNA_AS_2.vcf -R=/TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa --dont-use-soft-clipped-bases=true -ploidy=2 -stand-call-conf=20 --output-mode=EMIT_ALL_SITES -ERC GVCF 合并多样本GVCF文件,并获取发生变异的位点 #合并(限制内存4G最快,可以按染色体拆分或者再将染色体按bin拆分,节约时间,拆分脚本/TJPROJ6/RNA_SH/personal_dir/zhangxin/jiaoben/snp/split_vcf.py) /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Xmx4g -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar CombineGVCFs -R /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa -V RNA_AS_1/RNA_AS_1.vcf.gz -V RNA_AS_2/RNA_AS_2.vcf.gz -V RNA_FS_1/RNA_FS_1.vcf.gz -V RNA_FS_2/RNA_FS_2.vcf.gz -V RNA_YS_1/RNA_YS_1.vcf.gz -V RNA_YS_2/RNA_YS_2.vcf.gz -O GATK_merge_all_sample.vcf #获取变异位点 /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Xmx25g -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar GenotypeGVCFs -R /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa --variant GATK_merge_all_sample.vcf -O GET_GATK_merge_all_sample.vcf 若果有dbsnp,可加参数--dbsnp /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/bin/java -Xmx25g -jar /TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/SNP/share/gatk4-4.1.1.0-0/gatk-package-4.1.1.0-local.jar GenotypeGVCFs -R /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Oryza_sativa/Oryza_sativa_UGA/genome/Sequence/WholeGenomeFasta/genome.fa --variant GATK_merge_all_sample.vcf -O GET_GATK_merge_all_sample.vcf --dbsnp 通过这种方法可获取,多样本的变异位点,保留基因型为0/0的位点,之后可对vcf文件继续筛选,注释等操作(未测试) 关于拆分脚本 /TJPROJ6/RNA_SH/personal_dir/zhangxin/jiaoben/snp/split_vcf.py -h usage: split_vcf.py [-h] [-A VCF] [-C] [-L CHR_LIST] [-B BIN] [-O OUT] [-F FAI] split vcf file optional arguments: -h, --help show this help message and exit -A VCF, --vcf VCF the vcf file -C, --chr split chr -L CHR_LIST, --chr_list CHR_LIST get chr list -B BIN, --bin BIN the bin size -O OUT, --out OUT the out dir -F FAI, --fai FAI the fai file 例如仅按染色体拆分 /TJPROJ6/RNA_SH/personal_dir/zhangxin/jiaoben/snp/split_vcf.py --vcf T4.vcf.gz --chr --out ./ --fai genome.fa.fai 按染色体bin拆分 /TJPROJ6/RNA_SH/personal_dir/zhangxin/jiaoben/snp/split_vcf.py --vcf T4.vcf.gz --bin 10000000 --out ./ --fai genome.fa.fai