2016年,Babraham生物信息组织发布了一款专门用来区分亲本来源reads的软件SNPsplit,它通过SAM/BAM文件reads上覆盖的已知SNP位点信息,能够将reads分配给其中一个等位基因。
1. 软件说明
SNPsplit软件只需要提供用来区分印记来源的SNP信息(通过双亲本VCF文件得到,详见下述实操部分——4.1、4.2),就可以针对生信常用软件(包括Bowtie2, TopHat, STAR, HISAT2, HiCUP 和Bismark)比对后的bam区分其亲本来源。
该软件的主要包括以下2部分模块:
1)SNPsplit_genome_preparation
重新建立比对N-masked基因组,其原理如下图,举例流程详见下述4.2部分。
2)SNPsplit
针对重新比对的bam文件,区分亲本来源reads。举例流程详见4.3、4.4。
step1
父母Y1和母本M1样本的vcf文件使用GATK3的CombineVariants合并。
/PUBLIC/software/public/System/jre1.7.0_25//bin/java -jar /PUBLIC/software/RNA/GATK/GenomeAnalysisTK.jar -T CombineVariants -R /TJPROJ13/GB_TR/reference_data/Animal/Homo_sapiens/Homo_sapiens_Ensemble_94/Sequence/WholeGenomeFasta/genome.fa -V /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/03.Result_X101SC24045650-Z01-J002_Homo_Sapiens/Result_X101SC24045650-Z01-J002_Homo_Sapiens/7.SNP/1.snpsite/Y1_SNP.vcf -V /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/03.Result_X101SC24045650-Z01-J002_Homo_Sapiens/Result_X101SC24045650-Z01-J002_Homo_Sapiens/7.SNP/1.snpsite/M1_SNP.vcf -o Y1_M1_gatk3.vcf
step2
SNPsplit_genome_preparation
/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit_genome_preparation --vcf_file Y1_M1_gatk3.vcf --strain Y1 --dual_hybrid --strain2 M1 --reference_genome /TJPROJ13/GB_TR/reference_data/Animal/Homo_sapiens/Homo_sapiens_Ensemble_94/Sequence/WholeGenomeFasta/ --genome_build GRCm38
step3 合并成基因组文件
cd /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/test/Y1_M1_dual_hybrid.based_on_GRCm38_N-masked cat *fa > all_N-masked.GRCm38.N-masked.fa
step4 构建索引
/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/prepare_data/bin/hisat2-build all_N-masked.GRCm38.N-masked.fa all_N-masked.GRCm38.N-masked
step5 比对
/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2 -x /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/test/all_N-masked.GRCm38.N-masked -p 4 --dta -t --phred33 -1 /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/01.RawData/T1_1.fq.gz -2 /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/01.RawData/T1_2.fq.gz --un-conc-gz T1.unmap.fq.gz 2> T1_align.log | samtools sort -O BAM --threads 4 -o T1.bam -
step6 拆分来自父本和母本的read数据得到bam文件,后续供转录组和甲基化分析等位基因的特异性。
/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit --snp_file all_M1_SNPs_Y1_reference.based_on_GRCm38.txt --paired --no_sort -o test --singletons T1.bam
参考文献:
1.Krueger F, Andrews SR. SNPsplit: Allele-specific splitting of alignments between genomes with known SNP genotypes. F1000Res. 2016 Jun 23;5:1479. doi: 10.12688/f1000research.9037.1 http://europepmc.org/article/MED/27429743 2.https://github.com/FelixKrueger/SNPsplit 3.https://github.com/FelixKrueger/SNPsplit/blob/master/SNPsplit_User_Guide.md