用户工具

站点工具


snpsplit

等位基因表达特异性_SNPsplit

https://mp.weixin.qq.com/s?__biz=MzA3MjM5NTE4Mg==&mid=2247483801&idx=1&sn=cb7d2b98effb4ffff360ea1f2e9e8997&chksm=9ef0cdef1c4d02b07df0a1e0af4b89656184bb49eabeb44786f01495a7b58a75ebca4f90afd5&mpshare=1&scene=1&srcid=0920mj53gskLctfFcEWAu3XM&sharer_shareinfo=d66f12922fa1b207ff45e7932f0d7db3&sharer_shareinfo_first=d66f12922fa1b207ff45e7932f0d7db3&version=4.1.9.99285&platform=mac&nwr_flag=1#wechat_redirect

2016年,Babraham生物信息组织发布了一款专门用来区分亲本来源reads的软件SNPsplit,它通过SAM/BAM文件reads上覆盖的已知SNP位点信息,能够将reads分配给其中一个等位基因。

1. 软件说明

SNPsplit软件只需要提供用来区分印记来源的SNP信息(通过双亲本VCF文件得到,详见下述实操部分——4.1、4.2),就可以针对生信常用软件(包括Bowtie2, TopHat, STAR, HISAT2, HiCUP 和Bismark)比对后的bam区分其亲本来源。

该软件的主要包括以下2部分模块:

1)SNPsplit_genome_preparation

重新建立比对N-masked基因组,其原理如下图,举例流程详见下述4.2部分。

2)SNPsplit

针对重新比对的bam文件,区分亲本来源reads。举例流程详见4.3、4.4。

step1

父母Y1和母本M1样本的vcf文件使用GATK3的CombineVariants合并。

/PUBLIC/software/public/System/jre1.7.0_25//bin/java -jar /PUBLIC/software/RNA/GATK/GenomeAnalysisTK.jar -T CombineVariants -R /TJPROJ13/GB_TR/reference_data/Animal/Homo_sapiens/Homo_sapiens_Ensemble_94/Sequence/WholeGenomeFasta/genome.fa -V /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/03.Result_X101SC24045650-Z01-J002_Homo_Sapiens/Result_X101SC24045650-Z01-J002_Homo_Sapiens/7.SNP/1.snpsite/Y1_SNP.vcf -V /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/03.Result_X101SC24045650-Z01-J002_Homo_Sapiens/Result_X101SC24045650-Z01-J002_Homo_Sapiens/7.SNP/1.snpsite/M1_SNP.vcf -o Y1_M1_gatk3.vcf

step2

SNPsplit_genome_preparation

/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit_genome_preparation --vcf_file Y1_M1_gatk3.vcf --strain Y1 --dual_hybrid --strain2 M1 --reference_genome /TJPROJ13/GB_TR/reference_data/Animal/Homo_sapiens/Homo_sapiens_Ensemble_94/Sequence/WholeGenomeFasta/ --genome_build GRCm38

step3 合并成基因组文件

cd /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/test/Y1_M1_dual_hybrid.based_on_GRCm38_N-masked

cat *fa > all_N-masked.GRCm38.N-masked.fa

step4 构建索引

/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/prepare_data/bin/hisat2-build all_N-masked.GRCm38.N-masked.fa all_N-masked.GRCm38.N-masked

step5 比对

/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2 -x /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/test/all_N-masked.GRCm38.N-masked -p 4 --dta -t --phred33 -1 /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/01.RawData/T1_1.fq.gz -2 /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/SNPsplit/OSS_DOWN/Back_up/X101SC24045650-Z01-J002/01.RawData/T1_2.fq.gz --un-conc-gz T1.unmap.fq.gz 2> T1_align.log | samtools sort -O BAM --threads 4 -o T1.bam -

step6 拆分来自父本和母本的read数据得到bam文件,后续供转录组和甲基化分析等位基因的特异性。

/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit --snp_file all_M1_SNPs_Y1_reference.based_on_GRCm38.txt --paired --no_sort -o test --singletons T1.bam

参考文献:

1.Krueger F, Andrews SR. SNPsplit: Allele-specific splitting of alignments between genomes with known SNP genotypes. F1000Res. 2016 Jun 23;5:1479. doi: 10.12688/f1000research.9037.1 http://europepmc.org/article/MED/27429743 2.https://github.com/FelixKrueger/SNPsplit 3.https://github.com/FelixKrueger/SNPsplit/blob/master/SNPsplit_User_Guide.md

snpsplit.txt · 最后更改: 2024/09/21 08:42 由 fengjie