一、SNP_spilt(鉴定亲本的等位基因在子代中特异性表达情况) 1.简介 2016年,Babraham生物信息组织发布了一款专门用来区分亲本来源reads的软件SNPsplit,它通过SAM/BAM文件reads上覆盖的已知SNP位点信息, 能够将reads分配给其中一个等位基因。 SNPsplit软件只需要提供用来区分印记来源的SNP信息,就可以针对生信常用软件(包括Bowtie2, TopHat, STAR, HISAT2, HiCUP和Bismark)比对后的bam区分其亲本来源。 参考链接:https://felixkrueger.github.io/SNPsplit/ https://github.com/FelixKrueger/SNPsplit https://f1000research.com/articles/5-1479/v2?s3BucketUrl=https%3A%2F%2Ff1000research.s3.amazonaws.com>mKey=GTM-PCBS9JK&submissionUrl=%2Ffor-authors%2Fpublish-your-research&otid=1bc074d1-3db4-47ed-9f80-df1a4a3f2ab4&immUserUrl=https%3A%2F%2Ff1r-proxy.f1krdev.com%2Feditor%2Fmember%2Fshow%2F 2.其分析过程如下 2.1 vcf的合并以及参考基因组的构建 #生成索引等操作 ''samtools_path="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools" picard_jar="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/share/picard-2.23.3-0/picard.jar" "$samtools_path" faidx "$reference_genome_dir/reference_genome.fa" java -jar "$picard_jar" CreateSequenceDictionary \ R="$reference_genome_dir/reference_genome.fa" \ O="$reference_genome_dir/reference_genome.dict"'' #合并vcf ''gatk_path="/PUBLIC/software/RNA/GATK/GenomeAnalysisTK.jar" java -XX:ParallelGCThreads=4 -Xmx20g -jar "$gatk_path" \ -T CombineVariants \ -R "$reference_genome_dir/reference_genome.fa" \ -V "$parent_vcf_dir/${parent1}_SNP.vcf" \ -V "$parent_vcf_dir/${parent2}_SNP.vcf" \ -o "$result_dir/${parent1}_${parent2}_all.vcf"'' #生成N掩盖基因组 ''cd "$result_dir" snp_split_path="/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit_genome_preparation" $snp_split_path \ --vcf_file "$result_dir/${parent1}_${parent2}_all.vcf" \ --strain "$parent1" \ --dual_hybrid \ --strain2 "$parent2" \ --reference_genome "$reference_genome_dir" \ --genome_build Human # 这里需要根据实际情况更改'' #整理文件 ''dual_hybrid_dir="${parent1}_${parent2}_dual_hybrid.based_on_Human_N-masked" cd "$result_dir/$dual_hybrid_dir" cat *.fa > all_N-masked.Human.N-masked.fa'' # 构建索引新基因组 ''hisat2_build_path="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2-build" "$hisat2_build_path" \ "$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked.fa" \ "$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked"'' 2.2 比对排序 #比对 ''hisat2_path="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2" samtools_path="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools" cd "$bam_dir" "$hisat2_path" -x "$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked" \ -p 8 --dta -t --phred33 \ -1 "$fq_dir/$(basename "$fq1")" \ -2 "$fq_dir/$(basename "$fq2")" \ --no-softclip \ --un-conc-gz "${fq_name}.unmap.fq.gz" \ 2> "${fq_name}_align.log" | \ "$samtools_path" sort -O BAM --threads 8 -o "${fq_name}.bam" -'' #拆分BAM文件 ''snp_split_tool="/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit" "$snp_split_tool" \ --snp_file "$result_dir/${parent1}_${parent2}_all_SNPs_${parent1}_reference.based_on_Human.txt" \ -o "$sort_bam_dir" \ --paired --no_sort --singletons "${fq_name}.bam"'' 2.4 转录组的定量以及画图 ''export PYTHONPATH="" unset PERL5LIB export PATH="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/Quant/bin:$PATH" export PATH="/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/r_3.5.0/bin:$PATH" ln -s "$gtf_file" "$reference_genome_dir" ln -s "$gene_file" "$reference_genome_dir" cd "$quant_dir"'' # 生成feature_count ''featureCounts_path="/path/to/featureCounts" # 请替换为实际路径,例如 /usr/bin/featureCounts featureCounts -T 8 -F GTF -t exon -g gene_id -s 0 -Q 20 -C -B -p -a \ "$reference_genome_dir/genome.gtf" \ -o "$quant_dir/feature_count.xls" \ "$sort_bam_dir/${fq_name}.genome1.bam" \ "$sort_bam_dir/${fq_name}.genome2.bam"'' # 生成样本信息表 ''get_condition_info="/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_condition_info" "$get_condition_info" \ --s2g "${fq_name}.genome1,${fq_name}.genome2" \ --group "$parent1,$parent2" \ --outfile "$quant_dir/condition.xls"'' # 生成count ''get_gene_count="/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_count" "$get_gene_count" \ --sample "${fq_name}.genome1,${fq_name}.genome2" \ --quant featureCounts \ --count "$quant_dir/feature_count.xls" \ --gene "$reference_genome_dir/genome_gene.xls" \ --outfile "$quant_dir/gene_count.xls"'' # 生成FPKM ''get_gene_fpkm="/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_fpkm" "$get_gene_fpkm" \ --count "$quant_dir/gene_count.xls" \ --condition "$quant_dir/condition.xls" \ --outfile "$quant_dir/gene_fpkm.xls"'' # 绘图 ''Rscript_path="Rscript" # 请根据实际情况修改 Rscript_script="/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/shi_1.R" # 确保脚本路径正确 "$Rscript_path" "$Rscript_script" \ --inputfile "$quant_dir/gene_count.xls" \ --outfile "$quant_dir/retu.pdf"'' 参考脚本路径:/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/SNP_spilt_1.sh 脚本帮助信息: sh SNP_spilt_1.sh --help 用法: SNP_spilt_1.sh --genome_fa <基因组文件路径> --gtf --gene_file --vcf1 <父本的vcf文件路径> --vcf2 <母本的vcf文件路径> --parent1 <父本名称> --parent2 <母本名称> --fq1 --fq2 --fq_name --output_dir <输出文件夹路径> 脚本使用实例: ''sh SNP_spilt_1.sh --genome_fa /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/genome.fa\ --gtf /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome.gtf\ --gene_file /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome_gene.xls\ --vcf1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F1_SNP.vcf\ --vcf2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F2_SNP.vcf \ --parent1 F1 \ --parent2 F2 \ --fq1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_1.fq.gz \ --fq2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_2.fq.gz \ --fq_name T1 \ --output_dir /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/ceshi '' 输出结果: 1.根据双亲snp比对后生成的bam文件: T1.genome2.bam T1.genome2.bam 2.gene_count文件 gene_count.xls 3.特异性热图 retu.pdf retu.png