用户工具

站点工具


snpspilt

一、SNP_spilt(鉴定亲本的等位基因在子代中特异性表达情况)

1.简介

2016年,Babraham生物信息组织发布了一款专门用来区分亲本来源reads的软件SNPsplit,它通过SAM/BAM文件reads上覆盖的已知SNP位点信息, 能够将reads分配给其中一个等位基因。 SNPsplit软件只需要提供用来区分印记来源的SNP信息,就可以针对生信常用软件(包括Bowtie2, TopHat, STAR, HISAT2, HiCUP和Bismark)比对后的bam区分其亲本来源。 参考链接:https://felixkrueger.github.io/SNPsplit/ https://github.com/FelixKrueger/SNPsplit https://f1000research.com/articles/5-1479/v2?s3BucketUrl=https%3A%2F%2Ff1000research.s3.amazonaws.com&gtmKey=GTM-PCBS9JK&submissionUrl=%2Ffor-authors%2Fpublish-your-research&otid=1bc074d1-3db4-47ed-9f80-df1a4a3f2ab4&immUserUrl=https%3A%2F%2Ff1r-proxy.f1krdev.com%2Feditor%2Fmember%2Fshow%2F

2.其分析过程如下

2.1 vcf的合并以及参考基因组的构建

#生成索引等操作

samtools_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools” picard_jar=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/share/picard-2.23.3-0/picard.jar” “$samtools_path” faidx “$reference_genome_dir/reference_genome.fa” java -jar “$picard_jar” CreateSequenceDictionary \ R=“$reference_genome_dir/reference_genome.fa” \ O=“$reference_genome_dir/reference_genome.dict”

#合并vcf

gatk_path=“/PUBLIC/software/RNA/GATK/GenomeAnalysisTK.jar” java -XX:ParallelGCThreads=4 -Xmx20g -jar “$gatk_path” \ -T CombineVariants \ -R “$reference_genome_dir/reference_genome.fa” \ -V “$parent_vcf_dir/${parent1}_SNP.vcf” \ -V “$parent_vcf_dir/${parent2}_SNP.vcf” \ -o “$result_dir/${parent1}_${parent2}_all.vcf”

#生成N掩盖基因组 cd “$result_dir” snp_split_path=“/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit_genome_preparation” $snp_split_path \ –vcf_file “$result_dir/${parent1}_${parent2}_all.vcf” \ –strain “$parent1” \ –dual_hybrid \ –strain2 “$parent2” \ –reference_genome “$reference_genome_dir” \ –genome_build Human # 这里需要根据实际情况更改 #整理文件 dual_hybrid_dir=“${parent1}_${parent2}_dual_hybrid.based_on_Human_N-masked” cd “$result_dir/$dual_hybrid_dir” cat *.fa > all_N-masked.Human.N-masked.fa

# 构建索引新基因组 hisat2_build_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2-build” “$hisat2_build_path” \ “$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked.fa” \ “$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked”

2.2 比对排序

#比对

hisat2_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2” samtools_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools” cd “$bam_dir” “$hisat2_path” -x “$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked” \ -p 8 –dta -t –phred33 \ -1 “$fq_dir/$(basename “$fq1”)” \ -2 “$fq_dir/$(basename “$fq2”)” \ –no-softclip \ –un-conc-gz “${fq_name}.unmap.fq.gz” \ 2> “${fq_name}_align.log” | \ “$samtools_path” sort -O BAM –threads 8 -o “${fq_name}.bam” -

#拆分BAM文件 snp_split_tool=“/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit” “$snp_split_tool” \ –snp_file “$result_dir/${parent1}_${parent2}_all_SNPs_${parent1}_reference.based_on_Human.txt” \ -o “$sort_bam_dir” \ –paired –no_sort –singletons “${fq_name}.bam”

2.4 转录组的定量以及画图

export PYTHONPATH=“” unset PERL5LIB export PATH=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/Quant/bin:$PATH” export PATH=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/r_3.5.0/bin:$PATH” ln -s “$gtf_file” “$reference_genome_dir” ln -s “$gene_file” “$reference_genome_dir” cd “$quant_dir”

# 生成feature_count

featureCounts_path=“/path/to/featureCounts” # 请替换为实际路径,例如 /usr/bin/featureCounts featureCounts -T 8 -F GTF -t exon -g gene_id -s 0 -Q 20 -C -B -p -a \ “$reference_genome_dir/genome.gtf” \ -o “$quant_dir/feature_count.xls” \ “$sort_bam_dir/${fq_name}.genome1.bam” \ “$sort_bam_dir/${fq_name}.genome2.bam” # 生成样本信息表 get_condition_info=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_condition_info” “$get_condition_info” \ –s2g “${fq_name}.genome1,${fq_name}.genome2” \ –group “$parent1,$parent2” \ –outfile “$quant_dir/condition.xls”

# 生成count

get_gene_count=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_count” “$get_gene_count” \ –sample “${fq_name}.genome1,${fq_name}.genome2” \ –quant featureCounts \ –count “$quant_dir/feature_count.xls” \ –gene “$reference_genome_dir/genome_gene.xls” \ –outfile “$quant_dir/gene_count.xls” # 生成FPKM get_gene_fpkm=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_fpkm” “$get_gene_fpkm” \ –count “$quant_dir/gene_count.xls” \ –condition “$quant_dir/condition.xls” \ –outfile “$quant_dir/gene_fpkm.xls”

# 绘图

Rscript_path=“Rscript” # 请根据实际情况修改 Rscript_script=“/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/shi_1.R” # 确保脚本路径正确 “$Rscript_path” “$Rscript_script” \ –inputfile “$quant_dir/gene_count.xls” \ –outfile “$quant_dir/retu.pdf”

参考脚本路径:/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/SNP_spilt_1.sh

脚本帮助信息:

sh SNP_spilt_1.sh –help 用法: SNP_spilt_1.sh –genome_fa <基因组文件路径> –gtf <gtf文件路径> –gene_file <gene文件路径> –vcf1 <父本的vcf文件路径> –vcf2 <母本的vcf文件路径> –parent1 <父本名称> –parent2 <母本名称> –fq1 <fq文件1路径> –fq2 <fq文件2路径>

  1. -fq_name <fq文件名称> –output_dir <输出文件夹路径>

脚本使用实例:

sh SNP_spilt_1.sh –genome_fa /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/genome.fa\ –gtf /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome.gtf\ –gene_file /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome_gene.xls\ –vcf1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F1_SNP.vcf\ –vcf2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F2_SNP.vcf \ –parent1 F1 \ –parent2 F2 \ –fq1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_1.fq.gz \ –fq2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_2.fq.gz \ –fq_name T1 \ –output_dir /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/ceshi

输出结果: 1.根据双亲snp比对后生成的bam文件:

T1.genome2.bam T1.genome2.bam

2.gene_count文件 gene_count.xls

3.特异性热图 retu.pdf retu.png

snpspilt.txt · 最后更改: 2024/10/12 14:43 由 zhengliuchang