一、SNP_spilt(鉴定亲本的等位基因在子代中特异性表达情况)
1.简介
2016年,Babraham生物信息组织发布了一款专门用来区分亲本来源reads的软件SNPsplit,它通过SAM/BAM文件reads上覆盖的已知SNP位点信息, 能够将reads分配给其中一个等位基因。 SNPsplit软件只需要提供用来区分印记来源的SNP信息,就可以针对生信常用软件(包括Bowtie2, TopHat, STAR, HISAT2, HiCUP和Bismark)比对后的bam区分其亲本来源。 参考链接:https://felixkrueger.github.io/SNPsplit/ https://github.com/FelixKrueger/SNPsplit https://f1000research.com/articles/5-1479/v2?s3BucketUrl=https%3A%2F%2Ff1000research.s3.amazonaws.com>mKey=GTM-PCBS9JK&submissionUrl=%2Ffor-authors%2Fpublish-your-research&otid=1bc074d1-3db4-47ed-9f80-df1a4a3f2ab4&immUserUrl=https%3A%2F%2Ff1r-proxy.f1krdev.com%2Feditor%2Fmember%2Fshow%2F
2.其分析过程如下
2.1 vcf的合并以及参考基因组的构建
#生成索引等操作
samtools_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools”
picard_jar=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/share/picard-2.23.3-0/picard.jar”
“$samtools_path” faidx “$reference_genome_dir/reference_genome.fa”
java -jar “$picard_jar” CreateSequenceDictionary \
R=“$reference_genome_dir/reference_genome.fa” \
O=“$reference_genome_dir/reference_genome.dict”
#合并vcf
gatk_path=“/PUBLIC/software/RNA/GATK/GenomeAnalysisTK.jar”
java -XX:ParallelGCThreads=4 -Xmx20g -jar “$gatk_path” \
-T CombineVariants \
-R “$reference_genome_dir/reference_genome.fa” \
-V “$parent_vcf_dir/${parent1}_SNP.vcf” \
-V “$parent_vcf_dir/${parent2}_SNP.vcf” \
-o “$result_dir/${parent1}_${parent2}_all.vcf”
#生成N掩盖基因组
cd “$result_dir”
snp_split_path=“/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit_genome_preparation”
$snp_split_path \
–vcf_file “$result_dir/${parent1}_${parent2}_all.vcf” \
–strain “$parent1” \
–dual_hybrid \
–strain2 “$parent2” \
–reference_genome “$reference_genome_dir” \
–genome_build Human # 这里需要根据实际情况更改
#整理文件
dual_hybrid_dir=“${parent1}_${parent2}_dual_hybrid.based_on_Human_N-masked”
cd “$result_dir/$dual_hybrid_dir”
cat *.fa > all_N-masked.Human.N-masked.fa
# 构建索引新基因组
hisat2_build_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2-build”
“$hisat2_build_path” \
“$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked.fa” \
“$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked”
2.2 比对排序
#比对
hisat2_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/hisat2”
samtools_path=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/QC/bin/samtools”
cd “$bam_dir”
“$hisat2_path” -x “$result_dir/$dual_hybrid_dir/all_N-masked.Human.N-masked” \
-p 8 –dta -t –phred33 \
-1 “$fq_dir/$(basename “$fq1”)” \
-2 “$fq_dir/$(basename “$fq2”)” \
–no-softclip \
–un-conc-gz “${fq_name}.unmap.fq.gz” \
2> “${fq_name}_align.log” | \
“$samtools_path” sort -O BAM –threads 8 -o “${fq_name}.bam” -
#拆分BAM文件
snp_split_tool=“/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/SNPsplit/bin/SNPsplit”
“$snp_split_tool” \
–snp_file “$result_dir/${parent1}_${parent2}_all_SNPs_${parent1}_reference.based_on_Human.txt” \
-o “$sort_bam_dir” \
–paired –no_sort –singletons “${fq_name}.bam”
2.4 转录组的定量以及画图
export PYTHONPATH=“”
unset PERL5LIB
export PATH=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/Quant/bin:$PATH”
export PATH=“/TJPROJ2/GB/PUBLIC/software/GB_TR/mRNA/miniconda3/envs/r_3.5.0/bin:$PATH”
ln -s “$gtf_file” “$reference_genome_dir”
ln -s “$gene_file” “$reference_genome_dir”
cd “$quant_dir”
# 生成feature_count
featureCounts_path=“/path/to/featureCounts” # 请替换为实际路径,例如 /usr/bin/featureCounts
featureCounts -T 8 -F GTF -t exon -g gene_id -s 0 -Q 20 -C -B -p -a \
“$reference_genome_dir/genome.gtf” \
-o “$quant_dir/feature_count.xls” \
“$sort_bam_dir/${fq_name}.genome1.bam” \
“$sort_bam_dir/${fq_name}.genome2.bam”
# 生成样本信息表
get_condition_info=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_condition_info”
“$get_condition_info” \
–s2g “${fq_name}.genome1,${fq_name}.genome2” \
–group “$parent1,$parent2” \
–outfile “$quant_dir/condition.xls”
# 生成count
get_gene_count=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_count”
“$get_gene_count” \
–sample “${fq_name}.genome1,${fq_name}.genome2” \
–quant featureCounts \
–count “$quant_dir/feature_count.xls” \
–gene “$reference_genome_dir/genome_gene.xls” \
–outfile “$quant_dir/gene_count.xls”
# 生成FPKM
get_gene_fpkm=“/TJPROJ7/GB_TR/PUBLIC/source/ncRNA/gb_tr_man_pipline/bin/get_gene_fpkm”
“$get_gene_fpkm” \
–count “$quant_dir/gene_count.xls” \
–condition “$quant_dir/condition.xls” \
–outfile “$quant_dir/gene_fpkm.xls”
# 绘图
Rscript_path=“Rscript” # 请根据实际情况修改
Rscript_script=“/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/shi_1.R” # 确保脚本路径正确
“$Rscript_path” “$Rscript_script” \
–inputfile “$quant_dir/gene_count.xls” \
–outfile “$quant_dir/retu.pdf”
参考脚本路径:/TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/SNP_spilt_1.sh
脚本帮助信息:
sh SNP_spilt_1.sh –help 用法: SNP_spilt_1.sh –genome_fa <基因组文件路径> –gtf <gtf文件路径> –gene_file <gene文件路径> –vcf1 <父本的vcf文件路径> –vcf2 <母本的vcf文件路径> –parent1 <父本名称> –parent2 <母本名称> –fq1 <fq文件1路径> –fq2 <fq文件2路径>
脚本使用实例:
sh SNP_spilt_1.sh –genome_fa /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/genome.fa\
–gtf /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome.gtf\
–gene_file /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/reference_genome/genome_gene.xls\
–vcf1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F1_SNP.vcf\
–vcf2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/pepline_vcf/F2_SNP.vcf \
–parent1 F1 \
–parent2 F2 \
–fq1 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_1.fq.gz \
–fq2 /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/data/fq.gz/T1/T1_2.fq.gz \
–fq_name T1 \
–output_dir /TJPROJ6/RNA_SH/personal_dir/zhengliuchang/yanfa2/ceshi
输出结果: 1.根据双亲snp比对后生成的bam文件:
T1.genome2.bam T1.genome2.bam
2.gene_count文件 gene_count.xls
3.特异性热图 retu.pdf retu.png