===== RNA编辑 =====
脚本如下:
/TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py
python /TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py --input /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/bam --fa /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.fa --gtf /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.gtf --sample NC1,NC2,NC3,SI1,SI2,SI3 --outdir /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001
我们使用REDItools软件进行RNA editing的分析,其特点是可以不依赖DNA数据进行RNA editing分析,输出结果格式如下:
outTable_* 表头含义
Region:RNA editing位点所在染色体
Position:RNA editing位点坐标
Reference:参考序列在该位点的坐标
Strand:参考序列在该位点的链方向,0表示'+',1表示'-',2表示未知
Coverage-q25:该位点的碱基覆盖度(质量值>=25)
MeanQ:该位点上所有碱基的Qpred的均值
BaseCount[A,C,G,T]:依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开
AllSubs:RNA editing类型,'-'表示未发生RNA editing
Frequency:发生RNA editing的频率
Pvalue:使用Fisher检验计算出的p-value,当Pvalue<0.05时,我们认为该位点是统计学意义上的RNA editing位点
===如果是加入了DNA数据,结果会多出几列===
gCoverage-q25: DNA数据,该位点的碱基覆盖度(质量值>=25)
gMeanQ: DNA数据,该位点上所有碱基的Qpred的均值
gBaseCount[A,C,G,T]: DNA数据,依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开
gAllSubs: DNA数据,突变类型'-'表示未发生突变
gFrequency: DNA数据,发生突变的频率
对于REDItools的输出结果,我们进行后续分析,相关脚本路径如下:
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit
/BJPROJ/RNA_SH/script_dir/rna_edit
==RNA编辑类型鉴定及作图==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_type_dis.pl
==============================================================================
Description to get rna editing type distribution
writer: liuxunbiao@novogene.com
Options
-dir : dir of outtable
- outdir : pathway of outdir
-sample :sample name,split by ","
-h|?|help : Show this help
==============================================================================
==编辑位点信息汇总及各样品各位点编辑水平热图绘制==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/get_summary_information_new.pl
==============================================================================
Description to get summary edit sites information
writer: liuxunbiao@novogene.com
Options
-dir : dir of outtable
-outdir : pathway of outdir
-sample :sample name,split by ","
-h|?|help : Show this help
==============================================================================
==鉴定A->I位点==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_AtoI_site.pl
==============================================================================
Description to get A->G editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点
writer: liuxunbiao@novogene.com
Options
-dir : outtable
-outdir : pathway of outdir
-sample :sample name,split by ","
-n : min number of predicted editing sites were constitutively transcribed from all sample [3]
-S : yes or no split by strand? [yes]
-h|?|help : Show this help
==============================================================================
==查找生物学重复中相同编辑位点==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_same_site.pl
==============================================================================
Description to get same editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点
writer: liuxunbiao@novogene.com
Options
-dir : dir of outtable
-outdir : pathway of outdir
-sample :sample name,split by ","
-n : min number of predicted editing sites were constitutively transcribed from all sample [3]
-S : yes or no split by strand? [yes]
-h|?|help : Show this help
==============================================================================
==编辑位点在染色体上的分布==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_chr_dis.pl
==============================================================================
Description to get rna editing type distribution
writer :liuxuniao@novogene.com
Options
-dir : dir of outtable
-outdir : pathway of outdir
-bin :size of bin
-chr : split by ","
-sample :sample name,split by ","
-fa :sample name,split by ","
-h|?|help : Show this help
==============================================================================
==鉴定编辑位点是否为有义突变==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/non_synonymous.pl
==============================================================================
Description to get rna editing type distribution
writer: liuxunbiao@novogene.com
Options
-dir : dir of outtable
-outdir : pathway of outdir
-sample :sample name,split by ","
-gtf :should have CDS line
-fa :sample name,split by ","
-h|?|help : Show this help
==============================================================================
==鉴定RNA编辑簇==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/edit_box_detect.pl
==============================================================================
Description to get rna editing type distribution
writer: liuxunbiao@novogene.com
Options
-file : edit_site.xls
-n :min number in edit box
-len :min length of edit box
-dis : min distance of adjacent box
-outdir : pathway of outdir
-h|?|help : Show this help
==============================================================================
==编辑位点在基因功能域的分布==
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/genomic_region_dis.pl
==============================================================================
Description to get rna editing type distribution
writer: liuxunbiao@novogene.com
Options
-dir : dir of outtable
-outdir : pathway of outdir
-fa :fa file
-gtf :gtf file
-format :gtf file format:gtf or gff etc [gtf]
-sample :sample name,split by ","
-h|?|help : Show this help
==============================================================================
===参考文献===
Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences. BMC Genomics
Comprehensive analysis of RnA-seq data reveals extensive RnA editing in a human transcriptome. Nature Biotechnology
RNA Editome in Rhesus Macaque Shaped by Purifyin Selection. PLOS Genetics
result整理脚本:
python /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/RNAediting/get_result.py --help
usage: get_result.py [opthions]
generate result for RNAediting analysis
optional arguments:
-h, --help show this help message and exit
-S SAMPLES, --samples SAMPLES
samples, split by ,
-IN INPUT_DIR, --input_dir INPUT_DIR
samples, split by ,
-OUT OUTPUT_DIR, --output_dir OUTPUT_DIR
samples, split by ,
readme:
outTable RNA 编辑位点鉴定:
我们使用主流的 REDItools 软件进行 RNA 编辑位点的鉴定。
结果如下:
*outTable_result.xls
表头说明:
Region:RNA editing 位点所在染色体。
Position:RNA editing 位点坐标。
Reference:参考序列在该位点的坐标。
Strand:参考序列在该位点的链方向,0 表示'+',1 表示'-',2 表示未知 。
Coverage-q30:该位点的碱基覆盖度(质量值>=30) 。
MeanQ:该位点上所有碱基的 Qpred 的均值 。
BaseCount[A,C,G,T]:依次为 A,C,G,T 类型的碱基在该位点的 reads 数,用逗号隔开。
AllSubs:RNA editing 类型,'-'表示未发生 RNA editing 。
Frequency:发生 RNA editing 的频率。
Pvalue:发生RNA editing的Pvalue值
A_to_I A-I 编辑位点的鉴定:
一般认为在三个样品以上某个位点均发生了 A->I 的变化,认为该位点有 A->I 的 RNA编辑事件发生。我们将为客户提供所有 A->I 编辑的位点。
结果如下:
merge.edit_site.xls
表头说明:
第一列:染色体
第二列:编辑位点
第三列:参考基因组信息
第四列:正负链信息 0 表示'+',1 表示'-',2 表示未知 。
第五列:编辑类型,及出现的次数
editing_type_dis_hist RNA 编辑类型分布:
此图横坐标代表不同的编辑类型,纵坐标代表发生该编辑类型位点的个数,每个样品会提供三张图,分别是正链,负链,总的编辑类型分布。
结果如下:
*editing_type_dis_hist.png
*editing_type_dis_hist.pdf
*stand_summary_hist.png
*stand_summary_hist.pdf
non_synonymous 编辑位点引起的同义突变与非同义突变分析:
结果如下:
*info.txt
表头说明:
#chromosome 染色体
cordination 突变位点
raf>alt 参考基因组 》突变碱基
transcript_id 转录本 ID
codon_phase 密码子相位
codon_mutate 突变前后的密码子
aa_mutate 突变前后的蛋白
synonymous 同义突变
nonsynonymous 非同义突变
Genomic_regions_distribution RNA 编辑位点在基因不同功能域的分布:
结果如下:
*editing_in_different_region.png
*editing_in_different_region.pdf
edit_box编辑簇分析:
我们通过编辑簇分析可以查找到染色体编辑位点相对集中的区域。编辑簇鉴定方法:编辑簇内 RNA 编辑位点的最小数目为 5; 相邻编辑簇的最低距离为 50bp;编辑簇最小长度为20bp。
结果如下:
edit_box.xls
表头说明:
第一列:染色体。
第二列:编辑簇区间。
第三列:编辑簇内的编辑位点。