脚本如下: /TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py
python /TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py --input /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/bam --fa /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.fa --gtf /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.gtf --sample NC1,NC2,NC3,SI1,SI2,SI3 --outdir /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001
我们使用REDItools软件进行RNA editing的分析,其特点是可以不依赖DNA数据进行RNA editing分析,输出结果格式如下:
outTable_* 表头含义
Region:RNA editing位点所在染色体
Position:RNA editing位点坐标
Reference:参考序列在该位点的坐标
Strand:参考序列在该位点的链方向,0表示'+',1表示'-',2表示未知
Coverage-q25:该位点的碱基覆盖度(质量值>=25)
MeanQ:该位点上所有碱基的Qpred的均值
BaseCount[A,C,G,T]:依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开
AllSubs:RNA editing类型,'-'表示未发生RNA editing
Frequency:发生RNA editing的频率
Pvalue:使用Fisher检验计算出的p-value,当Pvalue<0.05时,我们认为该位点是统计学意义上的RNA editing位点
gCoverage-q25: DNA数据,该位点的碱基覆盖度(质量值>=25)
gMeanQ: DNA数据,该位点上所有碱基的Qpred的均值
gBaseCount[A,C,G,T]: DNA数据,依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开
gAllSubs: DNA数据,突变类型'-'表示未发生突变
gFrequency: DNA数据,发生突变的频率
对于REDItools的输出结果,我们进行后续分析,相关脚本路径如下:
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit /BJPROJ/RNA_SH/script_dir/rna_edit
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_type_dis.pl
============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir <s>: dir of outtable - outdir <s>: pathway of outdir -sample <s>:sample name,split by "," -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/get_summary_information_new.pl
============================================================================== Description to get summary edit sites information writer: liuxunbiao@novogene.com Options -dir <s>: dir of outtable -outdir <s>: pathway of outdir -sample <s>:sample name,split by "," -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_AtoI_site.pl
============================================================================== Description to get A->G editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点 writer: liuxunbiao@novogene.com Options -dir <s>: outtable -outdir <s>: pathway of outdir -sample <s>:sample name,split by "," -n <s>: min number of predicted editing sites were constitutively transcribed from all sample [3] -S <s>: yes or no split by strand? [yes] -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_same_site.pl
============================================================================== Description to get same editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点 writer: liuxunbiao@novogene.com Options -dir <s>: dir of outtable -outdir <s>: pathway of outdir -sample <s>:sample name,split by "," -n <s>: min number of predicted editing sites were constitutively transcribed from all sample [3] -S <s>: yes or no split by strand? [yes] -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_chr_dis.pl
============================================================================== Description to get rna editing type distribution writer :liuxuniao@novogene.com Options -dir <s>: dir of outtable -outdir <s>: pathway of outdir -bin <s>:size of bin -chr <s>: split by "," -sample <s>:sample name,split by "," -fa <s>:sample name,split by "," -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/non_synonymous.pl
============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir <s>: dir of outtable -outdir <s>: pathway of outdir -sample <s>:sample name,split by "," -gtf <s>:should have CDS line -fa <s>:sample name,split by "," -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/edit_box_detect.pl
============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -file <s>: edit_site.xls -n <s>:min number in edit box -len <s>:min length of edit box -dis <s>: min distance of adjacent box -outdir <s>: pathway of outdir -h|?|help : Show this help ==============================================================================
/TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/genomic_region_dis.pl
============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir <s>: dir of outtable -outdir <s>: pathway of outdir -fa <s>:fa file -gtf <s>:gtf file -format <s>:gtf file format:gtf or gff etc [gtf] -sample <s>:sample name,split by "," -h|?|help : Show this help ==============================================================================
Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences. BMC Genomics
Comprehensive analysis of RnA-seq data reveals extensive RnA editing in a human transcriptome. Nature Biotechnology
RNA Editome in Rhesus Macaque Shaped by Purifyin Selection. PLOS Genetics
result整理脚本:
python /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/RNAediting/get_result.py --help usage: get_result.py [opthions] <value> generate result for RNAediting analysis optional arguments: -h, --help show this help message and exit -S SAMPLES, --samples SAMPLES samples, split by , -IN INPUT_DIR, --input_dir INPUT_DIR samples, split by , -OUT OUTPUT_DIR, --output_dir OUTPUT_DIR samples, split by ,
readme:
outTable RNA 编辑位点鉴定: 我们使用主流的 REDItools 软件进行 RNA 编辑位点的鉴定。 结果如下: *outTable_result.xls 表头说明: Region:RNA editing 位点所在染色体。 Position:RNA editing 位点坐标。 Reference:参考序列在该位点的坐标。 Strand:参考序列在该位点的链方向,0 表示'+',1 表示'-',2 表示未知 。 Coverage-q30:该位点的碱基覆盖度(质量值>=30) 。 MeanQ:该位点上所有碱基的 Qpred 的均值 。 BaseCount[A,C,G,T]:依次为 A,C,G,T 类型的碱基在该位点的 reads 数,用逗号隔开。 AllSubs:RNA editing 类型,'-'表示未发生 RNA editing 。 Frequency:发生 RNA editing 的频率。 Pvalue:发生RNA editing的Pvalue值 A_to_I A-I 编辑位点的鉴定: 一般认为在三个样品以上某个位点均发生了 A->I 的变化,认为该位点有 A->I 的 RNA编辑事件发生。我们将为客户提供所有 A->I 编辑的位点。 结果如下: merge.edit_site.xls 表头说明: 第一列:染色体 第二列:编辑位点 第三列:参考基因组信息 第四列:正负链信息 0 表示'+',1 表示'-',2 表示未知 。 第五列:编辑类型,及出现的次数 editing_type_dis_hist RNA 编辑类型分布: 此图横坐标代表不同的编辑类型,纵坐标代表发生该编辑类型位点的个数,每个样品会提供三张图,分别是正链,负链,总的编辑类型分布。 结果如下: *editing_type_dis_hist.png *editing_type_dis_hist.pdf *stand_summary_hist.png *stand_summary_hist.pdf non_synonymous 编辑位点引起的同义突变与非同义突变分析: 结果如下: *info.txt 表头说明: #chromosome 染色体 cordination 突变位点 raf>alt 参考基因组 》突变碱基 transcript_id 转录本 ID codon_phase 密码子相位 codon_mutate 突变前后的密码子 aa_mutate 突变前后的蛋白 synonymous 同义突变 nonsynonymous 非同义突变 Genomic_regions_distribution RNA 编辑位点在基因不同功能域的分布: 结果如下: *editing_in_different_region.png *editing_in_different_region.pdf edit_box编辑簇分析: 我们通过编辑簇分析可以查找到染色体编辑位点相对集中的区域。编辑簇鉴定方法:编辑簇内 RNA 编辑位点的最小数目为 5; 相邻编辑簇的最低距离为 50bp;编辑簇最小长度为20bp。 结果如下: edit_box.xls 表头说明: 第一列:染色体。 第二列:编辑簇区间。 第三列:编辑簇内的编辑位点。