===== RNA编辑 ===== 脚本如下: /TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py python /TJPROJ6/RNA_SH/script_dir/REDItools/REDItools_predict.py --input /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/bam --fa /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.fa --gtf /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001/04.Ref/genome.gtf --sample NC1,NC2,NC3,SI1,SI2,SI3 --outdir /TJPROJ6/RNA_SH/shouhou/202303/X101SC22122814-Z01-J001 我们使用REDItools软件进行RNA editing的分析,其特点是可以不依赖DNA数据进行RNA editing分析,输出结果格式如下: outTable_* 表头含义 Region:RNA editing位点所在染色体 Position:RNA editing位点坐标 Reference:参考序列在该位点的坐标 Strand:参考序列在该位点的链方向,0表示'+',1表示'-',2表示未知 Coverage-q25:该位点的碱基覆盖度(质量值>=25) MeanQ:该位点上所有碱基的Qpred的均值 BaseCount[A,C,G,T]:依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开 AllSubs:RNA editing类型,'-'表示未发生RNA editing Frequency:发生RNA editing的频率 Pvalue:使用Fisher检验计算出的p-value,当Pvalue<0.05时,我们认为该位点是统计学意义上的RNA editing位点 ===如果是加入了DNA数据,结果会多出几列=== gCoverage-q25: DNA数据,该位点的碱基覆盖度(质量值>=25) gMeanQ: DNA数据,该位点上所有碱基的Qpred的均值 gBaseCount[A,C,G,T]: DNA数据,依次为A,C,G,T类型的碱基在该位点的reads数,用逗号隔开 gAllSubs: DNA数据,突变类型'-'表示未发生突变 gFrequency: DNA数据,发生突变的频率 对于REDItools的输出结果,我们进行后续分析,相关脚本路径如下: /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit /BJPROJ/RNA_SH/script_dir/rna_edit ==RNA编辑类型鉴定及作图== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_type_dis.pl ============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir : dir of outtable - outdir : pathway of outdir -sample :sample name,split by "," -h|?|help : Show this help ============================================================================== ==编辑位点信息汇总及各样品各位点编辑水平热图绘制== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/get_summary_information_new.pl ============================================================================== Description to get summary edit sites information writer: liuxunbiao@novogene.com Options -dir : dir of outtable -outdir : pathway of outdir -sample :sample name,split by "," -h|?|help : Show this help ============================================================================== ==鉴定A->I位点== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_AtoI_site.pl ============================================================================== Description to get A->G editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点 writer: liuxunbiao@novogene.com Options -dir : outtable -outdir : pathway of outdir -sample :sample name,split by "," -n : min number of predicted editing sites were constitutively transcribed from all sample [3] -S : yes or no split by strand? [yes] -h|?|help : Show this help ============================================================================== ==查找生物学重复中相同编辑位点== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/detect_same_site.pl ============================================================================== Description to get same editing site 若至少在n个样本中出现A->G,则认为该位点为A->G位点 writer: liuxunbiao@novogene.com Options -dir : dir of outtable -outdir : pathway of outdir -sample :sample name,split by "," -n : min number of predicted editing sites were constitutively transcribed from all sample [3] -S : yes or no split by strand? [yes] -h|?|help : Show this help ============================================================================== ==编辑位点在染色体上的分布== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/rna_edit_chr_dis.pl ============================================================================== Description to get rna editing type distribution writer :liuxuniao@novogene.com Options -dir : dir of outtable -outdir : pathway of outdir -bin :size of bin -chr : split by "," -sample :sample name,split by "," -fa :sample name,split by "," -h|?|help : Show this help ============================================================================== ==鉴定编辑位点是否为有义突变== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/non_synonymous.pl ============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir : dir of outtable -outdir : pathway of outdir -sample :sample name,split by "," -gtf :should have CDS line -fa :sample name,split by "," -h|?|help : Show this help ============================================================================== ==鉴定RNA编辑簇== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/edit_box_detect.pl ============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -file : edit_site.xls -n :min number in edit box -len :min length of edit box -dis : min distance of adjacent box -outdir : pathway of outdir -h|?|help : Show this help ============================================================================== ==编辑位点在基因功能域的分布== /TJPROJ1/RNA/shouhou/script_dir/ref/rna_edit/genomic_region_dis.pl ============================================================================== Description to get rna editing type distribution writer: liuxunbiao@novogene.com Options -dir : dir of outtable -outdir : pathway of outdir -fa :fa file -gtf :gtf file -format :gtf file format:gtf or gff etc [gtf] -sample :sample name,split by "," -h|?|help : Show this help ============================================================================== ===参考文献=== Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences. BMC Genomics Comprehensive analysis of RnA-seq data reveals extensive RnA editing in a human transcriptome. Nature Biotechnology RNA Editome in Rhesus Macaque Shaped by Purifyin Selection. PLOS Genetics result整理脚本: python /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/RNAediting/get_result.py --help usage: get_result.py [opthions] generate result for RNAediting analysis optional arguments: -h, --help show this help message and exit -S SAMPLES, --samples SAMPLES samples, split by , -IN INPUT_DIR, --input_dir INPUT_DIR samples, split by , -OUT OUTPUT_DIR, --output_dir OUTPUT_DIR samples, split by , readme: outTable RNA 编辑位点鉴定: 我们使用主流的 REDItools 软件进行 RNA 编辑位点的鉴定。 结果如下: *outTable_result.xls 表头说明: Region:RNA editing 位点所在染色体。 Position:RNA editing 位点坐标。 Reference:参考序列在该位点的坐标。 Strand:参考序列在该位点的链方向,0 表示'+',1 表示'-',2 表示未知 。 Coverage-q30:该位点的碱基覆盖度(质量值>=30) 。 MeanQ:该位点上所有碱基的 Qpred 的均值 。 BaseCount[A,C,G,T]:依次为 A,C,G,T 类型的碱基在该位点的 reads 数,用逗号隔开。 AllSubs:RNA editing 类型,'-'表示未发生 RNA editing 。 Frequency:发生 RNA editing 的频率。 Pvalue:发生RNA editing的Pvalue值 A_to_I A-I 编辑位点的鉴定: 一般认为在三个样品以上某个位点均发生了 A->I 的变化,认为该位点有 A->I 的 RNA编辑事件发生。我们将为客户提供所有 A->I 编辑的位点。 结果如下: merge.edit_site.xls 表头说明: 第一列:染色体 第二列:编辑位点 第三列:参考基因组信息 第四列:正负链信息 0 表示'+',1 表示'-',2 表示未知 。 第五列:编辑类型,及出现的次数 editing_type_dis_hist RNA 编辑类型分布: 此图横坐标代表不同的编辑类型,纵坐标代表发生该编辑类型位点的个数,每个样品会提供三张图,分别是正链,负链,总的编辑类型分布。 结果如下: *editing_type_dis_hist.png *editing_type_dis_hist.pdf *stand_summary_hist.png *stand_summary_hist.pdf non_synonymous 编辑位点引起的同义突变与非同义突变分析: 结果如下: *info.txt 表头说明: #chromosome 染色体 cordination 突变位点 raf>alt 参考基因组 》突变碱基 transcript_id 转录本 ID codon_phase 密码子相位 codon_mutate 突变前后的密码子 aa_mutate 突变前后的蛋白 synonymous 同义突变 nonsynonymous 非同义突变 Genomic_regions_distribution RNA 编辑位点在基因不同功能域的分布: 结果如下: *editing_in_different_region.png *editing_in_different_region.pdf edit_box编辑簇分析: 我们通过编辑簇分析可以查找到染色体编辑位点相对集中的区域。编辑簇鉴定方法:编辑簇内 RNA 编辑位点的最小数目为 5; 相邻编辑簇的最低距离为 50bp;编辑簇最小长度为20bp。 结果如下: edit_box.xls 表头说明: 第一列:染色体。 第二列:编辑簇区间。 第三列:编辑簇内的编辑位点。