=====可变剪切asprofile分析===== ====背景==== 项目可变剪切软件是RMARTs,分析的是差异可变剪切,asprofile可以分析单个样品的可变剪切事件,直接将同一个基因的多个转录本进行比较,从而可以鉴定可变剪切事件。\\ ASprofile软件对Cufflinks预测出的基因模型按照单个实验组中所有重复样品的数据和所有实验样品的数据进行分组,并对每组的可变剪切事件分别进行分类和表达量统计。\\ asprofile是以前项目流程用的软件,对原脚本进行串写整理。\\ ====脚本用法==== /TJPROJ6/RNA_SH/personal_dir/zhangxin/jiaoben/AS/asprofile.py \ -R /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Brachypodium_distachyum_L/Brachypodium_distachyum_L_ensemble_53/Brachypodium_distachyum_L_ensemble_53.fa \ -G /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Brachypodium_distachyum_L/Brachypodium_distachyum_L_ensemble_53/Brachypodium_distachyum_L_ensemble_53.gtf \ -goann /TJPROJ6/GB_TR/reference_data/new_pip/Plant/Brachypodium_distachyum_L/Brachypodium_distachyum_L_ensemble_53/Brachypodium_distachyum_L_ensemble_53_go.txt \ -i /TJPROJ6/NC_BG_SH/shouhou/202302/X101SC22061958-Z01-F001/QC/bam \ -o /TJPROJ6/NC_BG_SH/shouhou/202302/X101SC22061958-Z01-F002_ASprofile/outdir/ \ -s CK12h_1,CK12h_2,CK12h_3,CK24h_1,CK24h_2,CK24h_4,T58012h_1,T58012h_2,T58012h_3,T58024h_1,T58024h_2,T58024h_3,T58412h_1,T58412h_2,T58412h_3,T58424h_1,T58424h_2,T58424h_3 \ -group CK12h_1:CK12h_2:CK12h_3,T58012h_1:T58012h_2:T58012h_3,T58412h_1:T58412h_2:T58412h_3,CK24h_1:CK24h_2:CK24h_4,T58024h_1:T58024h_2:T58024h_3,T58424h_1:T58424h_2:T58424h_3 \ -groupname CK12h,T58012h,T58412h,CK24h,T58024h,T58424h \ -lib fr-unstranded \ ====结果说明==== 1、可预测12类可变剪切事件,定义如下:\\ (1) TSS: Alternative 5' first exon (transcription start site) 第一个外显子可变剪切 \\ (2) TTS: Alternative 3' last exon (transcription terminal site) 最后一个外显子可变剪切 \\ (3) SKIP: Skipped exon (SKIP_ON,SKIP_OFF pair) 单外显子跳跃 \\ (4) XSKIP: Approximate SKIP (XSKIP_ON,XSKIP_OFF pair) 单外显子跳跃(模糊边界)\\ (5) MSKIP: Multi-exon SKIP (MSKIP_ON,MSKIP_OFF pair) 多外显子跳跃 \\ (6) XMSKIP: Approximate MSKIP (XMSKIP_ON,XMSKIP_OFF pair) 多外显子跳跃(模糊边界)\\ (7) IR: Intron retention (IR_ON, IR_OFF pair) 单内含子滞留 \\ (8) XIR: Approximate IR (XIR_ON, XIR_OFF pair) 单内含子滞留(模糊边界)\\ (9) MIR: Multi-IR (MIR_ON, MIR_OFF pair) 多内含子滞留 \\ (10) XMIR: Approximate MIR (XMIR_ON, XMIR_OFF pair) 多内含子滞留(模糊边界)\\ (11) AE: Alternative exon ends (5', 3', or both) 可变 5'或3'端剪切 \\ (12) XAE: Approximate AE 可变 5'或3'端剪切(模糊边界)\\ 2、可变剪切种类以及数量统计图\\ {{:个性化条目:as.png?400|}}\\ 3、 *.fpkm.xls为样本可变剪切结构及表达量统计表\\ (1) event_id: AS事件编号 \\ (2) event_type: AS事件类型 (TSS, TTS, SKIP_{ON,OFF}, XSKIP_{ON,OFF}, MSKIP_{ON,OFF}, XMSKIP_{ON,OFF}, IR_{ON ,OFF}, XIR_{ON,OFF}, AE, XAE)\\ (3) gene_id: cufflink组装结果中的基因编号\\ (4) chrom: 染色体编号\\ (5) event_start: AS事件起始位置\\ (6) event_end: AS事件结束位置\\ (7) event_pattern: AS事件特征 (for TSS, TTS - inside boundary of alternative marginal exon; for *SKIP_ON,the coordinates of the skipped exon(s); for *SKIP_OFF, the coordinates of the enclosing introns; for *IR_ON, the end coordinates of the long, intron-containing exon; for *IR_OFF, the listing of coordinates of all the exons along the path containing the retained intron; for *AE, the coordinates of the exon variant)\\ (8) strand: 基因正负链信息\\ (9) fpkm: 此AS类型该基因表达量\\ (10) ref_id 参考基因组id \\ ====脚本==== #!/usr/bin/env python #coding:utf-8 import os import sys import argparse parser = argparse.ArgumentParser(description = 'asprofile') parser.add_argument('-R', help = 'fa', required = True) parser.add_argument('-G', help = 'gtf', required = True) #parser.add_argument('--type', help = 'seq file', required = True) parser.add_argument('-goann',help="go,not need,can be null",default='100') parser.add_argument('-i',help='bam',required = True) parser.add_argument('-o',help='outdir',required = True) parser.add_argument('-s', help = 'sample', required = True) parser.add_argument('-group', help = 'group') parser.add_argument('-groupname', help = 'groupname') parser.add_argument('-lib',help = 'libtype') argv = vars(parser.parse_args()) R = argv['R'].strip() G = argv['G'].strip() goann=argv['goann'].strip() i=argv['i'].strip() o=argv['o'].strip() s=argv['s'].strip() if argv['lib'] == None: lib='fr-unstranded' else: lib=argv['lib'].strip() group=argv['group'].strip() groupname=argv['groupname'].strip() os.system("mkdir -p CAN_TR/CAN") oo = o+"/CAN_TR/CAN" #print ('python /PUBLIC/source/RNA/RefRNA/version4/TR_Modules/runCAN_sjm_nosp.py -R %s -G %s -goann %s -i %s -o %s -s %s -lib %s -group %s -groupname %s '%(R,G,goann,i,o,s,lib,group,groupname)) os.system('python %s/bin/runCAN_sjm_nosp.py -R %s -G %s -goann %s -i %s -o %s -s %s -lib %s -group %s -groupname %s '%(sys.path[0],R,G,goann,i,oo,s,lib,group,groupname)) os.system('mkdir -p %s/log'%(o)) os.system('mkdir -p %s/results'%(o)) cmd='cp %s/asprofile_readme.txt %s/results\n'%(sys.path[0],o) cmd+='cp %s/CAN_TR/CAN/ASprofile/AS.* %s/results\n'%(o,o) cmd+='cp %s/CAN_TR/CAN/ASprofile/*/*fpkm.xls %s/results\n'%(o,o) log_out_dir = o+"/"+"log/" log=o+"/"+"CAN_TR/" open(log_out_dir+"result.sh","w").write(cmd) f_sjm=open(log_out_dir +"analysis.JOB", "w") for i in s.split(","): f_sjm.write("job_begin\n") f_sjm.write(" name " + "cufflink_" + str(i) + "\n") f_sjm.write(" sched_options -cwd -V -l vf=6g,p=6\n") f_sjm.write(" cmd sh " +log + "runCufflinks_"+str(i) + ".sh\n") f_sjm.write("job_end\n") f_sjm.write("\n") f_sjm.write("job_begin\n") f_sjm.write(" name runCuffmerge_Cuffcompare\n") f_sjm.write(" sched_options -cwd -V -l vf=6g,p=6\n") f_sjm.write(" cmd sh " + log+"runCuffmerge_Cuffcompare.sh\n") f_sjm.write("job_end\n") f_sjm.write("\n") f_sjm.write("job_begin\n") f_sjm.write(" name result\n") f_sjm.write(" cmd sh " + log_out_dir+"result.sh\n") f_sjm.write("job_end\n") for i in s.split(","): f_sjm.write("order runCuffmerge_Cuffcompare after " + "cufflink_" + str(i)+"\n") f_sjm.write("order result after " + "runCuffmerge_Cuffcompare" "\n") f_sjm.write('\nlog_dir %s'%(log_out_dir)) f_sjm.close() open(o+'/sjm_Analysis.sh','w').write('/PUBLIC/software/public/System/sjm-1.2.0/bin/sjm %s \n' %(log_out_dir +'analysis.JOB')) =====可变剪切asprofile分析,出现core文件===== 该软件针对ensemble数据,有参分析时,可能会染色体或者gene_id中会出现特殊字符,分析会出core文件。\\ 可以使用:/TJPROJ6/RNA_SH/script_dir/AS/asprofile_genome_prepare.py 整理参考基因组,重新比对生成bam文件进行as分析;\\ 分析结果使用:/TJPROJ6/RNA_SH/script_dir/AS/asprofile_result.py 整理,将改变的id重新对应到原来的参考基因组中。