=====可选择性多聚腺苷酸(APA)分析流程===== ====背景介绍==== 可选择性多聚腺苷酸化 (alternative polyadenylation, APA) 在基因表达调控中起着重要的作用。 位于最后一个外显子的 APA 可以产生具有不同 3'UTR 长度的 mRNA 异构体,在细胞中可影响信使 RNA (messagerRNA, mRNA) 的稳定性、 翻译效率、 转运及亚细胞定位等。有研究发现APA 与一系列关键生物现象的关系, 包括免疫应答、 神经反应、 胚胎发育和肿瘤发生等都有密切的关联。 ====分析方法==== 使用软件 DaPars : 详见Dapars 说明文档 [[http://lilab.research.bcm.edu/dldcc-web/lilab/zheng/DaPars_Documentation/html/DaPars.html|DaPars]] ===文件准备=== 1. bed 文件 (可从UCSC 网站下载) ben 文件下载链接: [[http://genome.ucsc.edu/cgi-bin/hgTables|UCSC bed 文件下载]] {{:售后:ucsc_bed下载.png?1000|}} bed 文件格式 : [[http://genome.ucsc.edu/FAQ/FAQformat.html#format1|bed 文件格式]] chr1 66999824 67210768 NM_032291 0 + 67000041 67208778 25 227,64,25,72,57,55,176,12,12,25,52,86,93,75,501,128,127,60,112,156,133,203,65,165,2013, 0,91705,98928,101802,105635,108668,109402,126371,133388,136853,137802,139139,142862,145536,147727,155006,156048,161292,185152,195122,199606,205193,206516,207130,208931, chr1 33546713 33585995 NM_052998 0 + 33547850 33585783 12 182,121,212,177,174,173,135,166,163,113,215,351, 0,275,488,1065,2841,10937,12169,13435,15594,16954,36789,38931, chr1 16767166 16786584 NM_001145278 0 + 16767256 16785385 104,101,105,82,109,178,76,1248, 0,2960,7198,7388,8421,11166,15146,18170, chr1 16767166 16786584 NM_001145277 0 + 16767256 16785491 182,101,105,82,109,178,1248, 0,2960,7198,7388,8421,11166,18170, chr1 8384389 8404227 NM_001080397 0 + 8384389 8404073 0 8 397,93,225,728,154,177,206,421, 0,968,1488,5879,11107,13486,15163,19417, chr1 16767166 16786584 NM_018090 0 + 16767256 16785385 182,101,105,82,109,178,76,1248, 0,2960,7198,7388,8421,11166,15146,18170, chr1 25071759 25170815 NM_013943 0 + 25072044 25167428 357,110,126,107,182,3552, 0,52473,68825,81741,94591,95504, chr1 48998526 50489626 NM_032785 0 - 48999844 50489468 14 1439,27,97,163,153,112,115,90,40,217,95,125,123,192, 0,2035,6787,54149,57978,101638,120482,130297,334336,512729,712915,1164458,1318541,1490908, 2. genename 文件 (可选) #name name2 NM_032291 SGIP1 NM_052998 ADC NM_001145278 NECAP2 NM_001145277 NECAP2 NM_001080397 SLC45A1 NM_018090 NECAP2 NM_013943 CLIC4 NM_032785 AGBL4 NM_001195684 TGFBR3 NM_001195683 TGFBR3 NR_036634 TGFBR3 NM_001918 DBT NM_003243 TGFBR3 NM_030806 C1orf21 NM_022457 RFWD2 NM_001001740 RFWD2 NM_021222 PRUNE **3 使用脚本通过以上两个文件转换得到utr.bed 文件** 脚本路径: /TJPROJ1/RNA/shouhou/script_dir/other/Dapars/DaPars_Extract_Anno.py unsge: python /TJPROJ1/RNA/shouhou/script_dir/other/Dapars/DaPars_Extract_Anno.py -b hg19_refseq_whole_gene.bed -s hg19_4_19_2012_Refseq_id_from_UCSC.txt -o hg19_refseq_extracted_3UTR.bed 转换后的 3-utr.bed 格式 chr15 41795161 41795757 NM_002220|ITPKA|chr15|+ 0 + chr9 95473645 95477745 NM_001003800|BICD2|chr9|- 0 - chr19 50921099 50921275 NM_001308632|NA|chr19|+ 0 + chr6 44201154 44201888 NM_001304466|NA|chr6|+ 0 + chr10 126446400 126449072 NM_001304467|NA|chr10|- 0 - chr11 92623657 92629635 NM_001008781|FAT3|chr11|+ 0 + chr6 137245023 137246798 NM_001008783|SLC35D3|chr6|+ 0 + chr16 90061167 90063028 NR_003227|AFG3L1P|chr16|+ 0 + chr13 80910111 80911891 NM_001318537|NA|chr13|- 0 - chr14 101459573 101459646 NR_003224|SNORD114-31|chr14|+ 0 + chr14 101458256 101458326 NR_003223|SNORD114-30|chr14|+ 0 + chr14 101456428 101456496 NR_003222|SNORD114-29|chr14|+ 0 + chr14 101455467 101455537 NR_003221|SNORD114-28|chr14|+ 0 + chr14 101454498 101454566 NR_003220|SNORD114-27|chr14|+ 0 + chr1 146465878 146467744 NM_001278267|NA|chr1|+ 0 + chr16 8946799 8949183 NM_001278262|NA|chr16|- 0 - chr14 101391158 101391227 NR_003229|SNORD113-1|chr14|+ 0 + chr16 90066857 90067195 NR_003228|AFG3L1P|chr16|+ 0 + 4 . bigwig 格式文件(由bam 文件转换而来) bam to wig genomeCoverageBed -bg -ibam CML_2.bam -g genelength -split >CML_2.wig genomeCoverageBed -bg -ibam APP_1.bam -g genelength -split >APP_1.wig 5 . 运行第二步脚本。 准备配置文件 DaPars_test_data_configure.txt Annotated_3UTR=utr.bed #第一步生成的utr.bed Group1_Tophat_aligned_Wig=CML_2.wig #由样品1 bam 文件转化的wig 文件 Group2_Tophat_aligned_Wig=APP_1.wig #由样品2 bam 文件转化的wig 文件 Output_directory=DaPars_Test_data/ #输出文件路径 Output_result_file=DaPars_Test_data #Parameters Num_least_in_group1=1 Num_least_in_group2=1 Coverage_cutoff=30 FDR_cutoff=0.05 PDUI_cutoff=0.5 Fold_change_cutoff=0.59 第二步 脚本路径: /TJPROJ1/RNA/shouhou/script_dir/other/Dapars/DaPars_main.py usage : python /TJPROJ1/RNA/shouhou/script_dir/other/Dapars/DaPars_main.py DaPars_test_data_configure.txt 串写脚本: python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/APAtrap/APA_prep.py \ --bamdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/APA_analysis/bam \ --samples C1,C2,C3,T1,T2,T3,T4,M1,M2,M3 \ --groups T,C,M \ --s2g T1:T2:T3:T4,C1:C2:C3,M1:M2:M3 \ --compares TvsC,MvsC,TvsM \ -software DaPars \ --gtf /TJPROJ13/GB_TR/reference_data/Animal/Mus_musculus/Mus_musculus_Ensemble_94/Mus_musculus_Ensemble_94.gtf \ --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/APA_analysis/DaPars \ ====结果文件格式==== Gene fit_value Predicted_Proximal_APA Loci A_1_long_exp A_1_short_exp A_1_PDUI B_1_long_exp B_1_short_exp B_1_PDUI Group_A_Mean_PDUI Group_B_Mean_PDUI PDUI_Group_diff P_val adjusted.P_val Pass_Filter MGG_16866|NA|chr3|- 194.0 4583410 chr3:4583310-4583667 111.08 22.44 0.83 201.32 42.75 0.82 0.83 0.82 0.01 0.886944721141 1.0 N MGG_01784|NA|chr2|- 145.7 4818859 chr2:4818744-4819128 64.91 0.11 1.00 39.25 7.17 0.85 1.0 0.85 0.15 0.0016820131859 0.00694232193925 N MGG_07736|NA|chr3|+ 68.1 5574813 chr3:5574612-5575100 NA NA NA 13.22 27.26 0.33 NA NA NA NA NA N MGG_04647|NA|chr2|- 1354.1 1128467 chr2:1128015-1128668 77.86 60.32 0.56 149.88 126.760.54 0.56 0.54 0.02 0.752815723012 0.955064601115 N readme: 选择性多聚腺苷酸化(APA)是一种重要的前体RNA加工机制,广泛存在于所有真核生物中。通过在RNA 3′UTR不同位置上添加polyA尾巴,可以选择性地调节3′UTR的长短。由于3′UTR含有多种顺式调控元件,例如:miRNA或RNA结合蛋白(RBP)结合位点,因此,APA可以通过调控3′UTR的长度,影响目标mRNA的稳定性和翻译效率以及翻译后蛋白质的细胞定位,进而精细调节基因表达,对一系列细胞过程(如增殖、分化和肿瘤发生)产生根本性的影响。(引自https://www.seqchina.cn/14138.html) 为了使用RNA-seq来评价APA事件,有学者发明了DaPars算法。DaPars算法是利用远端PolyA位点使用比(Percentage of Distal polyA site Usage Index, PDUI)的数值来评价APA事件的发生比例。PDUI的数值范围是0-1,如果PDUI接近于1则代表基因存在更多的长3'UTR;如果PDUI接近于0则代表基因存在更多的短3'UTR。 Gene:基因名称和位置 fit_value:回归模型的参数 Predicted_Proximal_APA:预测近端Ploy(A)位点 Loci:转录本3'UTR的区域 sample_long_exp:远端Ploy(A)位点表达量 sample_short_exp:近端Ploy(A)位点表达量 sample_PDUI:样本的PDUI值,远端PolyA位点使用比(Percentage of Distal polyA site Usage Index, PDUI),在[0,1],越接近1,3'UTR越长 DUI_Group_diff: 两个比较组合的差值 P_val:显著性检验P值 adjusted.P_val:矫正后的P值 Pass_Filter:是否差异显著 我们分析过程中使用的参数如下: FDR_cutoff=0.05 PDUI_cutoff=0.5 Fold_change_cutoff=0.59 ====参考文献==== {{ :售后:可变ploya.pdf |可变PloyA 文献}} {{ :售后:cfim25_links_alternative_polyadenylation_to_glioblastoma_tumor_suppression.pdf |}}