====== TE甲基化分析思路及分析结果图形展示形式收集 ====== ===== TE甲基化分析总脚本路径 ===== /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/00.Pipline/TEs_Analysis_Pipeline_V1.pl ========================================================================================= Description: TE Analysis Pipeline Usege: --preparedir prepare dir Required --group sample name,like A,B... or A1:A2:A3,B1:B2:B3... Required --groupname group names,like A,B... Required --compare compare strategie,like 1:2,2:4 Required --fpkm fpkm file Required --fa fasta file Required --gtf gtf file Required --species KEGG species,like ath Required --goann GO file Required ========================================================================================= ===== TE甲基化分析模块脚本路径 ===== /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module ===== TE甲基化分析模块 ===== ==== <1>数据准备 ==== ==== <2>TE甲基化水平展示 ==== ==== <3>有无TE插入基因表达水平分析 ==== ==== <4>TE差异分析 ==== ==== <5>差异TE相关基因富集分析 ==== \\ \\ ===== TE甲基化分析模块介绍 ===== ==== <1>数据准备 ==== perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/0.prepare/associate_prepare.pl ===================================================================================== Description: prepare bedfile CXfile for TEs Analysis Writer:guoxueyu@novogene.com Date:20180821 Usege: -mapdir the WGBS 2.Map_Methy dir Required -beddir the WGBS */Gernom_reg/all dir Required -methydir the WGBS project dir Required -outdir the output dir Required -fai *fai file Required -Biorepeat Biology repeats or not ,default="yes" Option(yes|no) -group a1:a2:a3,b1:b2:b3 or A1,B1 sep by ',' Required -groupname A,B sep by ',' Required -compare 1:2,2:4 sep by ',' Required ====================================================================================== ==== <2>TE甲基化水平展示 ==== perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/region_level.plot.V2.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180823 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --outdir output dir Required ===================================================================================== ==== <3>有无TE插入基因表达水平分析 ==== perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/gene.withorwithout.TEs.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180825 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --fpkm ref rowmeans.fpkm.xls Required --outdir output dir Required ######################## fpkm ############################## #geneID compare1 compare2 comapre3 ...# #genea fkpma1 fkpma2 fkpma3 ...# #geneb fkpmb1 fkpmb2 fkpmb3 ...# ===================================================================================== ==== <4>TE差异分析 ==== perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/2.DMTEs/DMTE.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180829 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --outdir output dir Required ====================================================================================== ==== <5>差异TE相关基因富集分析 ==== perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/GO_Enrichment.pre.pl Description: Usage: -preparedir prepare dir Required -groupname group names a,b,c Required -compare compare strategy 1:2 Required -outdir out dir Required -go go.txt Required -gtf *.gtf Required perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/KEGG_Enrichment.pre.pl Description: Usage: -preparedir prepare dir Required -groupname group names a,b,c Required -compare compare strategy 1:2 Required -outdir out dir Required -species kegg species: ath Required ===== TE甲基化分析思路 ===== ==== 1、TEs分类 ==== TEs分类并整理为bed格式 方法:转座子分类 TEs个数分布 脚本路径: ==== 2、TEs甲基化水平分析 ==== TEs整体甲基化水平分布:上下游 TEs子类甲基化水平分布:上下游 ==== 3、有TEs插入的基因的甲基化水平及表达水平分析 ==== 基因body或侧翼区域有无TE插入的基因甲基化水平与表达水平分布 不同位置被TE插入的基因的甲基化水平与表达水平分布 ==== 4、差异TEs分析 ==== 分析方法借鉴DMP分析 ==== 5、差异TEs锚定基因富集分析 ==== 差异TEs锚定基因GO富集 差异TEs锚定基因KEGG富集 ===== TE甲基化分析结果图形展示 ===== ==== TE甲基化水平分布 ==== 1、TE body及其上下游区域甲基化水平分布\\ {{:产品:png1.png?400|}} Average methylation level distribution over genes and TEs. The flanking regions are the same lengths as the genes or TE bodies.\\ 2、两大类TE原件body区域及侧翼区域甲基化水平分布 \\ {{:产品:f3e.png?400|}} DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.\\ 3、两大类下不同子类TE原件 body及其上下游区域甲基化水平分布 \\ {{:产品:png3.png?400|}} Average methylation level distribution over class I TEs. “−”, “B” and “+” mean upstream, gene body and downstream regions of genes and TEs, respectively. The upstream and downstream flanking regions are the same lengths as the genes or TE body regions. ==== TE个数分布 ==== 1、不同类型TE原件个数柱形图 \\ {{:产品:png2.png?400|}} Numbers of differentially methylated TEs in ‘Qinguan’ or ‘Honeycrisp’ under water deficit ==== TE甲基化水平与插入基因的表达水平分布 ==== 1、基因body或侧翼区域有无TE插入的基因表达水平分布\\ {{:产品:f3b.png?400|}} Expression level of genes inserted by TEs or not. “Flank” means 4-kb flanking region of the gene; “+” means TEs inserted in this region; *P value < 0.05; **P value < 0.01.\\ 2、不同位置被TE插入的基因的表达水平分布\\ {{:产品:f3c.png?400|}} Gene- expression level related to the distance to the closest TE. “0” indicates genes overlapped with TEs in body regions. The total gene number is 24,092. Error bars indicates SEM. \\ 3、侧翼区域有不同数目TE插入的基因表达水平分布\\ {{:产品:f3d.png?400|}} Gene-expression level related to the number of neighboring TEs. (E) DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.\\ ==== 差异TE分析内容 ==== 差异TE分析参照DMP分析进行\\ 图形展示:\\ 1、不同差异TE个数分布\\ {{:产品:diff_te_zhu.png?400|}} 2、差异TE甲基化水平热图\\ {{:产品:te_diff_b.png?400|}} 参考文献:\\ Single-Base Methylome Analysis Reveals Dynamic Epigenomic Differences Associated with Water Deficit in Apple\\ Autotetraploid rice methylome analysis reveals methylation variation of transposable elements and their effects on gene expression\\ ====TE频数分布==== 计算TE在染色体或者基因上下游的频率,参考下图的E图和F图。 参考文献:Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening,该图为文献的补充文件。\\ {{:产品:te.png?400|}}\\ 分析脚本: perl /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequecy.pl \ -chrlist /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency/chrlist.txt \ -fai /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/GDDH13_1-1_formatted.fa.fai \ -binsize 500000 \ -TEbedfile /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/Genome_Reg/all/repeat.bed \ -TE_class LINE,SINE,LTR,DNA,MULE-MuDR,PIF-Harbinger\ -outdir /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency ##画图: /TJPROJ1/RNA/WORK/software/R-3.5.2/bin/Rscript /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequ.r Chr01.TE_frequency.xls Chr01 #Chr01.TE_frequency.xls格式: chr bin_num frequency type Chr01 1 0.0277777777777778 LINE Chr01 1 0 SINE Chr01 1 0.770833333333333 LTR Chr01 1 0.180555555555556 DNA Chr01 1 0.0300925925925926 MULE-MuDR Chr01 1 0.0324074074074074 PIF-Harbinger Chr01 10 0.018348623853211 LINE Chr01 10 0 SINE 画图脚本: library(plyr) library(ggpubr) library(data.table) args<-commandArgs(T) data<-read.table(args[1],header=F,stringsAsFactors = F) names(data)<-c("chr","bin","te","Class") data<-arrange(data,data[,4],data[,2],decreasing=F) ##平滑曲线 smooth<-4 rw<-floor(smooth) hw<- floor(smooth/2) xx<-split(data,data$Class) for (i in 1:length(xx)){xx[[i]]$Y=sapply(1:nrow(xx[[i]]), function(x) mean(xx[[i]]$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE))} data<-rbindlist(xx) #data$Y <- sapply(1:nrow(data), function(x) mean(data$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE)) p1<-ggline(data,x="bin",y="Y",color="Class",palette = "npg",plot_type="l",size=0.8)+ylim(0,1) p1<-p1+ylab("TE frequency")+xlab("Chromosome Length") ggsave((sprintf("TE_frequency_%s.pdf",args[2])),plot=p1, width=7, height=7, useDingbats=FALSE) ggsave((sprintf("TE_frequency_%s.png",args[2])),plot=p1, width=7, height=7, type="cairo-png")