/PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/00.Pipline/TEs_Analysis_Pipeline_V1.pl
========================================================================================= Description: TE Analysis Pipeline Usege: --preparedir prepare dir Required --group sample name,like A,B... or A1:A2:A3,B1:B2:B3... Required --groupname group names,like A,B... Required --compare compare strategie,like 1:2,2:4 Required --fpkm fpkm file Required --fa fasta file Required --gtf gtf file Required --species KEGG species,like ath Required --goann GO file Required =========================================================================================
/PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/0.prepare/associate_prepare.pl ===================================================================================== Description: prepare bedfile CXfile for TEs Analysis Writer:guoxueyu@novogene.com Date:20180821 Usege: -mapdir the WGBS 2.Map_Methy dir Required -beddir the WGBS */Gernom_reg/all dir Required -methydir the WGBS project dir Required -outdir the output dir Required -fai *fai file Required -Biorepeat Biology repeats or not ,default="yes" Option(yes|no) -group a1:a2:a3,b1:b2:b3 or A1,B1 sep by ',' Required -groupname A,B sep by ',' Required -compare 1:2,2:4 sep by ',' Required ======================================================================================
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/region_level.plot.V2.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180823 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --outdir output dir Required =====================================================================================
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/gene.withorwithout.TEs.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180825 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --fpkm ref rowmeans.fpkm.xls Required --outdir output dir Required ######################## fpkm ############################## #geneID compare1 compare2 comapre3 ...# #genea fkpma1 fkpma2 fkpma3 ...# #geneb fkpmb1 fkpmb2 fkpmb3 ...# =====================================================================================
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/2.DMTEs/DMTE.pl ===================================================================================== Description:show the methylation of TEs and it's subclasses Writer:guoxueyu@novogene.com Date:20180829 Usege: --prepare_dir 0.prepare dir Required --groupname A,B sep by ',' Required --compare 1:2,2:4 sep by ',' Required --outdir output dir Required ======================================================================================
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/GO_Enrichment.pre.pl Description: Usage: -preparedir prepare dir Required -groupname group names a,b,c Required -compare compare strategy 1:2 Required -outdir out dir Required -go go.txt Required -gtf *.gtf Required
perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/KEGG_Enrichment.pre.pl Description: Usage: -preparedir prepare dir Required -groupname group names a,b,c Required -compare compare strategy 1:2 Required -outdir out dir Required -species kegg species: ath Required
TEs分类并整理为bed格式 方法:转座子分类 TEs个数分布 脚本路径:
TEs整体甲基化水平分布:上下游 TEs子类甲基化水平分布:上下游
基因body或侧翼区域有无TE插入的基因甲基化水平与表达水平分布 不同位置被TE插入的基因的甲基化水平与表达水平分布
分析方法借鉴DMP分析
差异TEs锚定基因GO富集 差异TEs锚定基因KEGG富集
Average methylation level distribution over genes and TEs. The flanking regions are the same lengths as the genes or TE bodies.
DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.
3、两大类下不同子类TE原件 body及其上下游区域甲基化水平分布
Average methylation level distribution over class I TEs. “−”, “B” and “+” mean upstream, gene body and downstream regions of genes and TEs, respectively. The upstream and downstream flanking regions are the same lengths as the genes or TE body regions.
Numbers of differentially methylated TEs in ‘Qinguan’ or ‘Honeycrisp’ under water deficit
Expression level of genes inserted by TEs or not. “Flank” means 4-kb flanking region of the gene; “+” means TEs inserted in this region; *P value < 0.05; **P value < 0.01.
Gene- expression level related to the distance to the closest TE. “0” indicates genes overlapped with TEs in body regions. The total gene number is 24,092. Error bars indicates SEM.
Gene-expression level related to the number of neighboring TEs. (E) DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.
差异TE分析参照DMP分析进行
图形展示:
参考文献:
Single-Base Methylome Analysis Reveals Dynamic Epigenomic Differences Associated with Water Deficit in Apple
Autotetraploid rice methylome analysis reveals methylation variation of transposable elements and their effects on gene expression
计算TE在染色体或者基因上下游的频率,参考下图的E图和F图。
参考文献:Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening,该图为文献的补充文件。
分析脚本:
perl /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequecy.pl \ -chrlist /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency/chrlist.txt \ -fai /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/GDDH13_1-1_formatted.fa.fai \ -binsize 500000 \ -TEbedfile /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/Genome_Reg/all/repeat.bed \ -TE_class LINE,SINE,LTR,DNA,MULE-MuDR,PIF-Harbinger\ -outdir /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency ##画图: /TJPROJ1/RNA/WORK/software/R-3.5.2/bin/Rscript /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequ.r Chr01.TE_frequency.xls Chr01 #Chr01.TE_frequency.xls格式: chr bin_num frequency type Chr01 1 0.0277777777777778 LINE Chr01 1 0 SINE Chr01 1 0.770833333333333 LTR Chr01 1 0.180555555555556 DNA Chr01 1 0.0300925925925926 MULE-MuDR Chr01 1 0.0324074074074074 PIF-Harbinger Chr01 10 0.018348623853211 LINE Chr01 10 0 SINE
画图脚本:
library(plyr) library(ggpubr) library(data.table) args<-commandArgs(T) data<-read.table(args[1],header=F,stringsAsFactors = F) names(data)<-c("chr","bin","te","Class") data<-arrange(data,data[,4],data[,2],decreasing=F) ##平滑曲线 smooth<-4 rw<-floor(smooth) hw<- floor(smooth/2) xx<-split(data,data$Class) for (i in 1:length(xx)){xx[[i]]$Y=sapply(1:nrow(xx[[i]]), function(x) mean(xx[[i]]$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE))} data<-rbindlist(xx) #data$Y <- sapply(1:nrow(data), function(x) mean(data$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE)) p1<-ggline(data,x="bin",y="Y",color="Class",palette = "npg",plot_type="l",size=0.8)+ylim(0,1) p1<-p1+ylab("TE frequency")+xlab("Chromosome Length") ggsave((sprintf("TE_frequency_%s.pdf",args[2])),plot=p1, width=7, height=7, useDingbats=FALSE) ggsave((sprintf("TE_frequency_%s.png",args[2])),plot=p1, width=7, height=7, type="cairo-png")