TE甲基化分析思路及分析结果图形展示形式收集

TE甲基化分析总脚本路径

/PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/00.Pipline/TEs_Analysis_Pipeline_V1.pl

=========================================================================================
        Description: TE Analysis Pipeline
        Usege:
		--preparedir	prepare dir					Required
		--group		sample name,like A,B... or A1:A2:A3,B1:B2:B3...	Required
		--groupname	group names,like A,B...				Required
		--compare	compare strategie,like 1:2,2:4			Required
		--fpkm		fpkm file					Required
		--fa		fasta file					Required
		--gtf		gtf file					Required
		--species	KEGG species,like ath				Required
		--goann		GO file						Required
=========================================================================================

TE甲基化分析模块脚本路径

/PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module

TE甲基化分析模块

<1>数据准备

<2>TE甲基化水平展示

<3>有无TE插入基因表达水平分析

<4>TE差异分析

<5>差异TE相关基因富集分析

TE甲基化分析模块介绍

<1>数据准备

perl  /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/0.prepare/associate_prepare.pl
=====================================================================================
	Description: prepare bedfile CXfile for TEs Analysis
	Writer:guoxueyu@novogene.com
	Date:20180821
	Usege:
		-mapdir		the WGBS 2.Map_Methy dir		Required
		-beddir		the WGBS */Gernom_reg/all dir		Required
                -methydir       the WGBS project dir			Required
		-outdir		the output dir				Required
		-fai		*fai file				Required
		-Biorepeat	Biology repeats or not ,default="yes"	Option(yes|no)
		-group		a1:a2:a3,b1:b2:b3 or A1,B1 sep by ','	Required
		-groupname      A,B sep by ','				Required
		-compare	1:2,2:4 sep by ','			Required
======================================================================================

<2>TE甲基化水平展示

perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/region_level.plot.V2.pl
=====================================================================================
	Description:show the methylation of TEs and it's subclasses
	Writer:guoxueyu@novogene.com
	Date:20180823
	Usege:
		--prepare_dir	0.prepare dir				Required
		--groupname     A,B sep by ','				Required
		--compare       1:2,2:4 sep by ','			Required
		--outdir	output dir				Required
=====================================================================================

<3>有无TE插入基因表达水平分析

perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/1.region_level/gene.withorwithout.TEs.pl
=====================================================================================
        Description:show the methylation of TEs and it's subclasses
        Writer:guoxueyu@novogene.com
        Date:20180825
        Usege:
                --prepare_dir   0.prepare dir                           Required
                --groupname     A,B sep by ','                          Required
                --compare       1:2,2:4 sep by ','                      Required
		--fpkm		ref rowmeans.fpkm.xls			Required
                --outdir        output dir                              Required


######################## fpkm ##############################
#geneID	compare1	compare2	comapre3	...#
#genea	fkpma1		fkpma2		fkpma3		...#
#geneb	fkpmb1		fkpmb2		fkpmb3		...#

=====================================================================================

<4>TE差异分析

perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/2.DMTEs/DMTE.pl
=====================================================================================
	Description:show the methylation of TEs and it's subclasses
	Writer:guoxueyu@novogene.com
	Date:20180829
	Usege:
		--prepare_dir	0.prepare dir				Required
		--groupname     A,B sep by ','				Required
		--compare       1:2,2:4 sep by ','			Required
		--outdir	output dir				Required
======================================================================================

<5>差异TE相关基因富集分析

perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/GO_Enrichment.pre.pl
	Description:
	Usage:
		-preparedir	prepare dir		Required
		-groupname	group names a,b,c	Required
		-compare	compare strategy 1:2	Required
		-outdir		out dir			Required
		-go		go.txt			Required
		-gtf		*.gtf			Required

perl /PUBLIC/source/RNA/WGBS_V7/Tools/8.TE_analysis/Module/3.enrich/KEGG_Enrichment.pre.pl
	Description:
	Usage:
		-preparedir	prepare  dir		Required
		-groupname	group names a,b,c	Required
		-compare	compare strategy 1:2	Required
		-outdir		out dir			Required
		-species	kegg species: ath	Required

TE甲基化分析思路

1、TEs分类

 TEs分类并整理为bed格式 方法：转座子分类
 TEs个数分布
 脚本路径：

2、TEs甲基化水平分析

 TEs整体甲基化水平分布：上下游
 TEs子类甲基化水平分布：上下游

3、有TEs插入的基因的甲基化水平及表达水平分析

 基因body或侧翼区域有无TE插入的基因甲基化水平与表达水平分布
 不同位置被TE插入的基因的甲基化水平与表达水平分布

4、差异TEs分析

  分析方法借鉴DMP分析

5、差异TEs锚定基因富集分析

   差异TEs锚定基因GO富集
   差异TEs锚定基因KEGG富集

TE甲基化分析结果图形展示

TE甲基化水平分布

1、TE body及其上下游区域甲基化水平分布

Average methylation level distribution over genes and TEs. The flanking regions are the same lengths as the genes or TE bodies.

2、两大类TE原件body区域及侧翼区域甲基化水平分布

DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.

3、两大类下不同子类TE原件 body及其上下游区域甲基化水平分布

Average methylation level distribution over class I TEs. “−”, “B” and “+” mean upstream, gene body and downstream regions of genes and TEs, respectively. The upstream and downstream flanking regions are the same lengths as the genes or TE body regions.

TE个数分布

1、不同类型TE原件个数柱形图

Numbers of differentially methylated TEs in ‘Qinguan’ or ‘Honeycrisp’ under water deficit

TE甲基化水平与插入基因的表达水平分布

1、基因body或侧翼区域有无TE插入的基因表达水平分布

Expression level of genes inserted by TEs or not. “Flank” means 4-kb flanking region of the gene; “+” means TEs inserted in this region; *P value < 0.05; **P value < 0.01.

2、不同位置被TE插入的基因的表达水平分布

Gene- expression level related to the distance to the closest TE. “0” indicates genes overlapped with TEs in body regions. The total gene number is 24,092. Error bars indicates SEM.

3、侧翼区域有不同数目TE插入的基因表达水平分布

Gene-expression level related to the number of neighboring TEs. (E) DNA methylation of TEs from the whole genome, gene body, and flanking 4-kb regions. Green asterisks mean higher methylation level in autotetraploid rice.

差异TE分析内容

差异TE分析参照DMP分析进行

图形展示：

1、不同差异TE个数分布

2、差异TE甲基化水平热图

参考文献：
Single-Base Methylome Analysis Reveals Dynamic Epigenomic Differences Associated with Water Deficit in Apple
Autotetraploid rice methylome analysis reveals methylation variation of transposable elements and their effects on gene expression

TE频数分布

计算TE在染色体或者基因上下游的频率,参考下图的E图和F图。

参考文献：Single-base resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening，该图为文献的补充文件。

分析脚本：

perl /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequecy.pl \
	-chrlist /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency/chrlist.txt \
        -fai /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/GDDH13_1-1_formatted.fa.fai \
        -binsize  500000 \
	-TEbedfile /BJPROJ/RNA/reference_data/Plant/Malus_x_domestica/GDDH13_Version1.1/WGBS/Genome_Reg/all/repeat.bed \
	-TE_class LINE,SINE,LTR,DNA,MULE-MuDR,PIF-Harbinger\
        -outdir /TJPROJ1/RNA/SHOUHOU/P101SC17111092/prepare_TE_B1/fenxi2/TE_frequency

##画图：
/TJPROJ1/RNA/WORK/software/R-3.5.2/bin/Rscript /TJPROJ1/RNA/WORK/Pipline/aftersale/5.TE_frequency/TE_frequ.r Chr01.TE_frequency.xls Chr01


#Chr01.TE_frequency.xls格式：
chr     bin_num   frequency           type
Chr01	1	0.0277777777777778	LINE
Chr01	1	0	SINE
Chr01	1	0.770833333333333	LTR
Chr01	1	0.180555555555556	DNA
Chr01	1	0.0300925925925926	MULE-MuDR
Chr01	1	0.0324074074074074	PIF-Harbinger
Chr01	10	0.018348623853211	LINE
Chr01	10	0	SINE

画图脚本：

library(plyr)
library(ggpubr)
library(data.table)
args<-commandArgs(T)
data<-read.table(args[1],header=F,stringsAsFactors = F)
names(data)<-c("chr","bin","te","Class")
data<-arrange(data,data[,4],data[,2],decreasing=F)
##平滑曲线
smooth<-4
rw<-floor(smooth)
hw<- floor(smooth/2)
xx<-split(data,data$Class)
for (i in 1:length(xx)){xx[[i]]$Y=sapply(1:nrow(xx[[i]]), function(x) mean(xx[[i]]$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE))}
data<-rbindlist(xx)
#data$Y <- sapply(1:nrow(data), function(x) mean(data$te[(max(1, x-hw)):(x+rw-hw-1)], na.rm=TRUE))
p1<-ggline(data,x="bin",y="Y",color="Class",palette = "npg",plot_type="l",size=0.8)+ylim(0,1)
p1<-p1+ylab("TE frequency")+xlab("Chromosome Length")

ggsave((sprintf("TE_frequency_%s.pdf",args[2])),plot=p1, width=7, height=7, useDingbats=FALSE)
ggsave((sprintf("TE_frequency_%s.png",args[2])),plot=p1, width=7, height=7, type="cairo-png")

目录

TE甲基化分析思路及分析结果图形展示形式收集

TE甲基化分析总脚本路径

TE甲基化分析模块脚本路径

TE甲基化分析模块

<1>数据准备

<2>TE甲基化水平展示

<3>有无TE插入基因表达水平分析

<4>TE差异分析

<5>差异TE相关基因富集分析

TE甲基化分析模块介绍

<1>数据准备

<2>TE甲基化水平展示

<3>有无TE插入基因表达水平分析

<4>TE差异分析

<5>差异TE相关基因富集分析

TE甲基化分析思路

1、TEs分类

2、TEs甲基化水平分析

3、有TEs插入的基因的甲基化水平及表达水平分析

4、差异TEs分析

5、差异TEs锚定基因富集分析

TE甲基化分析结果图形展示

TE甲基化水平分布

TE个数分布

TE甲基化水平与插入基因的表达水平分布

差异TE分析内容

TE频数分布