=====转座子、TE、LTR、ERV分析===== 转座子(Transposable element,TE),也被称为“跳跃基因”或“转座元件”,其本质是一类DNA片段,能够从基因组的一个位置转移到另一个位置。转座子及其可识别的残余成分在原核和真核生物中均有分布,并且在不同物种中对基因组和转录组等在不同层次的影响不断被报道,这些发现改变了研究人员对转座子的看法。目前,转座子相关的研究已成为后基因组时代的研究热点。 转座子定量差异分析流程: /TJPROJ6/RNA_SH/script_dir/TEs/TE_pipline.py -h usage: TE_pipline.py [-h] {RepEnrich,RSEM,TEtranscripts} ... use RepeatMasker analysis TE and quant positional arguments: {RepEnrich,RSEM,TEtranscripts} sub-command help RepEnrich RepEnrich quant pipline RSEM RSEM quant pipline TEtranscripts TEtranscripts quant pipline optional arguments: -h, --help show this help message and exit 该流程涉及3款定量软件,RepEnrich,RSEM,TEtranscripts,其中RepEnrich和TEtranscripts,均为有文献支持的TE定量软件,RSEM是根据老师的需求串写的流程。 RepEnrich软件 /TJPROJ6/RNA_SH/script_dir/TEs/TE_pipline.py RepEnrich -h usage: TE_pipline.py RepEnrich [-h] [-R RAW] [-C CLEAN] [-A {yes,no}] [-RTC {fastp,ngqc}] -F FA [-SP SP] [-S SAMPLE] [-S2G S2G] [-G GROUP] [-CD CONDITION] [-CP COMPARE] [-PJ PADJ] [-PV PVALUE] [-FC FC] optional arguments: -h, --help show this help message and exit -R RAW, --raw RAW the raw dir -C CLEAN, --clean CLEAN the clean dir -A {yes,no}, --adapter {yes,no} get adapter -RTC {fastp,ngqc}, --rawtoclean_soft {fastp,ngqc} get clean soft ware -F FA, --fa FA the fa file -SP SP, --sp SP the RepeatMasker species, find /PUBLIC/software/public /Repeat/RepeatMasker/Libraries/Species.txt -S SAMPLE, --sample SAMPLE the sample name -S2G S2G, --s2g S2G the relation for sample and group -G GROUP, --group GROUP the group name -CD CONDITION, --condition CONDITION the condition file -CP COMPARE, --compare COMPARE the compare group name -PJ PADJ, --padj PADJ the padj -PV PVALUE, --pvalue PVALUE the pvalue -FC FC, --fc FC the foldchange 该软件的测试结果见/TJPROJ6/RNA_SH/script_dir/TEs/RepEnrich2-master/result-RepEnrich2 TEtranscripts软件 /TJPROJ6/RNA_SH/script_dir/TEs/TE_pipline.py TEtranscripts -h usage: TE_pipline.py TEtranscripts [-h] [-R RAW] [-C CLEAN] [-A {yes,no}] [-RTC {fastp,ngqc}] [-F FA] [-SP SP] -GTF GTF [-S SAMPLE] [-S2G S2G] [-G GROUP] [-CD CONDITION] [-CP COMPARE] [-PJ PADJ] [-PV PVALUE] [-FC FC] [-SD {no,forward,reverse}] [-M {uniq,multi}] [-TE TE_GTF] [-B BAM] optional arguments: -h, --help show this help message and exit -R RAW, --raw RAW the raw dir -C CLEAN, --clean CLEAN the clean dir -A {yes,no}, --adapter {yes,no} get adapter -RTC {fastp,ngqc}, --rawtoclean_soft {fastp,ngqc} get clean soft ware -F FA, --fa FA the fa file -SP SP, --sp SP the RepeatMasker species, find /PUBLIC/software/public /Repeat/RepeatMasker/Libraries/Species.txt -GTF GTF, --gtf GTF the gene gtf file -S SAMPLE, --sample SAMPLE the sample name -S2G S2G, --s2g S2G the relation for sample and group -G GROUP, --group GROUP the group name -CD CONDITION, --condition CONDITION the condition file -CP COMPARE, --compare COMPARE the compare group name -PJ PADJ, --padj PADJ the padj -PV PVALUE, --pvalue PVALUE the pvalue -FC FC, --fc FC the foldchange -SD {no,forward,reverse}, --strand {no,forward,reverse} the strand -M {uniq,multi}, --mode {uniq,multi} TE counting mode -TE TE_GTF, --TE_gtf TE_GTF the TE gtf file -B BAM, --bam BAM the bam dir 该软件测试结果见/TJPROJ6/RNA_SH/script_dir/TEs/TEtranscripts-master/result-TEtranscripts。\\ 该软件匹配了一些物种的TE的gtf文件,如果版本能对应则可以不跑预测部分,路径见/TJPROJ6/RNA_SH/script_dir/TEs/TEtranscripts-master/database\\ RSEM软件 /TJPROJ6/RNA_SH/script_dir/TEs/TE_pipline.py RSEM -h usage: TE_pipline.py RSEM [-h] [-R RAW] [-C CLEAN] [-A {yes,no}] [-RTC {fastp,ngqc}] -F FA [-SP SP] [-S SAMPLE] [-S2G S2G] [-G GROUP] [-CD CONDITION] [-CP COMPARE] [-PJ PADJ] [-PV PVALUE] [-FC FC] [-SS {0,0.5,1}] optional arguments: -h, --help show this help message and exit -R RAW, --raw RAW the raw dir -C CLEAN, --clean CLEAN the clean dir -A {yes,no}, --adapter {yes,no} get adapter -RTC {fastp,ngqc}, --rawtoclean_soft {fastp,ngqc} get clean soft ware -F FA, --fa FA the fa file -SP SP, --sp SP the RepeatMasker species, find /PUBLIC/software/public /Repeat/RepeatMasker/Libraries/Species.txt -S SAMPLE, --sample SAMPLE the sample name -S2G S2G, --s2g S2G the relation for sample and group -G GROUP, --group GROUP the group name -CD CONDITION, --condition CONDITION the condition file -CP COMPARE, --compare COMPARE the compare group name -PJ PADJ, --padj PADJ the padj -PV PVALUE, --pvalue PVALUE the pvalue -FC FC, --fc FC the foldchange -SS {0,0.5,1}, --ss {0,0.5,1} for RSEM: fr-unstranded:0.5,fr-firststrand:1,fr- secondstrand:0 测试结果见/TJPROJ6/RNA_SH/script_dir/TEs/RSEM-master/result-example,仅有TE的预测结果,定量结果与无参的count一致。