MetaCHIP,结合序列比对和系统发育分析,可以实现无参考基因组的、在菌群层次上横向转移基因(HGT,horizontal gene transfer)的检测。
横向转移基因(HGT)是指有机体之间遗传物质的转移,通常是微生物进化和适应性的重要原因(例如抗性基因,毒力基因)。前人分析转移基因的方法基本分为3种,以1.基因组成分特征(compositional features)为主的方法(如GIST,IslandViewer);2.最佳比对(best-match)为主的方法(如DrakHorse,HGTector);3. 显性系统发育分析为主的方法(如Ranger-DTL,AnGST),即对比gene tree和species tree之间的不一致性从而找到转移的gene。以往方法一是不适用于细菌群落,二是需要reference genome,这都限制了它的运用,MetaCHIP就是能解决以上两点问题。
提示:软件需要用到一些第三方软件(prodigal/hmmsearch/hmmfetch/hmmalign/hmmstat/mafft/blastp/blastn/makeblastdb/fasttree),这些软件如何在路径下失效,修改该配置文件中的对应绝对路径即可:
/TJPROJ1/META_ASS/soft/MetaCHIP/lib/python3.10/site-packages/MetaCHIP/MetaCHIP_config.py
MetaCHIP在分析前需要使用PI模块进行输入数据整理,然后使用BI模块去运行鉴定HGT
...::: MetaCHIP v1.10.13 :::... Core modules: PI -> Prepare input files BP -> Run Best-match and Phylogenetic approaches Supplementary modules: filter_HGT -> Get HGTs been found at no less than n taxonomic ranks update_hmms -> Update hmm profiles used for inferring SCG tree get_SCG_tree -> Get SCG protein tree rename_seqs -> Rename sequences in a file # for command specific help info MetaCHIP PI -h MetaCHIP BP -h
检测classes层级的HGT
source /TJPROJ1/META_ASS/soft/anaconda3/bin/activate /TJPROJ1/META_ASS/soft/MetaCHIP MetaCHIP PI -p NorthSea -r c -t 6 -i input_file_examples/human_gut_bins -x fasta -taxon input_file_examples/human_gut_bins_GTDB.tsv MetaCHIP BP -p NorthSea -r c -t 6
检测多层级的HGT among phyla, classes, orders, families and genera
MetaCHIP PI -p NorthSea -r pcofg -t 6 -o Total_level_result -i input_file_examples/human_gut_bins -x fasta -taxon input_file_examples/human_gut_bins_GTDB.tsv MetaCHIP BP -p NorthSea -r pcofg -t 6 -o Total_level_result -pfr
#检测组水平间的HGT
MetaCHIP PI -p NorthSea -g customized_grouping.txt -t 6 -i NS_37bins -x fasta MetaCHIP BP -p NorthSea -g customized_grouping.txt -t 6
- 包含所有已识别的HGT的制表符分隔文本文件。文件名格式:[prefix]_[taxon_ranks]_detected_HGTs.txt
Column | Description |
---|---|
Gene_1 | 参与HGT事件的第1个基因 |
Gene_2 | 参与HGT事件的第2个基因 |
Identity | Gene_1和Gene_2之间的Identity值 |
Occurence(taxon_ranks) | 仅用于多级检测。如果您在门、类和顺序级别执行HGT检测,则一些“011”表示当前HGT是在类和顺序级别识别的,而不是在门级别。 |
End_match | 是否结束match |
Full_length_match | 全长匹配与否 |
Direction | 基因流动的方向。括号中的数字是指如果这个HGT在多个等级被检测到,并且Ranger-DTL提供了不同的方向,则观察到这个方向的百分比。 |
- 已识别的供体和受体基因的核苷酸和氨基酸序列。 - 已识别的HGT的側翼区域。前链上编码的基因以浅蓝色显示,后链上编码的基因以浅绿色显示。预测为HGT的基因名称以蓝色突出显示,大字体在括号中给出成对的相同性。Contig名称在序列轨道的左下角提供,contig名称之后的数字是指受HGT约束的基因与contig的左端或右端之间的距离。根据BLASTN结果,红色条显示了contigs之间匹配区域的相似性。 - 群体之间的基因流动。带子连接捐赠者和接受者,带子的宽度与HGT的数量和与捐赠者对应的颜色相关。
- contig End_match的示例 - 全长contig匹配的示例
ps:文献关于该软件使用部分的描述:
To identify potential HGT events in algal-bacterial symbiotic systems, MetaCHIP (version 1.9.0) was employed to analyze all MAGs (Song et al., 2019; Wang et al., 2022). The metagenomic data, along with the taxonomic information of MAGs at specified ranks (e.g., phyla, genus, species), were organized using the GTDB-Tk tool, which is based on phylogenetic databases. The identification of candidate HGT events was based on the most accurate matching method. An algorithm utilizing the DeBruijn graph was applied to exclude candidates with high similarity (>95 %). For further validation, BLASTN analysis was conducted on the flanking regions. To confirm the initial predictions, each gene pair suspected of HGT was analyzed to construct a protein tree using FastTree version 2.1.10. A species tree was also generated for comparison with the gene tree. High-confidence predictions were identified, allowing us to determine the direction of gene transfer. The process concluded with the removal of redundant HGT detections to finalize the results.