====== 一、软件介绍 ======
SeroBA 是一种基于 k-mer 的 Pipeline,用于从 Illumina NGS 的reads中鉴定给定参考的血清型,以识别肺炎链球菌的包膜类型。
**整体分析流程**
{{:liucheng.png?400|}}
====== 二、软件下载 ======
# 依赖软件
Required dependencies:
1.Python3 version >= 3.3.2
2.KMC version >= 3.0
3.MUMmer version >= 3.1
4.Ariba
# 安装diamond
Set up bioconda channel:
conda config --add channels bioconda
Install SeroBA:
conda install -c bioconda seroba
====== 三、软件使用 ======
usage: seroba createDBs
Creates a Database for kmc and ariba
positional arguments:
database dir output directory for kmc and ariba Database
kmer size kmer_size you want to use for kmc , recommended = 71
usage: seroba runSerotyping [options]
Example : seroba createDBs my_database/ 71
Identify serotype of your input data
positional arguments:
database dir path to database directory
read1 forward read file
read2 reverse read file
prefix unique prefix
optional arguments:
-h, --help show this help message and exit
Other options:
--noclean NOCLEAN Do not clean up intermediate files (assemblies, ariba
report)
--coverage COVERAGE threshold for k-mer coverage of the reference sequence (default = 20)
Summaries the output in one tsv file
usage: seroba summary
====== 四、流程执行 ======
分析脚本:
cd /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326
source /TJPROJ1/META_ASS/soft/anaconda3/bin/activate /TJPROJ1/META_ASS/soft/seroba
unset PERL5LIB
seroba runSerotyping /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_1.fq.gz /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_2.fq.gz F4326.out
====== 五、分析结果 ======
1.分析结果展示
在文件夹 'prefix' 中,您将找到一个 pred.tsv,其中包含您的预测血清型,以及一个名为 detailed_serogroup_info.txt 的文件,其中包含有关您在读数中找到的 SNP、基因和等位基因的信息。使用“seroba summary”后,创建一个名为 summary.tsv 的 tsv 文件,该文件由三列(样本 ID、血清型、注释)组成。与任何参考文献都不匹配的血清型被标记为“untypable”(v0.1.3)。
输出结果如下:
Predicted Serotype: 23F
Serotype predicted by ariba: 23F
assembly from ariba has an identity of: 99.77 with this serotype
Serotype Genetic Variant
23F allele wchA
在详细信息中,您可以看到最终预测的血清型,以及根据 ARIBA 在该特定血清组中具有最接近参考的血清型。此外,您还可以查看序列组合件和参考序列之间的序列标识。
2.分析过程中可能出现的问题
Case 1:
SeroBA predicts 'untypable'. An 'untypable' prediction can either be a real 'untypable' strain or can be caused by different problems. Possible problems are: bad quality of your input data, submission of a wrong species or to low coverage of your sequenced reads. Please check your data again and run a quality control.
Case 2:
Low alignment identity in the 'detailed_serogroup_info' file. This can be a hint for a mosaic serotpye.
Possible solution: perform a blast search on the whole genome assembly
Case 3:
The third column in the summary.tsv indicates "contamination". This means that at least one heterozygous SNP was detected in the read data with at least 10% of the mapped reads at the specific position supporting the SNP.
Possible solution: please check the quality of your data and have a look for contamination within your reads