# 依赖软件 Required dependencies: 1.Python3 version >= 3.3.2 2.KMC version >= 3.0 3.MUMmer version >= 3.1 4.Ariba # 安装diamond Set up bioconda channel: conda config --add channels bioconda Install SeroBA: conda install -c bioconda seroba
usage: seroba createDBs <database dir> <kmer size> Creates a Database for kmc and ariba positional arguments: database dir output directory for kmc and ariba Database kmer size kmer_size you want to use for kmc , recommended = 71 usage: seroba runSerotyping [options] <databases directory> <read1> <read2> <prefix> Example : seroba createDBs my_database/ 71 Identify serotype of your input data positional arguments: database dir path to database directory read1 forward read file read2 reverse read file prefix unique prefix optional arguments: -h, --help show this help message and exit Other options: --noclean NOCLEAN Do not clean up intermediate files (assemblies, ariba report) --coverage COVERAGE threshold for k-mer coverage of the reference sequence (default = 20) Summaries the output in one tsv file usage: seroba summary <output folder> positional arguments: output folder directory where the output directories from seroba runSerotyping are stored
分析脚本: cd /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326 source /TJPROJ1/META_ASS/soft/anaconda3/bin/activate /TJPROJ1/META_ASS/soft/seroba unset PERL5LIB seroba runSerotyping /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_1.fq.gz /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_2.fq.gz F4326.out
1.分析结果展示
在文件夹 'prefix' 中,您将找到一个 pred.tsv,其中包含您的预测血清型,以及一个名为 detailed_serogroup_info.txt 的文件,其中包含有关您在读数中找到的 SNP、基因和等位基因的信息。使用“seroba summary”后,创建一个名为 summary.tsv 的 tsv 文件,该文件由三列(样本 ID、血清型、注释)组成。与任何参考文献都不匹配的血清型被标记为“untypable”(v0.1.3)。
输出结果如下:
Predicted Serotype: 23F Serotype predicted by ariba: 23F assembly from ariba has an identity of: 99.77 with this serotype
Serotype Genetic Variant 23F allele wchA
在详细信息中,您可以看到最终预测的血清型,以及根据 ARIBA 在该特定血清组中具有最接近参考的血清型。此外,您还可以查看序列组合件和参考序列之间的序列标识。
2.分析过程中可能出现的问题
Case 1:
SeroBA predicts 'untypable'. An 'untypable' prediction can either be a real 'untypable' strain or can be caused by different problems. Possible problems are: bad quality of your input data, submission of a wrong species or to low coverage of your sequenced reads. Please check your data again and run a quality control.
Case 2:
Low alignment identity in the 'detailed_serogroup_info' file. This can be a hint for a mosaic serotpye. Possible solution: perform a blast search on the whole genome assembly
Case 3:
The third column in the summary.tsv indicates “contamination”. This means that at least one heterozygous SNP was detected in the read data with at least 10% of the mapped reads at the specific position supporting the SNP. Possible solution: please check the quality of your data and have a look for contamination within your reads