目录

一、软件介绍

SeroBA 是一种基于 k-mer 的 Pipeline,用于从 Illumina NGS 的reads中鉴定给定参考的血清型,以识别肺炎链球菌的包膜类型。

整体分析流程

二、软件下载

# 依赖软件
Required dependencies:
1.Python3 version >= 3.3.2
2.KMC version >= 3.0
3.MUMmer version >= 3.1
4.Ariba

# 安装diamond
Set up bioconda channel:

conda config --add channels bioconda

Install SeroBA:

conda install -c bioconda seroba

三、软件使用

usage: seroba createDBs  <database dir> <kmer size>

Creates a Database for kmc and ariba

positional arguments:
    database dir     output directory for kmc and ariba Database
    kmer size   kmer_size you want to use for kmc , recommended = 71

    usage: seroba runSerotyping [options]  <databases directory> <read1> <read2> <prefix>

    Example : seroba createDBs my_database/ 71

Identify serotype of your input data

    positional arguments:
      database dir         path to database directory
      read1              forward read file
      read2              reverse read file
      prefix             unique prefix

    optional arguments:
      -h, --help         show this help message and exit

    Other options:
      --noclean NOCLEAN  Do not clean up intermediate files (assemblies, ariba
                         report)
      --coverage COVERAGE  threshold for k-mer coverage of the reference sequence (default = 20)                         



Summaries the output in one tsv file

usage: seroba summary  <output folder>

positional arguments:
  output folder   directory where the output directories from seroba runSerotyping are stored

四、流程执行

分析脚本:
cd /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326
source /TJPROJ1/META_ASS/soft/anaconda3/bin/activate /TJPROJ1/META_ASS/soft/seroba
unset PERL5LIB
seroba runSerotyping /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_1.fq.gz /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_2.fq.gz F4326.out

五、分析结果

1.分析结果展示

在文件夹 'prefix' 中,您将找到一个 pred.tsv,其中包含您的预测血清型,以及一个名为 detailed_serogroup_info.txt 的文件,其中包含有关您在读数中找到的 SNP、基因和等位基因的信息。使用“seroba summary”后,创建一个名为 summary.tsv 的 tsv 文件,该文件由三列(样本 ID、血清型、注释)组成。与任何参考文献都不匹配的血清型被标记为“untypable”(v0.1.3)。

输出结果如下:

Predicted Serotype: 23F Serotype predicted by ariba: 23F assembly from ariba has an identity of: 99.77 with this serotype

Serotype Genetic Variant 23F allele wchA

在详细信息中,您可以看到最终预测的血清型,以及根据 ARIBA 在该特定血清组中具有最接近参考的血清型。此外,您还可以查看序列组合件和参考序列之间的序列标识。

2.分析过程中可能出现的问题

Case 1:

SeroBA predicts 'untypable'. An 'untypable' prediction can either be a real 'untypable' strain or can be caused by different problems. Possible problems are: bad quality of your input data, submission of a wrong species or to low coverage of your sequenced reads. Please check your data again and run a quality control.

Case 2:

Low alignment identity in the 'detailed_serogroup_info' file. This can be a hint for a mosaic serotpye. Possible solution: perform a blast search on the whole genome assembly

Case 3:

The third column in the summary.tsv indicates “contamination”. This means that at least one heterozygous SNP was detected in the read data with at least 10% of the mapped reads at the specific position supporting the SNP. Possible solution: please check the quality of your data and have a look for contamination within your reads