====== 一、软件介绍 ======
SeroBA 是一种基于 k-mer 的 Pipeline，用于从 Illumina NGS 的reads中鉴定给定参考的血清型,以识别肺炎链球菌的包膜类型。

**整体分析流程**

{{:liucheng.png?400|}}


====== 二、软件下载 ======
<code>
# 依赖软件
Required dependencies：
1.Python3 version >= 3.3.2
2.KMC version >= 3.0
3.MUMmer version >= 3.1
4.Ariba

# 安装diamond
Set up bioconda channel:

conda config --add channels bioconda

Install SeroBA:

conda install -c bioconda seroba

</code>

====== 三、软件使用 ======
<code>
usage: seroba createDBs  <database dir> <kmer size>

Creates a Database for kmc and ariba

positional arguments:
    database dir     output directory for kmc and ariba Database
    kmer size   kmer_size you want to use for kmc , recommended = 71

    usage: seroba runSerotyping [options]  <databases directory> <read1> <read2> <prefix>

    Example : seroba createDBs my_database/ 71

Identify serotype of your input data

    positional arguments:
      database dir         path to database directory
      read1              forward read file
      read2              reverse read file
      prefix             unique prefix

    optional arguments:
      -h, --help         show this help message and exit

    Other options:
      --noclean NOCLEAN  Do not clean up intermediate files (assemblies, ariba
                         report)
      --coverage COVERAGE  threshold for k-mer coverage of the reference sequence (default = 20)                         


Summaries the output in one tsv file

usage: seroba summary  <output folder>

positional arguments:
  output folder   directory where the output directories from seroba runSerotyping are stored

</code>

====== 四、流程执行 ======
<code>
分析脚本：
cd /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326
source /TJPROJ1/META_ASS/soft/anaconda3/bin/activate /TJPROJ1/META_ASS/soft/seroba
unset PERL5LIB
seroba runSerotyping /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_1.fq.gz /TJPROJ7/META_ASS/16s/yaoyuanyuan/X101SC24071531-Z01-gxh-seroba/X101SC24071531-Z01-F020/seroba-20241223/data/X101SC24071531-Z01-J025_20241211102138/00.CleanData/F4326/F4326_2.fq.gz F4326.out

</code>


====== 五、分析结果 ======

1.分析结果展示

在文件夹 'prefix' 中，您将找到一个 pred.tsv，其中包含您的预测血清型，以及一个名为 detailed_serogroup_info.txt 的文件，其中包含有关您在读数中找到的 SNP、基因和等位基因的信息。使用“seroba summary”后，创建一个名为 summary.tsv 的 tsv 文件，该文件由三列（样本 ID、血清型、注释）组成。与任何参考文献都不匹配的血清型被标记为“untypable”（v0.1.3）。

输出结果如下：

Predicted Serotype:       23F
Serotype predicted by ariba:    23F
assembly from ariba has an identity of:   99.77    with this serotype

Serotype       Genetic Variant
23F            allele  wchA

在详细信息中，您可以看到最终预测的血清型，以及根据 ARIBA 在该特定血清组中具有最接近参考的血清型。此外，您还可以查看序列组合件和参考序列之间的序列标识。
    
2.分析过程中可能出现的问题

Case 1:

SeroBA predicts 'untypable'. An 'untypable' prediction can either be a real 'untypable' strain or can be caused by different problems. Possible problems are: bad quality of your input data, submission of a wrong species or to low coverage of your sequenced reads. Please check your data again and run a quality control.

Case 2:

Low alignment identity in the 'detailed_serogroup_info' file. This can be a hint for a mosaic serotpye.
Possible solution: perform a blast search on the whole genome assembly

Case 3:

The third column in the summary.tsv indicates "contamination". This means that at least one heterozygous SNP was detected in the read data with at least 10% of the mapped reads at the specific position supporting the SNP.
Possible solution: please check the quality of your data and have a look for contamination within your reads