湖南大学生物学院彭友松团队在生物信息学国际权威期刊《Briefings in Bioinformatics》(IF2021=14)在线发表题为“vsRNAfinder: a novel method for identifying high-confidence viral small RNAs from small RNA-Seq data”的研究论文,该研究开发了一种基于sRNA测序数据识别病毒sRNA的生物信息学工具vsRNAfinder,为深入挖掘病毒sRNA提供了帮助。
vsRNAfinder由四个模块组成,包括预处理模块(Preprocessing)、识别模块(Identification)、过滤和注释模块(Filtering and annotation)以及定量模块(Quantification),可在Github上获取(https://github.com/ZenaCai/vsRNAfinder)。该工具通过对病毒参考基因组上的覆盖度进行平滑化处理来降低峰值检测的噪声,进而增强峰值信号,便于峰值检测;同时为了得到高可信度的病毒sRNA,引入了泊松分布模型计算候选sRNA的统计学显著性,进一步提高所识别到的病毒sRNA的可信度。
https://github.com/ZenaCai/vsRNAfinder
http://bio.hnu.edu.cn/info/1286/7064.html
https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbac496
第一步:Preprocessing
export PATH=/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/:$PATH source /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/bin/activate vsRNAfinder python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/Preprocessing.py \ --cleanfq /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test/test.fa \ --genome /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test/genome.fa \ --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test \ --prefix nc \
第二步:Identification
python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindSmallRNA.py --data /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --genomecovfile nc.sort.positive.bga.txt --chromosome AJ507799.2 --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --strand positive --readfile nc.sort.bed --mapInfor nc.mapInfo.txt --threads 10 python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindSmallRNA.py --data /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --genomecovfile nc.sort.negative.bga.txt --chromosome AJ507799.2 --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --strand negative --readfile nc.sort.bed --mapInfor nc.mapInfo.txt --threads 10
第三步:Filtering and annotation
python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindMiRNA.py --genome genome.fa --speices virus --data AJ507799.2
第四步:Quantification
python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/Quantification.py --threads 10 --data AJ507799.2 --bam nc.sort.bam --mapInfor nc.mapInfo.txt
结果如下: miRNAs和其他sRNAs的鉴定结果:
$head Result.txt Site Chr Start End Strand Length Start_count End_count Start_rpm End_rpm Pvalue Type Sequence 6629-6654-positive-AJ507799.2 AJ507799.2 6629 6654 + 26 458 127 46.291364315665064 12.836251677051226 1.8781706134921779e-88 miRNA AGGACCUACGCUGCCCUAGAGGUUUU 6654-6679-positive-AJ507799.2 AJ507799.2 6654 6679 + 26 1869 1878 188.9051526331397 189.8148082637969 5.64269801271391e-122 miRNA UGCUAGGGAGGAGACGUGUGUGGCUG 6714-6732-positive-AJ507799.2 AJ507799.2 6714 6732 + 19 255 191 25.773576201953247 19.30491393950224 3.3573568838683606e-08 miRNA GAGGACGGUGUCUGUGGUU 7068-7089-positive-AJ507799.2 AJ507799.2 7068 7089 + 22 4616 3943 466.55226567927934 398.5302390756929 0.0 miRNA UUGCAAGUCAGGAUUCUCUAAU 7099-7122-positive-AJ507799.2 AJ507799.2 7099 7122 + 24 195 175 19.70920533090543 17.68774837388949 0.0037070871655429955 miRNA AGAAGGGUAUUCGGCUUGUCCGCU 42888-42909-positive-AJ507799.2 AJ507799.2 42888 42909 + 22 115472 81134 11671.083887027238 8200.44443752657 0.0 miRNA UAUCUUUUGCGGCAGAAAUUGA 42968-42991-positive-AJ507799.2 AJ507799.2 42968 42991 + 24 13227 4505 1336.890558522493 455.33317956784083 0.0 miRNA UAACGGGAAGUGUGUAAGCACACA 43008-43028-positive-AJ507799.2 AJ507799.2 43008 43028 + 21 1054 1316 106.53078163474012 133.01186777164892 1.4219795266618228e-129 miRNA UGCUUCACGCUCUUCGUUAAA 139087-139109-positive-AJ507799.2 AJ507799.2 139087 139109 + 23 4601 3622 465.0361729615174 366.085854915587 0.0 miRNA ACCUAGUGUUAGUGUUGUGCUGU
sRNA的定量表格
$head sRNA.counts.txt sRNA Chr Start End Strand Length Count RPM Pvalue 40365-40391-negative-AJ507799.2 AJ507799.2 40365 40391 - 27 225 22.74139076642934 3.891085995066767e-06 6629-6654-positive-AJ507799.2 AJ507799.2 6629 6654 + 26 481 48.61603981623339 1.8781706134921779e-88 6654-6679-positive-AJ507799.2 AJ507799.2 6654 6679 + 26 3441 347.7916694545927 5.64269801271391e-122 6714-6732-positive-AJ507799.2 AJ507799.2 6714 6732 + 19 341 34.4658411171218 3.3573568838683606e-08 6772-6794-positive-AJ507799.2 AJ507799.2 6772 6794 + 23 1250 126.34105981349634 1.5359341200538574e-58 7032-7052-positive-AJ507799.2 AJ507799.2 7032 7052 + 21 239 24.1564106363405 6.385248254578215e-14 7068-7089-positive-AJ507799.2 AJ507799.2 7068 7089 + 22 6848 692.1468620822584 0.0 7099-7122-positive-AJ507799.2 AJ507799.2 7099 7122 + 24 358 36.18407953058535 0.0037070871655429955 41474-41495-positive-AJ507799.2 AJ507799.2 41474 41495 + 22 136437 13790.076142219199 0.0
readme:
outdir/chromosome/Result.txt miRNAs和其他sRNAs的结果 Site/sRNA sRNA位点,格式为Start-End-Strand-Chr Chr 染色体 Start 起始位置 End 终止位置 Strand 链的方向 Length 长度 Start_count 从sRNA起始位置开始的读取次数 End_count 在sRNA末端位置结束的读取次数 Start_rpm sRNA起始位置的丰度使用RPM (Reads Per Million)归一化。 End_rpm sRNA末端位置的丰度使用RPM (Reads Per Million)归一化。 Pvalue 基于泊松分布的sRNA的显著性意义 Type sRNA类型(miRNA或sRNA) Sequence sRNA的序列 outdir/chromosome/sRNA.counts.txt sRNA的定量表格 Site/sRNA sRNA位点,格式为Start-End-Strand-Chr Chr 染色体 Start 起始位置 End 终止位置 Strand 链的方向 Length 长度 Count sRNA的readcount RPM sRNA的RPM(Reads Per Million)归一化 Pvalue 基于泊松分布的sRNA的显著性意义
参考文献:
Zena Cai, Ping Fu, Ye Qiu, Aiping Wu, Gaihua Zhang, Yirong Wang, Taijiao Jiang, Xing-Yi Ge, Haizhen Zhu, Yousong Peng, vsRNAfinder: a novel method for identifying high-confidence viral small RNAs from small RNA-Seq data, Briefings in Bioinformatics, bbac496, https://doi.org/10.1093/bib/bbac496