用户工具

站点工具


vsrnafinder

识别病毒sRNAvsRNAfinder

湖南大学生物学院彭友松团队在生物信息学国际权威期刊《Briefings in Bioinformatics》(IF2021=14)在线发表题为“vsRNAfinder: a novel method for identifying high-confidence viral small RNAs from small RNA-Seq data”的研究论文,该研究开发了一种基于sRNA测序数据识别病毒sRNA的生物信息学工具vsRNAfinder,为深入挖掘病毒sRNA提供了帮助。

vsRNAfinder由四个模块组成,包括预处理模块(Preprocessing)、识别模块(Identification)、过滤和注释模块(Filtering and annotation)以及定量模块(Quantification),可在Github上获取(https://github.com/ZenaCai/vsRNAfinder)。该工具通过对病毒参考基因组上的覆盖度进行平滑化处理来降低峰值检测的噪声,进而增强峰值信号,便于峰值检测;同时为了得到高可信度的病毒sRNA,引入了泊松分布模型计算候选sRNA的统计学显著性,进一步提高所识别到的病毒sRNA的可信度。

https://github.com/ZenaCai/vsRNAfinder

http://bio.hnu.edu.cn/info/1286/7064.html

https://academic.oup.com/bib/article-lookup/doi/10.1093/bib/bbac496

第一步:Preprocessing

export PATH=/TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/:$PATH

source /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/bin/activate vsRNAfinder

python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/Preprocessing.py \
	--cleanfq /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test/test.fa \
	--genome /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test/genome.fa \
	--outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test \
	--prefix nc \

第二步:Identification

python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindSmallRNA.py --data /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --genomecovfile nc.sort.positive.bga.txt --chromosome AJ507799.2 --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --strand positive --readfile nc.sort.bed --mapInfor nc.mapInfo.txt --threads 10

python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindSmallRNA.py --data /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --genomecovfile nc.sort.negative.bga.txt --chromosome AJ507799.2 --outdir /TJPROJ6/RNA_SH/personal_dir/fengjie/Personal_analysis/vsRNAfinder/test --strand negative --readfile nc.sort.bed --mapInfor nc.mapInfo.txt --threads 10

第三步:Filtering and annotation

python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/FindMiRNA.py --genome genome.fa --speices virus --data AJ507799.2

第四步:Quantification

python /TJPROJ6/RNA_SH/personal_dir/fengjie/SOFTWARE/CONDA/conda/envs/vsRNAfinder/bin/Quantification.py --threads 10 --data AJ507799.2 --bam nc.sort.bam --mapInfor nc.mapInfo.txt

结果如下: miRNAs和其他sRNAs的鉴定结果:

$head Result.txt
Site	Chr	Start	End	Strand	Length	Start_count	End_count	Start_rpm	End_rpm	Pvalue	Type	Sequence
6629-6654-positive-AJ507799.2	AJ507799.2	6629	6654	+	26	458	127	46.291364315665064	12.836251677051226	1.8781706134921779e-88	miRNA	AGGACCUACGCUGCCCUAGAGGUUUU
6654-6679-positive-AJ507799.2	AJ507799.2	6654	6679	+	26	1869	1878	188.9051526331397	189.8148082637969	5.64269801271391e-122	miRNA	UGCUAGGGAGGAGACGUGUGUGGCUG
6714-6732-positive-AJ507799.2	AJ507799.2	6714	6732	+	19	255	191	25.773576201953247	19.30491393950224	3.3573568838683606e-08	miRNA	GAGGACGGUGUCUGUGGUU
7068-7089-positive-AJ507799.2	AJ507799.2	7068	7089	+	22	4616	3943	466.55226567927934	398.5302390756929	0.0	miRNA	UUGCAAGUCAGGAUUCUCUAAU
7099-7122-positive-AJ507799.2	AJ507799.2	7099	7122	+	24	195	175	19.70920533090543	17.68774837388949	0.0037070871655429955	miRNA	AGAAGGGUAUUCGGCUUGUCCGCU
42888-42909-positive-AJ507799.2	AJ507799.2	42888	42909	+	22	115472	81134	11671.083887027238	8200.44443752657	0.0	miRNA	UAUCUUUUGCGGCAGAAAUUGA
42968-42991-positive-AJ507799.2	AJ507799.2	42968	42991	+	24	13227	4505	1336.890558522493	455.33317956784083	0.0	miRNA	UAACGGGAAGUGUGUAAGCACACA
43008-43028-positive-AJ507799.2	AJ507799.2	43008	43028	+	21	1054	1316	106.53078163474012	133.01186777164892	1.4219795266618228e-129	miRNA	UGCUUCACGCUCUUCGUUAAA
139087-139109-positive-AJ507799.2	AJ507799.2	139087	139109	+	23	4601	3622	465.0361729615174	366.085854915587	0.0	miRNA	ACCUAGUGUUAGUGUUGUGCUGU

sRNA的定量表格

$head sRNA.counts.txt
sRNA	Chr	Start	End	Strand	Length	Count	RPM	Pvalue
40365-40391-negative-AJ507799.2	AJ507799.2	40365	40391	-	27	225	22.74139076642934	3.891085995066767e-06
6629-6654-positive-AJ507799.2	AJ507799.2	6629	6654	+	26	481	48.61603981623339	1.8781706134921779e-88
6654-6679-positive-AJ507799.2	AJ507799.2	6654	6679	+	26	3441	347.7916694545927	5.64269801271391e-122
6714-6732-positive-AJ507799.2	AJ507799.2	6714	6732	+	19	341	34.4658411171218	3.3573568838683606e-08
6772-6794-positive-AJ507799.2	AJ507799.2	6772	6794	+	23	1250	126.34105981349634	1.5359341200538574e-58
7032-7052-positive-AJ507799.2	AJ507799.2	7032	7052	+	21	239	24.1564106363405	6.385248254578215e-14
7068-7089-positive-AJ507799.2	AJ507799.2	7068	7089	+	22	6848	692.1468620822584	0.0
7099-7122-positive-AJ507799.2	AJ507799.2	7099	7122	+	24	358	36.18407953058535	0.0037070871655429955
41474-41495-positive-AJ507799.2	AJ507799.2	41474	41495	+	22	136437	13790.076142219199	0.0

readme:

outdir/chromosome/Result.txt                 miRNAs和其他sRNAs的结果

	Site/sRNA           sRNA位点,格式为Start-End-Strand-Chr
	Chr                 染色体
	Start               起始位置
	End                 终止位置
	Strand              链的方向
	Length              长度
	Start_count         从sRNA起始位置开始的读取次数
	End_count           在sRNA末端位置结束的读取次数
	Start_rpm           sRNA起始位置的丰度使用RPM (Reads Per Million)归一化。
	End_rpm             sRNA末端位置的丰度使用RPM (Reads Per Million)归一化。
	Pvalue              基于泊松分布的sRNA的显著性意义
	Type                sRNA类型(miRNA或sRNA)
	Sequence            sRNA的序列


outdir/chromosome/sRNA.counts.txt            sRNA的定量表格

	Site/sRNA           sRNA位点,格式为Start-End-Strand-Chr
	Chr	                染色体
	Start               起始位置
	End                 终止位置
	Strand              链的方向
	Length              长度
	Count               sRNA的readcount
	RPM                 sRNA的RPM(Reads Per Million)归一化
	Pvalue              基于泊松分布的sRNA的显著性意义

参考文献:

Zena Cai, Ping Fu, Ye Qiu, Aiping Wu, Gaihua Zhang, Yirong Wang, Taijiao Jiang, Xing-Yi Ge, Haizhen Zhu, Yousong Peng, vsRNAfinder: a novel method for identifying high-confidence viral small RNAs from small RNA-Seq data, Briefings in Bioinformatics, bbac496, https://doi.org/10.1093/bib/bbac496

vsrnafinder.txt · 最后更改: 2024/09/20 01:40 由 fengjie