用户工具

站点工具


arriba

使用Arriba鉴定融合基因fusions

参考网址:

https://arriba.readthedocs.io/en/latest/

https://github.com/suhrig/arriba

Arriba是一个命令行软件,用于从RNA-Seq数据中检测基因融合。它专为临床研究而开发,兼具短运行时间和高灵敏度优势。Arriba的运行基于STAR的比对结果,它将STAR的标准输出(Chimeric.out.sam或Aligned.out.bam)作为输入。

1、建立STAR索引

注意STAR版本要>=2.7.10a,否则后续分析会报错。

export PATH="/TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin:$PATH"

source /TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin/activate Arriba

STAR --runMode genomeGenerate \
  --genomeFastaFiles /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_genome.fa 
  --genomeDir /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ \
  --sjdbGTFfile /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_annot.gtf \
  --runThreadN 18 --sjdbOverhang 150 --genomeSAsparseD 1

2、运行STAR和Arriba程序

标准Arriba运行允许使用管道符将STAR的输出直接输入Arriba,这样将节省运行时间。示例如下:

export PATH="/TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin:$PATH"

source /TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin/activate Arriba

STAR \
    --runThreadN 8 \
    --genomeDir /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref --genomeLoad NoSharedMemory \
    --outFileNamePrefix  /TJPROJ6/RNA_SH/shouhou/pip_example/Arriba/test/reuslt1/test. \
    --readFilesIn /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/rawdata/sh_3/sh_3_1.fq.gz /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/rawdata/sh_3/sh_3_2.fq.gz --readFilesCommand zcat \
    --outStd BAM_Unsorted --outSAMtype BAM Unsorted --outSAMunmapped Within --outBAMcompression 0 \
    --outFilterMultimapNmax 50 --peOverlapNbasesMin 10 --alignSplicedMateMapLminOverLmate 0.5 --alignSJstitchMismatchNmax 5 -1 5 5 \
    --chimSegmentMin 10 --chimOutType WithinBAM HardClip --chimJunctionOverhangMin 10 --chimScoreDropMax 30 \
    --chimScoreJunctionNonGTAG 0 --chimScoreSeparation 1 --chimSegmentReadGapMax 3 --chimMultimapNmax 50 |
arriba \
    -x /dev/stdin \
    -o /TJPROJ6/RNA_SH/shouhou/pip_example/Arriba/test/reuslt1/sh_3_fusions.tsv -O /TJPROJ6/RNA_SH/shouhou/pip_example/Arriba/test/reuslt1/sh_3_fusions.discarded.tsv \
    -a /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_genome.fa -g /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_annot.gtf \
    -f blacklist

3、-blacklist参数

在官方帮助文档中,作者说明了-blacklist选项的意义,不使用-blacklist参数将大大提高软件鉴定的假阳性率。经测试,当不使用-blacklist参数时,鉴定出的fusions数量大大增加,且基本集中于medium或low confidence。

Arriba软件提供了GRCh37、GRCh38、GRCm38和GRCm39参考基因组配套的blacklist文件,文件本质上是作者通过大量数据训练出的与癌症等疾病无关的fusions。因此,当使用Arriba时,建议参考基因组为GRCh37、GRCh38、GRCm38和GRCm39,否则需要使用-f选项禁用blacklist(-f blacklist)。

4、Arriba的输出结果

Arriba的主要输出结果即为各样本鉴定出的融合基因列表:sample_fusions.tsv,其中给出了融合的两个基因的name、位置、断点等信息。

#gene1	gene2	strand1(gene/fusion)	strand2(gene/fusion)	breakpoint1	breakpoint2	site1	site2	type	split_reads1	split_reads2	discordant_mates	coverage1	coverage2	confidence	reading_frame	tags	retained_protein_domains	closest_genomic_breakpoint1	closest_genomic_breakpoint2	gene_id1	gene_id2	transcript_id1	transcript_id2	direction1	direction2	filters
NCOR1P2(112691),UBBP4(19766)	UBBP4	./+	+/+	chr17:22183229	chr17:22204087	intergenic	5'UTR/splice-site	deletion/read-through	109	77	17	270	283	high	.	.	.	.	.	.	ENSG00000263563.4	.	ENST00000584755.1	downstream	upstream	duplicates(31),mismappers(20),mismatches(3),multimappers(2)
BCR	ABL1	+/+	+/+	chr22:23290413	chr9:130854064	CDS/splice-site	CDS/splice-site	translocation	75	66	10	187	216	high	in-frame	.	.	.	.	ENSG00000186716.18	ENSG00000097007.16	ENST00000305877.11	ENST00000372348.5	downstream	upstream	duplicates(9),mismatches(1)
BCR	ABL1	+/+	+/+	chr22:23290413	chr9:130854067	CDS/splice-site	CDS/splice-site	translocation	1	0	10	187	216	high	in-frame	.	.	.	.	ENSG00000186716.18	ENSG00000097007.16	ENST00000305877.11	ENST00000372348.5	downstream	upstream	mismatches(1)
LA16c-352F7.1(65532),RP11-118F19.1(23646)	GSE1	./+	+/+	chr16:85556363	chr16:85633914	intergenic	CDS/splice-site	deletion/read-through	31	38	12	194	250	high	.	.	.	.	.	.	ENSG00000131149.16	.	ENST00000253458.10	downstream	upstream	duplicates(5),low_entropy(1),mismatches(3)
BAG6	SLC44A4	-/-	-/-	chr6:31651656	chr6:31865784	CDS/splice-site	CDS/splice-site	duplication	38	35	6	972	79	high	out-of-frame	.	.	.	.	ENSG00000204463.11	ENSG00000204385.9	ENST00000211379.8	ENST00000375562.7	upstream	downstream	duplicates(7)

除此之外,Arriba还有配套的绘图脚本draw_fusions.R用于对融合基因进行可视化。使用示例如下:

export PATH="/TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin:$PATH"

source /TJPROJ6/RNA_SH/personal_dir/xuyi/miniconda3/bin/activate Arriba

draw_fusions.R \
    --fusions=fusions.tsv \
    --output=fusions.pdf \
    --annotation=genome.gtf \
    --cytobands=database/cytobands.tsv \
    --proteinDomains=database/protein_domains.gff3

其中,–cytobands和–proteinDomains是软件自带的文件,同样只有GRCh37、GRCh38、GRCm38和GRCm39才有相应的配套文件。绘制出的图如下图所示:

5、脚本串写

为方便使用,串写了Arriba的使用脚本,脚本路径:/TJPROJ6/RNA_SH/personal_dir/xuyi/scripts/Arriba/use_Arriba.py

python /TJPROJ6/RNA_SH/personal_dir/xuyi/scripts/Arriba/use_Arriba.py \
  -f /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_genome.fa \
  -g /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/ref/ref_annot.gtf \
  -i /TJPROJ6/RNA_SH/shouhou/pip_example/STAR-seqr/X101SC24074806-Z01-J087/rawdata \
  -sn nc_1,nc_2,nc_3,sh_1,sh_2,sh_3 \
  -o /TJPROJ6/RNA_SH/shouhou/pip_example/Arriba/test/reuslt2 \
  -t GRCh38

-t选项指定参考基因组版本,如果不在GRCh37、GRCh38、GRCm38和GRCm39内,将禁用-blacklist参数。

参考文献:

Uhrig, Sebastian et al. “Accurate and efficient detection of gene fusions from RNA sequencing data.” Genome research vol. 31,3 (2021): 448-460. doi:10.1101/gr.257246.119 https://genome.cshlp.org/content/31/3/448.long

arriba.txt · 最后更改: 2024/10/12 06:46 由 xuyi