用户工具

站点工具


疾病显著性关联位点分析

简介

基于位点关联分析

功能

基于150对以上case/control 进行位点关联分析,根据筛选pvalue等统计学数据,分析对该疾病群体有效用的位点;

脚本使用

1. 配置文件:

合并后的snp注释文件


示例脚本step1:

建立split文件夹,刷work.sh,脚本内容如下:

infile=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/WGS.snp.merged.annovar.hg19_multianno.xls.gz
for i in {1..22}
	do 
	mkdir -p $i
	less $infile |awk -v chrid=$i '$1=="Priority"||$2==chrid' |gzip - > $i/snp.merged.annovar.hg19_multianno.chr$i.xls.gz
	echo -e "python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py \
	--infile ./snp.merged.annovar.hg19_multianno.chr$i.xls.gz \
	--calling multiple \
	--type allele \
	--db NOVO \
	--dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr$i/V10_WGS_all_v2_chr$i.snp.stat.xls \
	--out case49.novo.chr$i\nqsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr${i}_siteAS_Allele_NOVO.sh" > $i/work.chr$i.sh 
	done

示例脚本step2:

进入每条染色体的文件夹中,运行work.chr1.sh,脚本内容如下:
python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py 	--infile ./snp.merged.annovar.hg19_multianno.chr1.xls.gz 	--calling multiple 	--type allele 	--db NOVO 	--dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr1/V10_WGS_all_v2_chr1.snp.stat.xls 	--out case49.novo.chr1
qsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr1_siteAS_Allele_NOVO.sh


示例脚本step3:

合并每条染色体结果:
indir=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split
for i in {1..22}
	do
	cat $indir/$i/case49.novo.chr${i}_Allele.final.xls >> tmp.xls
	cat $indir/$i/case49.novo.chr${i}_Allele.final.4plot.xls  >> tmp.4plot.xls
	cat $indir/$i/case49.novo.chr${i}_count_combine.NOVO.xls >> tmp.count.NOVO.xls
	cat $indir/$i/case49.novo.chr${i}_count.xls >> tmp.count.xls
	done

cat tmp.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.xls
cat tmp.4plot.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.4plot.xls
cat tmp.count.NOVO.xls | sed -E "2,\${/CHR/d}" > case49.novo_count_combine.NOVO.xls
cat tmp.count.xls | sed -E "2,\${/CHR/d}" > case49.novo_count.xls
Rscript  plot.r --infile ./case49.novo_Allele.final.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele


示例脚本step4:
过滤
1)callrate  cutoff 默认case为0.9,control为0.6
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -o ./case49.novo_Allele.final.filterCallRate.4plot.xls

Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate

2)callrate及MAF,MAF cutoff > 0.05
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf -o ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls

Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF

3)callrate、MAF及repeat区域
grep -v "^#" /ifs/TJPROJ3/DISEASE/Database/ANNOVAR/humandb/hg19_rmsk.gff|cut -f 1,4,5 |sed 's/chr//g' > hg19.repeat.bed
python filter.py -i ./case144.novo_Allele.final.4plot.xls -cr case144.novo_count_combine.NOVO.xls -sn 193,2827 -fm maf,repeat -repeat ./hg19.repeat.bed -o ./case144.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf,repeat -repeat ../snp.merged.annovar.hg19_multianno.xls.gz -o ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls

Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat

4)callrate、MAF、repeat及 OR  OR cutoff
awk -F '\t' '$1=="Alt_case"||$6>=1||$6=="Inf"' ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls > ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls

Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat.OR



运行脚本:

/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work.sh
/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work_step1_step2.sh
sh 刷脚本;

交付结果

过滤后的表格; 曼哈顿图; QQ图;

示例路径: /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/filter_HW

疾病显著性关联位点分析.txt · 最后更改: 2023/03/22 06:04 由 hanyue