基于位点关联分析
基于150对以上case/control 进行位点关联分析,根据筛选pvalue等统计学数据,分析对该疾病群体有效用的位点;
1. 配置文件:
合并后的snp注释文件 示例脚本step1: 建立split文件夹,刷work.sh,脚本内容如下: infile=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/WGS.snp.merged.annovar.hg19_multianno.xls.gz for i in {1..22} do mkdir -p $i less $infile |awk -v chrid=$i '$1=="Priority"||$2==chrid' |gzip - > $i/snp.merged.annovar.hg19_multianno.chr$i.xls.gz echo -e "python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py \ --infile ./snp.merged.annovar.hg19_multianno.chr$i.xls.gz \ --calling multiple \ --type allele \ --db NOVO \ --dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr$i/V10_WGS_all_v2_chr$i.snp.stat.xls \ --out case49.novo.chr$i\nqsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr${i}_siteAS_Allele_NOVO.sh" > $i/work.chr$i.sh done 示例脚本step2: 进入每条染色体的文件夹中,运行work.chr1.sh,脚本内容如下: python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py --infile ./snp.merged.annovar.hg19_multianno.chr1.xls.gz --calling multiple --type allele --db NOVO --dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr1/V10_WGS_all_v2_chr1.snp.stat.xls --out case49.novo.chr1 qsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr1_siteAS_Allele_NOVO.sh 示例脚本step3: 合并每条染色体结果: indir=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split for i in {1..22} do cat $indir/$i/case49.novo.chr${i}_Allele.final.xls >> tmp.xls cat $indir/$i/case49.novo.chr${i}_Allele.final.4plot.xls >> tmp.4plot.xls cat $indir/$i/case49.novo.chr${i}_count_combine.NOVO.xls >> tmp.count.NOVO.xls cat $indir/$i/case49.novo.chr${i}_count.xls >> tmp.count.xls done cat tmp.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.xls cat tmp.4plot.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.4plot.xls cat tmp.count.NOVO.xls | sed -E "2,\${/CHR/d}" > case49.novo_count_combine.NOVO.xls cat tmp.count.xls | sed -E "2,\${/CHR/d}" > case49.novo_count.xls Rscript plot.r --infile ./case49.novo_Allele.final.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele 示例脚本step4: 过滤 1)callrate cutoff 默认case为0.9,control为0.6 python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -o ./case49.novo_Allele.final.filterCallRate.4plot.xls Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate 2)callrate及MAF,MAF cutoff > 0.05 python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf -o ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF 3)callrate、MAF及repeat区域 grep -v "^#" /ifs/TJPROJ3/DISEASE/Database/ANNOVAR/humandb/hg19_rmsk.gff|cut -f 1,4,5 |sed 's/chr//g' > hg19.repeat.bed python filter.py -i ./case144.novo_Allele.final.4plot.xls -cr case144.novo_count_combine.NOVO.xls -sn 193,2827 -fm maf,repeat -repeat ./hg19.repeat.bed -o ./case144.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf,repeat -repeat ../snp.merged.annovar.hg19_multianno.xls.gz -o ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat 4)callrate、MAF、repeat及 OR OR cutoff awk -F '\t' '$1=="Alt_case"||$6>=1||$6=="Inf"' ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls > ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat.OR
运行脚本:
/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work.sh /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work_step1_step2.sh sh 刷脚本;
过滤后的表格; 曼哈顿图; QQ图;
示例路径: /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/filter_HW