======简介======
基于位点关联分析
======功能======
基于150对以上case/control 进行位点关联分析,根据筛选pvalue等统计学数据,分析对该疾病群体有效用的位点;
======脚本使用======
**1. 配置文件:**
合并后的snp注释文件
示例脚本step1:
建立split文件夹,刷work.sh,脚本内容如下:
infile=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/WGS.snp.merged.annovar.hg19_multianno.xls.gz
for i in {1..22}
do
mkdir -p $i
less $infile |awk -v chrid=$i '$1=="Priority"||$2==chrid' |gzip - > $i/snp.merged.annovar.hg19_multianno.chr$i.xls.gz
echo -e "python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py \
--infile ./snp.merged.annovar.hg19_multianno.chr$i.xls.gz \
--calling multiple \
--type allele \
--db NOVO \
--dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr$i/V10_WGS_all_v2_chr$i.snp.stat.xls \
--out case49.novo.chr$i\nqsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr${i}_siteAS_Allele_NOVO.sh" > $i/work.chr$i.sh
done
示例脚本step2:
进入每条染色体的文件夹中,运行work.chr1.sh,脚本内容如下:
python /TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WES.X101SC20120961-Z01.144case.20220609/split/SiteAS.v4.0.py --infile ./snp.merged.annovar.hg19_multianno.chr1.xls.gz --calling multiple --type allele --db NOVO --dbfile /ifs/TJPROJ3/DISEASE/share/Disease/Association/site_AS/DB/NovoZhongHua/WGS/v1_WGS/chr1/V10_WGS_all_v2_chr1.snp.stat.xls --out case49.novo.chr1
qsub -cwd -V -l vf=2G,p=1 -q afsreseq.q case49.novo.chr1_siteAS_Allele_NOVO.sh
示例脚本step3:
合并每条染色体结果:
indir=/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split
for i in {1..22}
do
cat $indir/$i/case49.novo.chr${i}_Allele.final.xls >> tmp.xls
cat $indir/$i/case49.novo.chr${i}_Allele.final.4plot.xls >> tmp.4plot.xls
cat $indir/$i/case49.novo.chr${i}_count_combine.NOVO.xls >> tmp.count.NOVO.xls
cat $indir/$i/case49.novo.chr${i}_count.xls >> tmp.count.xls
done
cat tmp.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.xls
cat tmp.4plot.xls | sed -E "2,\${/Alt_case/d}" > case49.novo_Allele.final.4plot.xls
cat tmp.count.NOVO.xls | sed -E "2,\${/CHR/d}" > case49.novo_count_combine.NOVO.xls
cat tmp.count.xls | sed -E "2,\${/CHR/d}" > case49.novo_count.xls
Rscript plot.r --infile ./case49.novo_Allele.final.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele
示例脚本step4:
过滤
1)callrate cutoff 默认case为0.9,control为0.6
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -o ./case49.novo_Allele.final.filterCallRate.4plot.xls
Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate
2)callrate及MAF,MAF cutoff > 0.05
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf -o ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls
Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF
3)callrate、MAF及repeat区域
grep -v "^#" /ifs/TJPROJ3/DISEASE/Database/ANNOVAR/humandb/hg19_rmsk.gff|cut -f 1,4,5 |sed 's/chr//g' > hg19.repeat.bed
python filter.py -i ./case144.novo_Allele.final.4plot.xls -cr case144.novo_count_combine.NOVO.xls -sn 193,2827 -fm maf,repeat -repeat ./hg19.repeat.bed -o ./case144.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls
python filter.py -i ./case49.novo_Allele.final.4plot.xls -cr case49.novo_count_combine.NOVO.xls -sn 49,542 -fm maf,repeat -repeat ../snp.merged.annovar.hg19_multianno.xls.gz -o ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls
Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat
4)callrate、MAF、repeat及 OR OR cutoff
awk -F '\t' '$1=="Alt_case"||$6>=1||$6=="Inf"' ./case49.novo_Allele.final.filterCallRate.MAF.repeat.4plot.xls > ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls
Rscript plot.r --infile ./case49.novo_Allele.final.filterCallRate.MAF.repeat.OR.4plot.xls --pname Pvalue_Allele --out case49.novo_Allele.filterCallRate.MAF.repeat.OR
运行脚本:
/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work.sh
/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/work_step1_step2.sh
sh 刷脚本;
======交付结果======
过滤后的表格;
曼哈顿图;
QQ图;
示例路径:
/TJPROJ6/AFS_RESEQ/Proj/hanyue/06.gaoji/WGS.X101SC20120961-Z01.49case.20220609/split/filter_HW