=====遗传结构和遗传多样性分析内容整理===== 参考路径 /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/SNP/zhangxin ====1 核苷酸多样性 ==== 使用软件Vcftools :/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/population 参考路径: /NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf final.vcf --keep ZZJ --recode --recode-INFO-all --out ZZJ #final.vcf 是包含所有样品的vcf 文件,这一步旨在按照种群对vcf 进行拆分 /NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf ZZJ.recode.vcf --window-pi 1000 --out ZZJ_nucleotide_diversity #计算种群核苷酸多样性,1000bp 为一个滑窗 ====2 .Arlequin 软件计算遗传结构 ==== 1) 将vcf 文件转换为.arp 文件,即Arlequin 软件的输入文件。 参考路径:/TJPROJ1/RNA/shouhou/personal_dir/wangbaojian/xuexi/arlequin-3.5.2.2/setup/baidoushan_shenfen 篦子项目路径: /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/Arlequin /TJPROJ6/RNA_SH/shouhou/gexinghua/X101SC19110580/Arlequin perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf out_dir perl GenoToArp_2.pl -i baidoushan.geno -idgroup group -size 13545 -misrate 0.02 -maf 0.05 -o shenfen.arp 2)准备.ars 文件(保存Arlequin 分析的参数文件,可以由win版本的Arlequin 软件输出,也可以直接用我已经设置好的.ars 文件) /TJPROJ6/RNA_SH/software/arlequin/arlecore3522_64bit shenfen.arp shenfen.ars 这部分结果会生成shenfen.res 文件夹,找到shenfen.xml 文件,本次分析的相关结果都可以在这里面找到。 期望杂合度(He) 和观测杂合度(Ho) 统计结果,表格最后会有平均值。 ------------------------------------------------- Num. gene Num. Obs. Exp. Locus# copies alleles Het. Het ------------------------------------------------- 1 46 2 0.34783 0.46377 2 64 2 0.34375 0.28919 3 64 2 0.34375 0.28919 两两种群fst 值统计: Distance method: Pairwise differences 1 2 3 4 1 0.00000 2 0.17423 0.00000 3 0.25449 0.26326 0.00000 4 0.29934 0.28930 0.33585 0.00000 AMOVA分析结果 ---------------------------------------------------------------------- Source of Sum of Variance Percentage variation d.f. squares components of variation ---------------------------------------------------------------------- Among populations 3 4836.469 31.26725 Va 23.66 Within populations 212 21392.045 100.90587 Vb 76.34 ---------------------------------------------------------------------- Total 215 26228.514 132.17312 ---------------------------------------------------------------------- Fixation Index FST : 0.23656 ---------------------------------------------------------------------- ====3. structure 分析 ==== 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE 篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/structure 最新structure结果位置: /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/structure 使用/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/snp-dp10-miss0.5-maf0.05_DPmin10.vcf.all 脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/Structure ====4. PCA 分析==== 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/PCA 篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/pca 脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/pca/PCA.sh ====5.Tree 分析==== 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree 脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/02.TREE perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan zcat /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno.gz > /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/getInfo_treebest.pl /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE/samplelist /NJPROJ1/PAG/Crop/share/software/PopEvolution/software/treebest-1.9.2 nj -b 1000 rmRef.fa >/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/tree4plot.pl /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out.result ====6 .选择性消除分析==== 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/pi_fst 脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/script ====7. IBD_Mantel 检验==== {{ :售后:ibdmanual.pdf |IBD 软件分析参考方法}} 软件及相关输入输出文件备份(windows):/NJPROJ3/RNA_SH/software/IBD_1.53_for_Windows.zip 输入文件格式: GENETIC_DISTANCE 两两种群的Fst/(1-Fst) ;两两种群的地理遗传距离 GEOGRAPHIC_DISTANCE。 GENETIC_DISTANCE 1 2 0.5763 1 3 0.6375 1 4 0.4285 1 5 0.413 1 6 0.4442 1 7 0.4322 1 8 0.5276 1 9 0.0058 1 10 0.7234 2 3 0.7275 2 4 0.1845 2 5 0.1587 2 6 0.1925 2 7 0.2168 2 8 0.5977 2 9 0.5324 2 10 0.6797 3 4 0.548 3 5 0.5139 3 6 0.5454 3 7 0.5354 3 8 0.2468 3 9 0.6207 3 10 0.7234 4 5 0.0921 4 6 0.0036 4 7 0.1237 4 8 0.4423 4 9 0.398 4 10 0.5348 5 6 0.097 5 7 0.1278 5 8 0.3915 5 9 0.3811 5 10 0.4718 6 7 0.1157 6 8 0.4286 6 9 0.4099 6 10 0.5107 7 8 0.4508 7 9 0.3975 7 10 0.4736 8 9 0.5072 8 10 0.5857 9 10 0.659 GEOGRAPHIC_DISTANCE 1 2 561.82 1 3 461.04 1 4 544.51 1 5 541.46 1 6 538.84 1 7 450.42 1 8 447.03 1 9 5.72 1 10 427.19 2 3 1022.35 2 4 51.91 2 5 25.04 2 6 54.76 2 7 134.19 2 8 1008.23 2 9 567.54 2 10 969.77 3 4 1005.47 3 5 1002.33 3 6 999.79 3 7 910.63 3 8 14.33 3 9 455.33 3 10 205.92 4 5 33.85 4 6 5.76 4 7 98.72 4 8 991.5 4 9 550.22 4 10 943.39 5 6 34.63 5 7 109.26 5 8 988.25 5 9 547.18 5 10 947.13 6 7 92.99 6 8 985.82 6 9 544.54 6 10 937.63 7 8 896.75 7 9 456.09 7 10 844.9 8 9 441.32 8 10 203.79 9 10 421.95 =====表达多样性分析内容===== ====1 .Ed 和Ep 计算结果==== 根据文献中提供的公式,Ep 即种群fpkm 的平均值,Ed 种群| 每个样品的fpkm - Ep| 的累加值/(n-1).Ep . 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/population ====2. Anova 方差分析==== 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/Anova ====3. KS 检验==== 方法见:https://www.cnblogs.com/arkenstone/p/5496761.html 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/KS ====4. 相关性分析=== 主要是计算Ed ,Ep等表达相关多样性和遗传多样性之间的相关性,涉及到的方法也是皮尔森相关和Mantel 检验 参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/EpvsGeneticvsGeographic