=====遗传结构和遗传多样性分析内容整理=====
参考路径 /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/SNP/zhangxin
====1 核苷酸多样性 ====
使用软件Vcftools :/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/population
参考路径:
/NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf final.vcf --keep ZZJ --recode --recode-INFO-all --out ZZJ #final.vcf 是包含所有样品的vcf 文件,这一步旨在按照种群对vcf 进行拆分
/NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf ZZJ.recode.vcf --window-pi 1000 --out ZZJ_nucleotide_diversity #计算种群核苷酸多样性,1000bp 为一个滑窗
====2 .Arlequin 软件计算遗传结构 ====
1) 将vcf 文件转换为.arp 文件,即Arlequin 软件的输入文件。
参考路径:/TJPROJ1/RNA/shouhou/personal_dir/wangbaojian/xuexi/arlequin-3.5.2.2/setup/baidoushan_shenfen
篦子项目路径:
/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/Arlequin
/TJPROJ6/RNA_SH/shouhou/gexinghua/X101SC19110580/Arlequin
perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf out_dir
perl GenoToArp_2.pl -i baidoushan.geno -idgroup group -size 13545 -misrate 0.02 -maf 0.05 -o shenfen.arp
2)准备.ars 文件(保存Arlequin 分析的参数文件,可以由win版本的Arlequin 软件输出,也可以直接用我已经设置好的.ars 文件)
/TJPROJ6/RNA_SH/software/arlequin/arlecore3522_64bit shenfen.arp shenfen.ars
这部分结果会生成shenfen.res 文件夹,找到shenfen.xml 文件,本次分析的相关结果都可以在这里面找到。
期望杂合度(He) 和观测杂合度(Ho) 统计结果,表格最后会有平均值。
-------------------------------------------------
Num.
gene Num. Obs. Exp.
Locus# copies alleles Het. Het
-------------------------------------------------
1 46 2 0.34783 0.46377
2 64 2 0.34375 0.28919
3 64 2 0.34375 0.28919
两两种群fst 值统计:
Distance method: Pairwise differences
1 2 3 4
1 0.00000
2 0.17423 0.00000
3 0.25449 0.26326 0.00000
4 0.29934 0.28930 0.33585 0.00000
AMOVA分析结果
----------------------------------------------------------------------
Source of Sum of Variance Percentage
variation d.f. squares components of variation
----------------------------------------------------------------------
Among
populations 3 4836.469 31.26725 Va 23.66
Within
populations 212 21392.045 100.90587 Vb 76.34
----------------------------------------------------------------------
Total 215 26228.514 132.17312
----------------------------------------------------------------------
Fixation Index FST : 0.23656
----------------------------------------------------------------------
====3. structure 分析 ====
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE
篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/structure
最新structure结果位置:
/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/structure
使用/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/snp-dp10-miss0.5-maf0.05_DPmin10.vcf.all
脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/Structure
====4. PCA 分析====
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/PCA
篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/pca
脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/pca/PCA.sh
====5.Tree 分析====
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree
脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/02.TREE
perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan
zcat /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno.gz > /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno
perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/getInfo_treebest.pl /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE/samplelist
/NJPROJ1/PAG/Crop/share/software/PopEvolution/software/treebest-1.9.2 nj -b 1000 rmRef.fa >/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out
perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/tree4plot.pl /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out.result
====6 .选择性消除分析====
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/pi_fst
脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/script
====7. IBD_Mantel 检验====
{{ :售后:ibdmanual.pdf |IBD 软件分析参考方法}}
软件及相关输入输出文件备份(windows):/NJPROJ3/RNA_SH/software/IBD_1.53_for_Windows.zip
输入文件格式: GENETIC_DISTANCE 两两种群的Fst/(1-Fst) ;两两种群的地理遗传距离 GEOGRAPHIC_DISTANCE。
GENETIC_DISTANCE
1 2 0.5763
1 3 0.6375
1 4 0.4285
1 5 0.413
1 6 0.4442
1 7 0.4322
1 8 0.5276
1 9 0.0058
1 10 0.7234
2 3 0.7275
2 4 0.1845
2 5 0.1587
2 6 0.1925
2 7 0.2168
2 8 0.5977
2 9 0.5324
2 10 0.6797
3 4 0.548
3 5 0.5139
3 6 0.5454
3 7 0.5354
3 8 0.2468
3 9 0.6207
3 10 0.7234
4 5 0.0921
4 6 0.0036
4 7 0.1237
4 8 0.4423
4 9 0.398
4 10 0.5348
5 6 0.097
5 7 0.1278
5 8 0.3915
5 9 0.3811
5 10 0.4718
6 7 0.1157
6 8 0.4286
6 9 0.4099
6 10 0.5107
7 8 0.4508
7 9 0.3975
7 10 0.4736
8 9 0.5072
8 10 0.5857
9 10 0.659
GEOGRAPHIC_DISTANCE
1 2 561.82
1 3 461.04
1 4 544.51
1 5 541.46
1 6 538.84
1 7 450.42
1 8 447.03
1 9 5.72
1 10 427.19
2 3 1022.35
2 4 51.91
2 5 25.04
2 6 54.76
2 7 134.19
2 8 1008.23
2 9 567.54
2 10 969.77
3 4 1005.47
3 5 1002.33
3 6 999.79
3 7 910.63
3 8 14.33
3 9 455.33
3 10 205.92
4 5 33.85
4 6 5.76
4 7 98.72
4 8 991.5
4 9 550.22
4 10 943.39
5 6 34.63
5 7 109.26
5 8 988.25
5 9 547.18
5 10 947.13
6 7 92.99
6 8 985.82
6 9 544.54
6 10 937.63
7 8 896.75
7 9 456.09
7 10 844.9
8 9 441.32
8 10 203.79
9 10 421.95
=====表达多样性分析内容=====
====1 .Ed 和Ep 计算结果====
根据文献中提供的公式,Ep 即种群fpkm 的平均值,Ed 种群| 每个样品的fpkm - Ep| 的累加值/(n-1).Ep .
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/population
====2. Anova 方差分析====
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/Anova
====3. KS 检验====
方法见:https://www.cnblogs.com/arkenstone/p/5496761.html
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/KS
====4. 相关性分析===
主要是计算Ed ,Ep等表达相关多样性和遗传多样性之间的相关性,涉及到的方法也是皮尔森相关和Mantel 检验
参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/EpvsGeneticvsGeographic