用户工具

站点工具


遗传结构和遗传多样性分析

遗传结构和遗传多样性分析内容整理

参考路径 /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/SNP/zhangxin

1 核苷酸多样性

使用软件Vcftools :/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/population

参考路径:

/NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf final.vcf --keep ZZJ --recode --recode-INFO-all --out ZZJ   #final.vcf 是包含所有样品的vcf 文件,这一步旨在按照种群对vcf 进行拆分
/NJPROJ3/RNA_SH/software/vcftools_0.1.13/cpp/vcftools --vcf ZZJ.recode.vcf --window-pi 1000 --out ZZJ_nucleotide_diversity  #计算种群核苷酸多样性,1000bp 为一个滑窗

2 .Arlequin 软件计算遗传结构

1) 将vcf 文件转换为.arp 文件,即Arlequin 软件的输入文件。

参考路径:/TJPROJ1/RNA/shouhou/personal_dir/wangbaojian/xuexi/arlequin-3.5.2.2/setup/baidoushan_shenfen

篦子项目路径:

/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/Arlequin

/TJPROJ6/RNA_SH/shouhou/gexinghua/X101SC19110580/Arlequin

perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf out_dir
perl GenoToArp_2.pl -i baidoushan.geno -idgroup group -size 13545 -misrate 0.02 -maf 0.05 -o shenfen.arp   

2)准备.ars 文件(保存Arlequin 分析的参数文件,可以由win版本的Arlequin 软件输出,也可以直接用我已经设置好的.ars 文件)

/TJPROJ6/RNA_SH/software/arlequin/arlecore3522_64bit shenfen.arp shenfen.ars

这部分结果会生成shenfen.res 文件夹,找到shenfen.xml 文件,本次分析的相关结果都可以在这里面找到。

期望杂合度(He) 和观测杂合度(Ho) 统计结果,表格最后会有平均值。

-------------------------------------------------
           Num.
           gene     Num.        Obs.      Exp.
Locus#    copies   alleles      Het.       Het
-------------------------------------------------
     1        46         2   0.34783   0.46377
     2        64         2   0.34375   0.28919
     3        64         2   0.34375   0.28919

两两种群fst 值统计:

<PairFstMat time="08/04/19 at 15:10:49">


Distance method: Pairwise differences
                     1         2         3         4
           1   0.00000
           2   0.17423   0.00000
           3   0.25449   0.26326   0.00000
           4   0.29934   0.28930   0.33585   0.00000

AMOVA分析结果

----------------------------------------------------------------------
 Source of                  Sum of      Variance         Percentage
 variation      d.f.        squares     components       of variation
----------------------------------------------------------------------
 Among
 populations      3       4836.469       31.26725 Va            23.66

 Within
 populations    212      21392.045      100.90587 Vb            76.34
----------------------------------------------------------------------
 Total          215      26228.514      132.17312
----------------------------------------------------------------------
 Fixation Index      FST :      0.23656
----------------------------------------------------------------------

3. structure 分析

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE

篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/structure

       最新structure结果位置:
           /NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/structure
       使用/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/legacy/gatk_dp10_structure/snp-dp10-miss0.5-maf0.05_DPmin10.vcf.all

脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/Structure

4. PCA 分析

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/PCA

篦子项目路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/X101SC19110580/gexinghua/pca

脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/pca/PCA.sh

5.Tree 分析

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree

脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/script/02.TREE

perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/vcf2popsnp.v3.pl final.vcf  /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan
zcat /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno.gz > /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno
perl /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/getInfo_treebest.pl /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.geno /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/STRUCTURE/samplelist
/NJPROJ1/PAG/Crop/share/software/PopEvolution/software/treebest-1.9.2 nj -b 1000 rmRef.fa >/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out
perl  /NJPROJ1/PAG/Crop/share/pipeline/GWAS/pipeline/Population/bin/02.TREE/tree4plot.pl  /NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out  NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/Tree/baidoushan.nj_tree.out.result

6 .选择性消除分析

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/pi_fst

脚本路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_genetic_diversity/select/script

7. IBD_Mantel 检验

IBD 软件分析参考方法

软件及相关输入输出文件备份(windows):/NJPROJ3/RNA_SH/software/IBD_1.53_for_Windows.zip

输入文件格式: GENETIC_DISTANCE 两两种群的Fst/(1-Fst) ;两两种群的地理遗传距离 GEOGRAPHIC_DISTANCE。

GENETIC_DISTANCE
1	2	0.5763
1	3	0.6375
1	4	0.4285
1	5	0.413
1	6	0.4442
1	7	0.4322
1	8	0.5276
1	9	0.0058
1	10	0.7234
2	3	0.7275
2	4	0.1845
2	5	0.1587
2	6	0.1925
2	7	0.2168
2	8	0.5977
2	9	0.5324
2	10	0.6797
3	4	0.548
3	5	0.5139
3	6	0.5454
3	7	0.5354
3	8	0.2468
3	9	0.6207
3	10	0.7234
4	5	0.0921
4	6	0.0036
4	7	0.1237
4	8	0.4423
4	9	0.398
4	10	0.5348
5	6	0.097
5	7	0.1278
5	8	0.3915
5	9	0.3811
5	10	0.4718
6	7	0.1157
6	8	0.4286
6	9	0.4099
6	10	0.5107
7	8	0.4508
7	9	0.3975
7	10	0.4736
8	9	0.5072
8	10	0.5857
9	10	0.659
GEOGRAPHIC_DISTANCE
1	2	561.82
1	3	461.04
1	4	544.51
1	5	541.46
1	6	538.84
1	7	450.42
1	8	447.03
1	9	5.72
1	10	427.19
2	3	1022.35
2	4	51.91
2	5	25.04
2	6	54.76
2	7	134.19
2	8	1008.23
2	9	567.54
2	10	969.77
3	4	1005.47
3	5	1002.33
3	6	999.79
3	7	910.63
3	8	14.33
3	9	455.33
3	10	205.92
4	5	33.85
4	6	5.76
4	7	98.72
4	8	991.5
4	9	550.22
4	10	943.39
5	6	34.63
5	7	109.26
5	8	988.25
5	9	547.18
5	10	947.13
6	7	92.99
6	8	985.82
6	9	544.54
6	10	937.63
7	8	896.75
7	9	456.09
7	10	844.9
8	9	441.32
8	10	203.79
9	10	421.95

表达多样性分析内容

1 .Ed 和Ep 计算结果

根据文献中提供的公式,Ep 即种群fpkm 的平均值,Ed 种群| 每个样品的fpkm - Ep| 的累加值/(n-1).Ep .

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/population

2. Anova 方差分析

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/Anova

3. KS 检验

方法见:https://www.cnblogs.com/arkenstone/p/5496761.html

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/KS

4. 相关性分析

主要是计算Ed ,Ep等表达相关多样性和遗传多样性之间的相关性,涉及到的方法也是皮尔森相关和Mantel 检验

参考路径:/NJPROJ3/RNA_SH/shouhou/gexinghua/beifen/tmp/P101SC18061879/gexinghua_expression_diversity/EpvsGeneticvsGeographic

遗传结构和遗传多样性分析.txt · 最后更改: 2022/08/16 02:45 由 zhangxin