输入数据准备
本教程要求eMAGMA 文件, 软件 (MAGMA) 和辅助文件都在同一个目录下如果你的文件在不同的目录上,请在命令行加入路径信息
cd /path/yourworking folder/eMAGMA
解压缩软件包及辅助文件:magma_v1.07b.zip
, NCBI37.3.zip
和MDD2018_excluding23andMe
(下载自 PGC website).
unzip [filename].zip
由于我下载的MDD2018_excluding23andMe
是一个.gz的文件,解压使用如下代码:
gzip -d MDD2018_ex23andMe.gz
解压完后查看前几行
(base) bogon:eMAGMA chelsea$ head MDD2018_ex23andMe
CHR SNP BP A1 A2 FRQ_A_59851 FRQ_U_113154 INFO OR SE P ngt Direction HetISqt HetDf HetPVa Nca Nco Neff
8 rs62513865 101592213 T C 0.0747 0.0733 0.957 1.01461 0.0153 0.3438 0 ---+++ 0.0 5 0.7906 59851 113154 69115.85
8 rs79643588 106973048 A G 0.092 0.092 0.999 1.02122 0.0136 0.1231 0 ++-+++ 0.0 5 0.6847 59851 113154 69115.85
8 rs17396518 108690829 T G 0.562 0.565 0.98 1.00331 0.008 0.6821 6 --+--+ 34.9 5 0.05637 59851 113154 69115.85
8 rs6994300 102569817 A G 0.00609 0.00556 0.466 0.88126 0.4243 0.7658 0 ?-???? 0.0 0 1 16823 25632 20313.61
8 rs138449472 108580746 A G 0.00965 0.00852 0.734 0.97181 0.0598 0.632 0 -+??-? 0.0 2 0.6663 41253 79756 47868.51
8 rs983166 108681675 A C 0.565 0.568 0.991 0.99144 0.008 0.2784 0 --+--+ 10.1 5 0.1685 59851 113154 69115.85
8 rs28842593 103044620 T C 0.841 0.843 0.934 0.98926 0.0112 0.3381 1 -++--+ 0.0 5 0.51 59851 113154 69115.85
8 rs377046245 105176418 I D 0.265 0.265 0.994 1.03004 0.0196 0.1311 - ?????- 0.0 0 1 14260 15480 14844.98
8 rs7014597 104152280 C G 0.166 0.164 0.996 1.01501 0.0106 0.1591 0 +-+-++ 0.0 5 0.4818 59851 113154 69115.85
运行程序需要P-values
来自 MDD GWAS summary statistics
. 使用下面的代码从 GWAS summary statistics来提取: SNP-ID, chromosome, BP(base pair position), and P-value
相关信息并写入一个新的 txt file.
awk '{print $2,$1,$3,$11,$19}' MDD2018_ex23andMe > MDD2018_ex23andMe_emagma.txt
完了之后 前3列的信息排列应该是:SNP
, CHR
和 BP
。
数据分析Analysis
分析需要来自适当参考样本的原始基因型数据来模拟LD结构。为此,我们使用欧洲人口的基因组参考文件[g1000_eur]
。由于我们基于eQTL信息将SNP与基因相关联,因此我们使用注释文件,我们先前已将SNP分配给目标eGenes。这些注释文件是基于GTEx中显著的(FDR<0.05)SNP-基因关联生成的(更多细节见Gering 2009b)。注释文件位于文件夹Batch1、Batch2、Batch3与Batch4、Batch5与Batch6中。
出于实际原因,我们将仅使用位于目录Batch1中的杏仁核的注释文件。在Batch1中,有13个注释文件,前缀为组织名称,后缀为genes.annot。要将Batch1下载到名为Brain的目录中,请使用以下命令:
mkdir Brain
mv Batch1_annot.zip Brain
cd Brain
unzip Batch1.zip
cd ..
因为我之前已经下载了Batch1_annot.zip,新建Brain文件后将其移动到该目录下直接解压缩Batch1.zip
确保返回到eMAGMA文件夹。要运行关联,请使用以下命令:
./magma --bfile g1000_eur
--gene-annot Brain/Brain_Amygdala.genes.annot
--pval MDD2018_ex23andMe_emagma2.txt ncol=Neff
--gene-settings adap-permp=10000
--out Amygdala_emagma
运行结果如下所示:
(base) bogon:eMAGMA chelsea$ ./magma --bfile g1000_eur --gene-annot Brain/Batch1/Brain_Amygdala.genes.annot --pval MDD2018_ex23andMe_emagma.txt ncol=Neff --gene-settings adap-permp=10000 --out Amygdala_emagma
Welcome to MAGMA v1.07b (mac)
Using flags:
--bfile g1000_eur
--gene-annot Brain/Batch1/Brain_Amygdala.genes.annot
--pval MDD2018_ex23andMe_emagma.txt ncol=Neff
--gene-settings adap-permp=10000
--out Amygdala_emagma
Start time is 20:55:53, Tuesday 28 Apr 2020
Loading PLINK-format data...
Reading file g1000_eur.fam... 503 individuals read
Reading file g1000_eur.bim... 22665064 SNPs read
Preparing file g1000_eur.bed...
Reading SNP synonyms from file g1000_eur.synonyms (auto-detected)
read 6016767 mapped synonyms from file, mapping to 3921040 SNPs in the data
WARNING: detected 133 synonymous SNP pairs in the data
skipped all synonym entries involved, synonymous SNPs are kept in analysis
writing list of detected synonyms in data to supplementary log file
Reading SNP p-values from file MDD2018_ex23andMe_emagma.txt...
detected 5 variables in file
using variable: SNP (SNP id)
using variable: P (p-value)
using variable: Neff (sample size; discarding SNPs with N < 50)
ERROR - reading p-value file: too few values on line 9533410 (found 4, expecting at least 5):
line: rs190562351 21 9502243 0.83543
Terminating program.
由于rs190562351这行有个缺失值,运行终止了,那么我就删除这行
## 先用grep找到这行,没有重复项
$ grep 'rs190562351' MDD2018_ex23andMe_emagma.txt
rs190562351 21 9502243 0.83543
## 再去查找了grep的用法,-v参数是输出反选的数据,再赋值到一个新的文件,这样就删掉了这行。
$ grep -v 'rs190562351' MDD2018_ex23andMe_emagma.txt > MDD2018_ex23andMe_emagma1.txt
再用MDD2018_ex23andMe_emagma1.txt
重新运行,但是万万没想到,这种有一个值缺失的行不只这一行,我要是这样一步步删除不知道要到什么时候,那就需要写循环了,尝试了好几次都没能成功
(base) bogon:eMAGMA chelsea$ grep '{cat id2.txt | while read id ; do echo ${id}; done}' MDD2018_ex23andMe_emagma2.txt
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do echo ${id}; done
rs71269378
rs76790510
rs77252136
rs140659530
rs181553715
rs145993931
rs76661226
rs76722126
rs76196084
rs79016859
rs79149161
rs71259347
rs71258884
rs4104602
rs77072381
rs77056430
rs140202858
rs190271206
rs71229564
rs71277812
rs76384163
rs71244393
rs71247672
rs79178122
rs145133116
rs151229293
rs77863943
rs143923139
rs79633665
rs79047427
rs77756107
rs146840680
chr21_9862783_D
rs77110876
rs140066961
rs75310970
rs77482226
rs144043309
rs77690252
rs7378139
rs79407983
rs138722934
rs77061955
rs75645220
rs139338962
rs193153830
rs80297221
rs75702448
rs142407330
rs146496652
rs148968985
rs71259094
rs80035564
rs75780639
rs79300020
rs75625582
rs189257336
rs189188955
rs191838859
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do echo ${id}; done |grep -v
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do echo ${id}; done |grep -v ${id}
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do echo ${id}; done |grep ${id} MDD2018_ex23andMe_emagma2.txt
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do echo ${id}; done |grep MDD2018_ex23andMe_emagma2.txt
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do grep '${id}' MDD2018_ex23andMe_emagma2.txt; done
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do grep '${id}' MDD2018_ex23andMe_emagma2.txt > MDD2018_ex23andMe_emagmao.txt; done
^C
(base) bogon:eMAGMA chelsea$ cat id2.txt | while read id ; do grep -v '${id}' MDD2018_ex23andMe_emagma2.txt > MDD2018_ex23andMe_emagmap.txt; done
以上是我的探索过程,也记录下来,最终没有成功删除目标项目,看到缺失项还挺集中的,后面没有办法就简单粗爆的打开txt文档通过搜索rs71269378
找到位置,手动把这些都删除了,然后再运行,就成功了。
(base) bogon:eMAGMA chelsea$ ./magma --bfile g1000_eur --gene-annot Brain/Batch1/Brain_Amygdala.genes.annot --pval MDD2018_ex23andMe_emagma2.txt ncol=Neff --gene-settings adap-permp=10000 --out Amygdala_emagma
Welcome to MAGMA v1.07b (mac)
Using flags:
--bfile g1000_eur
--gene-annot Brain/Batch1/Brain_Amygdala.genes.annot
--pval MDD2018_ex23andMe_emagma2.txt ncol=Neff
--gene-settings adap-permp=10000
--out Amygdala_emagma
Start time is 09:02:45, Wednesday 29 Apr 2020
Loading PLINK-format data...
Reading file g1000_eur.fam... 503 individuals read
Reading file g1000_eur.bim... 22665064 SNPs read
Preparing file g1000_eur.bed...
Reading SNP synonyms from file g1000_eur.synonyms (auto-detected)
read 6016767 mapped synonyms from file, mapping to 3921040 SNPs in the data
WARNING: detected 133 synonymous SNP pairs in the data
skipped all synonym entries involved, synonymous SNPs are kept in analysis
writing list of detected synonyms in data to supplementary log file
Reading SNP p-values from file MDD2018_ex23andMe_emagma2.txt...
detected 5 variables in file
using variable: SNP (SNP id)
using variable: P (p-value)
using variable: Neff (sample size; discarding SNPs with N < 50)
read 13554490 lines from file