人类疾病相关数据库比对：scDRS软件使用教程

阅读数 2029

0. scDRS有一个局限性就是，基因和疾病之间的相关性是基于基因组变异来统计的，统计数据也是遗传相关的疾病，数据库相对局限；即使使用自己构建参考数据的方式也不能避免这个问题。单细胞测序多为表达数据，因此个人感觉还是基于表达量差异得到的疾病-基因相关性更有意义。

1. Seurat对象到Anndata对象的转换。

这里使用的是之前经过初步分析的Seurat对象，包括了scale.data的结果。如果直接用SeuratDisk转换的话，Anndata中的X读入的是scale.data数据，不适用于scDRS的分析。

转换流程：

library(Seurat)

library(SeuratDisk)

tumor<-readRDS("tumor.rds")

rna<-tumor@assays$RNA@counts

tumor_1<-CreateSeuratObject(counts=rna)

SaveH5Seurat(tumor_1,filename="tumor_h5Seurat",overwrite = T)

Convert("tumor_h5Seurat.h5seurat",dest="h5ad",assay="RNA",overwrite = T)

这一步得到的tumor_h5Seurat.h5ad文件可以用于后续分析。

2. python平台上scDRS分析

首先需要安装各种单细胞分析的程序包，包括pandas，numpy，seaborn，scipy，anndata，scanpy以及scdrs。使用pip安装或者参考各个软件的安装教程即可。

1) 数据读入，这一步需要在python上运行：

adata_own=sc.read_h5ad('tumor_h5Seurat.h5ad')

df_gs=pd.read_csv('magma_10kb_top1000_zscore.74_traits.rv1.gs',sep='\t',index_col=0) #gs文件下载于scDRs的官方数据库。示例：

2) 计算单个细胞的scDRS-score，这一步需要在终端运行。

scdrs compute-score \

--h5ad-file tumor_h5Seurat.h5ad \

--h5ad-species mouse \

--gs-file magma_10kb_top1000_zscore.74_traits.rv1.gs \

--gs-species human \

--cov-file cov.tsv \

--flag-filter-data True \

--flag-raw-count False \

--flag-return-ctrl-raw-score False \

--flag-return-ctrl-norm-score True \

--out-folder results/

可以得到根据每个疾病相关基因计算出的单个细胞表达分数。其中，traits数据来源于文章中的数据存储网站，共有74个traits，主要基于UK Biobank构建。

3) 最终生成的文件中，.score.gz文件可以用于后续的分析。可以通过barcode信息把score数据添加到Seurat对象的metadata中。示例代码：

library(data.table)

library(hash)

brcancer<-fread("/storage/work/wangrj/second_YWS_pan02/scDRS_result/own_data/results_all/UKB_460K.cancer_BREAST.score.gz",header=T,sep="\t")

scorehash<-hash()

for(i in 1:length(brcancer$V1)){

scorehash[brcancer$V1[i]]=brcancer$norm_score[i]

}

scDRS_BC_norm_score<-c()

barall<-colnames(mdata)

bar_br<-brcancer$V1

for(i in 1:length(colnames(mdata))){

bar<-barall[i]

if(bar %in% bar_br){

temp<-scorehash[[bar]]

}

else{

temp<-"NA"

}

scDRS_BC_norm_score<-c(scDRS_BC_norm_score,temp)

}

mdata$scDRS_BC_norm_score<-scDRS_BC_norm_score

msub<-subset(mdata,scDRS_BC_norm_score!="NA")

library(ggraph)

library(ggplot2)

msub$scDRS_BC_norm_score<-as.numeric(msub$scDRS_BC_norm_score)

ggplot(data.frame(msub@meta.data, msub@reductions$umap@cell.embeddings), aes(UMAP_1, UMAP_2, color=scDRS_BC_norm_score)) +

geom_point(size=1.5) + scale_color_viridis(option="H")+

theme_light(base_size = 15)+labs(title = "Breast cancer score")+

theme(panel.border = element_rect(fill=NA,color="black", size=1, linetype="solid"))+theme(plot.title = element_text(hjust = 0.5))

ggsave("scDRS_result/umap_of_breast_cancer_score.pdf",height=6,width=8)

武汉格发信息技术有限公司，格发许可优化管理系统可以帮你评估贵公司软件许可的真实需求，再低成本合规性管理软件许可,帮助贵司提高软件投资回报率，为软件采购、使用提供科学决策依据。支持的软件有: CAD,CAE,PDM,PLM,Catia,Ugnx, AutoCAD, Pro/E, Solidworks 等。

返回上级列表