compareGroupsR包是一个比较常用的用于绘制临床基线表的R包。
开发者对它的功能定义主要侧重于绘制描述性表格,可以显示多个变量的平均值、标准差、分位数或频率,以及运用统计学方法计算各组之间的P值。
今天就跟着github上的资料和网上各路大神的教程过一遍这个R包。参考资料链接附在推文末尾。
这是开发者告诉使用者这个R包的结构图,对于使用者来说最重要的就是三步:计算,构建和输出。
上述三步对应三个关键的函数:
compareGroups()
createTable()
export2word() 输出函数有很多变体。
除此之外,开发者告诫使用者该包的功能不是对数据进行质量控制。建议使用者导入分析的数据只包含需要分析的变量(或分析前需在R中处理好) ,并且要知道如何对变量进行分类 ,因为后续进行分析时需要将变量设定为因子以及命名(设置label属性) 。
1、安装和加载R包
#两种方法都可以
#install.packages("compareGroups")
#library(devtools); devtools::install_github(repo = "isubirana/compareGroups")
#加载R包
library(compareGroups)
2、导入数据(该示例数据是心血管相关的)
data("regicor", package = "compareGroups") #导入示例数据
str(regicor)
#'data.frame': 2294 obs. of 25 variables:
# $ id : num 2.26e+03 1.88e+03 3.00e+09 3.00e+09 3.00e+09 ...
# ..- attr(*, "label")= Named chr "Individual id"
# .. ..- attr(*, "names")= chr "id"
# $ year : Factor w/ 3 levels "1995","2000",..: 3 3 2 2 2 2 2 1 3 1 ...
# ..- attr(*, "label")= Named chr "Recruitment year"
# .. ..- attr(*, "names")= chr "year"
# $ age : int 70 56 37 69 70 40 66 53 43 70 ...
# ..- attr(*, "label")= Named chr "Age"
# .. ..- attr(*, "names")= chr "age"
#展示了部分str后的数据
每一个$后面代表了一个参数,每个参数有三行。
第一行代表为参数内容,其中num,Factor,int这些就不解释了;
第二行为参数的label名称,也就是最终展示在表格上的各参数的名称;
第三行为参数的names,也就是导入文件中各参数的名称。
3、创建心血管事件(CV)和死亡时间变量
library(survival)
regicor$tcv <- with(regicor, Surv(tocv, as.integer(cv=='Yes')))
attr(regicor$tcv, "label") <- "Cardiovascular"
regicor$tdeath <- with(regicor, Surv(todeath, as.integer(death=='Yes')))
attr(regicor$tdeath, "label") <- "Mortality"
封装前:
封装后:
4、整体描述
compareGroups( ~ ., data = regicor)
#-------- Summary of results ---------
# var N method selection
#1 Individual id 2294 continuous normal ALL
#2 Recruitment year 2294 categorical ALL
#3 Age 2294 continuous normal ALL
#4 Sex 2294 categorical ALL
#5 Smoking status 2233 categorical ALL
#6 Systolic blood pressure 2280 continuous normal ALL
#7 Diastolic blood pressure 2280 continuous normal ALL
#8 History of hypertension 2286 categorical ALL
#9 Hypertension treatment 2251 categorical ALL
#10 Total cholesterol 2193 continuous normal ALL
# 展示部分
函数解释: “~”前面代表的是分组变量,代表后续按照这个变量分组,但我这里没有设置分组变量。“~”后边代表的需要纳入统计的变量,这里“.”代表所有变量。
结果解释: Var代表各种变量,N代表数量,method代表是什么类型的变量,selection代表纳入分析的样本量。
5、按照year进行三分组,并且去除id
compareGroups(year~ . -id, data = regicor)
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 Age 2294 0.078* continuous normal ALL
#2 Sex 2294 0.506 categorical ALL
#3 Smoking status 2233 <0.001** categorical ALL
# 展示部分
chisq.test(regicor$sex, regicor$year)
# Pearson's Chi-squared test
#data: regicor$sex and regicor$year
#X-squared = 1.364, df = 2, p-value = 0.5056
同理,用多个“-”符号 可以去除多个变量 。
这里的结果部分多了p值,这里的p值是按照数据类型进行分析。这里的sex用的是卡方检验,给出了不同sex在不同year中的p值,需要统计学知识做支撑了。
6、可以选择部分数据进行分析
# subset指定某一亚组数据,在纳入分析的变量进行分析
compareGroups(year ~ age + smoker + chol, data = regicor,
subset = sex == "Female")
#-------- Summary of results by groups of 'year'---------
# var N p.value method selection
#1 Age 1193 0.351 continuous normal sex == "Female"
#2 Smoking status 1162 <0.001** categorical sex == "Female"
#3 Total cholesterol 1139 0.004** continuous normal sex == "Female"
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
# selec可以指定某一亚组中按照某一参数的情况进行分析
compareGroups(year ~ age + smoker + chol, data = regicor,
selec = list(smoker = chol > 100))
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 Age 2294 0.078* continuous normal ALL
#2 Smoking status 2143 <0.001** categorical chol > 100
#3 Total cholesterol 2193 <0.001** continuous normal ALL
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
# 先脑补一下,自行尝试
compareGroups(year ~ age + smoker + chol, data = regicor,
subset = sex == "Female",
selec = list(smoker = chol > 100))
7、连续型变量可用method选择不同的统计方法。
compareGroups(year ~ age + smoker + chol, data=regicor,
method = c(chol=NA),
alpha= 0.01)
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 Age 2294 0.078* continuous normal ALL
#2 Smoking status 2233 <0.001** categorical ALL
#3 Total cholesterol 2193 <0.001** continuous non-normal ALL
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
method:统计检验方法。1:数值型变量,正态分布。2:数值型变量,非正态分布。3:分类变量。
NA:使用shapiro.test()决定是否正态分布,compareGroups函数将自动选择合适的方法。
alpha:正态性检验的阈值。
8、设置变量取值数、分组数
regicor$age7gr <- as.integer(cut(regicor$age, breaks = c(-Inf,
55, 60, 65, 70, 75, 80, Inf), right = TRUE))
compareGroups(year ~ age7gr, data = regicor, method = c(age7gr = NA))
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 age7gr 2294 0.422 continuous non-normal ALL
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
# min.dis:所有非因子型向量都会被认为是连续型的,除非某个变量的取值少于5个,可通过此参数更改这个标准。
compareGroups(year ~ age7gr, data = regicor, method = c(age7gr = NA),
min.dis = 8)
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 age7gr 2294 0.163 categorical ALL
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
# max.ylev:设置分组变量的最大组数,说实话还没理解
compareGroups(year ~ sex + age7gr + chol, data = regicor, max.ylev = 7)
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
#1 Sex 2294 0.506 categorical ALL
#2 age7gr 2294 0.552 continuous normal ALL
#3 Total cholesterol 2193 <0.001** continuous normal ALL
#-----
#Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
9、当method这里制定变量设置为2时,默认格式变为"中位数[25%分位数,75%分位数]"
resu1 <- compareGroups(year ~ age + chol, data = regicor,
method = c(chol = 2))
createTable(resu1) #创建表格。
#--------Summary descriptives table by 'Recruitment year'---------
#_____________________________________________________________________
# 1995 2000 2005 p.overall
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078
#Total cholesterol 225 [196;254] 222 [193;250] 209 [184;238] <0.001
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#分位数可以通过Q1和Q2进行指定,如果指定了Q1=0 则代表最小值,Q2=1则代表最大值
resu2 <- compareGroups(year ~ age + chol, data = regicor,
method = c(chol = 2),
Q1 = 0.025, Q3 = 0.975) #2.5%, 97.5%
createTable(resu2)
#--------Summary descriptives table by 'Recruitment year'---------
#_____________________________________________________________________
# 1995 2000 2005 p.overall
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078
#Total cholesterol 225 [148;311] 222 [150;315] 209 [133;318] <0.001
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
10、simplify参数排除含有0的亚组
# 开发者设置了一个新的列
regicor$smk <- regicor$smoker
levels(regicor$smk) <- c("Never smoker", "Current or former < 1y", "Former >= 1y", "Unknown")
attr(regicor$smk, "label") <- "Smoking 4 cat."
cbind(table(regicor$smk))
# 不加simplify = FALSE 结果会给出warning:
compareGroups(year ~ age + smk + bmi, data = regicor)
#Warning message:
#In compare.i(X[, i], y = y, selec.i = selec[i], method.i = method[i], :
## Some levels of 'smk' are removed since no observation in that/those levels
# 加了之后就没有warning 啦~
compareGroups(year ~ age + smk + bmi, data = regicor, simplify = FALSE)
#-------- Summary of results by groups of 'Recruitment year'---------
# var N p.value method selection
# 1 Age 2294 0.078* continuous normal ALL
# 2 Smoking 4 cat. 2233 . categorical ALL
# 3 Body mass index 2259 <0.001** continuous normal ALL
# -----
# Signif. codes: 0 '**' 0.05 '*' 0.1 ' ' 1
# 支持更新对象
```{r}
# 具体不演示了,就是数据的增减~
# 如下是示例代码,需自行修改
res <- compareGroups(group ~ age + sex + smoke + waist + hormo,
data = predimed)
res
res <- update(res, . ~ . - sex + bmi + toevent, subset = sex ==
"Female", method = c(waist = 2, tovent = 2), selec = list(bmi = !is.na(hormo)))
res
11、summary函数可以输出更详细的数据
res <- compareGroups(year ~ age + sex + smoker + chol,
method = c(chol = 2), data = regicor)
summary(res[c(1, 2, 4)])
# --- Descriptives of each row-variable by groups of 'Recruitment year' ---
#-------------------
#row-variable: Age
# N mean sd lower upper p.overall p.trend p.1995 vs 2000 p.1995 vs 2005 p.2000 vs 2005
#[ALL] 2294 54.73627 11.04926 54.28388 55.18866
#1995 431 54.09745 11.7172 52.98813 55.20677 0.077837 0.031665 0.930249 0.143499 0.161195
#2000 786 54.33715 11.21814 53.55168 55.12262
#2005 1077 55.28319 10.62606 54.64786 55.91853
#-------------------
#row-variable: Sex
# Male Female Male% Female% p.overall p.trend p.1995 vs 2000 p.1995 vs 2005 p.2000 vs 2005
#[ALL] 1101 1193 47.99477 52.00523
#1995 206 225 47.79582 52.20418 0.505601 0.543829 0.793746 0.793746 0.791583
#2000 390 396 49.61832 50.38168
#2005 505 572 46.88951 53.11049
#-------------------
#row-variable: Total cholesterol
# N med Q1 Q3 lower upper p.overall p.trend p.1995 vs 2000 p.1995 vs 2005 p.2000 vs 2005
#[ALL] 2193 215 189 245 213 218
#1995 403 225 196 254 220 230 0 0 0.330934 0 0
#2000 715 222 193 250 217 227
#2005 1075 209 184 238 206 211
12、plot画图
plot(res[c(1)], file = "~/Desktop/", type = "png") # Age
plot(res[c(2)], file = "~/Desktop/", type = "png") # Sex
plot(res[c(3,4)], file = "~/Desktop/", type = "png") # smoker + chol
13、可提取其中的一些信息,比如P值、均值、比值比、风险比等
# 开发者提供了示例数据:SNPs,提供了代码hhh
library(SNPassoc)
data(SNPs)
tab <- createTable(compareGroups(casco ~ snp10001 + snp10002 + snp10005 + snp10008 + snp10009, SNPs))
pvals <- getResults(tab, "p.overall")
p.adjust(pvals, method = "BH")
# snp10001 snp10002 snp10005 snp10008 snp10009
#0.7051300 0.7072158 0.7583432 0.7583432 0.7072158
# ORR,HR。 show.radtio参数设置为TRUE
res1 <- compareGroups(tdeath ~ age + sex + bmi + smoker,
data = regicor,
# ref = c(smoker = 1, sex = 2) 可以通过设置数字来设置参考水平哦
ref = 1)
createTable(res1, show.ratio = TRUE)
#--------Summary descriptives table by 'Mortality'---------
#______________________________________________________________________________________
# No event Event HR p.ratio p.overall
# N=1975 N=173
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.3 (11.0) 60.5 (10.3) 1.05 [1.04;1.07] <0.001 <0.001
#Sex: 0.349
# Male 943 (47.7%) 87 (50.3%) Ref. Ref.
# Female 1032 (52.3%) 86 (49.7%) 0.87 [0.64;1.17] 0.349
#Body mass index 27.4 (4.45) 30.2 (5.01) 1.11 [1.08;1.14] <0.001 <0.001
#Smoking status: <0.001
# Never smoker 1052 (54.3%) 86 (49.7%) Ref. Ref.
# Current or former < 1y 488 (25.2%) 69 (39.9%) 1.74 [1.27;2.39] 0.001
# Former >= 1y 397 (20.5%) 18 (10.4%) 0.56 [0.34;0.94] 0.027
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# 还可以画丑丑的生存曲线..
plot(compareGroups(tdeath ~ sex, data = regicor), bivar = TRUE,
file = "~/Desktop/", type = "png")
14、创建描述性表格,关键函数createTable
res <- compareGroups(year ~ age + sex + smoker + chol,
data = regicor,
selec = list(chol = sex == "Female"))
restab <- createTable(res)
#可以用print查看描述性表格
print(restab, which.table = "descr")
#--------Summary descriptives table by 'Recruitment year'---------
#________________________________________________________________________
# 1995 2000 2005 p.overall
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078
#Sex: 0.506
# Male 206 (47.8%) 390 (49.6%) 505 (46.9%)
# Female 225 (52.2%) 396 (50.4%) 572 (53.1%)
#Smoking status: <0.001
# Never smoker 234 (56.4%) 414 (54.6%) 553 (52.2%)
# Current or former < 1y 109 (26.3%) 267 (35.2%) 217 (20.5%)
# Former >= 1y 72 (17.3%) 77 (10.2%) 290 (27.4%)
#Total cholesterol 226 (42.4) 224 (44.9) 216 (50.3) 0.004
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#可用性数据的表格
print(restab, which.table = "avail")
#---Available data----
#________________________________________________________________________
# [ALL] 1995 2000 2005 method select
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 2294 431 786 1077 continuous-normal ALL
#Sex 2294 431 786 1077 categorical ALL
#Smoking status 2233 415 758 1060 categorical ALL
#Total cholesterol 1139 207 362 570 continuous-normal sex == "Female"
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
15、CreatTable中关键参数介绍
# 延续上面的数据
res <- compareGroups(year ~ age + sex + smoker + chol +death,
data = regicor,
selec = list(chol = sex == "Female"))
res
# hide.no = no, 如果某个变量含有no这个类别,可以全部隐藏
createTable(res, hide.no = "no") # 比如上面的death中含有no
#--------Summary descriptives table by 'Recruitment year'---------
#________________________________________________________________________
# 1995 2000 2005 p.overall
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078
#Sex: 0.506
# Male 206 (47.8%) 390 (49.6%) 505 (46.9%)
# Female 225 (52.2%) 396 (50.4%) 572 (53.1%)
#Smoking status: <0.001
# Never smoker 234 (56.4%) 414 (54.6%) 553 (52.2%)
# Current or former < 1y 109 (26.3%) 267 (35.2%) 217 (20.5%)
# Former >= 1y 72 (17.3%) 77 (10.2%) 290 (27.4%)
#Total cholesterol 226 (42.4) 224 (44.9) 216 (50.3) 0.004
#Overall death 18 (4.65%) 81 (11.0%) 74 (7.23%) <0.001
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# digits用于控制表格中小数点的位数
createTable(res, digits = c(age = 2, sex = 3))
#Age 54.10 (11.72) 54.34 (11.22) 55.28 (10.63) 0.078
#Sex: 0.506
# Male 206 (47.796%) 390 (49.618%) 505 (46.890%)
# Female 225 (52.204%) 396 (50.382%) 572 (53.110%)
# show.n = TRUE 展示每个变量的所有可用数量
createTable(res, show.n = TRUE)
#--------Summary descriptives table by 'Recruitment year'---------
#_____________________________________________________________________________
# 1995 2000 2005 p.overall N
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078 2294
#Sex: 0.506 2294
# Male 206 (47.8%) 390 (49.6%) 505 (46.9%)
# Female 225 (52.2%) 396 (50.4%) 572 (53.1%)
#Smoking status: <0.001 2233
# Never smoker 234 (56.4%) 414 (54.6%) 553 (52.2%)
# Current or former < 1y 109 (26.3%) 267 (35.2%) 217 (20.5%)
# Former >= 1y 72 (17.3%) 77 (10.2%) 290 (27.4%)
#Total cholesterol 226 (42.4) 224 (44.9) 216 (50.3) 0.004 1139
#Overall death: <0.001 2148
# No 369 (95.3%) 657 (89.0%) 949 (92.8%)
# Yes 18 (4.65%) 81 (11.0%) 74 (7.23%)
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# show.descr = FALSE表示不展示描述统计部分,只显示P值
createTable(res, show.descr = FALSE)
#--------Summary descriptives table by 'Recruitment year'---------
#____________________________________
# p.overall
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 0.078
#Sex:
# Male 0.506
# Female
#Smoking status:
# Never smoker <0.001
# Current or former < 1y
# Former >= 1y
#Total cholesterol 0.004
#Overall death:
# No <0.001
# Yes
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# 因变量有2个以上的类别,可通过show.p.trend = TRUE展示p-for-trend,符合正态分布通过pearson计算,不符合通过spearman计算
createTable(res, show.p.trend = TRUE)
# show.p.mul:分组变量多于两组可以进行两两比较,符合正态分布用Turkey,不符合用Benjamini & Hochberg
createTable(res, show.p.mul = TRUE)
# 因变量是2分类或生存数据,show.ratio = TRUE可展示ORR或者HR(上面提到过)
createTable(update(res, subset = year != 1995), show.ratio = TRUE)
# digits.ratio控制ORR和HR的小数点位数
createTable(compareGroups(tdeath ~ year + age + sex, data = regicor),
show.ratio = TRUE,
digits.ratio = 3)
#--------Summary descriptives table by 'Mortality'---------
#________________________________________________________________________________
# No event Event HR p.ratio p.overall
# N=1975 N=173
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Recruitment year: <0.001
# 1995 369 (18.7%) 18 (10.4%) Ref. Ref.
# 2000 657 (33.3%) 81 (46.8%) 2.416 [1.450;4.027] 0.001
# 2005 949 (48.1%) 74 (42.8%) 1.509 [0.901;2.526] 0.118
#Age 54.3 (11.0) 60.5 (10.3) 1.052 [1.037;1.067] <0.001 <0.001
#Sex: 0.349
# Male 943 (47.7%) 87 (50.3%) Ref. Ref.
# Female 1032 (52.3%) 86 (49.7%) 0.867 [0.644;1.169] 0.349
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
16、导出前内容修改:列名修改,行合并,列合并,strataTable快速分层,descrTable快速制表
# print或者导出表格时,header.labels可修改列名:
final <- createTable(compareGroups(tdeath ~ year + age + sex, data = regicor),
show.all = TRUE)
print(final, header.labels = c(p.overall = "p-value", all = "ALL"))
#--------Summary descriptives table by 'Mortality'---------
#_______________________________________________________________
# ALL No event Event p-value
# N=2148 N=1975 N=173
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Recruitment year: <0.001
# 1995 387 (18.0%) 369 (18.7%) 18 (10.4%)
# 2000 738 (34.4%) 657 (33.3%) 81 (46.8%)
# 2005 1023 (47.6%) 949 (48.1%) 74 (42.8%)
#Age 54.8 (11.0) 54.3 (11.0) 60.5 (10.3) <0.001
#Sex: 0.349
# Male 1030 (48.0%) 943 (47.7%) 87 (50.3%)
# Female 1118 (52.0%) 1032 (52.3%) 86 (49.7%)
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# 可按照行合并表格-rbind
restab1 <- createTable(compareGroups(year ~ age + sex, data = regicor))
restab2 <- createTable(compareGroups(year ~ chol + smoker, data = regicor))
rbind(`Non-modifiable risk factors` = restab1, `Modifiable risk factors` = restab2)
#--------Summary descriptives table by 'Recruitment year'---------
#____________________________________________________________________________
# 1995 2000 2005 p.overall
# N=431 N=786 N=1077
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Non-modifiable risk factors:
# Age 54.1 (11.7) 54.3 (11.2) 55.3 (10.6) 0.078
# Sex: 0.506
# Male 206 (47.8%) 390 (49.6%) 505 (46.9%)
# Female 225 (52.2%) 396 (50.4%) 572 (53.1%)
#Modifiable risk factors:
# Total cholesterol 225 (43.1) 224 (44.4) 213 (45.9) <0.001
# Smoking status: <0.001
# Never smoker 234 (56.4%) 414 (54.6%) 553 (52.2%)
# Current or former < 1y 109 (26.3%) 267 (35.2%) 217 (20.5%)
# Former >= 1y 72 (17.3%) 77 (10.2%) 290 (27.4%)
x <- rbind(`Non-modifiable` = restab1, Modifiable = restab2)
rbind(`Non-modifiable` = restab1, Modifiable = restab2)[c(1,4)] #可以选择要想的变量
# 按列合并-cbind
res <- compareGroups(sex ~ age + chol, data = regicor)
alltab <- createTable(res, show.p.overall = FALSE)
femaletab <- createTable(update(res, subset = sex == "Female"),
show.p.overall = FALSE)
maletab <- createTable(update(res, subset = sex == "Male"), show.p.overall = FALSE)
cbind(ALL = alltab, FEMALE = femaletab, MALE = maletab)
#--------Summary descriptives table ---------
#___________________________________________________________________
# ALL FEMALE MALE
# _______________________ ___________ ___________
# Male Female Female Male
# N=1101 N=1193 N=1193 N=1101
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.8 (11.1) 54.7 (11.0) 54.7 (11.0) 54.8 (11.1)
#Total cholesterol 217 (42.7) 220 (47.4) 220 (47.4) 217 (42.7)
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# strataTable 快速分层
res <- compareGroups(sex ~ age + chol, data = regicor)
restab <- createTable(res, hide.no = "no")
strataTable(restab, "cv")
#--------Summary descriptives table ---------
#______________________________________________________________________________________
# No Yes
# _________________________________ _________________________________
# Male Female p.overall Male Female p.overall
# N=996 N=1075 N=46 N=46
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
#Age 54.7 (11.1) 54.6 (11.0) 0.785 58.2 (11.5) 56.7 (10.6) 0.511
#Total cholesterol 216 (42.4) 219 (46.5) 0.124 223 (44.4) 225 (56.2) 0.827
#¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
# descrTable可把compareGroup和createTable两步合并
# 自行尝试-建议上面步骤全部练熟了之后再用这个
descrTable(sex ~ age + chol, data = regicor)
17、导出数据
export2csv(restab, file='table1.csv') #导出为CSV
export2html(restab, file='table1.html') #导出为HTML
export2latex(restab, file='table1.tex') #导出为LaTeX
export2pdf(restab, file='table1.pdf') #导出为PDF
export2md(restab, file='table1.md') #导出为Markdown
export2word(restab, file='table1.docx') #导出为Word
export2xls(restab, file='table1.xlsx') #导出为Excel
# strip按变量添加条形行和颜色
export2md(restab, strip = TRUE, first.strip = TRUE)
# size修改字的大小
export2md(restab, size = 6)
# width修改变量列的宽度
export2md(restab, width = "400px")
参考资料:
1、https://htmlpreview.github.io/?https://github.com/isubirana/compareGroups/blob/master/compareGroups_vignette.html (开发者)
2、https://www.jstatsoft.org/article/view/v057i12 (开发者)
3、https://ayueme.github.io/R_medical_stat/comparegroups.html (阿越老师)
注:若对内容有疑惑或者有发现明确错误的朋友,请联系后台(欢迎交流)。更多内容可关注公众号:生信方舟
- END -