→ carl090105: 用了tricky的方式以字串相加方式做處理,供你參考 03/19 11:10
→ celestialgod: 都用data.table了... 善用by跟.N就好 03/19 11:11
→ carl090105: 想法是當不確定column個數及名稱時也可以用... 03/19 11:20
→ celestialgod: 參考我回文吧(攤手 03/19 11:29
→ allen1985: 感謝兩位C大 03/19 11:53
> -------------------------------------------------------------------------- <
作者: celestialgod (天) 看板: R_Language
標題: Re: [問題] 整理資料
時間: Sun Mar 19 11:05:30 2017
※ 引述《allen1985 (我要低調 拯救形象)》之銘言:
: [問題類型]:
: 效能諮詢(我想讓R 跑更快)
: [軟體熟悉度]:
: 使用者(已經有用R 做過不少作品)
: [問題敘述]:
: 整理資料 不使用for loop
: [程式範例]:
: 資料如下:
: data <- matrix(c("S11","R1","O11",
: "S11","R2","O12",
: "O11","R3","O12",
: "S21","R1","O21",
: "S21","R2","O22",
: "O21","R3","O22",
: "S11","R1","O11",
: "S11","R2","O12",
: "O11","R3","O12"), ncol = 3, byrow = T)
: 我想要把資料整理成
: r.data <- matrix(c("S11","O11","O12", "2",
: "S21","O21","O22", "1"), ncol = 4, byrow = T)
: 其中第四個Column 放的是 這組資料出現幾次
: 簡單講就是 原本的資料是三個rows為一組 我想把資料
: 每一個unique組別 抓出來 並算出他出現幾次
: 我先用了很笨的兩個for loops搞定 但想問問看有沒有好的方法
: 基本上第一個for loop 先把資料整理成
: r.data <- matrix(c("S11","O11","O12",
: "S21","O21","O22"), ncol = 3, byrow = T)
: 也就是先把unique的算出來
: 第二個for loop再去算每組unique的 出現幾次 變成想要的data.frame
: 謝謝
: 簡單講三個rows 是一組
提供四種解法:
dataMat <- matrix(c("S11","R1","O11",
"S11","R2","O12",
"O11","R3","O12",
"S21","R1","O21",
"S21","R2","O22",
"O21","R3","O22",
"S11","R1","O11",
"S11","R2","O12",
"O11","R3","O12"), ncol = 3, byrow = T)
# aggregate
colSplit <- split(dataMat, rep(1L:ncol(dataMat), each = nrow(dataMat)))
aggregate(rep(1, nrow(dataMat)), colSplit, sum)
# paste0
rowCollapse <- do.call(function(...) paste(..., sep = "_"),
split(dataMat, rep(1L:ncol(dataMat), each = nrow(dataMat))))
countRows <- table(rowCollapse)
cbind(data.frame(do.call(rbind,strsplit(names(countRows), "_")),
stringsAsFactors = FALSE), Freq = countRows)
# data.table
library(data.table)
DT <- data.table(dataMat)
DT[ , .N, by = .(V1, V2, V3)]
## note, column數眾多下面這樣也行
# DT[ , .N, by = eval(paste0("V", 1:ncol(DT)))]
## 或是by裡面放你要算的column name的character vector也行
## ex:
# colsCoun <- c("V1", "V2", "V3")
# DT[ , .N, by = colsCoun]
# dplyr
library(dplyr)
DF <- as.data.frame(dataMat, stringsAsFactors = FALSE)
DF %>% group_by(V1, V2, V3) %>% summarise(count = n())
## note, column數眾多下面這樣也行
# DF %>% group_by_(.dots = paste0("V", 1:ncol(DF))) %>%
# summarise(count = n())
## or
# colsCoun <- c("V1", "V2", "V3")
# DF %>% group_by_(.dots = colsCoun) %>%
# summarise(count = n())
效率應該是:data.table > dplyr > aggregate > paste0
--
R資料整理套件系列文:
magrittr #1LhSWhpH (R_Language) https://goo.gl/72l1m9
data.table #1LhW7Tvj (R_Language) https://goo.gl/PZa6Ue
dplyr(上.下) #1LhpJCfB,#1Lhw8b-s (R_Language) https://goo.gl/I5xX9b
tidyr #1Liqls1R (R_Language) https://goo.gl/i7yzAz
pipeR #1NXESRm5 (R_Language) https://goo.gl/zRUISx
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.235.90.162
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1489892734.A.C86.html
推 allen1985: 謝謝 又學到一課了! 03/19 11:49
不客氣,歡迎多來發問XDD
→ allen1985: 雖然這只解決第二個問題 這樣寫漂亮多了 03/19 11:51
unique column的部分,在算count的時候就算做拉~~~
→ allen1985: 我的問題是 在原本的資料是三個rows為單位 03/19 13:05
→ allen1985: 我會自己想一下的 03/19 13:05
沒有注意看,抱歉QQ
這個也不難解決... 我寫一下等我一下
→ allen1985: 感謝 代替我老闆感謝你... 03/19 13:08
搞定,請參考下面:
# aggregate
colSplit <- split(dataMat, rep(1L:ncol(dataMat), each = nrow(dataMat)))
idx <- rep(1:ceiling(nrow(dataMat)/3), each = 3L, length = nrow(dataMat))
aggregate(rep(1, nrow(dataMat)), c(colSplit, list(idx = idx)), sum)
# data.table
library(data.table)
DT <- data.table(dataMat)
DT[ , idx := rep(1:ceiling(nrow(DT)/3), each = 3L, length = nrow(DT))]
print(DT)
# V1 V2 V3 idx
# 1: S11 R1 O11 1
# 2: S11 R2 O12 1
# 3: O11 R3 O12 1
# 4: S21 R1 O21 2
# 5: S21 R2 O22 2
# 6: O21 R3 O22 2
# 7: S11 R1 O11 3
# 8: S11 R2 O12 3
# 9: O11 R3 O12 3
DT[ , .N, by = .(idx, V1, V2, V3)]
# dplyr
library(dplyr)
DF <- as.data.frame(dataMat, stringsAsFactors = FALSE)
DF %>% mutate(idx = rep(1:ceiling(nrow(DT)/3),each = 3L,length= nrow(DT))) %>%
group_by(idx, V1, V2, V3) %>% summarise(count = n())
# idx V1 V2 V3 count
# <int> <chr> <chr> <chr> <int>
# 1 1 O11 R3 O12 1
# 2 1 S11 R1 O11 1
# 3 1 S11 R2 O12 1
# 4 2 O21 R3 O22 1
# 5 2 S21 R1 O21 1
# 6 2 S21 R2 O22 1
# 7 3 O11 R3 O12 1
# 8 3 S11 R1 O11 1
# 9 3 S11 R2 O12 1
→ allen1985: 再次感謝 讓我研究一下 加到我的程式裡 03/19 13:17
不客氣,我一開始沒有看懂你的問題,抱歉Orz
※ 編輯: celestialgod (36.235.90.162), 03/19/2017 13:20:23