→ andrew43: 同id下次數相同的place如何安排top_place?前後順序? 08/13 03:38
> -------------------------------------------------------------------------- <
作者: celestialgod (天) 看板: R_Language
標題: Re: [問題] 如何將一個欄位按降冪排列變成另一個欄位
時間: Sat Aug 12 18:45:53 2017
※ 引述《henry48124 (= =)》之銘言:
: [問題類型]:
:
: 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
:
: [軟體熟悉度]:
: 入門(寫過其他程式,只是對語法不熟悉)
: [問題敘述]:
: 各位大大好,我有一筆資料長得像是:
: head(df)
: id place count
: 1 A 1
: 1 B 1
: 2 B 1
: 2 C 3
: 3 D 2
: 4 A 1
: 4 C 2
: 4 D 5
: 5 B 1
: 我希望能讓他變成
: id count top_place1 top_place2
: 1 2 A B
: 2 4 C B
: 3 2 D
: 4 8 D C
: 5 1 B
: [程式範例]:
: 這是我目前的做法,總覺得寫得怪怪的,如果未來要做到 top100 就不能這樣寫
: 謝謝各位 Orz
: library(dplyr)
: answer <- NULL
: for(x in as.list(unique(df$id))) {
: df_id <- df %>%
: filter(id == x) %>%
: arrange(-count)
: count <- sum(df$count)
: top_place1 <- NA
: top_place2 <- NA
: col <- c(x, count, top_place1, top_place2)
: for(y in 1:nrow(df_id)) {
: if(y <= 2) {
: col[y+2] <- df_id[y,]$place
: }
: answer <- rbind(answer, col)
: }
: [環境敘述]:
: [關鍵字]:
method 1是硬幹,可以直接先看method 2
library(data.table)
library(stringr)
library(pipeR)
DT <- data.table(id = rep(1:5, c(2,2,1,3,1)),
place = c("A","B","B","C","D","A","C","D","B"),
count = c(1,1,1,3:1,2,5,1))
## method 1:
setorder(DT, id, -count, -place)
numRank <- 3
DT[ , .(lapply(1:numRank, function(i){
ifelse(length(place) >= i, place[i], "")
}) %>>% transpose %>>% sapply(str_c, collapse = ",")), by = .(id)] %>>%
`[`(j = str_c("top_place", 1:numRank) := transpose(str_split(V1, ",")),
by = .(id)) %>>%
`[`(j = V1 := NULL) %>>%
merge(DT[ , .(count = sum(count)), by = .(id)], by = "id")
# id top_place1 top_place2 top_place3 count
# 1: 1 A B 2
# 2: 2 C B 4
# 3: 3 D 2
# 4: 4 D C A 8
# 5: 5 B 1
## method 2:
setorder(DT, id, -count, -place)
numRank <- 3
DT[ , rr := length(count) - frank(count, ties.method = "first")+1, by = .(id)]
DT[rr %in% 1:numRank] %>>%
dcast(id ~ rr, value.var = "place") %>>%
setnames(as.character(1:numRank), str_c("top_place", 1:numRank)) %>>%
merge(DT[ , .(count = sum(count)), by = .(id)], by = "id")
# id top_place1 top_place2 top_place3 count
# 1: 1 A B NA 2
# 2: 2 C B NA 4
# 3: 3 D NA NA 2
# 4: 4 D C A 8
# 5: 5 B NA NA 1
--
R資料整理套件系列文:
magrittr #1LhSWhpH (R_Language) https://goo.gl/72l1m9
data.table #1LhW7Tvj (R_Language) https://goo.gl/PZa6Ue
dplyr(上.下) #1LhpJCfB,#1Lhw8b-s (R_Language) https://goo.gl/I5xX9b
tidyr #1Liqls1R (R_Language) https://goo.gl/i7yzAz
pipeR #1NXESRm5 (R_Language) https://goo.gl/zRUISx
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.38.134.165
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1502534757.A.2B2.html
※ 編輯: celestialgod (114.38.134.165), 08/12/2017 19:24:02
推 henry48124: 謝謝C大! 不好意思我想再請問一下 method2 08/13 18:26
→ henry48124: 想問一下第一行的 DT[rr %in% 1:numRank]中 rr 的意思 08/13 18:27
→ henry48124: 試跑的時候找不到物件'rr' 是不是我漏安裝了什麼套件 08/13 18:28
已補上
※ 編輯: celestialgod (114.38.134.165), 08/13/2017 19:10:38
> -------------------------------------------------------------------------- <
作者: ddtwu (<囧>真夭壽) 看板: R_Language
標題: Re: [問題] 如何將一個欄位按降冪排列變成另一個欄位
時間: Sun Aug 13 21:24:06 2017
※ 引述《celestialgod (天)》之銘言:
: ※ 引述《henry48124 (= =)》之銘言:
: : [問題類型]:
: : 程式諮詢(我想用R 做某件事情,但是我不知道要怎麼用R 寫出來)
: : [軟體熟悉度]:
: : 入門(寫過其他程式,只是對語法不熟悉)
: : [問題敘述]:
: : 各位大大好,我有一筆資料長得像是:
: : head(df)
: : id place count
: : 1 A 1
: : 1 B 1
: : 2 B 1
: : 2 C 3
: : 3 D 2
: : 4 A 1
: : 4 C 2
: : 4 D 5
: : 5 B 1
: : 我希望能讓他變成
: : id count top_place1 top_place2
: : 1 2 A B
: : 2 4 C B
: : 3 2 D
: : 4 8 D C
: : 5 1 B
: : [程式範例]:
: : 這是我目前的做法,總覺得寫得怪怪的,如果未來要做到 top100 就不能這樣寫
: : 謝謝各位 Orz
: : library(dplyr)
: : answer <- NULL
: : for(x in as.list(unique(df$id))) {
: : df_id <- df %>%
: : filter(id == x) %>%
: : arrange(-count)
: : count <- sum(df$count)
: : top_place1 <- NA
: : top_place2 <- NA
: : col <- c(x, count, top_place1, top_place2)
: : for(y in 1:nrow(df_id)) {
: : if(y <= 2) {
: : col[y+2] <- df_id[y,]$place
: : }
: : answer <- rbind(answer, col)
: : }
: : [環境敘述]:
: : [關鍵字]:
: method 1是硬幹,可以直接先看method 2
: library(data.table)
: library(stringr)
: library(pipeR)
: DT <- data.table(id = rep(1:5, c(2,2,1,3,1)),
: place = c("A","B","B","C","D","A","C","D","B"),
: count = c(1,1,1,3:1,2,5,1))
: ## method 1:
: setorder(DT, id, -count, -place)
: numRank <- 3
: DT[ , .(lapply(1:numRank, function(i){
: ifelse(length(place) >= i, place[i], "")
: }) %>>% transpose %>>% sapply(str_c, collapse = ",")), by = .(id)] %>>%
: `[`(j = str_c("top_place", 1:numRank) := transpose(str_split(V1, ",")),
: by = .(id)) %>>%
: `[`(j = V1 := NULL) %>>%
: merge(DT[ , .(count = sum(count)), by = .(id)], by = "id")
: # id top_place1 top_place2 top_place3 count
: # 1: 1 A B 2
: # 2: 2 C B 4
: # 3: 3 D 2
: # 4: 4 D C A 8
: # 5: 5 B 1
: ## method 2:
: setorder(DT, id, -count, -place)
: numRank <- 3
: DT[ , rr := length(count) - frank(count, ties.method = "first")+1, by = .(id)]
: DT[rr %in% 1:numRank] %>>%
: dcast(id ~ rr, value.var = "place") %>>%
: setnames(as.character(1:numRank), str_c("top_place", 1:numRank)) %>>%
: merge(DT[ , .(count = sum(count)), by = .(id)], by = "id")
: # id top_place1 top_place2 top_place3 count
: # 1: 1 A B NA 2
: # 2: 2 C B NA 4
: # 3: 3 D NA NA 2
: # 4: 4 D C A 8
: # 5: 5 B NA NA 1
我的作法是這樣:
library(dplyr)
library(magrittr)
library(tidyr)
df <- data.frame(id = rep(1:5, c(2,2,1,3,1)),
place = c("A","B","B","C","D","A","C","D","B"),
count = c(1,1,1,3:1,2,5,1), stringsAsFactors = FALSE) %>% tbl_df()
df %>% group_by(id) %>%
mutate(seq = order(count, decreasing = TRUE),
sumCount = sum(count)) %>%
filter(seq <= 2) %>%
ungroup() %>%
mutate(seqName = sprintf('top_place%s', seq)) %>%
select(-count, -seq) %>%
spread(key = seqName, value = place, fill = NA)
# A tibble: 5 x 4
id sumCount top_place1 top_place2
* <int> <dbl> <chr> <chr>
1 1 2 A B
2 2 4 C B
3 3 2 D <NA>
4 4 8 D C
5 5 1 B <NA>
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 1.160.116.220
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1502630653.A.3AD.html