※ 引述《playaround (打滾)》之銘言:
: [問題類型]:
: N*1的資料 轉換成M*16
: [軟體熟悉度]:
: R初學
: [問題敘述]:
: 原始資料(csv檔)資料大致是這樣:
: time1
: a = 5
: b = 70
: c = "rest"
: ...
: ...
: time2
: a = 8
: b = 15
: c = "rest_2"
: ...
: ...
: 想要以16列為單位整理成M*16的矩陣
: 第一列是col標題
: 和每列前面的a,b,c等是row標題
: 類似這樣:
: time a b c ...
: time1 5 70 "rest"
: time2 8 15 "rest_2"
: 有找一些指令好像都是以同col內同樣資料來分組
: 所以不太知道目前需要做的這功能要怎麼處理
: 手機發文,排版請見諒
: 感謝大家
: -----
: Sent from JPTT on my Xiaomi MI 5.
給另外一種方法參考,然後教你怎麼做自動轉型XD
dataStr <- 'time1
a = 5
b = 70
c = "rest"
time2
a = 8
b = 15
c = "rest_2"
time3
a = 1
b = 45
c = "rest_3"'
# 等同於前兩位用readLines讀檔案的txt變數
txt <- strsplit(dataStr, "\n")[[1]]
# 把time也取代成同樣的格式
txt[grepl("time", txt)] <- paste0("time = ", txt[grepl("time", txt)])
# 把每一列切割成 column name跟value兩個,然後用cbind合併全部分割的資料
out <- do.call(cbind, strsplit(txt, "\\s+=\\s+"))
# 取得column names
columnNames <- unique(out[1, ])
# 把每一個column對應的value取成一個list
columnList <- lapply(columnNames, function(colname){
type.convert(out[2 , out[1, ] == colname]) # 取出對應名字的值並做自動轉型
})
# 確定每一個欄位長度都一樣
if (length(unique(sapply(out, length))) != 1)
stop("每個欄位的長度不一樣,請檢查資料")
# 給名字
names(columnList) <- columnNames
# 轉成data.frame
resultDf <- as.data.frame(columnList)
# time a b c
# 1 time1 5 70 "rest"
# 2 time2 8 15 "rest_2"
# 3 time3 1 45 "rest_3"
> str(resultDf)
'data.frame': 3 obs. of 4 variables:
$ time: Factor w/ 3 levels "time1","time2",..: 1 2 3
$ a : int 5 8 1
$ b : int 70 15 45
$ c : Factor w/ 3 levels "\"rest\"","\"rest_2\"",..: 1 2 3
難得一篇完全沒用套件XD
套件版:
library(data.table)
library(stringr)
library(pipeR)
txt <- strsplit(dataStr, "\n")[[1]]
txt[str_detect(txt, "time")] <- str_c("time = ", txt[str_detect(txt, "time")])
outDf <- txt %>>% str_detect("time") %>>% cumsum %>>%
cbind(do.call(rbind, str_split(txt, "\\s+=\\s+"))) %>>%
data.table %>>% setnames(c("id", "var", "value")) %>>%
`[`(j = id := NULL) %>>%
`[`(j = eval(names(.)) := lapply(.SD, type.convert))
# a b c time
# 1: 5 70 "rest" time1
# 2: 8 15 "rest_2" time2
# 3: 1 45 "rest_3" time3
> str(outDf)
Classes ‘data.table’ and 'data.frame': 3 obs. of 4 variables:
$ a : int 5 8 1
$ b : int 70 15 45
$ c : Factor w/ 3 levels "\"rest\"","\"rest_2\"",..: 1 2 3
$ time: Factor w/ 3 levels "time1","time2",..: 1 2 3
- attr(*, ".internal.selfref")=<externalptr>
--
R資料整理套件系列文:
magrittr #1LhSWhpH (R_Language) https://goo.gl/72l1m9
data.table #1LhW7Tvj (R_Language) https://goo.gl/PZa6Ue
dplyr(上.下) #1LhpJCfB,#1Lhw8b-s (R_Language) https://goo.gl/I5xX9b
tidyr #1Liqls1R (R_Language) https://goo.gl/i7yzAz
pipeR #1NXESRm5 (R_Language) https://goo.gl/zRUISx
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 111.253.88.5
※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1503414254.A.448.html
※ 編輯: celestialgod (111.253.88.5), 08/23/2017 01:04:19