精華區beta R_Language 關於我們 聯絡資訊
[問題類型]: 效能諮詢(我想讓R 跑更快) 好像在哪曾看過較簡易的寫法或function,但一時想不起,也沒找到,寫了比較複雜的 code,想請問是否有更快或更簡易的方式做到 [軟體熟悉度]: 請把以下不需要的部份刪除 入門(寫過其他程式,只是對語法不熟悉) [問題敘述]: 請簡略描述你所要做的事情,或是這個程式的目的 Merge some data tables by the same key, 但若有相同的variables則合併時要相加, 不管NA,data tables彼此間的行、列數均不同 [程式範例]: library(data.table) library(dplyr) # testing data, assuming merge by key = "SP" set.seed(NULL) x <- matrix(sample(1e6), 1e5) %>% data.table() %>% setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))] y <- matrix(sample(1e5), 1e4) %>% data.table() %>% setnames(1:10,sample(LETTERS,10)) %>% .[,SP:=seq_len(nrow(.))] z <- matrix(sample(4e5), 2e4) %>% data.table() %>% setnames(1:20,sample(LETTERS,20)) %>% .[,SP:=seq_len(nrow(.))] # function.. try to write Rcpp function.. require(Rcpp) cppFunction('NumericVector addv(NumericVector x, NumericVector y) { NumericVector out(x.size()); NumericVector::iterator x_it,y_it,out_it; for (x_it = x.begin(), y_it=y.begin(), out_it = out.begin(); x_it != x.end(); ++x_it, ++y_it, ++out_it) { if (ISNA(*x_it)) { *out_it = *y_it; } else if (ISNA(*y_it)) { *out_it = *x_it; } else { *out_it = *x_it + *y_it; } } return out;}') ### merge two data.table with different columns/rows, ### and summing identical column names outer_join2 <- function (df1,df2,byNames) { tt=intersect(colnames(df1)[-match(byNames,colnames(df1))], colnames(df2)[-match(byNames,colnames(df2))]) df <- merge(df2,df1[,-tt,with=F],by=byNames,all=T) dt <- merge(df2[,-tt,with=F],df1[,c(byNames,tt),with=F],by=byNames,all=T) %>% .[,tt,with=F] for (j in colnames(dt)) {set(df,j=j,value=addv(df[[j]],dt[[j]]))} return (df) } # get results, 參考c大 #1LaHm_aH (R_Language) system.time(Reduce(function(x, y) outer_join2(x, y, byNames="SP"), list(x,y,z))) 用了較多行code來完成這件事,速度上似乎還可以,但不確定是否有更好的寫法?謝謝! [關鍵字]: 選擇性,也許未來有用 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.112.65.48 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1444640089.A.EE0.html