看板 R_Language 關於我們 聯絡資訊
目標是使用grplasso 來挑選重要變數 就是使用group lasso 我的資料是response 是連續的 是房價 regressor 是 類別的 像是建物型態 ex 公寓大廈 高樓大廈 透天厝等等 我的資料類似長這樣 因為有很多個變數 所以要使用group lasso 房價 建物型態 坪數 100 公寓大廈 20 但是我看不懂 那個 grplasso 的 index怎麼用 不知道可不可以請強者稍微示範一下 程式碼該怎麼打 下面附上範例的code ## Use the Logistic Group Lasso on the splice data set data(splice) ## Define a list with the contrasts of the factors contr <- rep(list("contr.sum"), ncol(splice) - 1) names(contr) <- names(splice)[-1] ## Fit a logistic model fit.splice <- grplasso(y ~ ., data = splice, model = LogReg(), lambda = 20, contrasts = contr, standardize = TRUE) ## Perform the Logistic Group Lasso on a random dataset set.seed(79) n <- 50 ## observations p <- 4 ## variables ## First variable (intercept) not penalized, two groups having 2 degrees ## of freedom each index <- c(NA, 2, 2, 3, 3) 主要就是這行看不懂 看起來好像是 X1 X2 X3 X4 分別有2,2,3,3個 level 可是我去明明 x1 x2 x3 都是連續的阿 ## Create a random design matrix, including the intercept (first column) x <- cbind(1, matrix(rnorm(p * n), nrow = n)) colnames(x) <- c("Intercept", paste("X", 1:4, sep = ""))lambdamax 9 par <- c(0, 2.1, -1.8, 0, 0) prob <- 1 / (1 + exp(-x %*% par)) mean(pmin(prob, 1 - prob)) ## Bayes risk y <- rbinom(n, size = 1, prob = prob) ## binary response vector ## Use a multiplicative grid for the penalty parameter lambda, starting ## at the maximal lambda value lambda <- lambdamax(x, y = y, index = index, penscale = sqrt, model = LogReg()) * 0.9^(0:30) ## Fit the solution path on the lambda grid fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(), penscale = sqrt, control = grpl.control(update.hess = "lambda", trace = 0)) ## Plot coefficient paths plot(fit) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.114.237.189 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1528395196.A.196.html
VIATOR: 第2和3變數是group 2,第3和4變數是group 3 09/13 23:58