[問題] 用regex提取2到3位的數字

作者luenchang (luen)

看板R_Language

標題 [問題] 用regex提取2到3位的數字

時間Mon Dec 21 12:17:10 2020

各位先進，我想從字串中提取出數字的部分。我的字串有規律性，結構上，開頭是2到3位的數字，空白，接著不等位數的字母，或字母和數字。我想提取的是開頭的部分。我試了兩個方法，方法 1只拿出數字的最末位，方法2拿出完整的數字。我不知道方法1的regex寫法有什麼錯。以下是我的字串及code # Strings to extract strings <- c("130 UDINE", "162 BF02", "163 AS04", "164 AL08", "165 BR12", "166 S A13", "167 MA14", "167 MA14", "168 OC15", "85 BERGAMO") # Method 1 to extract the beginning part of the strings (not working) gsub(pattern = "^(\\d){2,3}(\\s).*", replacement = "\\1", x=strings) # [1] "0" "2" "3" "4" "5" "6" "7" "7" "8" "5" # Method 2 to extract the beginning part of the strings (not working) gsub(pattern = "^(\\d+)(\\s).*", replacement = "\\1", x=strings) # [1] "130" "162" "163" "164" "165" "166" "167" "167" "168" "85" 謝謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 110.174.219.126 (澳大利亞) ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1608524232.A.14D.html

推 celestialgod: library(stringr); str_extract_all(strings, ”\\d 12/21 13:11

→ celestialgod: {2,3}”) 12/21 13:11

→ andrew43: 用tstrsplit很直覺 12/21 13:23

→ andrew43: data.table::tstrsplit(strings, " ", keep = 1)[[1]] 12/21 13:24

→ resentis: 大概是\\d的數量範圍要跟緊\\d 12/21 20:30

→ resentis: "^(\\d{2,3})(\\s).*" 12/21 20:30

推 JuanMaestrow: str_extract(string, regex(“^\\d+”)) 就可以囉 12/21 21:45