[問題]請教如何加快dataframe的條件判斷

作者liquidbox (樹枝擺擺)

看板Python

標題[問題]請教如何加快dataframe的條件判斷

時間Sun May 14 18:13:58 2023

請問，我有一個近萬個由不重複字串組成的list叫kw_list，以及一個df 範例是['book','money','future','file'] Index sentence 1 This is a book 2 back to the future 3 replace the file 4 come on 5 have a nice weekend 我想要把list中的字串逐一拉出來，跟sentence那個欄位比較，如果sentence欄位有包含該字串（近萬個都要逐一比對）就標上True，否則就False 我建了一個近萬個column的新dataframe，欄位是kw_list 然後跟原本的df合併起來，然後再寫個條件判斷式，若該筆資料的sentence包含該字串，那個column就標上True，不然就False 於是會變成 Index sentence book money future file 1 This is a book TRUE FALSE FALSE FALSE 2 back to the future FALSE FALSE TRUE FALSE 3 replace the file FALSE FALSE FALSE TRUE 4 come on FALSE FALSE FALSE FALSE 5 have a nice weekend FALSE FALSE FALSE FALSE 不意外地，我用迴圈去判斷，跑幾小時都跑不出結果，如下： for kw in kw_list: df.loc[df['sentence'].str.contains(kw),df[kw]]=True 我覺得我把同樣的東西丟到Excel用函數算可能都比較快，請問有什麼方法改寫，讓這個df的運算速度加快嗎 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.225.78.65 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1684059240.A.3FF.html

→ celestialgod: https://i.imgur.com/PkCVaTq.png 不用1秒 05/14 18:32

→ celestialgod: https://pastecode.io/s/3nuedb9a 05/14 18:35

推 poototo: 拿sentence的word來判斷是否存在於kw_list 05/14 19:40

→ lycantrope: pandas: df[kw]=df["sentence"].str.contains(kw) 05/14 21:31

→ lycantrope: for kw in df: 05/14 21:31