[問題] pandas 條件加總

作者a5170040 (Andy)

看板Python

標題[問題] pandas 條件加總

時間Fri Jan 11 21:09:46 2019

我有一組資料如下 Vendor Document Date Clearing Date Invoice_Amount 0 A 09/13/2016 11/04/2016 2,007,324.85 1 A 04/18/2016 07/11/2016 631,714.68 2 A 09/13/2016 09/16/2016 4,000,000.00 3 A 07/11/2017 09/23/2017 5,000,000.00 4 A 05/03/2016 06/17/2016 2,000,000.00 --------------------------------------------------------------- Vendor Document Date Clearing Date Invoice_Amount 1158 H 2017-04-21 2017-06-28 3,000,000.00 1159 H 2017-04-25 2017-05-19 1,000,000.00 1160 H 2017-11-03 2017-12-11 4,500,000.00 1161 H 2018-03-15 2018-05-27 3,500,000.00 1162 H 2018-02-21 2018-05-03 1,500,000.00 想要新增一個欄位，這個欄位的每一列會加總過去6個月內已經付款的數目(相同的Vendor) 每一個row i 1. 要去比較Document Date[i]有沒有大於整個資料的'Clearing Date' 2. 要去篩出在Document Date[i]建立以前的六個月內，有那些樣本 3. 要去篩出Vendor[i]在整個樣本的Vendor有哪些目前寫法如下，是可以正確算出答案的，但實際的資料有10萬多筆，計算時間非常久想請問是否有更快的方法？目前想說用df.apply(lambda...)，但一直寫不出來 import pandas as pd df = pd.read_csv('E:\data.csv') df['Document Date'] = pd.to_datetime(df['Document Date'],format="%m/%d/%Y") df['Clearing Date'] = pd.to_datetime(df['Clearing Date'],format="%m/%d/%Y") df["Sum_Paid"] = "" for i in df.index: Vendor= df.loc[i,"Vendor"] Doc_Date= df.loc[i,"Document Date"] Six_Month = Doc_Date - pd.Timedelta(days=180) df.loc[i,"Sum_Paid"] = df.loc[(df["Vendor"] == Vendor) & (df["Clearing Date"] < Doc_Date) & (df["Document Date"] >= Six_Month),"Invoice_Amount"].count() -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.226.91.237 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1547212189.A.464.html

推 Luluemiko: def test(row): six_month = Document Date - .... 01/11 22:08

→ Luluemiko: return df.loc[(df["Vendor"] == Vendor).... 01/11 22:09

→ Luluemiko: df['Sum_Paid'] = df.apply(test, axis = 1) 01/11 22:10

→ a5170040: 好像不行耶...能再提示的更完整些嗎? 01/13 22:46

→ a5170040: 多謝L大的回答 01/13 22:47

→ Luluemiko: 我誤解了，因為有作搜尋，所以我的寫法是錯的 01/13 23:57