[問題] list中插入資料效能的問題

作者icetofux ()

看板Python

標題[問題] list中插入資料效能的問題

時間Tue Feb 21 12:31:54 2017

假設有個list中有上百萬筆資料, 我希望每隔1024筆就插入32筆0xFF, 寫了程式如下: def AppendData(raw_list): # 製作一個空 list 用來存放資料. new_list = list() # 製作一個32筆0xFF的list當作插入用的資料. insert_data = 32*[0xFF] for addr in range(0, len(raw_list), 1024): # 列出處裡進度, 平時會關掉以加快處理速度. print(addr) # 以1024筆為單位將資料複製到新list並插入32筆0xFF. new_list = new_list + raw_list[addr:(addr+1024)] + insert_data return (new_list) 已結果來看目的是達成了可是效率奇差無比, 目前光是4百萬筆的測試用資料就要跑將近1分鐘, 未來實際使用時資料量可能是數十倍甚至百倍. 請問像這樣的需求有更好的寫法嗎? 謝謝. -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 211.72.212.239 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1487651517.A.CBD.html

→ os653: 用字串的join方法試試？你用+的可能會一直要記憶體喔 02/21 13:07

目前修改一下程式觀察是哪個步驟在吃時間: def AppendData(raw_list): # 製作一個空 list 用來存放資料. new_list = list() # 製作一個32筆0xFF的list當作插入用的資料. insert_data = 32*[0xFF] for addr in range(0, len(raw_list), 1024): # 列出處理進度, 平時會關掉以加快處理速度. t1 = time.time() print(address) # 以1024筆為單位將資料複製到新list並插入32筆0xFF. t2 = time.time() temp = raw_list[addr:(addr+1024)] t3 = time.time() new_list = new_list + temp + insert_data t4 = time.time() print("print:", (t2-t1), ", slice:", (t3-t2), ", list add:", (t4-t3)) return (new_list) output: print: 0.0 , slice: 0.0 , list add: 0.015630722045898438 意外的slice不怎麼花時間, 所以時間應該都是用在把資料加入新的list. 或許一開始先把新的list所需空間算好做出來, 再用raw_data去replace會比較好, 因為這樣就不用一直要記憶體了: def AppendDataV2(raw_list): # 製作一個空 list 用來存放資料. new_list_size = int((len(raw_list)/1024)*(1024+32)) new_list = new_list_size * [0xFF] for addr in range(0, len(raw_list), 1024): # 列出處理進度, 平時會關掉以加快處理速度. print(addr) # 以raw_list中1024筆為單位取代new_list中的資料. new_start = (int((addr/1024)*(1024+32))) new_stop = new_start + 1024 new_list[new_start:new_stop] = raw_list[addr:(addr+1024)] return (new_list) 處理4百萬筆資料時間從95.8秒降至1.6秒, 效果十分顯著. 感謝兩位幫忙, 雖然我沒有去嘗試使用字串及join來處理, 但從兩位的建議中找到了方向, 十分感謝. 推 Yshuan: 要快就是轉string builder處理再轉回來 ※ 編輯: icetofux (211.72.212.239), 02/21/2017 14:09:06

→ huei820504: 為什麼不直接用insert? 03/06 03:05