Re: [問題] hashmap 的效能 (300mb檔案)

作者cyberwizard (Gavin)

看板java

標題Re: [問題] hashmap 的效能 (300mb檔案)

時間Mon Aug 27 15:13:04 2012

假設有 a.txt 以及 b.txt 兩個檔案 a.txt b.txt c1 c2 c3 c1 c2 c3 c4 將a.txt 轉成 map int first; while((line = br.readLine()) != null) { first = line.indexOf(" "); map.put(line.substring(first + 1).hashCode(), // 只存 hashcode line.substring(0, first)); } 不存 c2, c3 字串，可省下 2/3 記憶體假設取代 b.txt 中 c4 int last; while((line = br.readLine()) != null) { first = line.indexOf(" "); last = line.lastIndexOf(" "); bw.write(line.substring(0, last) + " " //c1 c2 c3 + map.get(line.substring(first + 1, last).hashCode()) //c4 + "\n"); } 實測約30秒內跑完，看電腦狀況 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.123.85.140

→ lovdkkkk:如果不同 string hashCode 一定不同的話可行 08/27 16:10

推 PsMonkey:[亂入] 用 Scanner.hasNext() 會不會更省咧？ 08/27 16:57

推 luoqr:基本上面對這樣的資料量...選用資料庫解法會簡單省事很多 XD 08/27 20:06

→ lovdkkkk:少了一千萬對 key value, 改用一千萬次 db query :x 08/27 20:19

→ luoqr:join column就好了? :$ 08/27 20:23

→ lovdkkkk:5F 突破盲點了! 08/27 20:34

推 love112302:謝謝!!! 從來沒有想過要用 hashCode 的方式 QQ 08/30 10:25