Re: [資料] string hash functions performance

作者reader (讀者)

看板CSSE

標題Re: [資料] string hash functions performance

時間Sun Dec 14 20:51:21 2008

※ 引述《reader (讀者)》之銘言： : 既然講到了演算法的實證資料，就想起這一篇文章： : http://www.fantasy-coders.de/projects/gh/html/x435.html : 算是很不錯的 string hash function performance 資料，不過是 : 德文的，幸好圖表很多，看圖大概也能看懂，也有列出程式碼。 : 我以前是用 DJB2, 一直很煩惱要不要用 FNV, 但看過這一篇之後， : 就決定改用 FNV 了。 : string hash function 在有大量會員，需要高效率的登入功能的 : 高負載網路服務，就顯得十分重要了。現在有新的實證研究了: http://smallcode.weblogs.us/2008/01/22/hash-functions-an-empirical-comparison/ http://smallcode.weblogs.us/2008/02/04/hash-functions-additional-tests/ http://smallcode.weblogs.us/2008/02/12/hash-functions-part-3/ http://smallcode.weblogs.us/2008/06/17/murmur-hash/ 而我自己根據這幾篇研究，試做了一個 x273 的方法: UINT Hash273(const CHAR *key, SIZE_T len) { UINT hash = 0; UINT i = 0; UINT n = (UINT)len & -4; UINT e = (UINT)len - n; for(; i < n; i += 4) { hash = 273 * hash + key[i + 0]; hash = 273 * hash + key[i + 1]; hash = 273 * hash + key[i + 2]; hash = 273 * hash + key[i + 3]; } if(e == 0) return hash; hash = 273 * hash + key[i + 0]; if(e == 1) return hash; hash = 273 * hash + key[i + 1]; if(e == 2) return hash; hash = 273 * hash + key[i + 2]; return hash; } (以上程式風格是為了嵌入這一系列文章的程式碼而寫成這樣的。已做了速度最佳化處理。) 結果是: Words Win32 Numbers Prefix Postfix Variables Shakespeare Bernstein 146 879 426 326 315 651 875 K&R 143 890 867 329 320 657 886 x17 137 848 81 317 299 639 831 x17 unrolled 132 826 84 307 292 622 806 x65599 139 846 207 320 317 639 836 FNV-1a 151 961 88 368 357 693 907 universal 155 981 91 376 366 705 923 Weinberger 168 1205 272 483 472 831 1068 Paul Hsieh 156 840 110 292 275 660 951 One At Time 161 1024 103 393 377 741 961 lookup3 153 846 92 290 278 665 948 Arash Partow 152 978 1046 384 362 717 928 CRC-32 158 1010 79 386 366 719 950 Ramakrishna 152 955 211 370 351 704 925 Fletcher 139 677 1178 261 229 593 1254 Murmur2 135 771 85 265 251 607 831 x273 129 748 70 248 243 591 802 (Adler-32 的結果太糟糕，我直接砍了) 好像就這樣不小心被我弄出了一個在這一張表中，看起來是效率第一名的新字串雜湊函數 XD 所以在這邊推薦使用的公式是: hash(n+1) = hash(n) * 273 + char(n) -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 82.103.134.5 ※ 編輯: reader 來自: 82.103.134.5 (12/14 21:06)

推 AlanSung:(Y) 12/14 22:59

推 tinlans:有跟 ternary search tree 的比較嗎？ 12/15 02:05

→ reader:這跟 ternary search tree 是不能比較的... 不同的東西 12/15 04:54

→ reader:光是 tree 的最佳化就是一個超大麻煩... 12/15 05:22

→ reader:理論上 TST 不必訪問所有字元應該比較快 12/15 05:29

→ reader:但實務上 tree 結構的效能都很可疑這很難處理 12/15 05:31