[討論] file BOM

作者superbear (bear)

看板Python

標題[討論] file BOM

時間Fri Jan 21 02:58:07 2011

之前有點事需要 parse file，結果被 file encoding 搞了兩個小時後來我是這樣做 import os, sys, codecs def test_file_encoding(file_path): file_encoding = sys.getfilesystemencoding() bom_len = 0 with open(file_path, 'r') as f: head = f.read(5) if head[:len(codecs.BOM_UTF16_LE)] == codecs.BOM_UTF16_LE: file_encoding = 'utf-16-le' bom_len = 1 elif head[:len(codecs.BOM_UTF16_BE)] == codecs.BOM_UTF16_BE: file_encoding = 'utf-16-be' bom_len = 1 elif head[:len(codecs.BOM_UTF8)] == codecs.BOM_UTF8: file_encoding = 'utf-8' bom_len = 1 return (file_encoding, bom_len) def parse_file(file_path): (file_encoding, bom_len) = test_file_encoding(file_path) with codecs.open(file_path, mode='r', encoding=file_encoding) as f: f.read(bom_len) for line in f: # do my job 雖然這樣是 ok 啦可是總覺得這問題應該有現成解才對.... 請問有沒有不必自己這麼辛苦的方法 XD -- 天地のはざまに迷えし古来より生まれし邪悪な精霊よ聖なる処女の柔肌に纏いし衣の雷で汚れも濁りも淀みも凝りも微塵に砕いて天地に返す！悔い改めよ! -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 114.32.118.49

→ holio:chardet.feedparser.org Universal Encoding Detector 01/21 17:20

推 cobrasgo:對啊，呼叫現在的chardet就可以了，不過準確度和輸入的 01/21 22:35

→ cobrasgo:字數/內容有關就是 01/21 22:36