作者left ()
看板Python
標題一直沒有辦法read html檔
時間Mon Dec 3 02:19:57 2012
各位大大 幫忙一下了
目前一直卡在這個檔沒法子讀
http://www.cmlab.csie.ntu.edu.tw/~left/index813.html
with open('index813.html','r') as page:
for each_line in page:
print(each_line.strip())
出現下面error 一直處理不掉
UnicodeDecodeError: 'cp950' codec can't decode bytes in position 1256-1257: illegal multibyte sequence
args = ('cp950', b'50"> 6/19</td>\n<td width="120">toughroleX</td...</div>\n</div>\n</div>\n</div>\n</body>\n</html>', 1256, 1258, 'illegal multibyte sequence')
encoding = 'cp950'
end = 1258
object = b'50"> 6/19</td>\n<td width="120">toughroleX</td...</div>\n</div>\n</div>\n</div>\n</body>\n</html>'
reason = 'illegal multibyte sequence'
start = 1256
with_traceback = <built-in method with_traceback of UnicodeDecodeError object>
--
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.112.217.224
→ os653:應該不是不能讀,是不能 print 12/03 11:48
→ os653:抱歉誤導,是不能讀沒錯 XD 12/03 12:02
推 swpoker:奇怪~我可以讀勒~沒有特別處理編碼 12/05 10:40