[問題] 爬蟲相關的疑問

作者darklimit ()

看板Python

標題[問題] 爬蟲相關的疑問

時間Tue Aug 20 19:04:00 2013

之前有寫了一個爬yahoo字典的而且確認沒問題今天重新跑發現很奇怪的問題程式碼如下 from bs4 import BeautifulSoup req = urllib2.Request("http://tw.dictionary.yahoo.com/dictionary?p=good") html = urllib2.urlopen(req) htmls = html.read() html.close soup = BeautifulSoup(htmls) #到這一行就會出錯以下內容是錯誤訊息 Traceback (most recent call last): File "<pyshell#29>", line 1, in <module> soup = BeautifulSoup(html) File "C:\Python26\lib\site-packages\bs4\__init__.py", line 168, in __init__ self._feed() File "C:\Python26\lib\site-packages\bs4\__init__.py", line 181, in _feed self.builder.feed(self.markup) File "C:\Python26\lib\site-packages\bs4\builder\_lxml.py", line 72, in feed self.parser.close() File "parser.pxi", line 1110, in lxml.etree._FeedParser.close (src/lxml/lxml.etree.c:73063) XMLSyntaxError: no element found 想請問是出了什麼錯誤? 謝謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.135.114.19

→ grapherd:html tag可能有問題, lxml下fromstring報錯, HTML沒問題 08/20 20:02

→ qwertmn:我用2.7跑正常 08/21 00:07

→ darklimit:恩，有找到了，yahoo那邊把tag改掉了 08/21 08:29

→ darklimit:所以導致後面tag有問題，^^謝謝 08/21 08:30