作者darklimit ()
看板Python
標題[問題] 爬蟲相關的疑問
時間Tue Aug 20 19:04:00 2013
之前有寫了一個爬yahoo字典的而且確認沒問題
今天重新跑發現很奇怪的問題
程式碼如下
from bs4 import BeautifulSoup
req = urllib2.Request("
http://tw.dictionary.yahoo.com/dictionary?p=good")
html = urllib2.urlopen(req)
htmls = html.read()
html.close
soup = BeautifulSoup(htmls) #到這一行就會出錯
以下內容是錯誤訊息
Traceback (most recent call last):
File "<pyshell#29>", line 1, in <module>
soup = BeautifulSoup(html)
File "C:\Python26\lib\site-packages\bs4\__init__.py", line 168, in __init__
self._feed()
File "C:\Python26\lib\site-packages\bs4\__init__.py", line 181, in _feed
self.builder.feed(self.markup)
File "C:\Python26\lib\site-packages\bs4\builder\_lxml.py", line 72, in feed
self.parser.close()
File "parser.pxi", line 1110, in lxml.etree._FeedParser.close
(src/lxml/lxml.etree.c:73063)
XMLSyntaxError: no element found
想請問是出了什麼錯誤?
謝謝
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 140.135.114.19
→ grapherd:html tag可能有問題, lxml下fromstring報錯, HTML沒問題 08/20 20:02
→ qwertmn:我用2.7跑正常 08/21 00:07
→ darklimit:恩,有找到了,yahoo那邊把tag改掉了 08/21 08:29
→ darklimit:所以導致後面tag有問題,^^謝謝 08/21 08:30