[問題] 爬蟲問題..

作者qwertmn (抽筋)

看板Python

標題[問題] 爬蟲問題..

時間Mon Nov 12 10:12:39 2012

我想抓台南縣觀光旅遊局的資料..網址如下 http://tour.tainan.gov.tw/action.aspx?season=spring 不過我用lxml 分析tag 的結構都不對.. 程式碼如下 from lxml import html import urllib2 file = urllib2.urlopen('http://tour.tainan.gov.tw/action.aspx?season=spring') root = html.parse(file).getroot() #這邊都抓不到table... 不過我用chrome 去抓過document tree, 有抓到超過100個... print root.cssselect('table') 不知道我哪邊有做錯了.. 求救.. -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 59.120.142.214

推 CMJ0121:file.read() ?? 11/12 11:09

推 swpoker:有先寫到檔案看看嗎~常見就是編碼或是HTML的DOM有問題 11/12 13:08

→ qwertmn:有試過wget下來ˇˇ.. 不過一樣.. 11/12 20:27

→ qwertmn:html 的dom應該沒有問題>"< 11/12 20:28

→ qwertmn:抓的到body & html.. 不過資料缺一大塊.. 11/12 20:29