[問題] 網頁爬蟲 讀不到完整資料

作者ckcy ( )

看板Python

標題[問題] 網頁爬蟲讀不到完整資料

時間Wed Jan 25 22:33:09 2017

大家好我想讀下面網頁的表格 http://pchome.megatime.com.tw/stock/sto3/ock1/sid6505.html 雖然在在document的sid6505.html裡看得到需要的資料但是程式讀出來卻讀不出來只能讀到下面幾行想請教這個問題該如何解決非常謝謝 <html> <head> </head> <body> <form id='submit_form' name='submit_form' action='http://pchome.megatime.com.tw/stock/sto3/ock1/sid6505.html' method='post'> <input type='hidden' name='is_check' value='1' /> </form> <script type="text/javascript"> document.getElementById('submit_form').submit(); </script> </body> </html> 程式碼： import requests res = requests.get("http://pchome.megatime.com.tw/stock/sto3/ock1/sid6505.html") print (res.text) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 123.192.239.185 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1485354796.A.810.html ※ 編輯: ckcy (123.192.239.185), 01/25/2017 22:57:26

→ Neisseria: 那個網站是用 JS 生成的，只用 requests 會爬不到 01/25 23:38

→ Neisseria: 要用 Selenium 或其他類似的工具才爬得到 01/25 23:39

→ s860134: 不太對喔，是你 header 沒給對，所以他把你擋掉囉~ 01/26 00:40

→ s860134: 測了一下，server　檢查的是 'Referer' 這個　header 01/26 00:46

→ s860134: https://goo.gl/2NlaF6 01/26 00:48

→ Neisseria: 歹勢，搞錯了 @@ 01/26 04:27

→ ckcy: 謝謝s大！！問題解決了 01/26 21:00