[問題] 爬蟲post&header

作者unhumanWu (阿文)

看板Python

標題[問題] 爬蟲post&header

時間Sat Sep 23 17:10:36 2017

大家好，初學爬蟲最近需要在https://www.taiwanmobile.com/mobile/storelbs/lbs.html# 擷取店點相關資訊爬之前的文章發現header似乎是問題所在於是把header全部放進去，但好像沒有用... 想請教各位大大該如何克服呢？感恩程式碼如下： form_data = {"city":"台北市", "district":"松山區", "lat":"25.0464207", "lng":"121.5555859", "searchDistance":"-1"} headers = {"Accept":"application/json, text/javascript, */*; q=0.01", "Accept-Encoding":"gzip, deflate, br", "Accept-Language":"zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4", "Connection":"keep-alive", "Content-Length":"127", "Content-Type":"application/x-www-form-urlencoded; charset=UTF-8", "Cookie":"_msuuid_558dza12683=D58CE660-772F-4345-A7C6-B1B732FB85F0; \ JSESSIONID=nt+dIYPkeJGODFLtAALzUKsu; _ga=GA1.2.142560397.1498299011; \ _gid=GA1.2.594796640.1506128988", "Host":"www.taiwanmobile.com", "Origin":"https://www.taiwanmobile.com", "Referer":"https://www.taiwanmobile.com/mobile/storelbs/lbs.html", "User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWe\ bKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36", "X-Requested-With":"XMLHttpRequest"} response_post = requests.post("https://www.taiwanmobile.com/mobile/stor\ elbs/lbs.html#", data = form_data, headers=headers) response_post.encoding = 'utf-8' soup_post = BeautifulSoup(response_post.text, "lxml") -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 111.241.210.128 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1506157839.A.F0D.html

推 vi000246: headers有些看起來像亂碼的有時是發起請求時才產生 09/23 23:16

→ vi000246: 你直接寫死有些網站會擋因為不是當下產生的header 09/23 23:16