作者okeyla (小寶)
看板Python
標題[問題] Python網路爬蟲,POST payload似乎有暗碼
時間Tue Jul 25 21:17:50 2017
我想取出HiLife的店鋪資訊,
解析網頁是POST方法,
可是似乎payload這兒有暗碼,一直搞不定.
取得到的僅有default的台北市中正區的商店檔案,無法切換得到其他地區的資料.
[Code如下]
import requests
from bs4 import BeautifulSoup
head = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/59.0.3071.115 Safari/537.36'
}
payload = {
'__EVENTTARGET':'AREA',
'__EVENTARGUMENT':'',
'__LASTFOCUS':'',
'__VIEWSTATE':'/wEPDwULLTE0NjI2MjI3MjMPZBYCAgcPZBYMAgEPZBYCAgEPFgIeBFRleHQFLiQoJyNzdG9yZUlucXVpcnlfc3RyZWV0JykuYXR0cignY2xhc3MnLCdzZWwnKTtkAgMPEA8WBh4NRGF0YVRleHRGaWVsZAUJY2l0eV9uYW1lHg5EYXRhVmFsdWVGaWVsZAUJY2l0eV9uYW1lHgtfIURhdGFCb3VuZGdkEBUSCeWPsOWMl+W4ggnln7rpmobluIIJ5paw5YyX5biCCeWunOiYree4ownmlrDnq7nnuKMJ5qGD5ZyS5biCCeiLl+agl+e4ownlj7DkuK3luIIJ5b2w5YyW57ijCeWNl+aKlee4ownlmInnvqnnuKMJ6Zuy5p6X57ijCeWPsOWNl+W4ggnpq5jpm4TluIIJ5bGP5p2x57ijCemHkemWgOe4ownmlrDnq7nluIIJ5ZiJ576p5biCFRIJ5Y+w5YyX5biCCeWfuumahuW
4ggnmlrDljJfluIIJ5a6c6Jit57ijCeaWsOeruee4ownmoYPlnJLluIIJ6IuX5qCX57ijCeWPsOS4reW4ggnlvbDljJbnuKMJ5Y2X5oqV57ijCeWYiee+qee4ownpm7LmnpfnuKMJ5Y+w5Y2X5biCCemrmOmbhOW4ggnlsY/mnbHnuKMJ6YeR6ZaA57ijCeaWsOerueW4ggnlmInnvqnluIIUKwMSZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnFgECB2QCBQ8QDxYGHwEFCXRvd25fbmFtZR8CBQl0b3duX25hbWUfA2dkEBUWBuS4reWNgAbmnbHljYAG5Y2X5Y2ABuilv+WNgAbljJfljYAJ5YyX5bGv5Y2ACeilv+Wxr+WNgAnljZflsa/ljYAJ5aSq5bmz5Y2ACeWkp+mHjOWNgAnpnKfls7DljYAJ54OP5pel5Y2ACeixkOWOn+WNgAnlkI7ph4zljYAJ5r2t5a2Q5Y2ACeWkp+mbheWNgAnnpZ7lsqHlj
YAJ5aSn6IKa5Y2ACeaymem5v+WNgAnmoqfmo7LljYAJ5riF5rC05Y2ACeWkp+eUsuWNgBUWBuS4reWNgAbmnbHljYAG5Y2X5Y2ABuilv+WNgAbljJfljYAJ5YyX5bGv5Y2ACeilv+Wxr+WNgAnljZflsa/ljYAJ5aSq5bmz5Y2ACeWkp+mHjOWNgAnpnKfls7DljYAJ54OP5pel5Y2ACeixkOWOn+WNgAnlkI7ph4zljYAJ5r2t5a2Q5Y2ACeWkp+mbheWNgAnnpZ7lsqHljYAJ5aSn6IKa5Y2ACeaymem5v+WNgAnmoqfmo7LljYAJ5riF5rC05Y2ACeWkp+eUsuWNgBQrAxZnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnZ2dnFgECBGQCBw8PFgIfAAUJ5Y+w5Lit5biCZGQCCQ8PFgIfAAUG5YyX5Y2AZGQCCw8WAh4LXyFJdGVtQ291bnQCAhYEZg9kFgJmDxUFBEg2NDYP5Y+w5Lit5aSq5bmz5bqXIOW
PsOS4reW4guWMl+WNgDQwNOWkquW5s+i3rzcy6JmfCzA0LTIyMjkwOTI4Azg3MGQCAQ9kFgJmDxUFBDQzMTgP5Y+w5Lit5rC45aSq5bqXM+WPsOS4reW4guWMl+WNgDQwNOWkquWOn+i3r+S6jOautTI0MOiZn+S4gOaok+WFqOmDqAswNC0yMzY5MDA1NwM4NzFkZFHxmtQaBu2Yr9cvskfEZMWn57JLRfjPYBFYDy+tHr6X',
'__VIEWSTATEGENERATOR':'B77476FC',
'__EVENTVALIDATION':'/wEdACtWrrgS52/ojbuYEYvRDXHZ2ryV+Ed5kWYedGp5zjHs3Neeeo/9TTvNTdElW+hiVA25mZnLEQUYPOZFLnuVu9jOT+Zq1/xceVgC7GxWRM+A8tOS3xZBjlhgzlx5UN3H3D0UrdtoyeScvRqxFL8L3gGKRyCJu029oItLX7X6c7SW7C7IVzuAeZ6t9kFMeOQus7MtrV7YeOXrlOP8inI96UkaJEU7Ro3FtK29+B+NamR2j4qInKVwJ4+JD3cjWm5buZdnOhT/ISzrljaf+F9GnVjm4dGchVglf1PxMMHl7EEoLjs20TZ856RDCGXvzK/6J+tEFp7zDvFTYGoeHtuHy+YF/IoR/CRFBAaEkys48FIAUCSUKnxACPyW6Ar2guIADjOqYue7v4fhV1jIq65P/lwanoaJpIsboCbjakbTYnqK8BLngMayrRehyT58dmj3SbzY1mOtzSNnakdpUxaC0EpOJ7rhB52A2FKsx
y5EbP0PwHHuHNMa9dit0AxPMfYUP1/LWuYPWMX0W8tyEMKxoUcYsCb+qJLF9yXPgM6c8sIQTRxcBokm1PGzFN4M6vnSF8OfFSC+c0frLZ4GH6l497B/5oDIjq7Bz4/cPeGCavvh9NUqPcmzJIr8Abx9vjtMGpZSwBdVY3bR/ARswIDrmWLt1qMD4jcRvGPxBa8nsRR8HNdVINbR+iOSFLwVhBCg+s+mV5NeTdOKvAeggfOsJHmJKL0ApQSCyjY5kEiOvo2JAI07C08ENIFF7HpDTaGCi93i2WnmdDrYoaoLZi96dRTlk4xoWV9tc7rd9X/wE6QoKHxFtADSz9WkgtbUn88lAhY2++OiqWCaQZobh7K26ndH1z34JXVB7C/AiOEV+CCb97oVyooxWullV44iFQ0isVBjYC1XWS3eGf1PwMS++A+EjQTkl9VJhIRDoS6sg2mD7mikimBjQGvZX/lcYtKSrjY=',
'CITY':'台中市',
'AREA':'北區'
}
res = requests.post('
http://www.hilife.com.tw/storeInquiry_street.aspx',
data=payload, headers = head)
res.encoding = 'utf-8'
print res.text
可否給個建議呢?
謝謝!
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 118.160.98.32
※ 文章網址: https://www.ptt.cc/bbs/Python/M.1500988677.A.E90.html
→ a0919610611: 可能每次server 都隨機生成一個token,看那個token 07/25 22:42
→ a0919610611: 能不能爬到摟 07/25 22:42
→ leo850611: 多觀察幾次找出動態值,再想辦法取得那個動態值 07/26 00:15
→ leo850611: 通常在當前頁面的html裡 07/26 00:26
推 sky800507: 要用session 07/26 00:30
→ coeric: 三個底線的值 都是隨機產生的 去找出來在哪生成的 07/26 02:54
→ HenryLiKing: 你是說viewstate那些嗎? 07/26 08:58
→ HenryLiKing: 那個用session後你爬到的第一個網頁裡面可以找到 07/26 08:58
→ HenryLiKing: 之後你每次爬新的網頁都要再找一次喔! 07/26 08:58
→ coeric: 樓上正確 07/26 11:41
→ HenryLiKing: 被認同感到興奮xd 07/26 12:22