看板 Python 關於我們 聯絡資訊
各位先進 最近初接觸爬蟲 想用 Beautifulsoup 抓類似下面網站的內容 http://propaccess.trueautomation.com/clientdb/?cid=81 但送了 Post 以後卻無回傳值 看起來是沒有讓伺服器收到form data 想向各位求救 幫忙看看 code 哪裡需要做修正 目前想依照 Owner name 搭配 Advanced 裡面的顯示條件來爬蟲 import requests from bs4 import BeautifulSoup import re from decimal import Decimal import pandas as pd import urllib index_url = 'http://propaccess.trueautomation.com/clientdb/?cid=81' session = requests.Session() #Get session cookies (session ID) index_request = session.get(index_url) r = urllib.request.urlopen(index_url) soup = BeautifulSoup(r, 'lxml') viewstate = soup.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"}) viewstategenerator = soup.findAll( "input", {"type": "hidden", "name": "__VIEWSTATEGENERATOR"}) eventvalidation = soup.findAll( "input", {"type": "hidden", "name": "__EVENTVALIDATION"}) formdata = { "propertySearchOptions%3AsearchType:": "Owner Name", "propertySearchOptions%3AownerName": 'smith', "propertySearchOptions%3Ataxyear": "2016", "propertySearchOptions%3ApropertyType": 'Mineral', "propertySearchOptions%253AorderResultsBy": "Owner Name", "propertySearchOptions%253ArecordsPerPage": "250", "__EVENTVALIDATION": eventvalidation[0]['value'], "__VIEWSTATE": viewstate[0]['value'], "__VIEWSTATEGENERATOR": viewstategenerator[0]['value'], "propertySearchOptions%253Asearch": "Search"} response_post = session.post(index_url, data= formdata) soup_post = BeautifulSoup(response_post.text, 'lxml') 感謝大神們 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 72.182.18.161 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1506364435.A.3FE.html ※ 編輯: getterb (72.182.18.161), 09/26/2017 02:49:13
coeric: 你最後的對象應該是找錯人了 09/26 08:18
getterb: 自問自答 後來用selenium來取cookies有成功 09/26 11:31