[問題] 爬蟲問題

作者iftrush (等一個人的回信,沒錯是你)

看板Python

標題[問題] 爬蟲問題

時間Tue Jul 17 10:11:56 2018

小弟爬蟲新手目前正在爬字典(已成功用網頁API爬出意思) 假如我想爬apple(不使用API) 從page source裡知道意思在下面程式碼的content裡 <meta name="twitter:description" content=" "/> 我要如何用findall 或是 find 找到這句然後print出content的" "裡的意思? 自己寫的程式碼 from urllib.request import urlopen from bs4 import BeautifulSoup def DictRequest(word): html = urlopen("https://www.merriam-webster.com/dictionary/"+ word) bsobj = BeautifulSoup(html.read(), 'html') meaning = bsobj.findAll('meta', name = 'twitter:description') TypeError: find_all() got multiple values for argument 'name' -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 73.223.41.252 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1531793524.A.411.html

→ bibo9901: .select('meta[name="twitter:description"]')[0] 07/17 10:17

→ iftrush: 有辦法只產生" "裡的東西嗎? 07/17 10:28

→ iftrush: 我自己是可以把meaning = str(meaning) 07/17 10:35

→ iftrush: return meaning[15:-53] 07/17 10:36

→ iftrush: 還是有其他方法可以用? 07/17 10:36

→ TitanEric: 建議用with statement去抓urlopen 07/17 10:55

推 coeric: findAll('meta', attrs={'name':'twitter:description'}) 07/17 10:57

→ coeric: 我自己習慣用attrs # . 這之類的我比較記不住 07/17 11:00