[問題] 爬蟲程式如何判斷某些連結的檔名

作者martinqqq321 (蓋棉被開冷氣)

看板Python

標題[問題] 爬蟲程式如何判斷某些連結的檔名

時間Mon Mar 1 21:17:25 2021

我寫的程式會先用googlesearch.search去找我想要下載的檔案，之後會用 r = request.get(url) with open(name,’wb’) as f: F.write(r.content) 的方式把檔案存下來最麻煩的地方主要是name的部分，我目前是直接從網址去判斷存下的檔名和副檔名但有時候google search會出現以下的結果： Http://www......./index.php?Action=downloadfile&file=............ 問題就在downloadfile&file後面都是無法辨識的亂碼，請問要如何去偵測這種網址的檔名和檔案類型呢如果直接從chrome去打開這些網址的話，會直接跳到下載檔案的畫面 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 223.140.154.176 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1614604647.A.044.html

→ zerof: https://mdn.io/Content-Disposition 03/02 02:01

→ zerof: or just search MDN for “Content-Disposition” 03/02 02:05

推 cloudandfree: Regular expression 03/05 16:33

推 mychiux413: 你的r.headers裡有線索 03/17 01:29