[問題] urllib2可以開啟多檔案嗎? - 看板 Python - 批踢踢實業坊

作者darklimit ()

看板Python

標題[問題] urllib2可以開啟多檔案嗎?

時間Thu Apr 11 19:10:15 2013

現在資料夾裡有HTML的檔案 data = urllib2.urlopen("file:///D:/路徑/xxx.html") 這樣單檔案讀取是沒有問題的!!! 但如果現在要讀取整個資料夾下的HTML檔案詢問寫法應該要怎麼寫謝謝 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.135.114.123

→ grapherd:import glob; html_list = glob.glob("*.htm*") 04/11 19:36

→ grapherd:html_list = map(lambda x: open(x), html_list) 04/11 19:37

→ grapherd:讀取local的html應該不用用到urllib2, 用open就夠了 04/11 19:37

→ darklimit:因為要使用html_parser進行分析，所以應用urllib2 04/11 21:17

→ darklimit:如果這樣用data = urllib2.urlopen 04/11 21:18

→ darklimit:("file:///D:/路徑/*.html") 這樣會錯誤 04/11 21:19

→ qwertmn:html_parser和urllib 應該沒有關係吧= ="" 04/11 22:06

→ grapherd:改用urllib.urlopen就可以了，不過效果跟open是一樣的 04/11 22:26

→ grapherd:不知道你是用什麼html parser？ 04/11 22:28

→ darklimit:用*.html讀取會錯誤 04/11 22:58

→ darklimit:先定義function，然後 04/11 22:59

→ darklimit:f = formatter.NullFormatter() 04/11 23:00

→ darklimit:html_parser = SearchLinks(f) 04/11 23:01

→ darklimit:data = urllib2.urlopen("file:///d:/路徑/xxx.html") 04/11 23:03

→ darklimit:html_parser.feed(data.read()) 04/11 23:03

→ darklimit:html_parser.close() 04/11 23:03

→ darklimit:links = html_parser.get_links() 04/11 23:04

→ grapherd:這樣當然錯誤了，資料夾內沒有 "*.html"這個檔案啊 04/11 23:04

→ darklimit:因為檔案不只一個，所以在應用for 迴圈處理 04/11 23:04

→ darklimit:擷取到想要的資料 04/11 23:05

→ grapherd:用最上面的方法，然後把open(x)改成open(x).read() 04/11 23:05

→ darklimit:那要讀取所有檔案，應該用什麼方式寫? 04/11 23:05

→ grapherd:然後 html_list 內的資料就是每一個html file的字串了 04/11 23:05

→ darklimit:我試試看，謝謝 04/11 23:06

→ qwertmn:個人偏向1f 推薦的glob 來找檔案.. 04/11 23:14