[問題] 讀取檔案後使用正規表示法將字串列出

作者schedule6666 (schedule)

看板Python

標題[問題] 讀取檔案後使用正規表示法將字串列出

時間Tue Aug 8 20:03:54 2017

小妹為Python超新手，如果問了奇怪的問題，還請大家包涵。最近在練習在pycharm讀取電腦中的檔案。檔案內容如下: Joe's email is joe@gmail.com Tom's email is tom@gmail.com 檔名:email.txt pycharm裡的檔案為: import sys import os import re fp= open("C:\\Users\\haha\\Desktop\\email.txt","r") text =fp.readlines() print(text) for w in text: sent= re.match(r'([\w.-]+@[\w.-])+',w) print(sent) fp.close() 預期會print出來的樣子如下: joe@gmail.com tom@gmail.com 但編譯執行後出現的樣子如下: ["Joe's email is joe@gmail.com\n", "Tom's email is tom@gmail.com"] None None <_sre.SRE_Match object at 0x02174960> 想請問版上各位高手是哪個地方出錯了呢? -- 那個…我剛剛發現原因了… 因為正規表示法接受的格式為string，但我從上面的檔案讀的格式是清單(list) 這就是我無法編譯執行的原因…我把清單讀成字串就可以了。程式如下: import string import sys import os import re fp= open("C:\\Users\\haha\\Desktop\\email.txt","r") text =fp.readlines() text1=''.join(text) print(text1) sent = re.findall(r"[\w.-]+@[\w_.-]+", text1) print(sent) fp.close() 感謝各位高手的幫忙^^ ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.231.24.145 ※ 文章網址: https://www.ptt.cc/bbs/Python/M.1502193837.A.3C2.html

推 APM99: 你要用 re.search 08/08 20:31

→ APM99: re.match在一開始沒成功就跳過了 08/08 20:32

→ ntumath: sent= re.findall(r'[\w.-]+@[\w.-]+',w) 08/08 20:32

已按照樓上二位的建議將re.mach改成 re.search及re.findall但都遇到了同樣的錯誤... 錯誤如下(以findall為例) ["Joe's email is joe@gmail.com\n", "Tom's email is tom@gmail.com\n", 'scheule@gmail.com'] Traceback (most recent call last): File "C:/Users/mlchen/PycharmProjects/untitled/Regular_expression.py", line 21, in <module> sent = re.findall(r"[\w.-]+@[\w_.-]+", text) File "C:\Python27\lib\re.py", line 181, in findall return _compile(pattern, flags).findall(string) TypeError: expected string or buffer Process finished with exit code 1 ※ 編輯: schedule6666 (36.231.24.145), 08/08/2017 21:08:24 ※ 編輯: schedule6666 (36.231.24.145), 08/08/2017 21:27:16 ※ 編輯: schedule6666 (36.231.24.145), 08/08/2017 21:28:11 ※ 編輯: schedule6666 (36.231.24.145), 08/08/2017 21:30:50

推 APM99: 我python36不用那樣樣也可以的縮QQ 08/08 22:16

→ schedule6666: 對耶，我的是python2.7說…看來該update一下了 08/08 22:28

→ schedule6666: 總之，還是非常感謝APM大大的幫忙 ^^ 08/08 22:30

※ 編輯: schedule6666 (36.231.24.145), 08/09/2017 03:08:19

→ coeric: 你從txt讀到的是一整串的字串，先把他變成list吧..... 08/09 10:19

→ coeric: 你要直接變成字串，用re去找也ok 08/09 10:20

→ coeric: text=text.split() #會變成list 08/09 10:22

→ coeric: 如果你只是單純要抓到email,直接轉成字串用findall找 08/09 10:27

→ coeric: 如果還要針對每一個email做動作，先把它切開成list 08/09 10:28

→ coeric: 才方便做後續動作....否則，你findall以後，要再做一次for 08/09 10:28

Hello,Coeric,不好意思，因為剛寫Python所以有點不太懂…，照你的說法，所以我在一開始讀檔案的時候，python是預設將txt檔裡面的東西讀成字串，然後我要自已讀成list嗎? 第二個問題是，因為我只是練習要用正規表示法去抓e-mail，所以不太懂為什麼要讀 list呢? 才方便做後續動作是指要把資料存在MySQL嗎? ※ 編輯: schedule6666 (36.231.24.145), 08/09/2017 16:07:16 ※ 編輯: schedule6666 (36.231.24.145), 08/09/2017 16:09:13

推 ntumath: fp.read() --> str | fp.readlines() --> list 08/09 16:50

→ ntumath: 如果用read，你就不用多加text1了 08/09 16:51

→ ntumath: 不過在這種case我會選做dict啦，名字對email，方便就好 08/09 16:53

→ coeric: fp.readlines() 會多個很討厭的\n，在做資料處理時 08/09 23:11

→ coeric: 我很討厭中間多一堆沒必要的東西，例：\n \t 之類的 08/09 23:11

→ coeric: 修正上面說的，我會選擇使用text=text.split('\n') 08/09 23:12