[問題] 網頁上的全域比對

作者adu (^_^)

看板Perl

標題[問題] 網頁上的全域比對

時間Mon Nov 2 14:50:53 2009

在文件當中全域比對使用/g就可以，不過我同樣的方式套用到網頁上沒有產生功能不知道是不是Mechanize有另外的改法？是延伸之前的問題，把要查詢的部分丟上網路，然後抓取部分結果下來當我要比對的部分超過一個時，就只比對到第一個就輸出了不知道有沒有辦法將整個網頁都掃過？實際的例子：於PDBsum中輸入2v69(id) http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl 抓取Uniprot後面的數字此處的input為 "2v69" 理想的output為 "2v69, P00877, P00873" 不過現在的code只能抓到 "2v69, P00877" 後面的會漏掉。目前的script： #!/usr/bin/perl use WWW::Mechanize; my $file = "input.txt"; my $ofile = ">output.txt"; my $checkURL = "http://www.ebi.ac.uk/pdbsum/"; open FILE, $file or die "File open error!!"; open FILE2, $ofile or die "File open error!!"; my $mech = WWW::Mechanize -> new(); my $result; while(<FILE>){ chomp; $_=~ s/ //g; $mech -> get($checkURL); $mech -> submit_form( form_number => 1, fields => { template => "main.html", EBI => "TRUE", pdbcode => $_, }, ); if($mech->content=~/http:\/\/www.uniprot.org\/uniprot\/(\D\S\S\S\S\S)/msg) { print FILE2 "$_ , $1\n"; } else {print FILE2 "$_ \n"; } } close FILE; close FILE2; ~~~~ 主要是用最後面一段的if($mech->content=~比對原始碼 -- 再次感謝曾協助過我的版大m(__ __)m -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.114.88.228

推 freshroger:@arr = ( $mech->content = ...... ); 11/02 23:10

→ freshroger:for $entry (@arr) { $output .= ",$entry";} 11/02 23:10

→ freshroger:print FILE2 "$output\n"; 11/02 23:10

→ freshroger:記得前面加上 my $output; $output .= $_; 11/02 23:11

→ freshroger:如果你要取少數的data,這樣ok,多的話建議直接下載dat檔 11/02 23:14

→ freshroger:再一次parse :) 11/02 23:14

推 freshroger:這網址給你參考 http://research.isb-sib.ch/ssmap/ 11/02 23:26

了解！謝謝板大:D 補上一個用while做出來的： if($mech->content=~/(http:\/\/www.uniprot.org\/uniprot\/\D\d\d\d\d\d)/ms) { my $line=$mech->content; print FILE2 "$_ , "; while ( $line =~ s/http:\/\/www.uniprot.org\/uniprot\/(\D\d\d\d\d\d)//ms) { print FILE2 "$1 "; } print FILE2 ",\n"; } else {print FILE2 "$_ \n"; } } 很硬來就是..XD ※ 編輯: adu 來自: 140.114.88.228 (11/03 09:37)