Re: [討論] 資料擷取程式發想

作者Spanner (孝任)

看板Soft_Job

標題Re: [討論] 資料擷取程式發想

時間Wed May 14 15:16:47 2014

※ 引述《StupidGaGa (笨嘎嘎)》之銘言： : 分析或拆網頁的話，也有幾個方法， : 01. Json、Xxml : 02. Html Agility Pack : 03. string : 01的話，直接反序列化就好，最快， : 02的話，稍微學一下，蠻簡單的。 : 03的話，通常會用string.IndexOf或string.Split 我自己是用XDocument，首先擷取回來的碼先用HtmlAgilityPack轉成標準xml 用XDocument直接下語法去查(擷取物件) 例如找出原始碼中table元素 id=table4的所有資料 XElement table = (from t in xdoc.Descendants("table") where t.Attribute("id") != null && t.Attribute("id").Value == "table4" select t).Single(); //擷取每個row List<XElement> trList = table.Descendants("tr").ToList(); //first row is headers for (int i = 1; i < trList.Count; i++) { //crawl each cell data. .....略 .....略 } -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 220.135.50.34 ※ 文章網址: http://www.ptt.cc/bbs/Soft_Job/M.1400051809.A.1BA.html