[問題] 抓取網頁

作者david31408 (Hope)

看板R_Language

標題[問題] 抓取網頁

時間Fri Aug 12 18:05:15 2016

[軟體熟悉度]: 請把以下不需要的部份刪除入門(寫過其他程式，只是對語法不熟悉) [問題敘述]: 請簡略描述你所要做的事情，或是這個程式的目的大家好，我是R的新手，所以最近在練習想要用XML這個package試著抓取 baseballreference的資料試看看由於很菜，所以就先亂試，程式碼跟提示如下會不會不是所有的網頁都可以用xml抓取? > library("XML", lib.loc="~/R/win-library/3.2") > url <- "http://www.baseball-reference.com/leaders/H_career.shtml" > Hits <- readHTMLTable(url) Error in UseMethod("xpathApply") : no applicable method for 'xpathApply' applied to an object of class "NULL" 在上面的case中，不知道為什麼會出現這樣的error message 但我猜網頁本身不是table 後來又試了方法2 > url <- "http://www.baseball-reference.com/leaders/H_career.shtml" > x <- xmlParse(url) Error message 如下 Specification mandate value for attribute itemscope attributes construct error Couldn't find end of Start Tag html line Extra content at the end of the document Error: 1: Specification mandate value for attribute itemscope 2: attributes construct error 3: Couldn't find end of Start Tag html line 1 4: Extra content at the end of the document 可能baseballreference防止這樣? 謝謝大家教學 :) [關鍵字]: MLB, XML -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.109.55.227 ※ 文章網址: https://www.ptt.cc/bbs/R_Language/M.1470996319.A.26D.html

→ andrew43: 你在板上先爬個文吧。 08/12 20:26

→ andrew43: 另外，你這樣「亂試」不是學習的好方法。多看說明文件 08/12 20:27

→ andrew43: 和前人的例子。 08/12 20:27

→ david31408: 謝謝這算是爬蟲嗎? 08/12 20:33

→ celestialgod: 是爬蟲 08/12 22:20

→ david31408: 了解！！謝謝:) 08/12 23:43