作者wtchen (沒有存在感的人)
看板C_and_CPP
標題[問題] 想用c++抓取以cgi產生的網頁
時間Thu Apr 3 22:49:50 2014
開發平台(Platform): (Ex: VC++, GCC, Linux, ...)
Linux + Eclipse + CDT
額外使用到的函數庫(Library Used): (Ex: OpenGL, ...)
socket programming?
問題(Question):
我想用c++寫一個抓取一個以cgi產生的表格,例如以下這個
http://ppt.cc/-wJ~ (cgi網址很長故縮網址)
目前是用網路上抓到的範例可以抓首頁webbook.nist.gov
但是如果要抓子目錄(例如webbook.nist.gov/chemistry/fluid/就失敗了)
我是想用c++寫個interface去輸出這個表格(輸入想要的參數後)
這樣可以用輸出的表格去做計算
餵入的資料(Input):
cgi的需要參數
預期的正確結果(Expected Output):
該網址輸出的表格
錯誤結果(Wrong Output):
程式碼(Code):(請善用置底文網頁, 記得排版)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <unistd.h>
char* host = "webbook.nist.gov";
int port = 80;
int main(void)
{
char buffer[512];
int isock;
struct sockaddr_in pin;
struct hostent * remoteHost;
char message[512];
int done = 0;
int chars = 0;
int l = 0;
if( (remoteHost = gethostbyname(host)) == 0 )
{
printf("Error resolving host\n");
exit(1);
}
bzero(message,sizeof(message));
bzero(&pin,sizeof(pin));
pin.sin_family = AF_INET;
pin.sin_port = htons(port);
pin.sin_addr.s_addr = ( (struct in_addr *)(remoteHost->h_addr) )->s_addr;
if( (isock = socket(AF_INET, SOCK_STREAM, 0)) == -1)
{
printf("Error opening socket!\n");
exit(1);
}
sprintf(message, "GET / HTTP/1.1\r\n");
strcat(message, "Host:www.hao123.com\r\n");
strcat(message, "Accept: */*\r\n");
strcat(message, "User-Agent: Mozilla/4.0(compatible)\r\n");
strcat(message, "connection:Keep-Alive\r\n");
strcat(message, "\r\n\r\n");
printf("%s",message);
if( connect(isock, (const sockaddr*) &pin, sizeof(pin)) == -1 )
{
printf("Error connecting to socket\n");
exit(1);
}
if( send(isock, message, strlen(message), 0) == -1)
{
printf("Error in send\n");
exit(1);
}
struct timeval timeout = {1,0};
setsockopt(isock, SOL_SOCKET, SO_RCVTIMEO, (char *)&timeout, sizeof(struct
timeval));
while(done == 0)
{
l = recv(isock, buffer, 1, 0);
if( l < 0 )
done = 1;
switch(*buffer)
{
case '\r':
break;
case '\n':
if(chars == 0)
done = 1;
chars = 0;
break;
default:
chars++;
break;
}
printf("%c",*buffer);
}
do
{
l = recv(isock, buffer, sizeof(buffer) - 1, 0);
if( l < 0 )
break;
*(buffer + l) = 0;
fputs(buffer, stdout);
}while( l > 0 );
close(isock);
return 0;
}
補充說明(Supplement):
這code是從網路上抓的,我只改了url
是可以跑但是不知道下一步該怎麼做
感謝幫忙!
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 90.27.31.118
※ 文章網址: http://www.ptt.cc/bbs/C_and_CPP/M.1396536593.A.4EF.html
→ diabloevagto:你自己看不懂程式的那邊? 04/03 23:52
→ wtchen:(後面一串該cgi要的參數設定) 04/04 00:05
→ wtchen:結果是Error resolving host 04/04 00:05
→ wtchen:不能只改char* host = "webbook.nist.gov";這行 04/04 00:06
→ takingblue:試試看在http header中帶param或path看看 04/04 00:20
→ wtchen:解決了,原來是加在sprintf(message, "GET後面 04/04 00:40
→ wtchen:感謝! 04/04 00:40