如何用google app script執行爬蟲(How to use google app script crawler web)

如何用google app script執行爬蟲(How to use google app script crawler web)

google app script當初我是看別人用google sheets裡面寫參數爬蟲,後來我發現該script很像java script,所以就研究一下如何寫!而且他還有定期執行的功能,可說是小型遠端主機,並且還能將資料寫回google sheets跟google drive!真是非常感謝google提供這種免費服務!

程式碼解析

  • 1.先判斷起始網址是否有問題!(Check website)
1
2
3
4
5
6
7
8
9
10
function myFunction() {
var url = "url";
var gresponse = UrlFetchApp.fetch(url,{muteHttpExceptions: true});
if (gresponse.getResponseCode() == 200 ){
var xml = gresponse.getContentText();
var gJson = JSON.parse(xml);
get_latest_new(gJson);
}

}
  • 2.下載資料到google drive(Download image file to google drive)
1
2
3
4
5
6
7
8
9
10
11
function get_img(){
var img = 'image url';
//check same title
if(
!DriveApp.searchFiles('parents = "folder id" and title contains "file name"').hasNext()
){
var blob = UrlFetchApp.fetch(img).getBlob().setName('file name');
DriveApp.getFoldersByName("folder").next().createFile(blob);
//Logger.log(img);
}
}
  • 3.存資料到google drive(Download file to google drive)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
function insert_sheet(){

var sourceID = 'sheets id';
var sourceSpreadSheet = SpreadsheetApp.openById(sourceID);
var srcSheet = sourceSpreadSheet.getSheets()[0];
var lastsr = sourceSpreadSheet.getSheets()[0].getLastRow();//直的行
var lastsc = sourceSpreadSheet.getSheets()[0].getLastColumn();//橫的列

if (lastsr === 0){
lastsr = 1;
}

for (var row = 1; row <= lastsr; row++) {
var range = srcSheet.getRange(row, 1);
var values = range.getValues();
if (values != gId && row == lastsr){
var date = new Date();
srcSheet.getRange(row+1,1).setValue(gId);
srcSheet.getRange(row+1,2).setValue(gcontent);
srcSheet.getRange(row+1,3).setValue(date);
srcSheet.getRange(row+1,5).setValue(gtopics);
}
else if (values == gId){
var content_rang = srcSheet.getRange(row, 2);
var content_values = content_rang.getValues();
//若有更新內文則將舊資訊寄mail 新的更新制試算表
//mail alters update
if (gcontent != content_values){
MailApp.sendEmail("email", gId+" update",content_values);
srcSheet.getRange(row,2).setValue(gcontent);
}
break;
}
}
}

後記

程式碼的部分我截取片段給各位,相信大家多試試看應該就能看得懂我這些模組程式碼如何運用!

下週主題「food」來吃美食啦!

Imgur

0%