Cause of the incidentA few days ago, I had to help my friend review the Captain Group of Bilibili. Searching the Captain List one by one is naturally not the first choice for a programmer. The right thing to do is to hand over the task to the computer and let it do it by itself. Theory established, start coding.
So I spent a little time to write this crawler, which I called bilibili-live-captain-tools 1.0 const axios = require('axios') const roomid = "146088" const ruid = "642922" const url = `https://api.live.bilibili.com/xlive/app-room/v2/guardTab/topList?roomid=${roomid}&ruid=${ruid}&page_size=30` const Captin = { 1: 'Governor', 2: 'Admiral', 3: 'Captain' } const reqPromise = url => axios.get(url); let CaptinList = [] let UserList = [] async function crawler(URL, pageNow) { const res = await reqPromise(URL); if (pageNow == 1) { CaptinList = CaptinList.concat(res.data.data.top3); } CaptinList = CaptinList.concat(res.data.data.list); } function getMaxPage(res) { const Info = res.data.data.info const { page: maxPage } = Info return maxPage } function getUserList(res) { for (let item of res) { const userInfo = item const { uid, username, guard_level } = userInfo UserList.push({ uid, username, Captin: Captin[guard_level] }) } } async function main(UID) { const maxPage = await reqPromise(`${url}&page=1`).then(getMaxPage) for (let pageNow = 1; pageNow < maxPage + 1; pageNow++) { const URL = `${url}&page=${pageNow}`; await crawler(URL, pageNow); } getUserList(CaptinList) console.log(search(UID, UserList)) return search(UID, UserList) } function search(uid, UserList) { for (let i = 0; i < UserList.length; i++) { if (UserList[i].uid === uid) { return UserList[i]; } } return 0 } module.exports = { main } Obviously, this crawler can only be triggered manually, and it needs a command line and node environment to run directly, so I opened a page service for it with Koa2 and wrote an extremely simple page const Koa = require('koa'); const app = new Koa(); const path = require('path') const fs = require('fs'); const router = require('koa-router')(); const index = require('./index') const views = require('koa-views') app.use(views(path.join(__dirname, './'), { extension: 'ejs' })) app.use(router.routes()); router.get('/', async ctx => { ctx.response.type = 'html'; ctx.response.body = fs.createReadStream('./index.html'); }) router.get('/api/captin', async (ctx) => { const UID = ctx.request.query.uid console.log(UID) const Info = await index.main(parseInt(UID)) await ctx.render('index', { Info, }) }); app.listen(3000); Since the page has no throttling and anti-shake, the current version can only be crawled in real time, the waiting time is long, and frequent refreshes will naturally trigger B station's anti-crawler mechanism, so the current server IP is subject to risk control. So bilibili-live-captain-tools 2.0 was born function throttle(fn, delay) { var timer; return function () { var _this = this; var args = arguments; if (timer) { return; } timer = setTimeout(function () { fn.apply(_this, args); timer = null; // Clear the timer after executing fn after delay. At this time, timer is false and throttle trigger can enter the timer}, delay) } } Add throttling and anti-shake, and use pseudo real-time crawler (crawl once a minute through scheduled tasks) In this case, we need to execute the crawler script regularly. At this time, I thought of using the schedule function of egg, but I don’t want to make a crawler program so "overkill". When I am in doubt, I just search on Baidu. So we have the following plan Use Node Schedule to implement scheduled tasksNode Schedule is a flexible cron and non-cron job scheduler for Node.js. It allows you to schedule a job (an arbitrary function) to be executed on specific dates, with optional recurrence rules. It only uses one timer at any given time (instead of re-evaluating upcoming jobs every second/minute). 1. Install node-schedulenpm install node-schedule # or yarn add node-schedule 2. Basic UsageLet’s take a look at the official examples. const schedule = require('node-schedule'); const job = schedule.scheduleJob('42 * * * *', function(){ console.log('The answer to life, the universe, and everything!'); }); The first parameter of schedule.scheduleJob needs to be entered according to the following rules Node Schedule rules are shown in the following table
Understand the rules and implement one yourself const schedule = require('node-schedule'); // Define a time let date = new Date(2021, 3, 10, 12, 00, 0); // Define a task let job = schedule.scheduleJob(date, () => { console.log("Current time:",new Date()); }); The above example means that the time will be reported at 12:00 on March 10, 2021. 3. Advanced UsageIn addition to the basic usage, we can also use some more flexible methods to implement scheduled tasks. 3.1. Execute once every minute const schedule = require('node-schedule'); // Define rules let rule = new schedule.RecurrenceRule(); rule.second = 0 //Execute once every minute at 0 seconds //Start the task let job = schedule.scheduleJob(rule, () => { console.log(new Date()); }); The rule supports the following values: second, minute, hour, date, dayOfWeek, month, year, etc. Some common rules are shown in the following table:
4. Termination of the taskYou can use cancel() to terminate a running task. When an abnormality occurs in a task, cancel the task in time job.cancel(); Summarizenode-schedule is a crontab module for Node.js. We can use scheduled tasks to maintain the server system, allowing it to perform certain necessary operations at a fixed time period. We can also use scheduled tasks to send emails, crawl data, etc. This is the end of this article about implementing scheduled crawlers with Nodejs. For more relevant Nodejs scheduled crawlers content, please search for previous articles on 123WORDPRESS.COM or continue to browse the following related articles. I hope everyone will support 123WORDPRESS.COM in the future! You may also be interested in:
|
>>: win2008 server security settings deployment document (recommended)
ab command principle Apache's ab command simu...
1. What is a servlet 1.1. Explain in official wor...
Generally speaking, in order to get more complete...
Using the <img> element with the default sr...
background As we all know, nginx is a high-perfor...
#include <asm/io.h> #define ioremap(cookie,...
This article introduces some issues about HTML ta...
Preface The best method may not be the one you ca...
Table of contents 1. Email 2. Mobile phone number...
1. When designing a web page, determining the widt...
Table of contents 1. Introduction to built-in obj...
I had nothing to do, so I bought the cheapest Ali...
1. Download the required kernel version 2. Upload...
1. getBoundingClientRect() Analysis The getBoundi...
This article example shares with you the specific...