Make life productive - 让生命富有成效
BitSky is open-source software for extracting the data from websites(web crawler) and web automation jobs with headless chrome and puppeteer, in a fast, simple, scalable, and extensible way.
Compare with other web crawling and web scraping frameworks or libraries(e.g. Scrapy, Apify SDK), what is unique features BitSky has:
- 1.BitSky has a desktop application for MacOS, Windows OS, and Ubuntu, and already pre-installed packages you need. So you don't need to spend time installing or configure the environment for Python, NodeJS, or other programming languages.
- 2.BitSky supports all programming languages(e.g. Python, Java, NodeJS, and so on), so you can use the programming language you already familiar with, don't need to learn a new programming language just for web crawling.
Except for those unique features, BitSky also has the following features:
- 1.Crawling any type of websites. BitSky can crawl static websites or single page application
- 2.Based on microservices architecture, naturally support distributed, easy to scalable, and extendable
With BitSky you just need to focus on extract data, and other work, BitSky will do for you.
A Supplier creates a chain between Retailer and Producer. A Supplier includes all the functions that manage Retailer Configurations, manage Producer Configurations, receive Tasks from a Retailer, and assign Tasks to suitable Producers, and move success or fail Tasks to Task History
Configuration for a Producer, it controls a Producer whether can execute Tasks and how to execute Tasks. A Producer MUST connect to a Producer Configuration before it can be assigned Tasks and a Producer Configuration is one to one relationship with a Producer.
Configuration for a Retailer, it has information about a Retailer. For example Base URL, Health Check URL, and receive Tasks URL. A Retailer MUST connect to a Retailer Configuration before it can create and receive Tasks and a Retailer Configuration is one to one relationship with a Retailer.
A Producer MUST connect to a Producer Configuration and both Producer Configuration and Producer should have the same type.
Retailer creates Tasks and sends to Supplier, Supplier assign Tasks to suitable Producers, after Producers successfully execute Tasks, will send Tasks back to Retailer, send back Tasks will contain crawled data(e.g.
HTML), Retailer can extract useful information from received Tasks or create more Tasks. Retailer also needs to decide where to store extract data and use what kind of format. Most of your time is working on creating your own Retailer.