Docker

BitSky provides Docker images, so you can use docker to run it on local or cloud service. You can find docker images from https://hub.docker.com/u/bitskyai.

Before you continue, please make sure you tried Quick Start first.

We will show you step by step to starting BitSky use Docker images

Docker Images Overview

Let us first take a quick overview of what is each docker image used for.

bitskyai/web-app

This is a BitSky Web Application, similar to BitSky Desktop Application except it doesn't include Headless Producer and HTTP Producer. It has a BitSky Supplier and BitSky UI.

bitskyai/headless-producer

It contains a BitSky Headless Producer. Headless Producer is based on Puppeteer, and it is good for crawl data from Single Page Application or to execute JavaScript on the browser page.

bitskyai/http-producer

It contains a BitSky HTTP Producer. HTTP Producer is based on Axios, and it is good for crawl data from none Single Page Application.

bitskyai/hello-retailer

It contains bitsky-hello-retailer, this is an example retailer that crawl all blogs from https://exampleblog.bitsky.ai/

Setup

Now let us setup BitSky use docker images, and use bitskyai/hello-retailer to test whether set up successful or not.

In the following example, we will use SQLite Database, if you want to use MongoDB, please check the document about Use MongoDB

Before continuing, make sure you install and start Docker

In the following example, please, DON'T uses localhost or 127.0.0.1, use your IP address. Otherwise in the Docker Environment, it cannot access each other.

Start BitSky Web Application

Start BitSky Web Application is very simple. Type the following command in your terminal:

docker run -d -p 9099:9099 bitskyai/web-app

Now you can open http://localhost:9099 in your browser

Now please follow 1. Create a Retailer Configuration to create a Retailer Configuration for Hello Retailer Docker Container.

When you create a Retailer Configuration, DON'T use localhost or127.0.0.1, use your IP address. Otherwise in the Docker Environment, it cannot access each other.

Please follow 4. Create Producer Configurations to create Producer Configurations to Headless Producer Docker Container and HTTP Producer Docker Container

Start Headless Producer Docker Container

Please follow 5. Configure Headless Producer to get Global ID, then type the following command in your terminal:

docker run -d -p 8100:80 \
-e BITSKY_BASE_URL=http://10.0.0.247:9099 \
-e HEADLESS=false \
-e GLOBAL_ID=59791964-de46-43c3-ac32-9a091ec4ac77 \
bitskyai/headless-producer
  1. Don't forget to change GLOBAL_ID

  2. BITSKY_BASE_URL need to use the IP address, don't use localhost or 127.0.0.1

After start successful, you can view Headless Producer by open http://localhost:8100 in your browser

And you can view the OS by open http://localhost:8100/vnc/?password=welcome

Since when start Headless Producer set the environment variable - HEADLESS to true, so you should able to see chrome will be automatically open to crawl data

Start HTTP Producer Docker Container

Please follow 6. Configure HTTP Producer to get Global ID, then type the following command in your terminal:

docker run -d -p 8090:8090 \
-e BITSKY_BASE_URL=http://10.0.0.247:9099 \
-e GLOBAL_ID=815f5465-8feb-4e53-996f-53651d6cfc0d \
bitskyai/http-producer
  1. Don't forget to change GLOBAL_ID

  2. BITSKY_BASE_URL need to use the IP address, don't use localhost or 127.0.0.1

After start successful, you can view HTTP Producer by open http://localhost:8090 in your browser

Start Hello Retailer Docker Container

Please follow 2. Configure Hello Retailer Service to get Global ID, then type the following command in your terminal:

docker run -d -p 8081:8081 \
-e BITSKY_BASE_URL=http://10.0.0.247:9099 \
-e GLOBAL_ID=9901faac-967e-4121-88a2-94275ca09672 \
bitskyai/hello-retailer
  1. Don't forget to change GLOBAL_ID

  2. BITSKY_BASE_URL need to use the IP address, don't use localhost or 127.0.0.1

Crawl Data

After start successful, you can view Hello Retailer by open http://localhost:8081 in your browser

Now open Retailer Configurations, you should see Hello Retailer is connected

Open Producer Configurations, you should see headless and http producer configurations are connected.

If both Retailer Configurations and Producer Configurations are connected, then please follow 7. Activate Producers, after you successfully activate, your Producer Configurations should like this:

Now you can trigger Hello Retailer by click Add init tasks to add a init Task.

After several seconds, when you open Tasks History and should be able to see 14 Tasks.

Now you can view collected data by click View Collected Data

And you should be able to view 10 blogs

Conclusion

With docker images, you can easily deploy BitSky to most of Cloud Services, e.g. AWS, Oracle Cloud, Microsoft Azure, Alibaba Cloud, and so on.