Remote Storage
In Enterprise Edition you can not only store files on a hard drive, but also connect Azure Blob Storage, Google Cloud or any S3 compatible storage (i.e. AWS S3).
You can upload files from your PC to connected cloud storage or use already uploaded files from cloud storage as a source (without duplicating it).

How we store files

Supervisely uses DATA_PATH from .env (defaults to /supervisely/data) to keep caches, database and etc. But we are interested in storage subfolder generated content, like uploaded images or neural networks are stored.
You can find two subfolders here:
    <something>-public/
    <something>-private/
That's because we maintain the same structure in local storage as if you would use a remote storage. In that case those two folders are buckets or containers. You may notice that one has "public" in it's name, but it only reflects the kind of data we store in it. Both buckets are private and does not provide anonymous read.

Configure Supervisely to use S3 compatible storage (Amazon S3, Minio)

Edit .env configuration file - you can find it by running supervisely where command.
Change STORAGE_PROVIDER from http (local hard drive) to minio (S3 storage backend).
Also, you need to provide STORAGE_ACCESS_KEY and STORAGE_SECRET_KEY credentials along with endpoint of your S3 storage.
For example, here are settings for Amazon S3:
    STORAGE_ENDPOINT=s3.amazonaws.com
    STORAGE_PORT=443
So in the end, here is how your .env settings could look like:
1
JUPYTER_DOWNLOAD_FILES_BEFORE_START=true
2
STORAGE_JUPYTER_SYNC=true
3
STORAGE_PROVIDER=minio
4
STORAGE_ENDPOINT=s3.amazonaws.com
5
STORAGE_PORT=443
6
STORAGE_ACCESS_KEY=<hidden>
7
STORAGE_SECRET_KEY=<hidden>
Copied!
Execute sudo supervisely up -d to apply the new settings

Configure Supervisely to use Azure Blob Storage

Edit .env configuration file - you can find it by running supervisely where command.
Change STORAGE_PROVIDER from http (local hard drive) to azure (Azure storage backend).
Also, you need to provide STORAGE_ACCESS_KEY (your storage account name) and STORAGE_SECRET_KEY (secret key) credentials along with endpoint of your blob storage.
Here is how your .env settings could look like:
1
JUPYTER_DOWNLOAD_FILES_BEFORE_START=true
2
STORAGE_JUPYTER_SYNC=true
3
STORAGE_ACCESS_KEY=<account name>
4
STORAGE_ENDPOINT=https://<account name>.blob.core.windows.net
5
STORAGE_PROVIDER=azure
6
STORAGE_SECRET_KEY=<secret key 88 chars long or so: aflmg+wg23fWA+6gAafWmgF4a>
Copied!
Execute sudo supervisely up -d to apply the new settings

Configure Supervisely to use Google Cloud Storage

Edit .env configuration file - you can find it by running supervisely where command.
Change STORAGE_PROVIDER from http (local hard drive) to google (GCS backend).
Also, you need to provide STORAGE_CREDENTIALS_PATH credentials file generated by Google.
Here is how your .env settings could look like:
1
JUPYTER_DOWNLOAD_FILES_BEFORE_START=true
2
STORAGE_JUPYTER_SYNC=true
3
STORAGE_PROVIDER=google
4
STORAGE_ENDPOINT=storage.googleapis.com
5
STORAGE_CREDENTIALS_PATH=/gcs.json
Copied!
Now create docker-compose.override.yml under cd $(sudo supervisely where):
1
services:
2
http-storage:
3
volumes:
4
- <path to the secret file>:/gcs.json:ro
Copied!
Execute sudo supervisely up -d to apply the new settings

Migration from local storage

Now, copy your current storage to an S3. As we mentioned before, because we maintain the same structure in local filesystem, copying will be enough.
We suggest to use minio/mc to copy the files.
Run minio/mc docker image and execute the following commands:
1
mc config host add s3 https://s3.amazonaws.com <YOUR-ACCESS-KEY> <YOUR-SECRET-KEY>
2
mc cp <DATA_STORAGE_FROM_HOST>/<your-buckets-prefix>-public s3/<your-buckets-prefix>-public/
3
mc cp <DATA_STORAGE_FROM_HOST>/<your-buckets-prefix>-private s3/<your-buckets-prefix>-private/
Copied!
Finally, restart services to apply new configuration: supervisely up -d.

Keys from IAM Role

If you want to use IAM Role you must specify STORAGE_IAM_ROLE=<role_name> in .env file then STORAGE_ACCESS_KEY and STORAGE_SECRET_KEY variables can be ommited.
IAM Roles are only supported for AWS S3.

Frontend caching

Since AWS and Azure can be quite price in case of heavy reads, we enable image caching by default.
If the image is not in the preview cache but in the STORAGE cache it will be generated and put into previews cache, but it will not be fetched from the remote server.
Here are the default values (you can alter them via docker-compose.override.yml file):
1
services:
2
proxy:
3
environment:
4
CACHE_PREVIEWS_SIZE: 1g
5
CACHE_PREVIEWS_EXPIRES: 12h
6
CACHE_STORAGE_SIZE: 10g
7
CACHE_STORAGE_EXPIRES: 7d
8
CACHE_IMAGE_CONVERTER_SIZE: 10g
9
CACHE_IMAGE_CONVERTER_EXPIRES: 7d
Copied!

Links plugin cloud providers support

If you already have some files on Amazon S3/Google Cloud Storage/Azure Storage and you don't want to upload and store those files in Supervisely, you can use the "Links" plugin to link the files to Supervisely server.
Instead of uploading actual files (i.e. images), you will need to upload .txt file(s) that contains a list of URLs to your files. If your URLs are publicly available (i.e. link looks like https://s3-us-west-2.amazonaws.com/test1/abc and you can open it in your web browser directly), then you can stop reading and start uploading.
If your files are protected, however, you will need to provide credentials in the instance settings or manually create configuration file.
If you are brave enough, you can create configuration files manually:
Example configuration file:
1
# amazon s3 example
2
my-car-datasets:
3
provider: minio
4
endpoint: s3.amazonaws.com
5
access_key: <your access key>
6
secret_key: <your secret key>
7
# iam_role: <or just use your iam role>
8
region: eu-central-1
9
# array of buckets
10
buckets:
11
- cars_2020_20_10
12
- cars_2020_10_10
13
14
# azure storage example
15
my-boats-datasets:
16
provider: azure
17
endpoint: https://<account name>.blob.core.windows.net
18
access_key: <account name>
19
secret_key: <secret key 88 chars long or so: aflmg+wg23fWA+6gAafWmgF4a>
20
# array of buckets
21
buckets:
22
- boats_bucket_2020_20_10
23
- another_boats_bucket_2020_10_10
24
25
# google cloud storage example
26
my-planes-datasets:
27
provider: google
28
endpoint: storage.googleapis.com
29
credentials_path: <path to the secret file inside the container>
30
# array of buckets
31
buckets:
32
- planes_bucket_2020_20_10
33
- another_planes_bucket_2020_10_10
Copied!
Links file structure:
1
<provider name>://<bucket name>/<object name>
Copied!
Links file example:
1
s3://cars_2020_20_10/truck.jpg
2
azure://boats_bucket_2020_20_10/supersonicboat.jpg
3
google://another_planes_bucket_2020_10_10/boeing.jpg
Copied!
Create a new file docker-compose.override.yml under cd $(sudo supervisely where):
1
services:
2
http-storage:
3
volumes:
4
- <path to the configuration file>:/remote_links.yml:ro
Copied!
Then execute the following to apply the changes:
1
sudo supervisely up -d http-storage
Copied!
Google Cloud Storage secret file example, docker-compose.override.yml:
1
services:
2
http-storage:
3
volumes:
4
- <path to the secret file>:/secret_planes.json:ro
Copied!
Last modified 5mo ago