How to run QGreenland Core

This project uses Docker and docker-compose to run each of its components as services. See Docker’s Getting started guide.

The docker-compose stack runs Luigi (with visualizer at port 8082) as a service for running tasks, as well as NGINX (port 80, 443) for hosting outputs.

Caution

Docker Desktop for OSX has some “gotchas”. Running with “Use gRPC FUSE for file sharing” enabled is recommended. You may see indefinite hangs otherwise. Please reference the Docker documentation for more info:

https://docs.docker.com/desktop/mac/

How to configure the service stack

Development overrides

Development overrides enable:

Build the Docker image from local source instead of using a versioned Docker image
Mount the source code into the Docker container, so the container doesn’t need to be re-built on each change

To set up development overrides on your machine:

ln -s docker-compose.dev.yml docker-compose.override.yml

Envvars

Some envvars are used by the source code, others are used by the docker-compose config.

Mandatory envvars

In order to download data behind Earthdata Login, you must export the following environment variables:

QGREENLAND_EARTHDATA_USERNAME
QGREENLAND_EARTHDATA_PASSWORD

Developers at NSIDC may use the values stored in Vault at the following path: nsidc/apps/qgreenland. Those outside of NSIDC must use their personal Earthdata Login credentials. New users to Earthdata can register here: https://urs.earthdata.nasa.gov/users/new

Optional envvars

The source code looks at these envvars, if set:

QGREENLAND_ENVIRONMENT: defaults to dev
QGREENLAND_ENV_MANAGER: defaults to conda

Optional Docker Compose envvars

Our source code expects to run in a container and has hard-coded path constants. We should move these envvars and defaults into the source code, but for now they’re for configuring the compose stack to route directories on the host to the hard-coded container locations.

QGREENLAND_VERSION: The nsidc/qgreenland docker image tag to use. Defaults to latest.
QGREENLAND_DATA_WORKING_STORAGE: defaults to ./data/working-storage
QGREENLAND_DATA_PRIVATE_ARCHIVE: defaults to ./data/private-archive
QGREENLAND_DATA_LOGS: defaults to ./data/logs

Visit our storage architecture reference documentation to learn more about storage locations.

How to start the service stack

Start the stack with docker-compose:

docker-compose up -d

How to run processing pipelines with the QGreenland CLI

The primary entrypoint for the CLI is ./scripts/cli.sh. This runs the CLI program inside the luigi container, allowing us to kick off pipelines or cleanup data from standard locations without risking destructive actions on the user’s computer.

To run the full pipeline:

./scripts/cli.sh run

To run in parallel:

./scripts/cli.sh run --workers=4

To run only the layers you care about (plus the background, useful for testing, but the final output will not be zipped):

./scripts/cli.sh run \
  --include="background" \
  --include="*my_layerid_mask*"

Collaborators outside NSIDC may want to run QGreenland pipeline without “manual access” layers that require difficult or impossible additional steps to prepare input data. See Assets documentation to learn more about “manual access” assets.

./scripts/cli.sh run \
  --exclude-manual-assets

Inclusion and exclusion flags can be combined arbitrarily. When --include and --exclude are used together, the final result is the set of layers which are included or not excluded. This is different from the set of layers which are included and not excluded.

To cleanup outputs while developing a new layer (deletes WIP and released layers matching mask, WIP and released packages; see --help for more):

./scripts/cli.sh cleanup --dev '*my_layerid_mask*'

See the Luigi documentation for more information on running Luigi if you want to do anything not documented here.

How to debug a Luigi pipeline

Simply add breakpoint() anywhere in the pipeline code, then run the pipeline with 1 worker (the default) and whichever layer(s) you want to debug.