# How to run QGreenland Core

This project uses Docker and `docker-compose` to run each of its components as services.
See Docker's [Getting started guide](https://docs.docker.com/get-started/).

The `docker-compose` stack runs Luigi (with visualizer at port 8082) as a service for
running tasks, as well as NGINX (port 80, 443) for hosting outputs.

```{caution}
Docker Desktop for OSX has some "gotchas". Running with "Use gRPC FUSE for file sharing"
_enabled_ is recommended. You may see indefinite hangs otherwise. Please reference the
Docker documentation for more info:

https://docs.docker.com/desktop/mac/
```


## How to configure the service stack

### Development overrides

Development overrides enable:

* Build the Docker image from local source instead of using a versioned Docker image
* Mount the source code into the Docker container, so the container doesn't need to be
  re-built on each change

To set up development overrides on your machine:

```
ln -s docker-compose.dev.yml docker-compose.override.yml
```


### Envvars

Some envvars are used by the source code, others are used by the `docker-compose`
config.


#### Mandatory envvars

In order to download data behind Earthdata Login, you must `export` the
following environment variables:

* `QGREENLAND_EARTHDATA_USERNAME`
* `QGREENLAND_EARTHDATA_PASSWORD`

Developers at NSIDC may use the values stored in Vault at the following path:
`nsidc/apps/qgreenland`. Those outside of NSIDC must use their personal
Earthdata Login credentials. New users to Earthdata can register here:
https://urs.earthdata.nasa.gov/users/new


##### Optional envvars

The source code looks at these envvars, if set:

* `QGREENLAND_ENVIRONMENT`: defaults to `dev`
* `QGREENLAND_ENV_MANAGER`: defaults to `conda`


#### Optional Docker Compose envvars

Our source code expects to run in a container and has hard-coded path constants. We
should move these envvars and defaults into the source code, but for now they're for
configuring the compose stack to route directories on the host to the hard-coded
container locations.

* `QGREENLAND_VERSION`: The `nsidc/qgreenland` docker image tag to use. Defaults to
  `latest`.
* `QGREENLAND_DATA_WORKING_STORAGE`: defaults to `./data/working-storage`
* `QGREENLAND_DATA_PRIVATE_ARCHIVE`: defaults to `./data/private-archive`
* `QGREENLAND_DATA_LOGS`: defaults to `./data/logs`

Visit our [storage architecture reference
documentation](../reference/architecture/storage.md) to learn more about storage
locations.


## How to start the service stack

Start the stack with docker compose:

```
docker compose up -d
```


## How to run processing pipelines with the QGreenland CLI

The primary entrypoint for the CLI is `./scripts/cli.sh`. This runs the CLI
program inside the `luigi` container, allowing us to kick off pipelines or
cleanup data from standard locations without risking destructive actions on the
user's computer.

To run the full pipeline:

```
./scripts/cli.sh run
```

To run in parallel:

```
./scripts/cli.sh run --workers=4
```

To run only the layers you care about (plus the background, useful for
testing, but the final output will not be zipped):

```
./scripts/cli.sh run \
  --include="background" \
  --include="*my_layerid_mask*"
```

Collaborators outside NSIDC may want to run QGreenland pipeline without "manual
access" layers that require difficult or impossible additional steps to prepare
input data. See [Assets](../reference/architecture/configuration.md#assets)
documentation to learn more about "manual access" assets.

```
./scripts/cli.sh run \
  --exclude-manual-assets
```

Inclusion and exclusion flags can be combined arbitrarily. When `--include` and
`--exclude` are used together, the final result is the set of layers which are
included _or_ not excluded. This is different from the set of layers which are
included _and_ not excluded.

To cleanup outputs while developing a new layer (deletes WIP and released
layers matching mask, WIP and released packages; see `--help` for more):

```
./scripts/cli.sh cleanup --dev '*my_layerid_mask*'
```

See the [Luigi
documentation](https://luigi.readthedocs.io/en/stable/running_luigi.html) for
more information on running Luigi if you want to do anything not documented
here.


### How to debug a Luigi pipeline

Simply add `breakpoint()` anywhere in the pipeline code, then run the pipeline
with 1 worker (the default) and whichever layer(s) you want to debug.