Encounters pipeline

This repository contains the encounters pipeline, which finds vessel encounters based on AIS messages.

Running

Dependencies

You just need docker and docker-compose in your machine to run the pipeline. No other dependency is required.

Setup

The pipeline reads it's input from BigQuery, so you need to first authenticate with your google cloud account inside the docker images. To do that, you need to run this command and follow the instructions:

TODO: used global GCP volume now: add correct instructions for that.

docker-compose run gcloud auth application-default login

Overview

The pipeline takes start_date and end_date. The pipeline pads start_date by one day to warm up, reads the data from from source_table and computes encounters over the specified window. In incremental mode, start_date and end_date would be on the same date. The results of this encounter are appended to the specified raw_sink table. A second pipeline is then run over this second table, merging encounters that are close in time into one long encounter and replacing the table specified in sink with the merged results.

CLI

The pipeline includes a CLI that can be used to start both local test runs and remote full runs. Just run docker-compose run pipeline --help and follow the instructions there.

Examples:

In incremental mode, the form of the command is

    docker-compose run create_raw_encounters \
            --source_table SOURCE_TABLE \
            --start_date DATE \
            --end_date DATE \
            --max_encounter_dist_km DISTANCE \
            --min_encounter_time_minutes TIME \
            --raw_table RAW_TABLE \
            --project world-fishing-827 \
            --temp_location gs://world-fishing-827-dev-ttl30d/scratch/encounters \
            --job_name encounters-pip \
            --max_num_workers 200 \
            --setup_file ./setup.py \
            --requirements_file requirements.txt \
            --runner DataflowRunner \
            --disk_size_gb 100 \
            --region us-central1

The raw encounters are then merged together, removing duplicates and merging across day boundaries:

    docker-compose run merge_encounters \
            --raw_table RAW_TABLE \
            --vessel_id_table SEGMENT_TABLE \
            --sink_table MERGED_TABLE \
            --max_encounter_dist_km 0.5 \
            --min_encounter_time_minutes 120 \
            --start_date 2018-01-01 \
            --end_date 2018-12-31 \
            --project world-fishing-827 \
            --temp_location gs://world-fishing-827-dev-ttl30d/scratch/encounters \
            --job_name encounters-merge-test \
            --max_num_workers 50 \
            --setup_file ./setup.py \
            --requirements_file requirements.txt \
            --runner DataflowRunner \
            --disk_size_gb 100 \
            --region us-central1

Currently, raw encounters are created based on segment id, since this is a stable (static) identifier. During the merge process, encounters are merged using vessel id, which does a better job stitching together segments, but is not stable. This is feasible since the merging process happens later in the pipeline and is run across all time on every day.

Note that raw_table needs to be persistent since it is date sharded and new dates are added with each run.

    docker-compose run create_raw_encounters \
            --source_table pipe_production_v20201001.position_messages_ \
            --start_date 2018-01-01 \
            --end_date 2018-01-31 \
            --max_encounter_dist_km 0.5 \
            --min_encounter_time_minutes 60 \
            --raw_table world-fishing-827:machine_learning_dev_ttl_120d.raw_encounters_test_ \
            --project world-fishing-827 \
            --temp_location gs://world-fishing-827-dev-ttl30d/scratch/encounters \
            --job_name encounters-pip \
            --max_num_workers 100 \
            --setup_file ./setup.py \
            --requirements_file requirements.txt \
            --runner DataflowRunner \
            --disk_size_gb 100 \
            --region us-central1


    docker-compose run merge_encounters \
            --raw_table machine_learning_dev_ttl_120d.raw_encounters_test_ \
            --vessel_id_table pipe_production_v20201001.segment_info \
            --sink_table world-fishing-827:machine_learning_dev_ttl_120d.encounters_test_v20210718 \
            --spatial_measures_table world-fishing-827.pipe_static.spatial_measures_20200311 \
            --min_encounter_time_minutes 120 \
            --start_date 2018-01-01 \
            --end_date 2018-01-31 \
            --project world-fishing-827 \
            --temp_location gs://world-fishing-827-dev-ttl30d/scratch/encounters \
            --job_name encounters-merge-test \
            --max_num_workers 50 \
            --setup_file ./setup.py \
            --requirements_file requirements.txt \
            --runner DataflowRunner \
            --disk_size_gb 100 \
            --region us-central1

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
assets		assets
pipeline		pipeline
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGES.md		CHANGES.md
Dockerfile-scheduler		Dockerfile-scheduler
Dockerfile-worker		Dockerfile-worker
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
cloudbuild.yaml		cloudbuild.yaml
docker-compose.yaml		docker-compose.yaml
install.sh		install.sh
main.py		main.py
requirements-scheduler.txt		requirements-scheduler.txt
requirements-worker.txt		requirements-worker.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Encounters pipeline

Running

Dependencies

Setup

Overview

CLI

Examples:

License

About

Releases 15

Packages

Contributors 6

Languages

License

GlobalFishingWatch/encounters_pipeline

Folders and files

Latest commit

History

Repository files navigation

Encounters pipeline

Running

Dependencies

Setup

Overview

CLI

Examples:

License

About

Resources

License

Stars

Watchers

Forks

Releases 15

Packages 0

Contributors 6

Languages

Packages