Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating an incremental table for vendor vehicle message age by day. #3477

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

fsalemi
Copy link
Contributor

@fsalemi fsalemi commented Sep 25, 2024

Creating a new incremental table for vendor vehicle position message age by day.

Resolves issue #3478

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Yes,
poetry run dbt run -s +fct_daily_vendor_vehicle_positions_message_age_summary

20:17:15 Finished running 12 view models, 11 table models, 1 incremental model in 0 hours 4 minutes and 39.96 seconds (279.96s). 20:17:15 20:17:15 Completed successfully 20:17:15 20:17:15 Done. PASS=24 WARN=0 ERROR=0 SKIP=0 TOTAL=24

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

Copy link

github-actions bot commented Sep 25, 2024

Warehouse report 📦

Checks/potential follow-ups

Checks indicate the following action items may be necessary.

  • For new models, do they all have a surrogate primary key that is tested to be not-null and unique?

New models 🌱

calitp_warehouse.mart.gtfs_quality.fct_daily_vendor_vehicle_positions_message_age_summary

DAG

Legend (in order of precedence)

Resource type Indicator Resolution
Large table-materialized model Orange Make the model incremental
Large model without partitioning or clustering Orange Add partitioning and/or clustering
View with more than one child Yellow Materialize as a table or incremental
Incremental Light green
Table Green
View White

@vevetron
Copy link
Contributor

I'm helping @fsalemi install pre-commit hooks into his environment.

INNER JOIN {{ ref('dim_gtfs_service_data') }} AS GSD
ON VPM.gtfs_dataset_key = GSD.gtfs_dataset_key
WHERE {{ incremental_where(default_start_var='PROD_GTFS_RT_START') }}
AND customer_facing IS TRUE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to limit this table to just customer facing datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks or the hint. I deleted customer facing check.

Comment on lines 28 to 36
bridge_organization_x_gtfs_dataset AS (
SELECT
organization_name,
gtfs_dataset_name,
MIN(_valid_from) AS valid_from,
MAX(_valid_to) AS valid_to
FROM {{ ref('bridge_organizations_x_gtfs_datasets_produced') }}
WHERE
(gtfs_dataset_name LIKE '%VehiclePositions%'
OR gtfs_dataset_name LIKE '%Vehicle Positions%'
OR gtfs_dataset_name LIKE '%VehiclePosition%'
OR gtfs_dataset_name LIKE '%Vehicle Position%')
GROUP BY
organization_name, gtfs_dataset_name
),

vendor_vehicle_positions_ages AS (
SELECT DISTINCT
dt,
organization_name,
_header_message_age,
_vehicle_message_age,
_vehicle_message_age_vs_header
FROM vehicle_positions_ages AS VPA
INNER JOIN bridge_organization_x_gtfs_dataset AS BOGD
ON VPA.gtfs_dataset_name = BOGD.gtfs_dataset_name
AND dt BETWEEN DATE(valid_from) AND DATE(valid_to)
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we may need to select into the new bridge_organization_x_gtfs_dataset from the bridge_organizations_x_gtfs_datasets_produced. Instead, it seems like the vendor_vehicle_positions_ages could join directly from bridge_organizations_x_gtfs_datasets_produced. Also instead of matching on dataset name, we should be matching on gtfs_dataset_key.

Copy link
Contributor Author

@fsalemi fsalemi Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint. I used gtfs_dataset_key instead.

@fsalemi
Copy link
Contributor Author

fsalemi commented Oct 2, 2024

@evansiroky
Although joining directly with bridge_organizations_x_gtfs_datasets_produced also works, but aggregating it would save some processing time since there are a lot of consecutive records for every vendor-to-agency association by date in this table.
I incorporated your requested changes anyways.

Copy link
Member

@evansiroky evansiroky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good for a first pass. I think eventually we may also want to include the trip_id and the organization key. That could be done in a later effort.

Bringing in the organization key would help do more joins to the vendor's record that may be present elsewhere. The trip_id could help analyze things related to whether the vehicle appears to be doing revenue service.

@fsalemi fsalemi merged commit a601622 into main Oct 2, 2024
4 checks passed
@fsalemi fsalemi deleted the vendor_vehicle_positions branch October 2, 2024 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants