Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup post #2489 #2804

Merged
merged 15 commits into from
Jul 17, 2023
Merged

Cleanup post #2489 #2804

merged 15 commits into from
Jul 17, 2023

Conversation

lauriemerrell
Copy link
Contributor

@lauriemerrell lauriemerrell commented Jul 14, 2023

Description

Describe your changes and why you're making them. Please include the context, motivation, and relevant dependencies.

Resolves a whole bunch of issues that popped up after #2489 merged, primarily because of the larger time period of data covered in the full refresh compared to test data.

Specifically, this PR:

  • Uses fallback to fct_daily_schedule_feeds for RT/schedule mapping for cases where the GTFS schedule URL changes and RT data is scraped before the GTFS schedule download occurs. This was indirectly causing CAL-ITP-DATA-INFRA-268M, CAL-ITP-DATA-INFRA-268R, and CAL-ITP-DATA-INFRA-2685 because of insufficient handling for multiple schedule URLs on the same dt/service date
  • Prepares to add sampling to the testing on the intermediate tables here because these tables are large. Adjusts the sample to look at the 2AM UTC hour where bugs were occurring here; will only actually implement this once tests pass on whole tables in prod first (follow-up PR).
  • Adds schedule_base64_url to the intermediate table keys; this may help with some of the duplicates that were observed (cases where a key was duplicated because the same trip occurred across two schedule URL versions)
  • Adds forward-dating to the URL/Airtable mapper to handle gaps where RT data is downloaded from a URL that was deleted from Airtable in the gap before the new Airtable config is picked up. Uses this mapper to get Airtable records instead of joining directly.
  • Deletes fct_daily_trip_update_status_counts model that was erroring and (per Slack) is ok to delete

Resolves #2802
Resolves #2803

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

Include commands/logs/screenshots as relevant.

Ran all affected tables and tested them and downstream models.

Doing an unqualified run & test right now to confirm none of the fixed issues are present

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)
  • Full refresh +fct_trip_updates_messages+ +fct_vehicle_positions_messages+ +fct_service_alerts_messages+ to get the fixed keying
  • Follow-up PR to uncomment sampled testing on the intermediate models once they are passing

@lauriemerrell lauriemerrell marked this pull request as ready for review July 17, 2023 21:49
@lauriemerrell lauriemerrell changed the title Cleanup post 2489 Cleanup post #2489 Jul 17, 2023
Copy link
Contributor

@atvaccaro atvaccaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

@lauriemerrell lauriemerrell self-assigned this Jul 17, 2023
@lauriemerrell lauriemerrell added do-not-merge Do not merge, even if approved and removed do-not-merge Do not merge, even if approved labels Jul 17, 2023
@lauriemerrell
Copy link
Contributor Author

dbt run / test (unqualified) passed with no new failures (all failures predate #2489). Merging and kicking off full refresh

@lauriemerrell lauriemerrell merged commit 1901bf0 into main Jul 17, 2023
4 of 5 checks passed
@lauriemerrell lauriemerrell deleted the cleanup-post-2489 branch July 17, 2023 23:23
@lauriemerrell lauriemerrell mentioned this pull request Jul 18, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants