Skip to content

Commit

Permalink
propagate new columns one more step: service alerts
Browse files Browse the repository at this point in the history
  • Loading branch information
Laurie Merrell committed Jun 16, 2023
1 parent 481309c commit 6c27670
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 33 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,22 @@ int_gtfs_rt__service_alerts_fully_unnested AS (
gtfs_dataset_key,
dt,
hour,
_extract_ts,
header_timestamp,
base64_url,
_extract_ts,
_config_extract_ts,
_gtfs_dataset_name,
name,
schedule_gtfs_dataset_key,
schedule_base64_url,
schedule_name,
schedule_feed_key,
schedule_feed_timezone,

_header_message_age,
header_version,
header_incrementality,
header_timestamp,
id,

cause,
effect,

Expand Down
65 changes: 39 additions & 26 deletions warehouse/models/mart/gtfs/_mart_gtfs_fcts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1009,31 +1009,28 @@ models:
# field: key
# config:
# where: '__rt_sampled__'
- name: gtfs_dataset_key
description: |
The primary key for the record in `dim_gtfs_datasets` associated with this message.
columns:
- <<: *gtfs_dataset_key
config:
where: '__rt_sampled__'
- name: dt
description: |
Date on which we scraped this message.
A date filter *must* be provided when querying this table, because of the size of the data.
- name: hour
description: |
Timestamp of the beginning of the hour in which this message was scraped,
ex. "2022-09-01T00:00:00+00".
- name: base64_url
description: |
URL-safe base64 encoding of the URL from which this message was scraped.
- name: _extract_ts
description: |
Time at which this message was scraped.
- name: _gtfs_dataset_name
description: |
String name of the GTFS dataset of which this message is a part.
This field is provided for human readability and should not be used as a join key.
- &calculated_service_date
name: calculated_service_date
description: |
Attempt to identify the `service_date` (corresponding to the related schedule feed) for trip activity referenced in a
GTFS RT feed. It uses the following fallback logic:
* If `trip_start_date` is populated, use that. This is assumed to be provided with respect to `schedule_feed_timezone`.
* Otherwise, for trip updates and vehicle positions, if `trip_update_timestamp` or `vehicle_timestamp` (respectively) are populated, convert that
to the `schedule_feed_timezone` and extract the date from that.
* Otherwise, use `header_timestamp` converted to `schedule_feed_timezone` and extract the date.
* Finally (and this generally should not happen, since `header_timestamp` should be populated), fall back to `_extract_ts` converted to `schedule_feed_timezone` and extract the date.
- *gtfs_rt_dataset_key
- *gtfs_rt_dt
- *gtfs_rt_hour
- *base64_url
- *gtfs_rt_extract_ts
- *gtfs_rt_config_extract_ts
- *gtfs_rt_name
- *gtfs_rt_schedule_dataset_key
- *gtfs_rt_schedule_dataset_name
- *gtfs_rt_schedule_feed_key
- *gtfs_rt_schedule_feed_timezone
- *_header_message_age
- name: url_text
description: |
See: https://gtfs.org/realtime/reference/#message-translation.
Expand Down Expand Up @@ -1111,7 +1108,23 @@ models:
description: |
`active_period_end` converted to a TIMESTAMP data type.
If `active_period_end` is null, will be midnight on January 1, 2099.
- *_header_message_age
- name: trip_start_time_interval
description: |
`trip_start_time` converted to a BigQuery INTERVAL type to allow handling for times after midnight.
See https://gtfs.org/schedule/reference/#field-types for how time strings are defined in GTFS.
- name: trip_start_time_interval
description: |
`trip_start_time` converted to a BigQuery INTERVAL type to allow handling for times after midnight.
See https://gtfs.org/schedule/reference/#field-types for how time strings are defined in GTFS.
*Note: If the interval is longer than 24 hours and `trip_start_date` is not populated, the interpretation for this
field becomes unclear.*
- name: trip_start_time_sec
description: |
`trip_start_time` converted to a number of seconds after twelve hours before noon (usually midnight)
on `calculated_service_date`.
See https://gtfs.org/schedule/reference/#field-types for how time strings are defined in GTFS.
*Note: If this is larger than 86,400 (the number of seconds in one day) and `trip_start_date` is not populated, the interpretation for this
field becomes unclear.*
- name: fct_daily_service_alerts
description: |
Each row is a daily summary of a service alert.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ select_english AS (
trip_start_date,
trip_start_time,
stop_id
ORDER BY english_likelihood DESC, header_text_language ASC) AS english_rank
ORDER BY english_likelihood DESC, header_text_language ASC) AS english_rank,
{{ gtfs_time_string_to_interval('trip_start_time') }} AS trip_start_time_interval
FROM int_gtfs_rt__service_alerts_fully_unnested
QUALIFY english_rank = 1
),
Expand All @@ -49,12 +50,23 @@ fct_service_alerts_messages_unnested AS (
service_alert_message_key,
gtfs_dataset_key,
dt,
-- try to figure out what the service date would be to join back with schedule: fall back from explicit to imputed
-- TODO; handle trip start time past midnight? subtract in that case?
COALESCE(
PARSE_DATE("%Y%m%d", trip_start_date),
DATE(header_timestamp, schedule_feed_timezone),
DATE(_extract_ts, schedule_feed_timezone)) AS calculated_service_date,
hour,
_extract_ts,
header_timestamp,
base64_url,
_extract_ts,
_config_extract_ts,
_gtfs_dataset_name,
name,
schedule_gtfs_dataset_key,
schedule_base64_url,
schedule_name,
schedule_feed_key,
schedule_feed_timezone,
header_timestamp,
_header_message_age,
header_version,
header_incrementality,
Expand All @@ -79,6 +91,8 @@ fct_service_alerts_messages_unnested AS (
trip_route_id,
trip_direction_id,
trip_start_time,
trip_start_time_interval,
{{ gtfs_interval_to_seconds('trip_start_time_interval') }} AS trip_start_time_seconds,
trip_start_date,
trip_schedule_relationship,
stop_id,
Expand Down

0 comments on commit 6c27670

Please sign in to comment.