-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Mode of Interest "quality text" has errors #115
Comments
More about the code in
in
Actually, looking at these, maybe passing in what we do in |
the length is the length of I'll plan to roll this change into my ongoing PR and watch out for other places we could be having this problem! |
see discussion here: e-mission#115
@Abby-Wheelis Great catch! This is why we need better testing of the public dashboard, and a principled way to update pandas versions. We have found multiple such issues while upgrading the server code, for which we do indeed have proper tests. @MukuFlash03 @nataliejschultz for visibiility |
I discovered this while testing my metric vs imperial changes, but have realized it's a bug in the whole dashboard, not just my branch, example of what I mean (from production):
The
mode_specific_timeseries
in particular is creating this issue with the quality text. It stems from the fact that pandasgroupby
is now creating ALL combinations not just those with nonzero values, but then in scaffolding it counts the length of the dataframe rather than summing the trip counts ...ex:
a dataset has 30 labeled trips from 5 users across 10 days and there are 3 possible modes
when we call
groupby
every possible combination in generated = 5103 = 150even narrowing to 1 mode is still 50 trips, so when we compare 50 with 30 we're getting > 100%
One way to help this is to drop the day/user/mode combinations that have a trip count of 0, but I'm not sure that fixes the whole issue
A more principled fix would be to get the total mode of interest count by adding up rows rather than taking the length of the dataframe, this seems like it would be more accurate. I'm not sure how we got here, but it might have to do with pandas updates to the ways that
groupby
works, indicated by these warnings:FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
The text was updated successfully, but these errors were encountered: