Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use groupby in for loop instead of filter in generate_balance_sample #33

Open
crispy-wonton opened this issue Jun 13, 2024 · 0 comments

Comments

@crispy-wonton
Copy link
Collaborator

crispy-wonton commented Jun 13, 2024

Groupby is more efficient than filtering. So move groupby to weighting loop in script (instead of filtering in generate_balance_sample function and test on EPC sample subset.

Original comment:
I'm not sure if it's worth checking and changing (I don't think it is at this stage), but FYI that I found in the evaluation that a groupby loop was more efficient than a loop in which you subset the df per lsoa within the loop. i.e.

df_select = df.select(cols) # the groupby was also faster with less columns, so worth doing this first too
for lsoa, sample in tqdm(df_select.group_by("lsoa")):
      results = generate_balance_sample(sample, target_marginals, lsoa)

Originally posted by @lizgzil in #26 (comment)

@crispy-wonton crispy-wonton mentioned this issue Jun 13, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant