Use groupby in for loop instead of filter in `generate_balance_sample` #33

crispy-wonton · 2024-06-13T11:13:33Z

Groupby is more efficient than filtering. So move groupby to weighting loop in script (instead of filtering in generate_balance_sample function and test on EPC sample subset.

Original comment:
I'm not sure if it's worth checking and changing (I don't think it is at this stage), but FYI that I found in the evaluation that a groupby loop was more efficient than a loop in which you subset the df per lsoa within the loop. i.e.

df_select = df.select(cols) # the groupby was also faster with less columns, so worth doing this first too
for lsoa, sample in tqdm(df_select.group_by("lsoa")):
      results = generate_balance_sample(sample, target_marginals, lsoa)

Originally posted by @lizgzil in #26 (comment)

The text was updated successfully, but these errors were encountered:

crispy-wonton mentioned this issue Jun 13, 2024

12 reweight epc #26

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use groupby in for loop instead of filter in `generate_balance_sample` #33

Use groupby in for loop instead of filter in `generate_balance_sample` #33

crispy-wonton commented Jun 13, 2024 •

edited

Loading

Use groupby in for loop instead of filter in generate_balance_sample #33

Use groupby in for loop instead of filter in generate_balance_sample #33

Comments

crispy-wonton commented Jun 13, 2024 • edited Loading

Use groupby in for loop instead of filter in `generate_balance_sample` #33

Use groupby in for loop instead of filter in `generate_balance_sample` #33

crispy-wonton commented Jun 13, 2024 •

edited

Loading