Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

68 Add grid capacity feature #71

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

68 Add grid capacity feature #71

wants to merge 2 commits into from

Conversation

shabrf
Copy link
Collaborator

@shabrf shabrf commented Oct 2, 2024


Description

This pull request introduces a new feature to calculate the grid capacity for heat pump installations per LSOA. The main changes include:

  1. Implementation of data processing functions for each Distribution Network Operator (DNO): ENW, NPg, SPEN, SSEN, UKPN, and WPD.
  2. Creation of a unified substation dataset combining data from all DNOs.
  3. Development of functions to distribute substation headroom to LSOAs and calculate headroom per household.
  4. Implementation of a heat pump suitability assessment function that calculates the percentage of households in each LSOA that could potentially install heat pumps based on available grid capacity.
  5. Integration of all components into a main calculate_grid_capacity() function.

Fixes #68

Instructions for Reviewer

In order to test the code in this PR, you need to:

  1. Run the main script containing the calculate_grid_capacity() function.
  2. Check the output DataFrame for correct column names and data types, especially the 'heatpump_installation_percentage' column.
  3. Check the incorporation of this output in the wider pipeline

Please pay special attention to:

  1. The data processing functions for each DNO (ENW, NPg, SPEN, SSEN, UKPN, WPD) to ensure they correctly handle the specific data formats and structures for each operator.
  2. The distribute_substation_headroom() function, which allocates substation capacity to LSOAs based on household count. Verify that this allocation is performed correctly and fairly.
  3. The assess_heatpump_suitability() function, particularly the calculation of 'max_heatpumps' and 'heatpump_installation_percentage'. Ensure these calculations are accurate and properly consider the power requirements of heat pumps.
  4. The main calculate_grid_capacity() function to confirm it correctly integrates all components and produces a coherent final dataset.
  5. Error handling and edge cases, such as LSOAs with very low household counts or unusually high/low headroom values.

The expected output should be a DataFrame containing LSOA-level data, including the percentage of households that could install heat pumps in each LSOA. This percentage should never exceed 100% and should accurately reflect the relationship between available grid capacity and household count.

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

@shabrf shabrf marked this pull request as ready for review October 4, 2024 15:43
@shabrf shabrf requested a review from danlewis85 October 7, 2024 16:00
Comment on lines +128 to +133
elif fnmatch(file_name, "*.gpkg"):
with BytesIO(obj.get()["Body"].read()) as file:
return gpd.read_file(file)
elif fnmatch(file_name, "*.geojson"):
with BytesIO(obj.get()["Body"].read()) as file:
return gpd.read_file(file)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for adding this :)

Copy link
Collaborator

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @shabrf 🎉 This is great - so thorough and easy to review :)

I made a few comments. My suggestion of moving some of the functions to other places in the repo is a nice to have - if you don't think you'll have time to do it then don't worry - we can create an issue and address it later.

I noticed a potential bug about duplicate substation data - no idea how big this is / whether it is to be expected.

I've tagged @crispy-wonton and @danlewis85 in a few specific places to check too.

Thanks so much!

Comment on lines +300 to +302
pl.when(
pl.col("heatpump_installation_percentage") > grid_installation_threshold
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but just adding a thought that I wonder whether the score added should just be the continuous value in pl.col("heatpump_installation_percentage")/100, rather than a score based off whether this value is greater than 30% or not.

Can't work out whether that would make compute_df_max_score_per_row a bit hellish or not! e.g. would lines 359-362 be

pl.when(pl.col("heatpump_installation_percentage").is_not_null())
        .then(1)
        .otherwise(0)
        .alias("grid_capacity_max"),

or .then(pl.col("heatpump_installation_percentage").max/100) (which might be 1 anyway)

SUBSTATION_SCALING_FACTOR = 1.0 # Factor to scale substation power rating (MVA) by


def _parse_binary_geometry(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function could live in utils/geo_utils.py

return np.nan


def process_shapefile(shapefile_directory: str) -> gpd.GeoDataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be better in getters/s3_getters.py

shutil.rmtree(temp_dir)


def generate_enw_df() -> gpd.GeoDataFrame:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these substation generate functions could all be in the getters/ folder in a new file. That way all our reading data files (and moderate processing) are in the same folder. Also, the hardcoded s3 locations that you read from should all be in the config/base.yaml file - with a description of each (sorry it's a pain) in the config/README.md. I think that's right but @crispy-wonton knows more about the structure!

I also think this is a nice to have situation, and can be addressed in another PR if you don't have time.

ssen_shape, left_on="Primary Substation Name", right_on="Primary"
)

ssen_df["operator"] = "UKPN"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be "SSEN"

Comment on lines +718 to +722
lsoa_boundaries = load_s3_data(
"asf-heat-pump-suitability",
"source_data/Output_Areas_2021_EW_BGC_V2_4299916833741807639.geojson",
)
lsoa_gdf = aggregate_oa_to_lsoa(lsoa_boundaries)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be useful in the getters/get_datasets.py file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crispy-wonton is this file consistent with other LSOA geometries used?

).to_pandas()

# Process substation and LSOA data
headroom_df = process_substation_lsoa_data(substations, lsoa_gdf, households_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much of an issue this is but len(substations) = 4225 and substations['id'].nunique() = 4115, and when I look at some of the duplicated IDs, sometimes it's due to multiple duplicates of the entire row and sometimes there are lots of Nones.
Screenshot 2024-10-09 at 17 42 43

So, it might be worth deleting duplicates? I think this effects the

joined = gpd.sjoin(
        lsoa_with_households, substations_df, how="inner", predicate="intersects"
    )

line in thedistribute_substation_headroom function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think this is an issue that needs fixing or did you expect it?

joined = joined.merge(substation_total_households, on="index_right")

# Calculate the fraction of substation's load for each LSOA
joined["load_fraction"] = joined["household_count"] / joined["total_households"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked that joined.groupby('index_right')['load_fraction'].sum() all equalled 1, but noticed that joined.groupby('id')['load_fraction'].sum() didn't. Then I realised the substation ID column isn't unique which made me notice some duplicates which might be better removed?

Comment on lines +735 to +737
result = assess_heatpump_suitability(
headroom_df, consumption_per_heatpump, TAKEUP_RATE
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like this argument should be TAKEUP_RATE as the function says its voltage_factor?

return calculate_headroom_per_household(lsoa_headroom, households_df)


def assess_heatpump_suitability(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good sense of capacity/voltage etc to feel like I can do a good job reviewing this function! @danlewis85 @crispy-wonton may be better placed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add grid capacity feature
3 participants