68 Add grid capacity feature #71

shabrf · 2024-10-02T12:56:54Z

Description

This pull request introduces a new feature to calculate the grid capacity for heat pump installations per LSOA. The main changes include:

Implementation of data processing functions for each Distribution Network Operator (DNO): ENW, NPg, SPEN, SSEN, UKPN, and WPD.
Creation of a unified substation dataset combining data from all DNOs.
Development of functions to distribute substation headroom to LSOAs and calculate headroom per household.
Implementation of a heat pump suitability assessment function that calculates the percentage of households in each LSOA that could potentially install heat pumps based on available grid capacity.
Integration of all components into a main calculate_grid_capacity() function.

Fixes #68

Instructions for Reviewer

In order to test the code in this PR, you need to:

Run the main script containing the calculate_grid_capacity() function.
Check the output DataFrame for correct column names and data types, especially the 'heatpump_installation_percentage' column.
Check the incorporation of this output in the wider pipeline

Please pay special attention to:

The data processing functions for each DNO (ENW, NPg, SPEN, SSEN, UKPN, WPD) to ensure they correctly handle the specific data formats and structures for each operator.
The distribute_substation_headroom() function, which allocates substation capacity to LSOAs based on household count. Verify that this allocation is performed correctly and fairly.
The assess_heatpump_suitability() function, particularly the calculation of 'max_heatpumps' and 'heatpump_installation_percentage'. Ensure these calculations are accurate and properly consider the power requirements of heat pumps.
The main calculate_grid_capacity() function to confirm it correctly integrates all components and produces a coherent final dataset.
Error handling and edge cases, such as LSOAs with very low household counts or unusually high/low headroom values.

The expected output should be a DataFrame containing LSOA-level data, including the percentage of households that could install heat pumps in each LSOA. This percentage should never exceed 100% and should accurately reflect the relationship between available grid capacity and household count.

Checklist:

crispy-wonton · 2024-10-08T09:35:02Z

asf_heat_pump_suitability/getters/s3_getters.py

+    elif fnmatch(file_name, "*.gpkg"):
+        with BytesIO(obj.get()["Body"].read()) as file:
+            return gpd.read_file(file)
+    elif fnmatch(file_name, "*.geojson"):
+        with BytesIO(obj.get()["Body"].read()) as file:
+            return gpd.read_file(file)


Nice, thanks for adding this :)

lizgzil

Thanks so much @shabrf 🎉 This is great - so thorough and easy to review :)

I made a few comments. My suggestion of moving some of the functions to other places in the repo is a nice to have - if you don't think you'll have time to do it then don't worry - we can create an issue and address it later.

I noticed a potential bug about duplicate substation data - no idea how big this is / whether it is to be expected.

I've tagged @crispy-wonton and @danlewis85 in a few specific places to check too.

Thanks so much!

lizgzil · 2024-10-09T14:17:10Z

asf_heat_pump_suitability/pipeline/suitability/calculate_suitability.py

+        pl.when(
+            pl.col("heatpump_installation_percentage") > grid_installation_threshold
+        )


This is fine for now, but just adding a thought that I wonder whether the score added should just be the continuous value in pl.col("heatpump_installation_percentage")/100, rather than a score based off whether this value is greater than 30% or not.

Can't work out whether that would make compute_df_max_score_per_row a bit hellish or not! e.g. would lines 359-362 be

pl.when(pl.col("heatpump_installation_percentage").is_not_null()) .then(1) .otherwise(0) .alias("grid_capacity_max"),

or .then(pl.col("heatpump_installation_percentage").max/100) (which might be 1 anyway)

lizgzil · 2024-10-09T14:23:01Z