-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
68 Add grid capacity feature #71
base: dev
Are you sure you want to change the base?
Conversation
elif fnmatch(file_name, "*.gpkg"): | ||
with BytesIO(obj.get()["Body"].read()) as file: | ||
return gpd.read_file(file) | ||
elif fnmatch(file_name, "*.geojson"): | ||
with BytesIO(obj.get()["Body"].read()) as file: | ||
return gpd.read_file(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for adding this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @shabrf 🎉 This is great - so thorough and easy to review :)
I made a few comments. My suggestion of moving some of the functions to other places in the repo is a nice to have - if you don't think you'll have time to do it then don't worry - we can create an issue and address it later.
I noticed a potential bug about duplicate substation data - no idea how big this is / whether it is to be expected.
I've tagged @crispy-wonton and @danlewis85 in a few specific places to check too.
Thanks so much!
pl.when( | ||
pl.col("heatpump_installation_percentage") > grid_installation_threshold | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine for now, but just adding a thought that I wonder whether the score added should just be the continuous value in pl.col("heatpump_installation_percentage")/100
, rather than a score based off whether this value is greater than 30% or not.
Can't work out whether that would make compute_df_max_score_per_row
a bit hellish or not! e.g. would lines 359-362 be
pl.when(pl.col("heatpump_installation_percentage").is_not_null())
.then(1)
.otherwise(0)
.alias("grid_capacity_max"),
or .then(pl.col("heatpump_installation_percentage").max/100)
(which might be 1 anyway)
SUBSTATION_SCALING_FACTOR = 1.0 # Factor to scale substation power rating (MVA) by | ||
|
||
|
||
def _parse_binary_geometry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function could live in utils/geo_utils.py
return np.nan | ||
|
||
|
||
def process_shapefile(shapefile_directory: str) -> gpd.GeoDataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might be better in getters/s3_getters.py
shutil.rmtree(temp_dir) | ||
|
||
|
||
def generate_enw_df() -> gpd.GeoDataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all these substation generate functions could all be in the getters/
folder in a new file. That way all our reading data files (and moderate processing) are in the same folder. Also, the hardcoded s3 locations that you read from should all be in the config/base.yaml
file - with a description of each (sorry it's a pain) in the config/README.md
. I think that's right but @crispy-wonton knows more about the structure!
I also think this is a nice to have situation, and can be addressed in another PR if you don't have time.
ssen_shape, left_on="Primary Substation Name", right_on="Primary" | ||
) | ||
|
||
ssen_df["operator"] = "UKPN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be "SSEN"
lsoa_boundaries = load_s3_data( | ||
"asf-heat-pump-suitability", | ||
"source_data/Output_Areas_2021_EW_BGC_V2_4299916833741807639.geojson", | ||
) | ||
lsoa_gdf = aggregate_oa_to_lsoa(lsoa_boundaries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be useful in the getters/get_datasets.py
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crispy-wonton is this file consistent with other LSOA geometries used?
).to_pandas() | ||
|
||
# Process substation and LSOA data | ||
headroom_df = process_substation_lsoa_data(substations, lsoa_gdf, households_df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much of an issue this is but len(substations) = 4225
and substations['id'].nunique() = 4115
, and when I look at some of the duplicated IDs, sometimes it's due to multiple duplicates of the entire row and sometimes there are lots of Nones.
So, it might be worth deleting duplicates? I think this effects the
joined = gpd.sjoin(
lsoa_with_households, substations_df, how="inner", predicate="intersects"
)
line in thedistribute_substation_headroom
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think this is an issue that needs fixing or did you expect it?
joined = joined.merge(substation_total_households, on="index_right") | ||
|
||
# Calculate the fraction of substation's load for each LSOA | ||
joined["load_fraction"] = joined["household_count"] / joined["total_households"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that joined.groupby('index_right')['load_fraction'].sum()
all equalled 1, but noticed that joined.groupby('id')['load_fraction'].sum()
didn't. Then I realised the substation ID column isn't unique which made me notice some duplicates which might be better removed?
result = assess_heatpump_suitability( | ||
headroom_df, consumption_per_heatpump, TAKEUP_RATE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like this argument should be TAKEUP_RATE as the function says its voltage_factor?
return calculate_headroom_per_household(lsoa_headroom, households_df) | ||
|
||
|
||
def assess_heatpump_suitability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good sense of capacity/voltage etc to feel like I can do a good job reviewing this function! @danlewis85 @crispy-wonton may be better placed
Description
This pull request introduces a new feature to calculate the grid capacity for heat pump installations per LSOA. The main changes include:
calculate_grid_capacity()
function.Fixes #68
Instructions for Reviewer
In order to test the code in this PR, you need to:
calculate_grid_capacity()
function.Please pay special attention to:
distribute_substation_headroom()
function, which allocates substation capacity to LSOAs based on household count. Verify that this allocation is performed correctly and fairly.assess_heatpump_suitability()
function, particularly the calculation of 'max_heatpumps' and 'heatpump_installation_percentage'. Ensure these calculations are accurate and properly consider the power requirements of heat pumps.calculate_grid_capacity()
function to confirm it correctly integrates all components and produces a coherent final dataset.The expected output should be a DataFrame containing LSOA-level data, including the percentage of households that could install heat pumps in each LSOA. This percentage should never exceed 100% and should accurately reflect the relationship between available grid capacity and household count.
Checklist:
notebooks/
pre-commit
and addressed any issues not automatically fixeddev
README
s