Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RasterNormalizer#add_alpha_channel: tweak gdalwarp command so it adds the alpha channel correctly and uses compression when creating the output file #799

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jmartin-sul
Copy link
Member

@jmartin-sul jmartin-sul commented Feb 13, 2024

Why was this change made? 🤔

fixes #789
fixes #802
fixes #570 (hopefully!)

How was this change tested? 🤨

NOTE: as @edsu discovered (noted in the "Difficult Data" folder), enabling this compression can sometimes (one case we know of) lead to the alpha channel addition taking many hours. I realized after putting this PR up that this might occasionally lead to some very long running normalize-data jobs. Happily, the Sidekiq hotswap timeout for the GIS robots is already set to 4 days, which should be sufficient, as far as we've seen. And that timeout isn't even a hard limit on all gis-robot-suite jobs, just a cap on how long the job can run after a Sidekiq restart (typically triggered when updated code is deployed). See here, here, and here.

⚡ ⚠ If this change involves consuming from other services or writing to shared file systems, test that GIS accessioning works properly in [stage|qa] environment, in addition to specs. ⚡

@jmartin-sul jmartin-sul marked this pull request as draft February 13, 2024 23:50
@jmartin-sul
Copy link
Member Author

hmm, i noticed a honeybadger alert for two failed invocations of this updated call. will put this in draft while i investigate a bit.

RuntimeError: normalize-data: could not execute command successfully: false: /usr/bin/gdalwarp -co 'COMPRESS=LZW' -dstalpha /var/geomdtk/current/tmp/normalize_kx853bc3354/EPSG_4326/20220626_215939_ssc6_u0002_pansharpened_file_format.tif /var/geomdtk/current/tmp/normalize_kx853bc3354/EPSG_4326/20220626_215939_ssc6_u0002_pansharpened_file_format_alpha.tif

Screenshot 2024-02-13 at 3 52 24 PM

@justinlittman
Copy link
Contributor

Please note that add_alpha channel is being moved to Load Raster robot in #818

@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch from 005aaf1 to 17b104e Compare February 16, 2024 19:47
@jmartin-sul
Copy link
Member Author

Please note that add_alpha channel is being moved to Load Raster robot in #818

thanks for the heads up! as long as the method stays pretty much the same in its new home, should still be a pretty easy rebase (though would've been harder without that tip!).

this change will be a little more complicated with some discoveries i made testing yesterday (will leave notes on #802), but it should still be a pretty small change.

@jmartin-sul
Copy link
Member Author

jmartin-sul commented Feb 16, 2024

@edsu i requested your re-review, since this has changed a bit based on what i learned yesterday. also requested manual testing from kim, and will run through integration tests again before taking out of draft. i'll also squash into one commit -- just wanted to leave an easy to look at trail of what's been tried since this still needs a bit more testing to confirm that it's actually the right solution.

@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch 2 times, most recently from 31d8a6d to 2422941 Compare February 21, 2024 03:11
@jmartin-sul jmartin-sul changed the base branch from main to iss823-stop-adding-alpha-channel February 21, 2024 03:13
Base automatically changed from iss823-stop-adding-alpha-channel to main February 21, 2024 17:05
@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch from 2422941 to 5d4a9e8 Compare February 21, 2024 18:08
@jmartin-sul
Copy link
Member Author

rebased on #818 and #825, in case it's useful as a reference later

@jmartin-sul
Copy link
Member Author

jmartin-sul commented Feb 22, 2024

revisiting this branch with some new info that @kimdurante and @edsu found.

@kimdurante successfully used this command in manual testing for adding alpha channel to a variety of input files: gdalwarp -dstalpha -co COMPRESS=LZW -dstnodata 0

@edsu found similar advice here: https://gis.stackexchange.com/questions/41472/setting-transparency-in-an-sld-file-for-a-4-channel-geotiff

will deploy to stage for integration testing and manual testing (with one of the cloud optimized GeoTIFFs that ran into #802, and possibly some other test cases).

note: if this pans out when tested, the commits should be squashed and the commit message should be cleaned up before this is merged

@jmartin-sul jmartin-sul changed the title NormalizeData#add_alpha_channel: use compression option when calling gdalwarp RasterNormalizer#add_alpha_channel: tweak gdalwarp command so it adds the alpha channel correctly and uses compression when creating the output file Feb 22, 2024
Copy link
Contributor

@edsu edsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we find in testing that the outline of the raster looks good in GeoServer this looks good to go!

@edsu
Copy link
Contributor

edsu commented Feb 22, 2024

I think we're seeing an error in stage related to adding the alpha channel? From the log:

I, [2024-02-21T18:08:32.645994 #3239818]  INFO -- : done
I, [2024-02-22T06:42:20.794659 #3239817]  INFO -- : start
I, [2024-02-22T06:42:20.794978 #3239817]  INFO -- : druid:yx324nv8969 processing load-raster (gisDeliveryWF)
I, [2024-02-22T06:42:20.946038 #3239817]  INFO -- : load-raster: yx324nv8969 is compressing to EPSG:4326
I, [2024-02-22T06:42:23.852109 #3239817]  INFO -- : load-raster: adding alpha channel for /var/geomdtk/current/tmp/normalizeraster_yx324nv8969/g3ct001947.tif
E, [2024-02-22T06:42:32.619403 #3239817] ERROR -- : No such file or directory - getcwd

It seems like the robot might be unhappy because the directory where the input file should be doesn't exist? But then again I think the robot cleans up the directory on errors which is kind of annoying. Maybe #820 will help in situations like this...


@jmartin-sul
Copy link
Member Author

the getcwd error mentioned above went away when the stage gis-robot-suite sidekiq processes were restarted and the step retried. seems likely that was a manifestation of problems caused by Dir.chdir usage? see #827.

@edsu
Copy link
Contributor

edsu commented Feb 26, 2024

We discussed this in standup on 02/26. We're going to leave this work in draft and will merge after further testing:

  • does it work with 2GB GeoTIFF ok?
  • why is it adding an additional 5th band?
  • are there OpenLayers URL parameters we can pass to get the correct output from GeoServer.

Per @kimdurante this work would be nice to have but is currently lower priority relative to other work-cycle work that is ready or underway.

@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch 4 times, most recently from 6435987 to a958cca Compare February 27, 2024 01:49
@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch from a958cca to 03fb2e7 Compare March 20, 2024 21:57
…gdalwarp

ref #789
ref #802

uses a more circuitous way of adding alpha channel (gdalwarp to add alpha channel and output to vrt, then pipe that to gdal_translate for compression).  this provides better compression and faster computation than solely using gdalwarp.
@jmartin-sul jmartin-sul force-pushed the iss789-add-alpha-channel-gdalwarp-compress-option branch from 03fb2e7 to c419340 Compare March 26, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants