Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle httpfs by default #105

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

BittnerBarnabas
Copy link

This would be really useful for us, as we use duckdb behind a corporate proxy. There's only proxied internet access. Once we have httpfs installed, installing other plugins is easy by setting the default repo url. Is there any particular reason this was explicitly not bundled?

Also seems to be an issue: #29

@carlopi
Copy link
Collaborator

carlopi commented Jul 13, 2024

Hi! Thanks for the contribution, but I am not sure this makes particularly sense to bake httpfs in.

If the request is only around fetching extension, would it not make more sense to have a node-specific functionality to fetch a given extension from the https endpoint via node fetch ?

I need to think how to structure this at the duckdb general level, but it would make sense, to customize to INSTALL extension_name; so that it becomes an invocation of a client specific callback (say fetch in node, or urllib.request in Python, or equivalent in other clients) + INSTALL of a local file.

This also offers a workaround of sort:

  • build URL like https://extensions.duckdb.org/$version/$platform/httpfs.duckdb_extension.gz
  • perform fetch on the extension, await that
  • copy that to a local folder like path/to/https.duckdb_extension.gz
  • in duckdb, perform: INSTALL 'path/to/https.duckdb_extension.gz'
  • in duckdb, do LOAD httpfs

@BittnerBarnabas
Copy link
Author

BittnerBarnabas commented Jul 13, 2024

Thanks for the comment.

Yeah, the above implementation would work in duckdb to call back into node world to actually do the extension fetching. However, it seems a bit unnecessary, when we have the httpfs plugin which handles everything for us already (e.g. trusting system certs).

Is there any particular reason httpfs was removed? Or any reason why it can't / shouldn't be re-added?

Currently, this is the workaround we're taking however this relies on knowing up-front the duckdb version / platform:

  1. manually download the httpfs extension to a folder lets call /custom-duckdb/extensions
  2. install httpfs from '/custom-duckdb/extensions
  3. install delta from 'https://corporate/proxy/duckdb/extensions/' <- this step now works as httpfs is installed
  4. (optionally) step 3 could be simplified by setting the default repository url for extensions for the above url

edit: just for clarity our app is a native node app (electron) rather than something embedded in plain chrome

@carlopi
Copy link
Collaborator

carlopi commented Jul 15, 2024

Is there any particular reason httpfs was removed? Or any reason why it can't / shouldn't be re-added?
Reason is that extensions are meant so that's possible to upgrade / improve them.
If they were to be backed in it would impossible in the current setup to do so.

Also I don't see why node should behave differently than other extensions here.

Possible viable improvements that I can see as doable:

  1. have a functions (SQL side) to return the URL of a given extensions (backing in right platform and version)
  2. have a callback (duckdb-side), to be implemented in relevant clients, to allow using native capabilities to fetch extension and then install it (say something like INSTALL httpfs FROM core (MODE custom) that should build the URL (using step 1), ask node to fetch the extension (using node capabilities)
  3. have an additional npm package wrapper that installs both duckdb, the relevant httpfs extension AND sets the custom repository to be https://extensions.duckdb.org, so that INSTALL x will work out of the box via httpfs.

I think 1 might make sense to have in any case, 2 requires some more toughs, 3 can be likely consider already user-side / contributions are welcome.

@BittnerBarnabas
Copy link
Author

BittnerBarnabas commented Jul 15, 2024

Reason is that extensions are meant so that's possible to upgrade / improve them.

I agree with this in a general sense, however I'm not sure how much change we can expect from such an integral extension as httpfs? parquet gets bundled by default, so does json.

I might have a look at 3.) we do recompile duckdb so if there was a way to conditionally include httpfs based on env variables maybe that's a non-intruse option to add support for this?

@carlopi
Copy link
Collaborator

carlopi commented Jul 16, 2024

I think having an option to include httpfs at build time is nice to have, I can have a look at that, even just via the vendor.py passing a off by default option it could work.

With the caveat that you will be bulding a different package, and you will have to run the vendoring + build step yourself.

I can think about other solutions, currently some flexibility is missing.

@BittnerBarnabas
Copy link
Author

I added some logic to optionally compile httpfs. By default it doesn't do anything, however exporting DUCKDB_INCLUDE_HTTPFS=true will make it include all the files required to build httpfs plugin.

It's configurable so any extension can be added which is supported by duckdb out of the box.

Unsure on how the release process works I added all the vendored files in there as well?

@@ -44,6 +73,7 @@
libraries = cache['libraries']
windows_options = cache['windows_options']
cflags = cache['cflags']
optional_extensions = cache['optional_extensions']
elif 'DUCKDB_NODE_BINDIR' in os.environ:
Copy link
Author

@BittnerBarnabas BittnerBarnabas Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea about these ifs and elses. Makes it very hard to reason about code with re-assigning variables. Haven't found explicit usage of it, but wasn't confident of deleting these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants