Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDB package size #71

Open
sean-legitscript opened this issue Mar 29, 2024 · 4 comments
Open

DuckDB package size #71

sean-legitscript opened this issue Mar 29, 2024 · 4 comments

Comments

@sean-legitscript
Copy link

sean-legitscript commented Mar 29, 2024

Hey all, wanting to use duckdb for out parquet parsing needs. In our lambda functions. I ran npm install duckdb and it installed without issue. I am also able to successfully parse my parquet files. The problem comes when trying to deploy my lambda stack. When running a deployment, I get the error:

Resource handler returned message: "Unzipped size must be smaller than 262144000 bytes (Service: Lambda, Status Code: 400, Request ID: XXX)" (RequestToken: XXX, HandlerErrorCode: InvalidRequest)

When running du node_modules/duckdb, I can see that the package is 284400 KB, so 284.4 MB. This is way too big for any lambda to deploy with serverless. Is this the expected size of the duckdb package? If so, are there workarounds for this package size that duckdb can support?

@carlopi
Copy link
Collaborator

carlopi commented Apr 2, 2024

I spawned a EC2 instance:

[ec2-user@ip-172-31-91-131 ~]$ sudo yum install npm
[ec2-user@ip-172-31-91-131 ~]$ npm install duckdb
[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/
113M	node_modules/duckdb/

of

[ec2-user@ip-172-31-91-131 ~]$ du -sh node_modules/duckdb/*
4.0K	node_modules/duckdb/LICENSE
4.0K	node_modules/duckdb/Makefile
4.0K	node_modules/duckdb/README.md
24K	node_modules/duckdb/binding.gyp
4.0K	node_modules/duckdb/binding.gyp.in
4.0K	node_modules/duckdb/duckdb.js
53M	node_modules/duckdb/lib
4.0K	node_modules/duckdb/package.json
16K	node_modules/duckdb/scripts
60M	node_modules/duckdb/src
712K	node_modules/duckdb/test
4.0K	node_modules/duckdb/tsconfig.json
4.0K	node_modules/duckdb/vendor
8.0K	node_modules/duckdb/vendor.py

Of those the src folder is optional, can be removed and package will still be functional.

Can you share how did you got to 284.4 MB? Possibly building from source?

@tobilg
Copy link

tobilg commented Apr 17, 2024

@sean-legitscript you can try to use the DuckDB Lambda Node Layer I maintain: https://github.com/tobilg/duckdb-nodejs-layer. Also, the "normal" DuckDB package should only work on Node 20 runtimes, because every runtime below uses Amazon Linux 2 which has GLIBC incompatibilities with the pre-compiled packages...

@tobilg
Copy link

tobilg commented Apr 17, 2024

@carlopi I think the src/ and test/ directories could be removed before publishing (e.g. via .npmignore), right? They are not for the package to function IMO, only what's in lib/

@tobilg
Copy link

tobilg commented May 29, 2024

Any updates eventually regarding my last comment @carlopi? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants