-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14
Comments
See the corresponding core dump log: |
Could you try with v0.9.1 that should contain some potentially relevant fixes? |
Already tried and we have the exact same issue (the stacktrace and core dump are from 0.9.1, I downgrade to 0.9.0 with the same issue) |
The fix in the httpfs extension has gone live a couple of hours ago, could you potentially give another try (while performing before hand This would NOT solve the |
Even with By the way, the |
Can you try the solution mentioned in duckdb/duckdb#8708 (comment) ? |
@Mause the solution by setting LD_PRELOAD works, however, this is not a proper fix for us as we cannot control the environment in which our user will run our code. |
It's less a permanent solution as it is confirming it's the same issue, and one we've seen before (though that was with tensorflow and python) |
For our own reference, are you using any other java libraries that are backed by a C++ library? |
I notice |
It is very difficult to answer this question as this kind of information is usually not documented. We are using literally hundreds of libraries (maybe even more than a thousand as we have 400 plugins). Our runtime uses Netty which, for sure, uses native libraries. |
Oh!, this can explain why we're only seeing this when using our Kafka runner and not our JDBC runner (we can launch Kestra with two different runners). So yes I confirm this works when we don't use Kafka (so no use of rocksdb). |
For our future reference, this is enough to trigger the crash: https://github.com/Mause/duckdb_rocksdb_crash/blob/main/src/test/java/com/mycompany/app/AppTest.java |
Or a crash anyway, not certain it's the same one |
Hi, |
I wonder if this issue manifested itself in 0.9.x as a side effect of the rease build moving to manylinux. I've built a local version of DuckDB JDBC driver with the codebase as of v0.9.2 tag using Ubuntu 22.04, and @Mause 's reproducer from #14 no longer crashes. (different JVMs also behave differently, with Ubuntu build of OpenJDK not crashing even with the released version of JDBC driver, but that's likely due to different library loading order). @Y-- helped me look at the difference between the two drivers, and it seems the manylinux-built driver contains two extra libraries that ubuntu-built driver does not --
|
any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version. |
Does the LD_PRELOAD workaround fix it for you as well? |
@Mause yes it works, but as I said, we cannot control the environment of our users so it's not a solution. |
I'm a Kestra user, Do you think that problem will be fixed in DuckDB next releases or do I have to deal with |
I tried to reproduce the error to help resolve this issue, but It works. I create a Dockerfile using latest version of Display the SQL results : ✅ Both 0.9.0 and 0.10.3 are working. Here the gist with all used files : https://gist.github.com/armetiz/e4ffd81189eb334c5acdf3e9e9796940 Can I try something else to reproduce the problem and hope for a solution? Regards outputs
|
@armetiz on Kestra, this issue only occurs if the rocksdb native library is loaded before the duckdb native library, this happens in Kestra EE. |
Hi @Mause I tried to reproduce your Maven configuration within a Docker container. But as you can see, I could not reproduce the error : https://github.com/armetiz/dockerfile-maven-duckdb-rockdbs |
The reproduction of this requires building DuckDB on Output:
|
@Mause any update on this ? |
Comments from @elefeint on Slack
@loicmathieu does it help in someway ? Or best option is to wait for followup on "plans on DuckDB side to move up from manylinux2014 for their builds" ? |
I think I told them in their Slack, but we use Kafka Stream that uses RocksDB so we don't have the choice here. |
Seeing the number of upvotes, comments and the number of kestra users asking about it on our Slack, the population of simultaneous RocksDB and DuckDB users doesn't seem to be that small. The issue has been open for a while now; @Mause @elefeint, could someone give specific guidance on how to proceed or fix this on the DuckDB side? This is a large hindrance for many users and we are blocked. |
I am getting this, too, and for me it is happening in a CI build container, with Kafka Streams + RocksDB + DuckDB. This works locally on ArchLinux, but the build container is some kind of "VERSION="24.04.1 LTS (Noble Numbat)" |
Can I ask for a link to the RocksDB issue? |
Sorry if it was confusing: I meant Kestra users who use RocksDB indirectly. I'm not aware of any RocksDB issue about this |
Could you please raise one then? I'd be curious to hear what they have to say on this issue, given it seems to be something they're doing to cause this |
FWIW, I am using "plain" Spring Boot Cloud Kafka Streams (which obviously includes RocksDB), and I do see failure in running unit tests in a specific Maven CI container (from Docker Hub). Note: I only use DuckDB in the unit tests, so RocksDB definitely loads first. There is nothing Kestra here on my end. I do acknowledge that Kestra may have material commercial interest in getting that fixed. ;) Since ELF is late-binding by default, and looking at the LD_PRELOAD magic that's being mentioned, I suspect that the (plain) Kafka Streams JAR (https://central.sonatype.com/artifact/org.apache.kafka/kafka-streams/versions) bundles a binary copy of RocksDB (which is C++, see https://github.com/facebook/rocksdb/blob/main/CMakeLists.txt) which has been built in a way, and with flags, that is ... "unexpected"? After all, all the LD_PRELOAD magic does is early-resolve the symbols in libstdc++, and to symbols from that very same ELF binary. I'll try to amend https://github.com/Mause/duckdb_rocksdb_crash with a containerized build script that allows for global reproducibility. I believe this is critical, as on my personal development environment (which is Arch Linux), all is - or appears to be! - well. |
@shoffmeister you can use the Dockerfile in #14 (comment) to reproduce |
Using the baseline that @Mause created in https://github.com/Mause/duckdb_rocksdb_crash, this simple script will reproduce:
Assumptions:
Running that script then yields
The image used above is a very recent Ubuntu 24.04.01, directly from Docker Hub, with a very recent Maven, and Java 17. Since that is a moving target, the digest nails the exact image identity for easier (future) reproducibility. Adding
to the mix removes the crash. Alas, adding
both inside the container and on the host. I suspect that this is due to the way the Java reproducer has been done, and that forking gets into the way. While I used https://github.com/Mause/duckdb_rocksdb_crash to reproduce, this most likely mirrors my local setup where RocksDB gets pulled by Kafka Streams (which is pulled by Spring Cloud Kafka) To me it is somewhat surprising that my local development environment is totally fine (Arch Linux as a rolling always-up-to-date distribution), but that the rather modern Maven build container shows the problem. This would suggest that the way the runtime distribution's user land has been built also plays a role. |
There is a different failure mode when building the reproducer as a fat JAR application (outside of the container) and then running that inside the container: the process simply hangs. With the above it is also clear, IMHO, that the distribution on which this is run does matter. |
The answer might be hiding in the (differential) output of
where the JAR is the fat JAR, and this is being run inside two different Java-enabled containers with different user libraries, where one container shows the problem, the other not. |
The content of tensorflow/tensorflow#61741 is intriguing. My initial reproducer in a large Kafka Streams application actually (also) shows
which I believe has the same root cause as this crash here. |
The issue at RocksDB side: facebook/rocksdb#13092 |
There is interesting analysis at pytorch/pytorch#102360 in the comments, specifically on the subject of They are fighting with what I gather to be a very similar problem; the root cause there apparently is "cross-talk" of C++ std::random functions from dynamically loaded binaries. They also preload the system libstdc++ which is the obvious solution in their case. If I interpret
right, from the LD_DEBUG above, in the affected container, and then add
getting
This means that RocksDB contains these symbols, and symbol resolution eventually hits that. The difference between my local Arch Linux machine and the container in the libstdc++ library is container:
arch:
so my Arch provides In the Ubuntu container I reproduce with, libstdc++ does not have the general export, so DuckDB seems to be binding to the equivalent export in RocksDB upon the attempt to init the random subsystem. And that ... is not good. From the looks of it, the RocksDB shared object should not contain those exports, i.e.
should return nothing. Note that |
@shoffmeister maybe you can give this information in the issue I just raised at RockDB side: facebook/rocksdb#13092 |
Added some context on the RocksDB repository asking why
shows so many symbols exported from RocksDB in the |
FWIW, I just took a look at the DuckDB JNI
or
and it would seem as if the DSO also exports a massive amount of symbols beyond need. That does not isolate well, so DuckDB also seems to be doing what RocksDB is doing. cc'ing @carlopi because he seems to be the resident "having fun with operating system library linkage" person ;) Example:
As mentioned for RocksDB, I would only expect those exported:
because anything else pollutes the ELF symbol namespace, interferes with (late-binding) symbol resolution, and can have side-effects. A JNI should not have symbol side-effects on being loaded into a native operating system process, IMHO. |
FWIW, in facebook/rocksdb#13092 I have confirmed that a custom build of the RocksDB JNI shared object with libstdc++ symbols hidden makes the SIGSEGV in DuckDB go away. For details, see the conversation there. |
What happens?
Since version 0.9.0, using the DuckDB JDBC driver in a Java application makes the application crash with a SIGSEGV.
The Java version is 17.0.5 (tested also on 17.0.8.1).
There is first a Java exception
Then a JVM crash
It works well with 0.8.0.
To Reproduce
Here is the SQL query:
The code uses the standard Java JDBC API (Connection & Statement), but it is not easily extracted as it runs via Kestra DuckDB plugin.
OS:
Ubuntu 23.04
DuckDB Version:
0.9.0
DuckDB Client:
Java JDBC
Full Name:
Loïc Mathieu
Affiliation:
Kestra
Have you tried this on the latest
main
branch?I have tested with a release build (and could not test with a main build)
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
The text was updated successfully, but these errors were encountered: