[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

loicmathieu · 2023-10-18T09:03:17Z

What happens?

Since version 0.9.0, using the DuckDB JDBC driver in a Java application makes the application crash with a SIGSEGV.
The Java version is 17.0.5 (tested also on 17.0.8.1).

There is first a Java exception

java.sql.SQLException: random_device could not be read
	at org.duckdb.DuckDBNative.duckdb_jdbc_startup(Native Method)
	at org.duckdb.DuckDBConnection.newConnection(DuckDBConnection.java:48)
	at org.duckdb.DuckDBDriver.connect(DuckDBDriver.java:38)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
	at io.kestra.plugin.jdbc.JdbcConnectionInterface.connection(JdbcConnectionInterface.java:63)
	at io.kestra.plugin.jdbc.AbstractJdbcQuery.run(AbstractJdbcQuery.java:77)
	at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:148)
	at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:31)
	at io.kestra.core.runners.Worker$WorkerThread.run(Worker.java:674)

Then a JVM crash

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f523603fd60, pid=37746, tid=39346
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  0x00007f523603fd60
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to <redacted>)
#
# An error report file with more information is saved as:
# <redacted>
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

It works well with 0.8.0.

To Reproduce

Here is the SQL query:

      INSTALL httpfs;
      SELECT Title, max("Days In Top 10") 
      from (SELECT * FROM read_parquet('s3://duckdb-md-dataset-121/netflix_daily_top_10.parquet'))
      where Type='Movie'
      GROUP BY Title
      ORDER BY max("Days In Top 10") desc
      limit 5;

The code uses the standard Java JDBC API (Connection & Statement), but it is not easily extracted as it runs via Kestra DuckDB plugin.

OS:

Ubuntu 23.04

DuckDB Version:

0.9.0

DuckDB Client:

Java JDBC

Full Name:

Loïc Mathieu

Affiliation:

Kestra

Have you tried this on the latest `main` branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Yes, I have

The text was updated successfully, but these errors were encountered:

See https://github.com/duckdb/duckdb/issues/9383

loicmathieu · 2023-10-18T09:08:28Z

See the corresponding core dump log:
hs_err_pid37746.log

carlopi · 2023-10-18T09:12:34Z

Could you try with v0.9.1 that should contain some potentially relevant fixes?

loicmathieu · 2023-10-18T09:16:45Z

Already tried and we have the exact same issue (the stacktrace and core dump are from 0.9.1, I downgrade to 0.9.0 with the same issue)

carlopi · 2023-10-18T09:24:12Z

The fix in the httpfs extension has gone live a couple of hours ago, could you potentially give another try (while performing before hand FORCE INSTALL httfps once, as explained here: duckdb/duckdb#9340 (comment)?).

This would NOT solve the random_device issue, but might solve the crash if they are independent.

loicmathieu · 2023-10-18T09:36:28Z

Even with FORCE INSTALL httpfs I have the same issue.

By the way, the random_device issue didn't appears on 0.8.0 so it may not be the same issue as duckdb/duckdb#9340

Mause · 2023-10-18T09:53:25Z

Can you try the solution mentioned in duckdb/duckdb#8708 (comment) ?

loicmathieu · 2023-10-18T10:01:52Z

@Mause the solution by setting LD_PRELOAD works, however, this is not a proper fix for us as we cannot control the environment in which our user will run our code.

Mause · 2023-10-18T10:02:53Z

It's less a permanent solution as it is confirming it's the same issue, and one we've seen before (though that was with tensorflow and python)

Mause · 2023-10-18T10:06:04Z

For our own reference, are you using any other java libraries that are backed by a C++ library?

Mause · 2023-10-18T10:08:01Z

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

loicmathieu · 2023-10-18T10:08:34Z

For our own reference, are you using any other java libraries that are backed by a C++ library?

It is very difficult to answer this question as this kind of information is usually not documented. We are using literally hundreds of libraries (maybe even more than a thousand as we have 400 plugins).

Our runtime uses Netty which, for sure, uses native libraries.

loicmathieu · 2023-10-18T10:18:54Z

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

Oh!, this can explain why we're only seeing this when using our Kafka runner and not our JDBC runner (we can launch Kestra with two different runners). So yes I confirm this works when we don't use Kafka (so no use of rocksdb).

Mause · 2023-11-06T09:07:42Z

For our future reference, this is enough to trigger the crash: https://github.com/Mause/duckdb_rocksdb_crash/blob/main/src/test/java/com/mycompany/app/AppTest.java

Mause · 2023-11-06T09:09:59Z

Or a crash anyway, not certain it's the same one

loicmathieu · 2023-12-08T14:27:09Z

Hi,
Do you have any news on this?
It prevent us to upgrade to driver version 0.9.2 so it prevent us to use MotherDuck as MotherDuck only supports DuckDB 0.9.2!

elefeint · 2023-12-13T17:04:44Z

I wonder if this issue manifested itself in 0.9.x as a side effect of the rease build moving to manylinux.

I've built a local version of DuckDB JDBC driver with the codebase as of v0.9.2 tag using Ubuntu 22.04, and @Mause 's reproducer from #14 no longer crashes. (different JVMs also behave differently, with Ubuntu build of OpenJDK not crashing even with the released version of JDBC driver, but that's likely due to different library loading order).

@Y-- helped me look at the difference between the two drivers, and it seems the manylinux-built driver contains two extra libraries that ubuntu-built driver does not -- libdl.so.2 and libpthread.so.0:

/tmp/official> ldd libduckdb_java.so_linux_amd64
	linux-vdso.so.1 (0x00007ffc2e9e0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7a39e39000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7a39e34000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7a37200000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7a37519000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7a39e14000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a36e00000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7a39e52000)

/tmp/mine> ldd libduckdb_java.so_linux_amd64
	linux-vdso.so.1 (0x00007ffcbcb7f000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe37ac00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe3802f0000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe3802d0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe37a800000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe3803eb000)

tchiotludo · 2024-02-28T15:03:37Z

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

Mause · 2024-02-28T15:42:48Z

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

Does the LD_PRELOAD workaround fix it for you as well?

loicmathieu · 2024-02-28T16:23:06Z

@Mause yes it works, but as I said, we cannot control the environment of our users so it's not a solution.

armetiz · 2024-05-20T08:12:56Z

I'm a Kestra user,

Do you think that problem will be fixed in DuckDB next releases or do I have to deal with LD_PRELOAD solution?

armetiz · 2024-06-05T10:25:27Z

I tried to reproduce the error to help resolve this issue, but It works.

I create a Dockerfile using latest version of eclipse-temurin.
Create a Java application that fetch a remote Parquet.

Display the SQL results : ✅

Both 0.9.0 and 0.10.3 are working.

Here the gist with all used files : https://gist.github.com/armetiz/e4ffd81189eb334c5acdf3e9e9796940

Can I try something else to reproduce the problem and hope for a solution?

Regards

outputs

➜  duckdb-jdbc docker build -t helloworld .
➜  duckdb-jdbc docker run helloworld:latest
DuckDB - About SIGSEGV
01001_1, 1001_1, 01001, 1, bureau 1,  , Salle des fêtes, 01400, abergement clemenciat, 448, 448.0, 01001_0001, 
01002_1, 1002_1, 01002, 1, mairie, 1, Place de la Mairie, 01640, l abergement de varey, 157, 143.0, 01002_0001, 
01004_1, 1004_1, 01004, 1, b1 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 633, 630.0, 01004_0001, 
01004_2, 1004_2, 01004, 2, b2 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 640, 638.0, 01004_0002, 
01004_3, 1004_3, 01004, 3, b3 chateau des echelles,  , RUE DES ARENES, 01500, amberieu en bugey, 736, 730.0, 01004_0003, 
01004_4, 1004_4, 01004, 4, b4 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 532, 527.0, 01004_0004, 
01004_5, 1004_5, 01004, 5, b5 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 531, 529.0, 01004_0005, 
01004_6, 1004_6, 01004, 6, b6 groupe scolaire jules ferry,  , RUE VICTOR HUGO, 01500, amberieu en bugey, 628, 627.0, 01004_0006, 
01004_7, 1004_7, 01004, 7, b7 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 582, 577.0, 01004_0007, 
01004_8, 1004_8, 01004, 8, b8 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 691, 688.0, 01004_0008,

loicmathieu · 2024-06-05T10:39:08Z

@armetiz on Kestra, this issue only occurs if the rocksdb native library is loaded before the duckdb native library, this happens in Kestra EE.

armetiz · 2024-06-05T13:26:51Z

Hi @Mause I tried to reproduce your Maven configuration within a Docker container.

But as you can see, I could not reproduce the error : https://github.com/armetiz/dockerfile-maven-duckdb-rockdbs

elefeint · 2024-08-09T15:30:17Z

The reproduction of this requires building DuckDB on manylinux2014 but running the Java application on a modern system. Docker file reproducing the issue with a debug version of DuckDB: Dockerfile.txt

Output:

0.256 *** BEFORE LOADING ROCKSDB ***                                                                                                                          
0.363 *** AFTER LOADING ROCKSDB ***                                                                                                                           
2.095 #                                                                                                                                                       
2.095 # A fatal error has been detected by the Java Runtime Environment:                                                                                      
2.095 #
2.095 #  SIGSEGV (0xb) at pc=0x00007ee3aca595cc, pid=7, tid=8
2.095 #
2.095 # JRE version: OpenJDK Runtime Environment Temurin-17.0.12+7 (17.0.12+7) (build 17.0.12+7)
2.095 # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (17.0.12+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
2.095 # Problematic frame:
2.095 # C  [libduckdb_java10883698283861250744.so+0x2cec5cc]  duckdb::Vector::GetVectorType() const+0xc
2.096 #
2.096 # Core dump will be written. Default location: //core.7
2.096 #
2.096 # An error report file with more information is saved as:
2.096 # //hs_err_pid7.log
2.249 #
2.249 # If you would like to submit a bug report, please visit:
2.249 #   https://github.com/adoptium/adoptium-support/issues
2.249 # The crash happened outside the Java Virtual Machine in native code.
2.249 # See problematic frame for where to report the bug.
2.249 #
2.460 Aborted (core dumped)

Ben8t · 2024-10-09T14:38:30Z

@Mause any update on this ?

Ben8t · 2024-10-10T13:43:18Z

Comments from @elefeint on Slack

DuckDB labs has to balance supporting older systems with the problems that the older system libraries sometimes cause. We'll follow up to see if there are any plans on DuckDB side to move up from manylinux2014 for their builds, but I am curious -- from the Kestra side, is it necessary to include all the drivers in a Kestra installation? This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.
The other way you could unblock Kestra users is by building DuckDB from source on a newer base image.

@loicmathieu does it help in someway ? Or best option is to wait for followup on "plans on DuckDB side to move up from manylinux2014 for their builds" ?

loicmathieu · 2024-10-10T13:46:18Z

I think I told them in their Slack, but we use Kafka Stream that uses RocksDB so we don't have the choice here.
We will need to wait but this issue is one year old now ...

anna-geller · 2024-10-18T10:47:06Z

This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.

Seeing the number of upvotes, comments and the number of kestra users asking about it on our Slack, the population of simultaneous RocksDB and DuckDB users doesn't seem to be that small. The issue has been open for a while now; @Mause @elefeint, could someone give specific guidance on how to proceed or fix this on the DuckDB side? This is a large hindrance for many users and we are blocked.

shoffmeister · 2024-10-24T11:24:55Z

I am getting this, too, and for me it is happening in a CI build container, with Kafka Streams + RocksDB + DuckDB.

This works locally on ArchLinux, but the build container is some kind of "VERSION="24.04.1 LTS (Noble Numbat)"

Mause · 2024-10-24T11:42:14Z

This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.

Seeing the number of upvotes, comments and the number of kestra users asking about it on our Slack, the population of simultaneous RocksDB and DuckDB users doesn't seem to be that small. The issue has been open for a while now; @Mause @elefeint, could someone give specific guidance on how to proceed or fix this on the DuckDB side? This is a large hindrance for many users and we are blocked.

Can I ask for a link to the RocksDB issue?

anna-geller · 2024-10-24T12:01:05Z

Sorry if it was confusing: I meant Kestra users who use RocksDB indirectly. I'm not aware of any RocksDB issue about this

Mause · 2024-10-24T12:09:59Z

Could you please raise one then? I'd be curious to hear what they have to say on this issue, given it seems to be something they're doing to cause this

shoffmeister · 2024-10-24T17:15:04Z

FWIW, I am using "plain" Spring Boot Cloud Kafka Streams (which obviously includes RocksDB), and I do see failure in running unit tests in a specific Maven CI container (from Docker Hub).

Note: I only use DuckDB in the unit tests, so RocksDB definitely loads first. There is nothing Kestra here on my end. I do acknowledge that Kestra may have material commercial interest in getting that fixed. ;)

Since ELF is late-binding by default, and looking at the LD_PRELOAD magic that's being mentioned, I suspect that the (plain) Kafka Streams JAR (https://central.sonatype.com/artifact/org.apache.kafka/kafka-streams/versions) bundles a binary copy of RocksDB (which is C++, see https://github.com/facebook/rocksdb/blob/main/CMakeLists.txt) which has been built in a way, and with flags, that is ... "unexpected"?

After all, all the LD_PRELOAD magic does is early-resolve the symbols in libstdc++, and to symbols from that very same ELF binary.

I'll try to amend https://github.com/Mause/duckdb_rocksdb_crash with a containerized build script that allows for global reproducibility. I believe this is critical, as on my personal development environment (which is Arch Linux), all is - or appears to be! - well.

elefeint · 2024-10-24T19:29:18Z

@shoffmeister you can use the Dockerfile in #14 (comment) to reproduce

shoffmeister · 2024-10-25T06:22:40Z

Using the baseline that @Mause created in https://github.com/Mause/duckdb_rocksdb_crash, this simple script will reproduce:

#!/usr/bin/env bash

# https://hub.docker.com/_/maven
# https://github.com/carlossg/docker-maven/blob/8cfe24baffa5b250f7bb2d31ce233fc28f3c4f20/eclipse-temurin-17/Dockerfile
IMAGE=maven:3-eclipse-temurin-17
IMAGE=maven@sha256:cf1bca11a285e887efebe851d8e55e4defa326b7ca29a68920f1c9dccc5dad4f

docker run -it --rm --name duckdb-rocksdb_crash \
    -u "$(id -u):$(id -g)" \
    -v "$(pwd)":/usr/src/build:rw \
    -v "$HOME/.m2":/usr/src/.m2:rw \
    -e "MAVEN_OPTS=-Dmaven.repo.local=/usr/src/.m2/repository" \
    -w /usr/src/build \
    ${IMAGE} \
    mvn --offline clean verify

Assumptions:

your Maven settings.xml does not exist (or is fully usable inside a container)
mvn clean package once before running inside the container

Running that script then yields

SIGSEGV (0xb) at pc=0x00007d0bb4fc00e0, pid=1, tid=26

The image used above is a very recent Ubuntu 24.04.01, directly from Docker Hub, with a very recent Maven, and Java 17. Since that is a moving target, the digest nails the exact image identity for easier (future) reproducibility.

Adding

    -e "LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6" \

to the mix removes the crash. Alas, adding LD_DEBUG=all hangs and does not yield useful output beyond

/usr/bin/java: error: symbol lookup error: undefined symbol: JNI_OnLoad_rocksdbjni-linux64 (fatal)

both inside the container and on the host. I suspect that this is due to the way the Java reproducer has been done, and that forking gets into the way.

While I used https://github.com/Mause/duckdb_rocksdb_crash to reproduce, this most likely mirrors my local setup where RocksDB gets pulled by Kafka Streams (which is pulled by Spring Cloud Kafka)

To me it is somewhat surprising that my local development environment is totally fine (Arch Linux as a rolling always-up-to-date distribution), but that the rather modern Maven build container shows the problem.

This would suggest that the way the runtime distribution's user land has been built also plays a role.

shoffmeister · 2024-10-25T06:50:31Z

There is a different failure mode when building the reproducer as a fat JAR application (outside of the container) and then running that inside the container: the process simply hangs.

With the above it is also clear, IMHO, that the distribution on which this is run does matter.

shoffmeister · 2024-10-25T07:21:33Z

The answer might be hiding in the (differential) output of

LD_DEBUG=symbols LD_DEBUG_OUTPUT=debug.log java -jar target/my-app-1.0-SNAPSHOT-jar-with-dependencies.jar

where the JAR is the fat JAR, and this is being run inside two different Java-enabled containers with different user libraries, where one container shows the problem, the other not.

shoffmeister · 2024-10-25T07:49:53Z

The content of tensorflow/tensorflow#61741 is intriguing.

My initial reproducer in a large Kafka Streams application actually (also) shows

java.sql.SQLException: Invalid Error: random_device could not be read: Bad file descriptor
	at org.duckdb.DuckDBNative.duckdb_jdbc_startup(Native Method) ~[duckdb_jdbc-1.1.0.jar:na]

which I believe has the same root cause as this crash here.

loicmathieu · 2024-10-25T08:01:02Z

The issue at RocksDB side: facebook/rocksdb#13092
Hope they will be able to help us.

shoffmeister · 2024-10-25T09:29:20Z

There is interesting analysis at pytorch/pytorch#102360 in the comments, specifically on the subject of USE_POSIX_FILE_IO and _GLIBCXX_USE_CXX11_ABI and binding of imports.

They are fighting with what I gather to be a very similar problem; the root cause there apparently is "cross-talk" of C++ std::random functions from dynamically loaded binaries. They also preload the system libstdc++ which is the obvious solution in their case.

If I interpret

        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=java [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/opt/java/openjdk/bin/../lib/libjli.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/opt/java/openjdk/lib/server/libjvm.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/tmp/libduckdb_java14682458476252195639.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libstdc++.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=java [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/opt/java/openjdk/bin/../lib/libjli.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/opt/java/openjdk/lib/server/libjvm.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/tmp/librocksdbjni1494878192266256145.so [0]

right, from the LD_DEBUG above, in the affected container, and then add

nm --demangle /tmp/librocksdbjni1494878192266256145.so | grep random_device

getting

0000000000a7b080 T std::random_device::_M_init_pretr1(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
0000000000a80f60 T std::random_device::_M_init_pretr1(std::string const&)
0000000000a7b280 T std::random_device::_M_getval_pretr1()
0000000000a7af90 T std::random_device::_M_fini()
0000000000a7aee0 T std::random_device::_M_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
0000000000a80eb0 T std::random_device::_M_init(std::string const&)
0000000000a7afb0 T std::random_device::_M_getval()

This means that RocksDB contains these symbols, and symbol resolution eventually hits that.

The difference between my local Arch Linux machine and the container in the libstdc++ library is

container:

  4840: 00000000000e9910  1024 FUNC    GLOBAL DEFAULT   15 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21

arch:

  4841: 00000000000e0950  1045 FUNC    GLOBAL DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
   527: 000000000009c01a   122 FUNC    LOCAL  DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE.cold
  5799: 00000000000e0950  1045 FUNC    GLOBAL DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

so my Arch provides _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE.

In the Ubuntu container I reproduce with, libstdc++ does not have the general export, so DuckDB seems to be binding to the equivalent export in RocksDB upon the attempt to init the random subsystem. And that ... is not good.

From the looks of it, the RocksDB shared object should not contain those exports, i.e.

readelf --wide --syms /tmp/librocksdbjni1494878192266256145.so | grep random_device

should return nothing.

Note that /tmp/librocksdbjni1494878192266256145.so is what the JVM extracts into /tmp from the RocksDB JAR as the JNI library.

loicmathieu · 2024-10-25T09:53:09Z

@shoffmeister maybe you can give this information in the issue I just raised at RockDB side: facebook/rocksdb#13092

shoffmeister · 2024-10-25T10:11:38Z

Added some context on the RocksDB repository asking why

unzip -o rocksdbjni-9.6.1.jar librocksdbjni-linux64.so -d . &&  nm --demangle ./librocksdbjni-linux64.so | cut -c 18- | grep 'T std::' | sort

shows so many symbols exported from RocksDB in the std:: namespace.

shoffmeister · 2024-10-25T10:37:00Z

FWIW, I just took a look at the DuckDB JNI

nm --demangle ./libduckdb_java15504998828832578825.so | cut -c 18- | grep 'T '

or

readelf --syms --wide --demangle ./libduckdb_java15504998828832578825.so

and it would seem as if the DSO also exports a massive amount of symbols beyond need. That does not isolate well, so DuckDB also seems to be doing what RocksDB is doing. cc'ing @carlopi because he seems to be the resident "having fun with operating system library linkage" person ;)

Example:

FUNC    GLOBAL DEFAULT   11 icu_66::double_conversion::Bignum::SubtractBignum(icu_66::double_conversion::Bignum const&)

As mentioned for RocksDB, I would only expect those exported:

readelf --syms --wide --demangle ./libduckdb_java15504998828832578825.so | grep Java_org_duckdb

because anything else pollutes the ELF symbol namespace, interferes with (late-binding) symbol resolution, and can have side-effects. A JNI should not have symbol side-effects on being loaded into a native operating system process, IMHO.

shoffmeister · 2024-11-02T18:37:46Z

FWIW, in facebook/rocksdb#13092 I have confirmed that a custom build of the RocksDB JNI shared object with libstdc++ symbols hidden makes the SIGSEGV in DuckDB go away.

For details, see the conversation there.

loicmathieu referenced this issue in kestra-io/plugin-jdbc Oct 18, 2023

fix: downgrade DuckDB driver to 0.8.1

c01705d

See https://github.com/duckdb/duckdb/issues/9383

This was referenced Oct 18, 2023

fix: downgrade DuckDB driver to 0.8.1 kestra-io/plugin-jdbc#164

Merged

[DuckDB] Unable to upgrade to version 0.9.0 kestra-io/plugin-jdbc#165

Open

hannes transferred this issue from duckdb/duckdb May 24, 2024

Ben8t mentioned this issue Oct 10, 2024

DuckDB plugin does not support DuckDBArray data type kestra-io/kestra#4303

Open

Ben8t mentioned this issue Oct 18, 2024

chore(deps): bump org.duckdb:duckdb_jdbc from 0.8.1 to 1.1.1 kestra-io/plugin-jdbc#387

Closed

loicmathieu mentioned this issue Oct 25, 2024

SIGSEGV when using DuckDB and RocksDB facebook/rocksdb#13092

Open

carlopi added the reproduced label Oct 25, 2024

shoffmeister mentioned this issue Oct 26, 2024

JNI shared object pollutes ELF symbol namespace #102

Open

[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

Comments

loicmathieu commented Oct 18, 2023

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest main branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

loicmathieu commented Oct 18, 2023

carlopi commented Oct 18, 2023 • edited Loading

loicmathieu commented Oct 18, 2023

carlopi commented Oct 18, 2023

loicmathieu commented Oct 18, 2023

Mause commented Oct 18, 2023

loicmathieu commented Oct 18, 2023

Mause commented Oct 18, 2023

Mause commented Oct 18, 2023

Mause commented Oct 18, 2023 • edited Loading

loicmathieu commented Oct 18, 2023

loicmathieu commented Oct 18, 2023

Mause commented Nov 6, 2023

Mause commented Nov 6, 2023

loicmathieu commented Dec 8, 2023

elefeint commented Dec 13, 2023

tchiotludo commented Feb 28, 2024

Mause commented Feb 28, 2024

loicmathieu commented Feb 28, 2024

armetiz commented May 20, 2024

armetiz commented Jun 5, 2024 • edited Loading

outputs

loicmathieu commented Jun 5, 2024

armetiz commented Jun 5, 2024

elefeint commented Aug 9, 2024

Ben8t commented Oct 9, 2024

Ben8t commented Oct 10, 2024

loicmathieu commented Oct 10, 2024

anna-geller commented Oct 18, 2024 • edited Loading

shoffmeister commented Oct 24, 2024 • edited Loading

Mause commented Oct 24, 2024

anna-geller commented Oct 24, 2024

Mause commented Oct 24, 2024

shoffmeister commented Oct 24, 2024

elefeint commented Oct 24, 2024

shoffmeister commented Oct 25, 2024

shoffmeister commented Oct 25, 2024

shoffmeister commented Oct 25, 2024

shoffmeister commented Oct 25, 2024

loicmathieu commented Oct 25, 2024

shoffmeister commented Oct 25, 2024 • edited Loading

loicmathieu commented Oct 25, 2024

shoffmeister commented Oct 25, 2024

shoffmeister commented Oct 25, 2024

shoffmeister commented Nov 2, 2024

Have you tried this on the latest `main` branch?

carlopi commented Oct 18, 2023 •

edited

Loading

Mause commented Oct 18, 2023 •

edited

Loading

armetiz commented Jun 5, 2024 •

edited

Loading

anna-geller commented Oct 18, 2024 •

edited

Loading

shoffmeister commented Oct 24, 2024 •

edited

Loading

shoffmeister commented Oct 25, 2024 •

edited

Loading