Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

Open
1 task done
loicmathieu opened this issue Oct 18, 2023 · 44 comments
Open
1 task done

[JDBC] DuckDB JDBC driver SIGSEGV the JVM since 0.9.0 #14

loicmathieu opened this issue Oct 18, 2023 · 44 comments

Comments

@loicmathieu
Copy link

What happens?

Since version 0.9.0, using the DuckDB JDBC driver in a Java application makes the application crash with a SIGSEGV.
The Java version is 17.0.5 (tested also on 17.0.8.1).

There is first a Java exception

java.sql.SQLException: random_device could not be read
	at org.duckdb.DuckDBNative.duckdb_jdbc_startup(Native Method)
	at org.duckdb.DuckDBConnection.newConnection(DuckDBConnection.java:48)
	at org.duckdb.DuckDBDriver.connect(DuckDBDriver.java:38)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:681)
	at java.sql/java.sql.DriverManager.getConnection(DriverManager.java:190)
	at io.kestra.plugin.jdbc.JdbcConnectionInterface.connection(JdbcConnectionInterface.java:63)
	at io.kestra.plugin.jdbc.AbstractJdbcQuery.run(AbstractJdbcQuery.java:77)
	at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:148)
	at io.kestra.plugin.jdbc.duckdb.Query.run(Query.java:31)
	at io.kestra.core.runners.Worker$WorkerThread.run(Worker.java:674) 	

Then a JVM crash

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f523603fd60, pid=37746, tid=39346
#
# JRE version: OpenJDK Runtime Environment Temurin-17.0.5+8 (17.0.5+8) (build 17.0.5+8)
# Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.5+8 (17.0.5+8, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  0x00007f523603fd60
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to <redacted>)
#
# An error report file with more information is saved as:
# <redacted>
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

It works well with 0.8.0.

To Reproduce

Here is the SQL query:

      INSTALL httpfs;
      SELECT Title, max("Days In Top 10") 
      from (SELECT * FROM read_parquet('s3://duckdb-md-dataset-121/netflix_daily_top_10.parquet'))
      where Type='Movie'
      GROUP BY Title
      ORDER BY max("Days In Top 10") desc
      limit 5;

The code uses the standard Java JDBC API (Connection & Statement), but it is not easily extracted as it runs via Kestra DuckDB plugin.

OS:

Ubuntu 23.04

DuckDB Version:

0.9.0

DuckDB Client:

Java JDBC

Full Name:

Loïc Mathieu

Affiliation:

Kestra

Have you tried this on the latest main branch?

I have tested with a release build (and could not test with a main build)

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have
@loicmathieu
Copy link
Author

See the corresponding core dump log:
hs_err_pid37746.log

@carlopi
Copy link
Contributor

carlopi commented Oct 18, 2023

Could you try with v0.9.1 that should contain some potentially relevant fixes?

@loicmathieu
Copy link
Author

Already tried and we have the exact same issue (the stacktrace and core dump are from 0.9.1, I downgrade to 0.9.0 with the same issue)

@carlopi
Copy link
Contributor

carlopi commented Oct 18, 2023

The fix in the httpfs extension has gone live a couple of hours ago, could you potentially give another try (while performing before hand FORCE INSTALL httfps once, as explained here: duckdb/duckdb#9340 (comment)?).

This would NOT solve the random_device issue, but might solve the crash if they are independent.

@loicmathieu
Copy link
Author

Even with FORCE INSTALL httpfs I have the same issue.

By the way, the random_device issue didn't appears on 0.8.0 so it may not be the same issue as duckdb/duckdb#9340

@Mause
Copy link
Member

Mause commented Oct 18, 2023

Can you try the solution mentioned in duckdb/duckdb#8708 (comment) ?

@loicmathieu
Copy link
Author

@Mause the solution by setting LD_PRELOAD works, however, this is not a proper fix for us as we cannot control the environment in which our user will run our code.

@Mause
Copy link
Member

Mause commented Oct 18, 2023

It's less a permanent solution as it is confirming it's the same issue, and one we've seen before (though that was with tensorflow and python)

@Mause
Copy link
Member

Mause commented Oct 18, 2023

For our own reference, are you using any other java libraries that are backed by a C++ library?

@Mause
Copy link
Member

Mause commented Oct 18, 2023

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

@loicmathieu
Copy link
Author

For our own reference, are you using any other java libraries that are backed by a C++ library?

It is very difficult to answer this question as this kind of information is usually not documented. We are using literally hundreds of libraries (maybe even more than a thousand as we have 400 plugins).

Our runtime uses Netty which, for sure, uses native libraries.

@loicmathieu
Copy link
Author

I notice /tmp/librocksdbjni14687608028396635175.so is mentioned in the dump, do you see the issue if you exclude that library/don't load it before duckdb?

Oh!, this can explain why we're only seeing this when using our Kafka runner and not our JDBC runner (we can launch Kestra with two different runners). So yes I confirm this works when we don't use Kafka (so no use of rocksdb).

@Mause
Copy link
Member

Mause commented Nov 6, 2023

For our future reference, this is enough to trigger the crash: https://github.com/Mause/duckdb_rocksdb_crash/blob/main/src/test/java/com/mycompany/app/AppTest.java

@Mause
Copy link
Member

Mause commented Nov 6, 2023

Or a crash anyway, not certain it's the same one

@loicmathieu
Copy link
Author

Hi,
Do you have any news on this?
It prevent us to upgrade to driver version 0.9.2 so it prevent us to use MotherDuck as MotherDuck only supports DuckDB 0.9.2!

@elefeint
Copy link
Contributor

I wonder if this issue manifested itself in 0.9.x as a side effect of the rease build moving to manylinux.

I've built a local version of DuckDB JDBC driver with the codebase as of v0.9.2 tag using Ubuntu 22.04, and @Mause 's reproducer from #14 no longer crashes. (different JVMs also behave differently, with Ubuntu build of OpenJDK not crashing even with the released version of JDBC driver, but that's likely due to different library loading order).

@Y-- helped me look at the difference between the two drivers, and it seems the manylinux-built driver contains two extra libraries that ubuntu-built driver does not -- libdl.so.2 and libpthread.so.0:

/tmp/official> ldd libduckdb_java.so_linux_amd64
	linux-vdso.so.1 (0x00007ffc2e9e0000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7a39e39000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7a39e34000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f7a37200000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7a37519000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7a39e14000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7a36e00000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7a39e52000)

/tmp/mine> ldd libduckdb_java.so_linux_amd64
	linux-vdso.so.1 (0x00007ffcbcb7f000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fe37ac00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fe3802f0000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fe3802d0000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe37a800000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fe3803eb000)

@tchiotludo
Copy link

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

@Mause
Copy link
Member

Mause commented Feb 28, 2024

any update on how to fix it? we have user blocked on version 0.8 and asking feature for latest version.

Does the LD_PRELOAD workaround fix it for you as well?

@loicmathieu
Copy link
Author

@Mause yes it works, but as I said, we cannot control the environment of our users so it's not a solution.

@armetiz
Copy link

armetiz commented May 20, 2024

I'm a Kestra user,

Do you think that problem will be fixed in DuckDB next releases or do I have to deal with LD_PRELOAD solution?

@hannes hannes transferred this issue from duckdb/duckdb May 24, 2024
@armetiz
Copy link

armetiz commented Jun 5, 2024

I tried to reproduce the error to help resolve this issue, but It works.

I create a Dockerfile using latest version of eclipse-temurin.
Create a Java application that fetch a remote Parquet.

Display the SQL results : ✅

Both 0.9.0 and 0.10.3 are working.

Here the gist with all used files : https://gist.github.com/armetiz/e4ffd81189eb334c5acdf3e9e9796940

Can I try something else to reproduce the problem and hope for a solution?

Regards


outputs

➜  duckdb-jdbc docker build -t helloworld .
➜  duckdb-jdbc docker run helloworld:latest
DuckDB - About SIGSEGV
01001_1, 1001_1, 01001, 1, bureau 1,  , Salle des fêtes, 01400, abergement clemenciat, 448, 448.0, 01001_0001, 
01002_1, 1002_1, 01002, 1, mairie, 1, Place de la Mairie, 01640, l abergement de varey, 157, 143.0, 01002_0001, 
01004_1, 1004_1, 01004, 1, b1 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 633, 630.0, 01004_0001, 
01004_2, 1004_2, 01004, 2, b2 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 640, 638.0, 01004_0002, 
01004_3, 1004_3, 01004, 3, b3 chateau des echelles,  , RUE DES ARENES, 01500, amberieu en bugey, 736, 730.0, 01004_0003, 
01004_4, 1004_4, 01004, 4, b4 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 532, 527.0, 01004_0004, 
01004_5, 1004_5, 01004, 5, b5 espace 1500,  , AVENUE LEON BLUM, 01500, amberieu en bugey, 531, 529.0, 01004_0005, 
01004_6, 1004_6, 01004, 6, b6 groupe scolaire jules ferry,  , RUE VICTOR HUGO, 01500, amberieu en bugey, 628, 627.0, 01004_0006, 
01004_7, 1004_7, 01004, 7, b7 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 582, 577.0, 01004_0007, 
01004_8, 1004_8, 01004, 8, b8 ecole maternelle de tiret,  , RUE JACQUES PREVERT, 01500, amberieu en bugey, 691, 688.0, 01004_0008, 

@loicmathieu
Copy link
Author

@armetiz on Kestra, this issue only occurs if the rocksdb native library is loaded before the duckdb native library, this happens in Kestra EE.

@armetiz
Copy link

armetiz commented Jun 5, 2024

Hi @Mause I tried to reproduce your Maven configuration within a Docker container.

But as you can see, I could not reproduce the error : https://github.com/armetiz/dockerfile-maven-duckdb-rockdbs

@elefeint
Copy link
Contributor

elefeint commented Aug 9, 2024

The reproduction of this requires building DuckDB on manylinux2014 but running the Java application on a modern system. Docker file reproducing the issue with a debug version of DuckDB: Dockerfile.txt

Output:

0.256 *** BEFORE LOADING ROCKSDB ***                                                                                                                          
0.363 *** AFTER LOADING ROCKSDB ***                                                                                                                           
2.095 #                                                                                                                                                       
2.095 # A fatal error has been detected by the Java Runtime Environment:                                                                                      
2.095 #
2.095 #  SIGSEGV (0xb) at pc=0x00007ee3aca595cc, pid=7, tid=8
2.095 #
2.095 # JRE version: OpenJDK Runtime Environment Temurin-17.0.12+7 (17.0.12+7) (build 17.0.12+7)
2.095 # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.12+7 (17.0.12+7, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
2.095 # Problematic frame:
2.095 # C  [libduckdb_java10883698283861250744.so+0x2cec5cc]  duckdb::Vector::GetVectorType() const+0xc
2.096 #
2.096 # Core dump will be written. Default location: //core.7
2.096 #
2.096 # An error report file with more information is saved as:
2.096 # //hs_err_pid7.log
2.249 #
2.249 # If you would like to submit a bug report, please visit:
2.249 #   https://github.com/adoptium/adoptium-support/issues
2.249 # The crash happened outside the Java Virtual Machine in native code.
2.249 # See problematic frame for where to report the bug.
2.249 #
2.460 Aborted (core dumped)

@Ben8t
Copy link

Ben8t commented Oct 9, 2024

@Mause any update on this ?

@Ben8t
Copy link

Ben8t commented Oct 10, 2024

Comments from @elefeint on Slack

DuckDB labs has to balance supporting older systems with the problems that the older system libraries sometimes cause. We'll follow up to see if there are any plans on DuckDB side to move up from manylinux2014 for their builds, but I am curious -- from the Kestra side, is it necessary to include all the drivers in a Kestra installation? This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.
The other way you could unblock Kestra users is by building DuckDB from source on a newer base image.

@loicmathieu does it help in someway ? Or best option is to wait for followup on "plans on DuckDB side to move up from manylinux2014 for their builds" ?

@loicmathieu
Copy link
Author

I think I told them in their Slack, but we use Kafka Stream that uses RocksDB so we don't have the choice here.
We will need to wait but this issue is one year old now ...

@anna-geller
Copy link

anna-geller commented Oct 18, 2024

This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.

Seeing the number of upvotes, comments and the number of kestra users asking about it on our Slack, the population of simultaneous RocksDB and DuckDB users doesn't seem to be that small. The issue has been open for a while now; @Mause @elefeint, could someone give specific guidance on how to proceed or fix this on the DuckDB side? This is a large hindrance for many users and we are blocked.

@shoffmeister
Copy link

shoffmeister commented Oct 24, 2024

I am getting this, too, and for me it is happening in a CI build container, with Kafka Streams + RocksDB + DuckDB.

This works locally on ArchLinux, but the build container is some kind of "VERSION="24.04.1 LTS (Noble Numbat)"

@Mause
Copy link
Member

Mause commented Oct 24, 2024

This problem only manifests when both RocksDB and DuckDB are both on the classpath, and the population of users who need both simultaneously is likely small.

Seeing the number of upvotes, comments and the number of kestra users asking about it on our Slack, the population of simultaneous RocksDB and DuckDB users doesn't seem to be that small. The issue has been open for a while now; @Mause @elefeint, could someone give specific guidance on how to proceed or fix this on the DuckDB side? This is a large hindrance for many users and we are blocked.

Can I ask for a link to the RocksDB issue?

@anna-geller
Copy link

Sorry if it was confusing: I meant Kestra users who use RocksDB indirectly. I'm not aware of any RocksDB issue about this

@Mause
Copy link
Member

Mause commented Oct 24, 2024

Could you please raise one then? I'd be curious to hear what they have to say on this issue, given it seems to be something they're doing to cause this

@shoffmeister
Copy link

FWIW, I am using "plain" Spring Boot Cloud Kafka Streams (which obviously includes RocksDB), and I do see failure in running unit tests in a specific Maven CI container (from Docker Hub).

Note: I only use DuckDB in the unit tests, so RocksDB definitely loads first. There is nothing Kestra here on my end. I do acknowledge that Kestra may have material commercial interest in getting that fixed. ;)

Since ELF is late-binding by default, and looking at the LD_PRELOAD magic that's being mentioned, I suspect that the (plain) Kafka Streams JAR (https://central.sonatype.com/artifact/org.apache.kafka/kafka-streams/versions) bundles a binary copy of RocksDB (which is C++, see https://github.com/facebook/rocksdb/blob/main/CMakeLists.txt) which has been built in a way, and with flags, that is ... "unexpected"?

After all, all the LD_PRELOAD magic does is early-resolve the symbols in libstdc++, and to symbols from that very same ELF binary.

I'll try to amend https://github.com/Mause/duckdb_rocksdb_crash with a containerized build script that allows for global reproducibility. I believe this is critical, as on my personal development environment (which is Arch Linux), all is - or appears to be! - well.

@elefeint
Copy link
Contributor

@shoffmeister you can use the Dockerfile in #14 (comment) to reproduce

@shoffmeister
Copy link

Using the baseline that @Mause created in https://github.com/Mause/duckdb_rocksdb_crash, this simple script will reproduce:

#!/usr/bin/env bash

# https://hub.docker.com/_/maven
# https://github.com/carlossg/docker-maven/blob/8cfe24baffa5b250f7bb2d31ce233fc28f3c4f20/eclipse-temurin-17/Dockerfile
IMAGE=maven:3-eclipse-temurin-17
IMAGE=maven@sha256:cf1bca11a285e887efebe851d8e55e4defa326b7ca29a68920f1c9dccc5dad4f

docker run -it --rm --name duckdb-rocksdb_crash \
    -u "$(id -u):$(id -g)" \
    -v "$(pwd)":/usr/src/build:rw \
    -v "$HOME/.m2":/usr/src/.m2:rw \
    -e "MAVEN_OPTS=-Dmaven.repo.local=/usr/src/.m2/repository" \
    -w /usr/src/build \
    ${IMAGE} \
    mvn --offline clean verify

Assumptions:

  • your Maven settings.xml does not exist (or is fully usable inside a container)
  • mvn clean package once before running inside the container

Running that script then yields

SIGSEGV (0xb) at pc=0x00007d0bb4fc00e0, pid=1, tid=26

The image used above is a very recent Ubuntu 24.04.01, directly from Docker Hub, with a very recent Maven, and Java 17. Since that is a moving target, the digest nails the exact image identity for easier (future) reproducibility.

Adding

    -e "LD_PRELOAD=/lib/x86_64-linux-gnu/libstdc++.so.6" \

to the mix removes the crash. Alas, adding LD_DEBUG=all hangs and does not yield useful output beyond

/usr/bin/java: error: symbol lookup error: undefined symbol: JNI_OnLoad_rocksdbjni-linux64 (fatal)

both inside the container and on the host. I suspect that this is due to the way the Java reproducer has been done, and that forking gets into the way.

While I used https://github.com/Mause/duckdb_rocksdb_crash to reproduce, this most likely mirrors my local setup where RocksDB gets pulled by Kafka Streams (which is pulled by Spring Cloud Kafka)

To me it is somewhat surprising that my local development environment is totally fine (Arch Linux as a rolling always-up-to-date distribution), but that the rather modern Maven build container shows the problem.

This would suggest that the way the runtime distribution's user land has been built also plays a role.

@shoffmeister
Copy link

There is a different failure mode when building the reproducer as a fat JAR application (outside of the container) and then running that inside the container: the process simply hangs.

With the above it is also clear, IMHO, that the distribution on which this is run does matter.

@shoffmeister
Copy link

The answer might be hiding in the (differential) output of

LD_DEBUG=symbols LD_DEBUG_OUTPUT=debug.log java -jar target/my-app-1.0-SNAPSHOT-jar-with-dependencies.jar

where the JAR is the fat JAR, and this is being run inside two different Java-enabled containers with different user libraries, where one container shows the problem, the other not.

@shoffmeister
Copy link

The content of tensorflow/tensorflow#61741 is intriguing.

My initial reproducer in a large Kafka Streams application actually (also) shows

java.sql.SQLException: Invalid Error: random_device could not be read: Bad file descriptor
	at org.duckdb.DuckDBNative.duckdb_jdbc_startup(Native Method) ~[duckdb_jdbc-1.1.0.jar:na]

which I believe has the same root cause as this crash here.

@loicmathieu
Copy link
Author

The issue at RocksDB side: facebook/rocksdb#13092
Hope they will be able to help us.

@shoffmeister
Copy link

shoffmeister commented Oct 25, 2024

There is interesting analysis at pytorch/pytorch#102360 in the comments, specifically on the subject of USE_POSIX_FILE_IO and _GLIBCXX_USE_CXX11_ABI and binding of imports.

They are fighting with what I gather to be a very similar problem; the root cause there apparently is "cross-talk" of C++ std::random functions from dynamically loaded binaries. They also preload the system libstdc++ which is the obvious solution in their case.

If I interpret

        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=java [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/opt/java/openjdk/bin/../lib/libjli.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/opt/java/openjdk/lib/server/libjvm.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/tmp/libduckdb_java14682458476252195639.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKSs;  lookup in file=/lib/x86_64-linux-gnu/libstdc++.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=java [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/opt/java/openjdk/bin/../lib/libjli.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libdl.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/opt/java/openjdk/lib/server/libjvm.so [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/librt.so.1 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/lib/x86_64-linux-gnu/libm.so.6 [0]
        11:	symbol=_ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE;  lookup in file=/tmp/librocksdbjni1494878192266256145.so [0]

right, from the LD_DEBUG above, in the affected container, and then add

nm --demangle /tmp/librocksdbjni1494878192266256145.so | grep random_device

getting

0000000000a7b080 T std::random_device::_M_init_pretr1(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
0000000000a80f60 T std::random_device::_M_init_pretr1(std::string const&)
0000000000a7b280 T std::random_device::_M_getval_pretr1()
0000000000a7af90 T std::random_device::_M_fini()
0000000000a7aee0 T std::random_device::_M_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
0000000000a80eb0 T std::random_device::_M_init(std::string const&)
0000000000a7afb0 T std::random_device::_M_getval()

This means that RocksDB contains these symbols, and symbol resolution eventually hits that.

The difference between my local Arch Linux machine and the container in the libstdc++ library is

container:

  4840: 00000000000e9910  1024 FUNC    GLOBAL DEFAULT   15 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21

arch:

  4841: 00000000000e0950  1045 FUNC    GLOBAL DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE@@GLIBCXX_3.4.21
   527: 000000000009c01a   122 FUNC    LOCAL  DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE.cold
  5799: 00000000000e0950  1045 FUNC    GLOBAL DEFAULT   12 _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

so my Arch provides _ZNSt13random_device7_M_initERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE.

In the Ubuntu container I reproduce with, libstdc++ does not have the general export, so DuckDB seems to be binding to the equivalent export in RocksDB upon the attempt to init the random subsystem. And that ... is not good.

From the looks of it, the RocksDB shared object should not contain those exports, i.e.

readelf --wide --syms /tmp/librocksdbjni1494878192266256145.so | grep random_device

should return nothing.

Note that /tmp/librocksdbjni1494878192266256145.so is what the JVM extracts into /tmp from the RocksDB JAR as the JNI library.

@loicmathieu
Copy link
Author

@shoffmeister maybe you can give this information in the issue I just raised at RockDB side: facebook/rocksdb#13092

@shoffmeister
Copy link

Added some context on the RocksDB repository asking why

unzip -o rocksdbjni-9.6.1.jar librocksdbjni-linux64.so -d . &&  nm --demangle ./librocksdbjni-linux64.so | cut -c 18- | grep 'T std::' | sort

shows so many symbols exported from RocksDB in the std:: namespace.

@shoffmeister
Copy link

FWIW, I just took a look at the DuckDB JNI

nm --demangle ./libduckdb_java15504998828832578825.so | cut -c 18- | grep 'T '

or

readelf --syms --wide --demangle ./libduckdb_java15504998828832578825.so

and it would seem as if the DSO also exports a massive amount of symbols beyond need. That does not isolate well, so DuckDB also seems to be doing what RocksDB is doing. cc'ing @carlopi because he seems to be the resident "having fun with operating system library linkage" person ;)

Example:

FUNC    GLOBAL DEFAULT   11 icu_66::double_conversion::Bignum::SubtractBignum(icu_66::double_conversion::Bignum const&)

As mentioned for RocksDB, I would only expect those exported:

readelf --syms --wide --demangle ./libduckdb_java15504998828832578825.so | grep Java_org_duckdb

because anything else pollutes the ELF symbol namespace, interferes with (late-binding) symbol resolution, and can have side-effects. A JNI should not have symbol side-effects on being loaded into a native operating system process, IMHO.

@shoffmeister
Copy link

FWIW, in facebook/rocksdb#13092 I have confirmed that a custom build of the RocksDB JNI shared object with libstdc++ symbols hidden makes the SIGSEGV in DuckDB go away.

For details, see the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants