Dependencies pollution wrap-up #16673

andsel · 2024-11-13T12:54:36Z

Abstract

From the discussion of logstash-plugins/logstash-integration-kafka#178 (comment) during the development of a PR to include AWS IAM as a SASL mechanism in Kafka integration plugin.
Adding that extension posed the problems:

the size of the overall transitive dependencies kicked in is not negligible. The gem size reach 65Mb from the original 13MB, and just for AWS service provider. In in a foreseeable future it would include also GCP IAM, the size could increment even more.
shadowing of class names. Some transitive dependencies included (i.e: Netty) are also included by other plugins (like TCP or Beats inputs) and not necessarily at the same version. Due to the flat classpath, shared between core and all plugins, this could pose some binary compatibility issues that aren't easy to resolve, or simply couldn't be resolved.

Shadowing of class names (pollution of classpath)

This typical problem can be resolved in two ways:

shading the conflicting dependencies under a specific package name (like something related to the plugin).
implement a child classloader that could handle loading of classes for each plugin.

Following we are going to discuss the pro, cons of each solution.

Shading dependencies

Shading all transitive dependencies under a package name related to the plugin is doable by leveraging the Gradle's GradleUp/shadow plugin https://github.com/GradleUp/shadow.
Before apply the shading we have to keep in mind some points:

don't include libraries that have pretty stable APIs, that doesn't change much during the development of minor releases. For example the API changes to Netty from version 4.1.nand 4.1.n+20 suppose, shouldn't change so much (like log4j-api), so those can be kept out of shading process.
don't shade class names that can be used in configuration. In the logstash-input-kafka AWS IAM authentication logstash-plugins/logstash-integration-kafka#178 there are a couple of settings (sasl_jaas_config sasl_client_callback_handler_class) that receives public AWS classes (like software.amazon.msk.auth.iam.IAMLoginModule and software.amazon.msk.auth.iam.IAMClientCallbackHandler). Those classes are an implicit interface used by user that couldn't be shaded.
some shading best practice include to create a new library (or submodule) for each shaded tree. So for example if a project depends on library software.amazon.msk:aws-msk-iam-auth:2.2.0 a new submodule (for example input-kafka-aws-msk-iam-auth-shaded) which only contains the dependency to shade and shading instructions must be created.

Pros & cons
Pros:

easier to implement, there is a Gradle plugin that can do this in excellent way
no headaches in get into the classloading mechanism of Java and JRuby

Cons:

changes has to be made on each plugin that uses external jars
attention has to be made to not shade (full) class names that are also used in the configuration strings
not all dependencies has good reason to be shaded, for example netty or log4j-api, which has pretty stable API.
dimension of the uber-jar
could require to handle the publishing of the uber-jar
minimal class reuse and potential class space explosion

Classloader per plugin

Another way to solve the classpath pollution problem is to isolate each plugin in its own classloader, so that different versions of same class can co-exists under different classloaders, one per plugin.
This is harder to implement because nests and intertwines into the operation of JRuby. This classloader segregation has to work also for mixed plugins, plugins that are ruby gems, contains some shim Ruby code but bundles jar classes used to operate. In this context, the classloader should co-operate with JRuby so that when the JRuby plugin's code load a Java class it uses the segregation classloader and not the standard JRuby classloading.
There is more things to understand if an how it's feasible, plus classloading code could be hard.

Pros & cons
Pros:

more elegant way to solve the problem
no need to change plugins build script
no need to select which dependencies to shade or not
no need to potentially ship and handle the lifecycle of the uber-jar

Cons:

nesting into the classloaders hierarchy could be tough
doesn't eliminate the problem of big gems with a lot of transitive dependencies

Size of the generated plugin gem

The other side of this problem regards the size of the gem can reach. In particular, like in SASL configuration use case, a gem could reach considerable dimension just to ship transitive dependencies that are needed in specific settings. Probably in majority of the uses cases those dependencies are not used but the users pays the penalty to use a gem bundled with everything.
To limit the size of transitive dependencies in a gem the idea is to offload those not mandatory into another artifact, for example:

another gem that can be installed once needed
an uber-jar that can be downloaded and put into the class loader's path.

Gem with optional dependencies

In this case the plugin that has optional dependencies would also generate and publish (how?) on rubygems a set of additional gems containing the full set of transitive dependencies required just for optional behaviours. In such case the documentation has to be updated to explain how and when those optional artifacts needs to be installed.
For the when question the answer is the documentation itself.
For the how an extension to bin/logstash-plugin tool can be imagined, so that it can installs also those kind of extensions gems.

Who's should maintain the transitive shaded uber jars?

In this case a strategy similar to JDBC plugin driver could be followed, asking the user to manually download an uber-jar which contains all the shaded dependencies, and set a path into the plugin configuration so that it can explicitly load the jar, like in:

require uber.jar

This poses a couple of questions: where to publish the uber jar and when to update it.

Where to publish?

As first though, being a Java jar, the answer could be the Maven repository, but has to be checked if there are any limitation to the size of the jar that can be uploaded, and is strictly related to the other question.

When to update?

A new version of the uber-jar should be published when the requiring plugin updates its library dependency version and release a new gem version. The uber-jar creation and publish could be either a task of the Gradle build script or could be shaped as an external CI pipeline, to be manually triggered. The optimal solution is to automate also that step, so that the plugin developer doesn't have to remember that step.

The text was updated successfully, but these errors were encountered:

andsel added enhancement status:needs-triage labels Nov 13, 2024

andsel self-assigned this Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependencies pollution wrap-up #16673

Dependencies pollution wrap-up #16673

andsel commented Nov 13, 2024 •

edited

Loading

Dependencies pollution wrap-up #16673

Dependencies pollution wrap-up #16673

Comments

andsel commented Nov 13, 2024 • edited Loading

Abstract

Shadowing of class names (pollution of classpath)

Shading dependencies

Classloader per plugin

Size of the generated plugin gem

Gem with optional dependencies

Who's should maintain the transitive shaded uber jars?

Where to publish?

When to update?

andsel commented Nov 13, 2024 •

edited

Loading