Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-15111: Use JDK8 Base64 instead of own implementation #2252

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

asalamon74
Copy link
Contributor

Description

JDK8 has a builtin Base64 encoder and decoder, there is no need to use own implementaion for this.

Solution

Eliminate own implementation.

Tests

Unit tests.

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the master branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Ref Guide (for Solr changes only).

Copy link
Contributor

@madrob madrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised to see StandardCharsets.ISO_8859_1 in use, most of the bytes we work with are UTF-8. Can you clarify?

@@ -298,7 +299,7 @@ private static String getFieldFlags( SchemaField f )

BytesRef bytes = field.binaryValue();
if (bytes != null) {
f.add( "binary", Base64.byteArrayToBase64(bytes.bytes, bytes.offset, bytes.length));
f.add( "binary", StandardCharsets.ISO_8859_1.decode(Base64.getEncoder().encode(bytes.wrapToByteBuffer())).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, why are we decoding something that we just encoded?

@asalamon74
Copy link
Contributor Author

@madrob

ISO_8859_1

In the first version I used UTF_8 but later I checked the source code of java.util.Base64 ( http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/java/util/Base64.java ) and they are using ISO-8859-1 internally, so I thought to use the same.

public byte[] decode(String src) {
    return decode(src.getBytes(StandardCharsets.ISO_8859_1));
}

Probably we could use both character encodings, since base64 encoded strings only use a small subset of the characters.

encode + decode

StandardCharsets.ISO_8859_1.decode(Base64.getEncoder().encode(bytes.wrapToByteBuffer())).toString());

First I base64 encode the ByteBuffer which gives me a ByteBuffer then I convert this ByteBuffer to String using Charsets decode ( https://stackoverflow.com/a/39845152 ). So it's an encode + a decode but it's a different type of encoding/decoding.

@madrob
Copy link
Contributor

madrob commented Jan 29, 2021

Unfortunately, we can’t use the solution provided from stack overflow -http://apache.org/legal/resolved.html#stackoverflow

Can you contact the original author and ask for permission to use this? Otherwise we will need somebody who hasn’t looked this code to create a clean room implementation.

@asalamon74
Copy link
Contributor Author

We also used a different way for String conversion, I modified the lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants