Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inspect on strings with unicode #79

Merged
merged 3 commits into from
May 11, 2024

Conversation

dubzzz
Copy link
Contributor

@dubzzz dubzzz commented May 7, 2024

Fixes #78

expect(inspect('🐱🐱🐱', { truncate: 5 })).to.equal("'🐱…'")
})

it('truncates strings involving graphemes than truncate (5)', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting graphemes, is way more complex than surrogate pairs. As such splitting a grapheme (aka a visual unit from a user point of view) is acceptable for Unicode as the string remains valid. On the opposite splitting a surrogate pair is responsible to create strings that are considered invalid from an Unicode point of view as they contain illegal characters.

test/strings.js Outdated
@@ -59,6 +59,22 @@ describe('strings', () => {
expect(inspect('foobarbaz', { truncate: 3 })).to.equal("'…'")
})

it('truncates strings involving surrogate pairs longer than truncate (7)', () => {
expect(inspect('🐱🐱🐱', { truncate: 7 })).to.equal("'🐱🐱…'") // not '🐱🐱\ud83d…'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm the author of fast-check: a property based testing library. This case is well-suited for such tests, so I have to at least let you know of it. Test would be something like:

it('truncates strings into valid unicode strings', () => {
  fc.assert(fc.property(fc.fullUnicodeString(), fc.nat(), (text, truncate) => {
    expect(() => encodeURI(inspect(text, { truncate }))).not.to.throw()
  }))
})

And it would cover this specific case but possibly other that might came one day.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds awsome! Is there a chai integration 😉?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dedicated chai integration. At least nothing deeper than the suggestion I made 😅

I had a chat with maintainers of Jest in the past to wrap assert+property into some kind of expect(...).toSomething(...) but I/we never pushed the idea further. The only integration I have are mostly connecting it/test of the test runners with the library. But nothing at the level of expect itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keithamus let me know if the technique might be interesting for loupe/chaijs? If you feel it might I can try to propose a pull request with some tests using the technique (I can work on such PR even if it never gets merged)

@dubzzz
Copy link
Contributor Author

dubzzz commented May 7, 2024

Note: Based on the note that loupe is reimplementing part of Node.js' util.inspect() I started to have a look into how the built-in library was dealing with such strings. Actually it does not handle them properly and truncates them in the middle of pairs and thus produces invalid strings.

@43081j
Copy link
Contributor

43081j commented May 7, 2024

Note: Based on the note that loupe is reimplementing part of Node.js' util.inspect() I started to have a look into how the built-in library was dealing with such strings. Actually it does not handle them properly and truncates them in the middle of pairs and thus produces invalid strings.

this was going to be my main question. what does node do?

i suppose we need to decide if it makes sense to match node or to make it "visually sensible" (assuming node just splits it down the middle like loupe does)

Copy link
Member

@keithamus keithamus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I was going to comment to the effect of "let's try to just truncate left if we're in the middle of a surrogate pair" in #78 but I thought I'd hold back to see what you came up with. This looks like an elegant fix and I'm glad we're tackling it nicely. A+ from me.

@dubzzz
Copy link
Contributor Author

dubzzz commented May 8, 2024

Woups, lint is not passing I'll fix it

src/helpers.ts Outdated Show resolved Hide resolved
@dubzzz
Copy link
Contributor Author

dubzzz commented May 8, 2024

🟢 Lint fixed
🟢 Shorter and more performant version of surrogate check

@keithamus keithamus merged commit e02467e into chaijs:main May 11, 2024
6 checks passed
dubzzz added a commit to dubzzz/loupe that referenced this pull request May 23, 2024
Following the discussion started on chaijs#79. I'm opening this PR as an opened question and suggestion to better cover the library and its edge-cases.

First, what is property-based testing?

It's a technic coming from functional world and aiming to help devs into detecting edge cases without having them to think of them too much.

Why, property-based testing?

Well, in order to reduce the risks of bugs and potentially regressions on key libraries.

Could it found problems?

Well, probably. I tried another property but I'm not sure whether or not the failure is considered as ok or not. As such I'm not sure of what to expect from truncate so I was not able to make a decision.

```js
it('produces strings with at most <truncate> characters', () => {
  fc.assert(
    fc.property(
      fc.anything({
        withBigInt: true,
        withBoxedValues: true,
        withDate: true,
        withMap: true,
        withNullPrototype: true,
        withObjectString: true,
        withSet: true,
        withSparseArray: true,
        withTypedArray: true,
        withUnicodeString: true,
      }),
      fc.nat(),
      (source, truncate) => {
        const output = inspect(source, { truncate })
        expect(output).to.have.lengthOf.at.most(truncate)
      }
    )
  )
})
```

Who am I?

I'm the author of fast-check, the library added by this PR. It's the leading property-based testing library today.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loupe is creating wrongly-formed strings when inspecting some strings
3 participants