-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Adding function(s) to return the printable width of a String or Stringlike types? #3785
Comments
Hi, if you mean the amount of utf8 bytes that the character needs, all 3 stringlike types have a PS: if you're working a lot with strings, you should take a look at all the helpers in |
Hey @martinvuyk thanks for the response! I don't mean the byte length, but the width of the character once printed. Some characters have a printable width of 0, and others like emojis and East Asian characters can have a width of 2. For example: 🔥🔥🔥🔥 has a printable width of 8, with a length of 4 and a byte length of 16. Please correct me if I'm wrong on the byte length! |
Oh ok now I understand why the code forks for East Asian characters 😄. I think things like this and grapheme clusters might or might not be worth it adding to the stdlib. Insofar as I've read it seems like this is only used for terminal printing, so IMO this has less chances than grapheme clusters. Also another thing that I very often think about is memory cost, especially I'm thinking about global lookup tables and the cost of having them in scope. I'm not sure how well Mojo prunes away unused code for some functions after compilation, but my main issue is the more load we add to the memory requirements of using the Mojo stdlib, the harder it will be for Mojo to run on microcontrollers. My personal hope is to have to stop writing C (or C flavored C++ when using Arduino) because everything else is just too heavy to run on ~300 KiB DRAM (~4 KiB stack) and 1 or 2 RISC CPU cores running at ~160 MHz. (this worry would be meaningless if Mojo prunes everything perfectly at compile time 🤷♂️ ). PS: I think this is a great case for a community library where it provides tools for developing CLI libraries (exactly what your prism repo is ;) ). |
Understandable! I figured other languages kept it external for similar reasons. It's a specific domain, so perhaps it's better off living in a blessed library in the future. Maybe it could piggy back off of existing utf8 validation logic in the utils module, but I'm no Unicode expert so that's a task for future me or someone else more knowledgeable haha. |
Review Mojo's priorities
What is your request?
For terminal based applications, it's usually required to know the printable width of characters. Could this possibly be added to the stdlib for String, StringSlice, etc.? I've usually seen these unicode-width packages implemented outside of the standard library of a few languages, so I wanted to hear thoughts from the team/contributors!
Some examples:
Rust: https://github.com/unicode-rs/unicode-width/tree/master
Go: https://github.com/mattn/go-runewidth/tree/master
My simple port of go-runewidth: https://github.com/thatstoasty/gojo/blob/main/src/gojo/unicode/utf8/width.mojo
What is your motivation for this change?
It would be nice to have, but I understand keeping the stdlib lean as well.
Any other details?
No response
The text was updated successfully, but these errors were encountered: