-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding autodetection #7
Comments
Any plans for this? I'm a bit sceptic how well auto-detection for arbitrary encoding would work in IRC (very short inputs) but I'd like to see xchat/irssi style input detection for UTF-8. Ie. if input is invalid UTF-8, then it tries one other predefined encoding. node-irc is not using iconv-lite. It uses iconv and node-icu-charset-detector. Difference being those are native addons and you have to install libicu manually (iconv seems to be bundled). It seems most of encoding detection modules are based on icu. There are some js-only modules too, for example node-chardet. If going for UTF-8 detection only, then there is utf-8-validate. It's native addon but at least it doesn't have external dependencies. I guess, I'll toss a coin and test one of these options soon (UTF-8 or auto detect). |
utf-8-validate also recently got a fallback JS-only implementation in case the native implementation fails, here. I guess this function accomplishes mostly the same as the native implementation? Maybe something like this could be used, and if the buffer is not valid UTF-8, decode using a configured fallback encoding instead? |
Input detection for utf8 & fallback looks rather trivial but it's not on the roadmap. @kiwiirc Would you accept PR? |
@apihlaja definitely! I'm not an encoding pro and I just haven't got around to looking further into it as yet so this would be very helpful. |
So, this was the main thing that bugged me after switching to The Lounge, and I happened to have some spare hacking time over Easter. I tried both autodetection and the irssi-style fallback, and the latter turned out by far the less disastrous alternative. I would have preferred to have an option in iconv-lite to throw when decoding fails, but implementation through utf-8-validate was simple and functional enough. Feedback appreciated, as this is literally the first time I've done anything with Node. :) |
The current node-irc currently does this using the same iconv-lite lib
The text was updated successfully, but these errors were encountered: