Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to translate '\\uD83D': older conpty issue -- only high surrogate gets pasted to buffer #3832

Closed
3 tasks done
DanielEmeka2003 opened this issue Oct 12, 2023 · 15 comments

Comments

@DanielEmeka2003
Copy link

Prerequisites

  • Write a descriptive title.
  • Make sure you are able to repro it on the latest released version
  • Search the existing issues, especially the pinned issues.

Exception report

### Exception

System.Text.EncoderFallbackException: Unable to translate Unicode character \\uD83D at index 0 to specified code page.
   at System.Text.EncoderExceptionFallbackBuffer.Fallback(Char charUnknown, Int32 index)
   at System.Text.Encoding.GetBytesWithFallback(ReadOnlySpan`1 chars, Int32 originalCharsLength, Span`1 bytes, Int32 originalBytesLength, EncoderNLS encoder)
   at System.Text.Encoding.GetBytesWithFallback(Char* pOriginalChars, Int32 originalCharCount, Byte* pOriginalBytes, Int32 originalByteCount, Int32 charsConsumedSoFar, Int32 bytesWrittenSoFar, EncoderNLS encoder)
   at System.Text.EncoderNLS.GetBytes(Char* chars, Int32 charCount, Byte* bytes, Int32 byteCount, Boolean flush)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.Dispose(Boolean disposing)
   at System.IO.TextWriter.Dispose()
   at Microsoft.PowerShell.PSConsoleReadLine.<>c__DisplayClass99_0.<WriteHistoryRange>b__0()
   at Microsoft.PowerShell.PSConsoleReadLine.WithHistoryFileMutexDo(Int32 timeout, Action action)
   at Microsoft.PowerShell.PSConsoleReadLine.WriteHistoryRange(Int32 start, Int32 end, Boolean overwritten)
   at Microsoft.PowerShell.PSConsoleReadLine.IncrementalHistoryWrite()
   at Microsoft.PowerShell.PSConsoleReadLine.MaybeAddToHistory(String result, List`1 edits, Int32 undoEditIndex, Boolean fromDifferentSession, Boolean fromInitialRead)
   at Microsoft.PowerShell.PSConsoleReadLine.InputLoop()
   at Microsoft.PowerShell.PSConsoleReadLine.ReadLine(Runspace runspace, EngineIntrinsics engineIntrinsics, CancellationToken cancellationToken, Nullable`1 lastRunStatus)

Screenshot

powershellException

Environment data

### Environment
PSReadLine: 2.2.6
PowerShell: 7.3.7
OS: Microsoft Windows 10.0.22621
BufferWidth: 182
BufferHeight: 13

Last 63 Keys:

 c d RightArrow Enter
 c l s Enter
 c d RightArrow Enter
 . RightArrow Enter
 c d RightArrow Enter
 . RightArrow Enter
 c d RightArrow Enter
 . RightArrow Enter
 c l e a e Enter
 c l e a r Enter
 c d RightArrow Enter
 . RightArrow Enter
 c l s Enter
 c d RightArrow Enter
 . RightArrow Enter
 � RightArrow LeftArrow Enter

Steps to reproduce

Pasting a lot of unicode encoded emojis from the windows clipboard
Note: This error only occured in my visual studio code integrated terminal verision of powershell7; In my windows terminal version of powershell7, it is working just fine, probably because i manipulated the Json file a bit to use Cascadia Code as the font face, but who knows

Expected behavior

Some sort of delay, then an error message is displayed.

Actual behavior

N/A

@microsoft-github-policy-service microsoft-github-policy-service bot added the Needs-Triage 🔍 It's a new issue that core contributor team needs to triage. label Oct 12, 2023
@daxian-dbw
Copy link
Member

@DanielEmeka2003 What is the emoji you were using in the screenshot? Please share that emoji for me to reproduce the issue.

@DanielEmeka2003
Copy link
Author

DanielEmeka2003 commented Oct 13, 2023

@DanielEmeka2003 What is the emoji you were using in the screenshot? Please share that emoji for me to reproduce the issue.

The issue isn't specific to a particular emoji.
Let's say I were to copy and paste or just paste a unicode emoji from the windows clipboard[🦬]- this bison emoji would be displayed like this:
[PS C:\Users\MIKE EMEKA\Documents\C++> echo �]
Then when I echo the character to the console it would be displayed accurately:
[🦬 ]

@daxian-dbw
Copy link
Member

hmm, I cannot reproduce this locally with Visual Studio Code terminal, and my OS version is the same as yours.

old-no-streaming

@SeeminglyScience Any ideas why the emoji is not rendered correctly in VSCode terminal but is fine in Windows Terminal?

@DanielEmeka2003
Copy link
Author

DanielEmeka2003 commented Oct 13, 2023 via email

@StevenBucher98
Copy link
Collaborator

StevenBucher98 commented Oct 16, 2023

cc @andyleejordan since may be integrated shell related. @DanielEmeka2003 can you share what version of the PowerShell extension you are using? Edit, oops not extension related

@andyleejordan
Copy link
Member

@StevenBucher98 from the screenshot this looks like it's happening not in the Extension Terminal but in a VS Code hosted terminal with pwsh, so the extension seems unrelated. VS Code's terminals are Xterm.js: https://github.com/xtermjs/xterm.js/

@daxian-dbw
Copy link
Member

daxian-dbw commented Oct 16, 2023

Right, it's not in integrated console from PowerShell extension, but in the VS Code default pwsh terminal.
With that being said, I guess it would be the same even in the PowerShell integrated console, as it looks like caused by the terminal -- only the high surrogate character \\uD83D was received by PSReadLine and the low surrogate character was lost.

@daxian-dbw
Copy link
Member

Let's say I were to copy and paste or just paste a unicode emoji from the windows clipboard[🦬]- this bison emoji would be displayed like this:

@DanielEmeka2003, one more question, when you pasted the emoji in VS Code terminal, did you use mouse right click, or Ctrl+v?

@SeeminglyScience
Copy link
Contributor

Another thing to check, do you have this setting set to false?

    "terminal.integrated.windowsEnableConpty": false,

If you do, try it as true. Should default to true though so unless you've changed it in the past to troubleshoot something I wouldn't expect it to be present.

@DanielEmeka2003
Copy link
Author

Let's say I were to copy and paste or just paste a unicode emoji from the windows clipboard[🦬]- this bison emoji would be displayed like this:

@DanielEmeka2003, one more question, when you pasted the emoji in VS Code terminal, did you use mouse right click, or Ctrl+v?

"Ctrl+v"

@DanielEmeka2003
Copy link
Author

Another thing to check, do you have this setting set to false?

    "terminal.integrated.windowsEnableConpty": false,

If you do, try it as true. Should default to true though so unless you've changed it in the past to troubleshoot something I wouldn't expect it to be present.

It is set to true on my vscode.

@DanielEmeka2003
Copy link
Author

@StevenBucher98 from the screenshot this looks like it's happening not in the Extension Terminal but in a VS Code hosted terminal with pwsh, so the extension seems unrelated. VS Code's terminals are Xterm.js: https://github.com/xtermjs/xterm.js/

True, because it seems to only affect most of my vscode's integrated terminals.

@DanielEmeka2003
Copy link
Author

When I attempt to enter unicode characters to the pwsh console - either as an input to a program or i wish to echo the character, it generally renders something not expected like this: � , then when i try to actually echo the character to the terminal, an exception is thrown.
But running a process that displays unicode characters still renders it okay(although it appears jumbled together).

I have four terminals integrated into my vscode:

  1. powershell7 - pwsh
  2. windows powershell(the default version installed on my windows11) - powershell
  3. git bash - bash
  4. Ubuntu(running on my virtual machine) - wsl
  5. command prompt - cmd

And they all seem not to render unicode correctly, with Ubuntu being the exception. I do think the exception being thrown is a fault from the integrated version of pwsh only, because for powershell(default version of powershell) although it does not render the unicode correctly it doesn't throw an exception.
Maybe a recent update of vscode or changing a particular setting in the global setting of powershell has conflict with the way the integrated terminal works.

Oh and I did try to rectify the issue by changing my integrated terminal font to Cascadia Code, but the exception is still thrown regardless.

@daxian-dbw
Copy link
Member

daxian-dbw commented Oct 17, 2023

The shown in VSCode terminal was because only the high surrogate of the surrogate pair that represents the emoji was pasted into the read-line buffer, which is puzzling to me. When you press Enter, PSReadLine tries to save the command line to the history file with UTF-8 encoding, but since it's an incomplete surrogate pair, the conversion of that high surrogate character to UTF-8 failed -- that's why you see the exception.

So, the root cause here is

  • The low surrogate of the emoji got lost when you paste it in VSCode terminal

@Tyriar can you please shed some light on this issue?

A quick summary of the issue:

when pasting an emoji to VSCode terminal, the author found only the high surrogate of the surrogate pair gets pasted into the read-line buffer for PSReadLine (pwsh hosted by VSCode terminal, not the integrated console from PS Extension), but it works fine with Windows Terminal for the author.

I cannot reproduce the issue locally with VSCode terminal (1.83.1), but would like to know if you've seen anything similar was reported for VSCode terminal or the xterm.js in general. Thanks!

@Tyriar
Copy link

Tyriar commented Oct 26, 2023

I think this is the older version of conpty that ships with Windows not handling it correctly. We are sending the right character there but this is what we get back (this is from the frontend "Terminal" log):

image

I also verified that's what the native API in node-pty is handing off to us ("Pty Host" log):

image

So this will likely get fixed in a future Windows update.

@daxian-dbw daxian-dbw added Resolution-External and removed Needs-Triage 🔍 It's a new issue that core contributor team needs to triage. labels Jan 31, 2024
@daxian-dbw daxian-dbw changed the title Powershell 7 could unable to translate Unicode character \\uD83D at index 0 to specified code page Unable to translate Unicode \\uD83D: older conpty issue -- only high surrogate gets pasted to buffer Jan 31, 2024
@daxian-dbw daxian-dbw changed the title Unable to translate Unicode \\uD83D: older conpty issue -- only high surrogate gets pasted to buffer Unable to translate '\\uD83D': older conpty issue -- only high surrogate gets pasted to buffer Jan 31, 2024
This was referenced Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants