Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add controller disconnect detection and resume #2496

Closed
agowa opened this issue Mar 29, 2024 · 7 comments
Closed

Add controller disconnect detection and resume #2496

agowa opened this issue Mar 29, 2024 · 7 comments

Comments

@agowa
Copy link

agowa commented Mar 29, 2024

Description

Hi,

I've some issues with my controller and noticed that the software doesn't provide any user feedback when the controller disconnects. For some reason my controller keeps reconnecting mid-run (and only mid-run, still need to look into that further). The ttyUSB0 vanishes and a new ttyUSB1 shows up. The software doesn't detect any of this it just stops getting updated. It doesn't even show an error or timeout. It also does not clear the Controller State but keeps showing the last information received. Meanwhile the cnc continues to finish the last command the software sent and then just waits with the spindle still spinning. The software also does not appear to be able to reconnect. Nor did the "Run from..." action appear to work (I entered the value from controller:rowsSent +1 as starting value). My shape is also quite simple. Just a rectangular cut starting at the center of the material (where also my Zero is).

I think this may be a power glitch in the controller board. Maybe it's faulty, I don't know yet. I've this one: https://www.vevor.de/holzstichmaschine-c_11142/cnc-fraesmaschine-graviermaschine-3018-pro-500mw-laser-fuer-holz-leder-kunststoff-p_010923569466

And this is the project:
test.ugsd

@breiler
Copy link
Collaborator

breiler commented Mar 30, 2024

There is no way for UGS to figure out what the device has changed ID to. So it is not possible for UGS to reconnect when it lost the connection. UGS should however be able display that it lost its connection to the hardware and display a proper status. But this functionality is currently missing from the serial communications library that we are using. I'll see if there are any updates on this.

There is annoyingly little information about the controller. The only information that I can work with is this blurry image:
image

Can not read the what USB serial chip is being used, but based on its form factor looks to be a CH340 which are very common to be unstable causing corrupted communication which I guess can lead to disconnects.

To debug this problem you can try and run the gcode program without the spindle. Does it still disconnect?

From the pictures it looks like the machine is shipped with a pretty decent USB cable. Make sure that you are not using any USB extension cables or USB hubs.

My best advice is to contact the vendor about these problems.

@agowa
Copy link
Author

agowa commented Apr 2, 2024

I see how reconnecting is an issue when the port changes. However resuming after a user manually changed the connection settings and hit connect may be possible, at least while the machine is still continuing with the last sent instructions. As it will send confirmations and trying to reconnect at the moment while it is not done with the last sent instructions results in an error message the lines of "unexpected command confirmed" somewhere around the time the software tries to send "$I" to it.

To debug this problem you can try and run the gcode program without the spindle. Does it still disconnect?

No it doesn't. That's why I think it may be a power glitch...

From the pictures it looks like the machine is shipped with a pretty decent USB cable. Make sure that you are not using any USB extension cables or USB hubs.

Directly connected to the back of the computer. Tried the hosts USB3 and USB2 ports (so also different controllers on the Mainboard) already.

My next step would have been trying to hook the cable and try to get an oscilloscope in to look for a voltage drop...

Edit: Here is what I get when manually re-connecting while it is still executing the gcode from before loosing connection. (tried twice here). I think at least when a run uses absolute coordinates it may have a very high likely hood that these coordinates are unique in the entire run and therefore the software may be able to detect where it currently is based upon the information:

  • where it diconnected (needs diconnect detection) and what was last sent, kinda like "checkpointing".
  • The last received acknowledgement "ok"
  • The gcode within the open file
  • The MPos sent by the machine.
ok
>>> G1X-6.4Y12.9Z-1.2
ok
>>> G1X-9.1Y15.6Z-1.2
ok
**** Connection closed ****
*** Connecting to jserialcomm://ttyUSB1:115200
*** Fetching device status
>>> ?
���<Run|MPos:-19.000,5.516,-1.230|FS:100,1000>
>>> $I
**** Connection closed ****
Error while processing response <ok>: An unexpected command was completed by the controller.
Error while processing response <ok>: An unexpected command was completed by the controller.
Error while processing response <ok>: An unexpected command was completed by the controller.
Error while processing response <ok>: An unexpected command was completed by the controller.
Error while processing response <ok>: An unexpected command was completed by the controller.
Error while processing response <ok>: An unexpected command was completed by the controller.
An unexpected error was detected: (error:8) Grbl '$' command cannot be used unless Grbl is IDLE. Ensures smooth operation during a job.
*** Connecting to jserialcomm://ttyUSB1:115200
*** Fetching device status
>>> ?
<Run|MPos:-5.500,-13.659,-1.330|FS:100,1000>
ok
>>> $I
An error was detected while sending '$I': (error:8) Grbl '$' command cannot be used unless Grbl is IDLE. Ensures smooth operation during a job. Streaming has been paused.
*** Could not detect the GRBL version
**** Connection closed ****

(the "Connection closed" above is from me manually hitting the disconnect button once the GUI stopped updating aka the usb re-connected, also seen within udev)

@agowa
Copy link
Author

agowa commented Apr 3, 2024

my oscilloscope does not go down to 2uS, however using the 10uS resolution I can already see a drop in voltage on the D+ pin below 1V, so it probably is a power glitch that triggers the reset...

btw, this is what the linux kernel logs on a disconnect:

Apr 03 03:45:47 pc kernel: ch341-uart ttyUSB0: usb_serial_generic_write_bulk_callback - nonzero urb status: -71
Apr 03 03:45:47 pc kernel: usb 3-6-port3: disabled by hub (EMI?), re-enabling...
Apr 03 03:45:47 pc kernel: usb 3-6.3: USB disconnect, device number 90
Apr 03 03:45:47 pc kernel: usb 3-6.3: failed to send control message: -19
Apr 03 03:45:47 pc kernel: ch341-uart ttyUSB0: ch341-uart converter now disconnected from ttyUSB0
Apr 03 03:45:47 pc kernel: ch341 3-6.3:1.0: device disconnected
Apr 03 03:45:48 pc kernel: usb 3-6.3: new full-speed USB device number 91 using xhci_hcd
Apr 03 03:45:48 pc kernel: usb 3-6.3: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
Apr 03 03:45:48 pc kernel: usb 3-6.3: New USB device strings: Mfr=0, Product=2, SerialNumber=0
Apr 03 03:45:48 pc kernel: usb 3-6.3: Product: USB Serial
Apr 03 03:45:48 pc kernel: ch341 3-6.3:1.0: ch341-uart converter detected
Apr 03 03:45:48 pc kernel: usb 3-6.3: ch341-uart converter now attached to ttyUSB1
Apr 03 03:48:57 pc kernel: usb 3-6.3: USB disconnect, device number 91
Apr 03 03:48:57 pc kernel: ch341-uart ttyUSB1: ch341-uart converter now disconnected from ttyUSB1
Apr 03 03:48:57 pc kernel: ch341 3-6.3:1.0: device disconnected
Apr 03 03:58:41 pc kernel: ch341-uart ttyUSB1: usb_serial_generic_write_bulk_callback - nonzero urb status: -71
Apr 03 03:58:41 pc kernel: usb 3-6-port3: disabled by hub (EMI?), re-enabling...
Apr 03 03:58:41 pc kernel: usb 3-6.3: USB disconnect, device number 94
Apr 03 03:58:41 pc kernel: usb 3-6.3: failed to send control message: -19
Apr 03 03:58:41 pc kernel: ch341-uart ttyUSB1: ch341-uart converter now disconnected from ttyUSB1
Apr 03 03:58:41 pc kernel: ch341 3-6.3:1.0: device disconnected
Apr 03 03:58:42 pc kernel: usb 3-6.3: new full-speed USB device number 95 using xhci_hcd
Apr 03 03:58:42 pc kernel: usb 3-6.3: New USB device found, idVendor=1a86, idProduct=7523, bcdDevice= 2.64
Apr 03 03:58:42 pc kernel: usb 3-6.3: New USB device strings: Mfr=0, Product=2, SerialNumber=0
Apr 03 03:58:42 pc kernel: usb 3-6.3: Product: USB Serial
Apr 03 03:58:42 pc kernel: ch341 3-6.3:1.0: ch341-uart converter detected
Apr 03 03:58:42 pc kernel: usb 3-6.3: ch341-uart converter now attached to ttyUSB0

@breiler
Copy link
Collaborator

breiler commented Apr 3, 2024

Automatic resuming will (in my opinion) not be possible.

When you connect to a GRBL controller or like in your case temporarily lost power it will reset. That means that its coordinates can not be guaranteed. If homing is enabled it will also require you to rehome the machine before it can be used. Without knowing the exact position of the machine it would be dangerous to resume the job.

The second problem is that UGS only knows about commands that have been OK:ed by the controller. It does not know which commands that the controller has completed. So you can't resume from the latest sent line.

You suggested that UGS could figure out which command it was executing by extrapolate it from the current controller position. Even if the position could be guaranteed it will be impossible to guess due to rounding differences on both the controller and UGS and also how certain commands are handled, for instance such Arcs which is broken down to small line segments by the controller.

While I don't like the idea to try and fix these kinds of hardware problems with software, you have got a point. The disconnect detection needs to be fixed, and it should also try to clear all buffers both on the controller and in UGS.

Once that is in place we could ease the process process for resuming. One such option could be to highlight the latest executed line in editor before it got disconnected where the user could safely evaluate and use the "Run from line" feature.

@agowa
Copy link
Author

agowa commented Apr 3, 2024

When you connect to a GRBL controller or like in your case temporarily lost power it will reset.

No, it doesn't. But I by now also discovered that something is pulling D+ on the USB low (or better the voltage on the USB port falls below the limit when the spindle is getting into contact...), I.E. triggers a USB protocol reset. So the disconnect/reconnect bug I have is probably not representative then...

For me, even after reconnecting in Universal G code Sender it still has my manually configured offsets and everything. It just doesn't continue the run. But all I have to do is hit "run from xxx" and enter a line somewhere around where it left off (as it's absolute coordinates).

The second problem is that UGS only knows about commands that have been OK:ed by the controller. It does not know which commands that the controller has completed. So you can't resume from the latest sent line.

That's a solvable issue, we know enough to narrow down the possible options enough to a point where we could say with certainty where it left off.

But that's probably just because my initial assumptions about the effects of a disconnect are biased by the special case I'm hitting...

It's probably not always the USB2Serial chips fault that causes a disconnect/reconnect...

Edit: And in reply to your previous message, it's a CH341-uart, not a CH340, but because their version number is so close they're probably the same except for the voltage (as uart and not RS232). So the issue I'm hitting may even be representative then after all?

Can not read the what USB serial chip is being used, but based on its form factor looks to be a CH340 which are very common to be unstable causing corrupted communication which I guess can lead to disconnects.

Edit2: It only shows up as a "CH341-uart" on the computer, the chip itself is labeled "CH340G". Probably because it uses the same drivers "Built-in firmware, software compatible with CH341, use VCP driver of CH341 directly." https://www.wch-ic.com/products/CH340.html

@agowa
Copy link
Author

agowa commented Apr 14, 2024

I managed to work out the issue. It's the host side that is incompatible with that controller.

Apparently on the newer mainboard the USB 2.0 ports were connected through a USB 3.0 controller. This chip didn't like this (probably because they're bending the specs somewhere). The host controller reset the USB protocol upon these errors which caused the device to reregister.

For everyone else here is how you get it working:

  1. buy an older PC. Intel Skylake or older with a true USB 2.0 controller and port.
  2. Use a way shorter cable than the included one.
  3. Add an USB Opto-isolator.
  4. Connect it to the back of the PC.

The "real USB 2.0" controller made it way better. It doesn't re-register there. (I didn't test the original cable with the new controller, so 2 may be unnecessary). However the CNC now managed to freeze the entire computer (less frequently than it reconnected on the USB 3.0 controller motherboard but often enough to be annoying). Adding an USB Opto-isolator managed to fix that too. The only thing I don't understand is why on that older PC the controller wasn't detected at all when connected to the front IO but works flawlessly when connected to the back with an opto-isolator...

I despise recommending to buy a different computer to work around a shitty controller, however these old computers are now cheaper than a raspberry pi (when bought used at least)...

@breiler
Copy link
Collaborator

breiler commented Jun 28, 2024

I am closing this as we now should have controller disconnect detection as of #2562. I think that we should evaluate how this is working before adding any "resume from last job" action.

@breiler breiler closed this as completed Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants