Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstable: high cpu usage by /sbin/urngd #46

Open
lePereT opened this issue Apr 12, 2020 · 14 comments
Open

unstable: high cpu usage by /sbin/urngd #46

lePereT opened this issue Apr 12, 2020 · 14 comments

Comments

@lePereT
Copy link

lePereT commented Apr 12, 2020

Hi all, getting a lot of instability. On MacOS Mojave, running Docker version 19.03.8, and docker-machine version 0.16.2

If I just use the Readme command:

docker run --rm -it openwrtorg/rootfs

I get a number of error messages during launch:

rich$ docker run --rm -it openwrtorg/rootfs
Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
Press the [f] key and hit [enter] to enter failsafe mode
Press the [1], [2], [3] or [4] key and hit [enter] to select the debug level
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted
Please press Enter to activate this console.

When in the shell, it's sluggish, and I notice that one core of my CPU is being used at 100%. A top inside the container reveals the following:

Mem: 433964K used, 579256K free, 290552K shrd, 9536K buff, 323160K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.99 0.58 0.24 2/163 817
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   92     1 root     R      780   0% 100% /sbin/urngd
  279     1 root     S     1300   0%   0% /sbin/rpcd -s /var/run/ubus.sock -t 30
  434     1 root     S     1196   0%   0% /sbin/netifd
    1     0 root     S     1116   0%   0% /sbin/procd
   76     1 root     S     1084   0%   0% /bin/ash --login

Am I doing something wrong?

@aparcar
Copy link
Member

aparcar commented Apr 13, 2020

Thanks for the report, I've never touched urngd but maybe @ynezz has a clue...

@lePereT
Copy link
Author

lePereT commented Apr 13, 2020

So, quickly typing a killall /sbin/urngd after terminal access is gained appears to make urngd behave. Not ideal. Also what are the following error messages all about:

Failed to resize receive buffer: Operation not permitted
ip: RTNETLINK answers: Operation not permitted
...
ip: can't send flush request: Operation not permitted
ip: SIOCSIFFLAGS: Operation not permitted

@lePereT
Copy link
Author

lePereT commented Apr 13, 2020

Just to confirm that the problem persists with an Ubuntu 18.04 VM as host

Mem: 865520K used, 143284K free, 984K shrd, 34440K buff, 579772K cached
CPU:  99% usr   0% sys   0% nic   0% idle   0% io   0% irq   0% sirq
Load average: 0.39 0.11 0.04 4/154 711
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   91     1 root     R      776   0%  99% /sbin/urngd
  444     1 root     S     1208   0%   0% /sbin/netifd
    1     0 root     S     1176   0%   0% /sbin/procd

@aparcar
Copy link
Member

aparcar commented Apr 28, 2020

I can't reproduce the error, did you tried to reproduce it on other machines?

@ynezz
Copy link
Member

ynezz commented May 6, 2020

I can't reproduce that error even on Ubuntu 18.04 (but with 5.6.7 kernel). It would help to get strace output from urngd if its in this state, should be as easy as running opkg update; opkg install strace; strace --no-abbrev --attach $(pidof urngd) inside container spawn with docker run --cap-add SYS_PTRACE --rm -it openwrtorg/rootfs

@lePereT
Copy link
Author

lePereT commented May 6, 2020

i'll attempt to do this in the next week or so. i'll close the issue for now to prevent noise :) thanks for both your responses

@lePereT lePereT closed this as completed May 6, 2020
@thg2k
Copy link

thg2k commented Jan 30, 2021

I would like to reopen this issue.

I am running in the same bug when OpenWRT is running in a docker that does not allow ioctl RNDADDENTROPY on /dev/random.

This causes an infinite loop consuming high cpu because the WRITE poll event keeps triggering and is never satisfied (because it cannot), thus causing the infinite busy loop.

Should I provide a possible fix? I would simply stop the polling for a certain amount of time in case RNDADDENTROPY fails.

@aparcar aparcar reopened this Jan 30, 2021
@databill
Copy link

databill commented Feb 7, 2021

I have the same issue in Ubuntu18.04 VM, and OpenWRT(19.07.02) in the docker container.

@aparcar
Copy link
Member

aparcar commented Feb 7, 2021

@thg2k please provide a fix

@thg2k
Copy link

thg2k commented Feb 7, 2021

@aparcar I did, but it was refused by the maintainer.

http://lists.openwrt.org/pipermail/openwrt-devel/2021-January/033587.html

It is indeed a very bad workaround but it solves the problem without causing any regression damage and it's easy to audit. A better fix would be to use uloop timers and improve logging but I have no interest in spending more time on this. It is still a fix and I recommend merging it.

@cyijun
Copy link

cyijun commented Jun 20, 2021

I got this problem on my MT7621 router too, maybe there is something wrong with the source code.

@Haizs
Copy link

Haizs commented Feb 26, 2022

I ran into this same problem when using PVE to run OpenWrt in Linux Container, according to random(4) - Linux manual page, The CAP_SYS_ADMIN capability is required for almost all related ioctl requests.

I had included the default OpenWrt config file (same as this lxc-template) which contains lxc.cap.drop = sys_admin, I removed this line and the /sbin/urngd not stuck my CPU anymore.

I think there is also a way to grant the SYS_ADMIN capability to a Docker container, but it is overloaded so the decision is yours.

Moreover, it seems just uninstall the urngd package could also solve this problem but I'm not sure the side effect.

@pmelange
Copy link

pmelange commented Jun 27, 2022

I ran into this problem today on a Linksys WRT1900ACS which has an uptime of 248 days running

~# cat /etc/openwrt_release 
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='21.02.0'
DISTRIB_REVISION='r16279-5cc0535800'
DISTRIB_TARGET='mvebu/cortexa9'
DISTRIB_ARCH='arm_cortex-a9_vfpv3-d16'
DISTRIB_DESCRIPTION='OpenWrt 21.02.0 r16279-5cc0535800'
DISTRIB_TAINTS=''

Suddenly at around 1am my load jumped.
Screenshot_2022-06-27_11-45-53

Killing urngd helped. But restarting it brought the load back up again. So, now I've killed urngd without restarting it. I will keep the system up to see if there are any impacts of having urngd stopped.

What, by the way, could be using urngd? Maybe those processes just need a restart. Perhaps dnsmasq? Anything else? Does OLSRd or babeld use urngd?

@bantu
Copy link

bantu commented Sep 15, 2022

It looks like I am also seeing this on a TP-Link Archer C7 v2.

root@foobar:~# cat /etc/openwrt_release
DISTRIB_ID='OpenWrt'
DISTRIB_RELEASE='19.07.2'
DISTRIB_REVISION='r10947-65030d81f3'
DISTRIB_TARGET='ar71xx/generic'
DISTRIB_ARCH='mips_24kc'
DISTRIB_DESCRIPTION='OpenWrt 19.07.2 r10947-65030d81f3'
DISTRIB_TAINTS=''

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants