Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsync fails while copying data from an already executed test #3359

Open
tuliom opened this issue Nov 12, 2024 · 3 comments
Open

rsync fails while copying data from an already executed test #3359

tuliom opened this issue Nov 12, 2024 · 3 comments

Comments

@tuliom
Copy link

tuliom commented Nov 12, 2024

Recently we started to see rsync failures on the Testing Farm.
The error happens from time to time, so I believe there is some parallelism taking place.
This is happening with tmt version 1.38.0

I believe the following fragment helps to understand the order of the events:

        execute task #1: clang-tests on default-0
        how: tmt
            00:00:00 pass /clang-tests/tests/clang-format-diff (on default-0) [1/15]
            00:00:03 pass /clang-tests/tests/fedora-flags (on default-0) [2/15]
            00:00:01 pass /clang-tests/tests/gcc-clang-compatibility (on default-0) [3/15]
            00:00:00 pass /clang-tests/tests/libomp (on default-0) [4/15]
            00:32:56 pass /clang-tests/tests/llvm-test-suite (on default-0) [5/15]
            00:00:00 pass /clang-tests/tests/long-double (on default-0) [6/15]
            00:00:00 pass /clang-tests/tests/openmp-rpm (on default-0) [7/15]
            00:00:10 pass /clang-tests/tests/rhbz_1647130 (on default-0) [8/15]
            00:00:01 pass /clang-tests/tests/rhbz_1657544 (on default-0) [9/15]
            00:00:00 pass /clang-tests/tests/rhbz_2239619 (on default-0) [10/15]
            00:00:00 pass /clang-tests/tests/rhbz_482491 (on default-0) [11/15]
            00:00:00 pass /clang-tests/tests/rpmmacros (on default-0) [12/15]
            00:00:08 pass /clang-tests/tests/toolchains (on default-0) [13/15]
            00:00:01 pass /clang-tests/tests/ucrt64-toolchain (on default-0) [14/15]
            00:00:00 pass /clang-tests/tests/use-correct-dwarf-default (on default-0) [15/15]
        
        execute task #2: compiler-rt-tests on default-0
        how: tmt
            00:00:06 pass /compiler-rt-tests/tests/broken-symlinks (on default-0) [1/3]
            00:00:03 pass /compiler-rt-tests/tests/fp16-abi (on default-0) [2/3]
            00:00:00 pass /compiler-rt-tests/tests/sanity (on default-0) [3/3]
        
        execute task #3: libomp-tests on default-0
        how: tmt
            00:00:00 pass /libomp-tests/tests/sanity (on default-0) [1/1]
        
        execute task #4: llvm-tests on default-0
        how: tmt
        fail: Failed to push workdir to the guest. This usually means that login as 'root' to the guest does not work.
        fail: Command 'rsync -s -R -r -z --links --safe-links --delete -e 'ssh -oForwardX11=no -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -oConnectionAttempts=5 -oConnectTimeout=60 -oServerAliveInterval=5 -oServerAliveCountMax=60 -oIdentitiesOnly=yes -p22 -i /etc/citool.d/id_rsa_artemis -oPasswordAuthentication=no' /var/ARTIFACTS/work-snapshot-gatingx591m28o/tests/snapshot-gating [email protected]:/' returned 24.
    
        summary: 19 tests executed
    finish
    
        summary: 0 tasks completed

plan failed

The exception was caused by 1 earlier exceptions

Cause number 1:

    execute step failed

    The exception was caused by 1 earlier exceptions

    Cause number 1:

        Command 'rsync -s -R -r -z --links --safe-links --delete -e 'ssh -oForwardX11=no -oStrictHostKeyChecking=no -oUserKnownHostsFile=/dev/null -oConnectionAttempts=5 -oConnectTimeout=60 -oServerAliveInterval=5 -oServerAliveCountMax=60 -oIdentitiesOnly=yes -p22 -i /etc/citool.d/id_rsa_artemis -oPasswordAuthentication=no' /var/ARTIFACTS/work-snapshot-gatingx591m28o/tests/snapshot-gating [email protected]:/' returned 24.

        stderr (3 lines)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        Warning: Permanently added '18.117.234.166' (ED25519) to the list of known hosts.
        file has vanished: "/var/ARTIFACTS/work-snapshot-gatingx591m28o/tests/snapshot-gating/execute/data/guest/default-0/clang-tests/tests/rhbz_1647130-8/output.txt.copy"
        rsync warning: some files vanished before they could be transferred (code 24) at main.c(1338) [sender=3.3.0]
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So, while copying the files for llvm-tests, the files from the other tests end up being copied too, while something else is still playing with output from clang-tests.

The workdir for this execution is available at: https://artifacts.dev.testing-farm.io/26f8be4d-e57f-46af-8098-9900b7bb8d6d/work-snapshot-gatingx591m28o/

@happz
Copy link
Collaborator

happz commented Nov 12, 2024

I don't think there's any *.copy created by tmt, I do suspect TF might be a culprit here.

@ukulekek since you were involved in the archiving part of TF recently, WDYT? Could it be that TF creates various *.copy files in tmt's workdir, removing them at some point? Because if that's the case, I think we see a race condition in action: tmt starting an rsync command, with output.txt.copy in the workdir, but the file then gets removed by TF while rsync is still running, which results in rsync reporting it as a vanishing file.

@ukulekek
Copy link
Collaborator

Yes, we caused the problem because we do create copies of files in the working directory and delete them after syncing to the artifacts storage server. I created TFT-2973 issue and will work on it, so it should land next release

@tuliom
Copy link
Author

tuliom commented Nov 13, 2024

In that case, it's fair to close this issue, unless the community believes this report should trigger changes to tmt too.
Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants