Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing failure in reduction tests on Perlmutter-CPU with nvidia #161

Open
xylar opened this issue Nov 15, 2024 · 1 comment
Open

Seeing failure in reduction tests on Perlmutter-CPU with nvidia #161

xylar opened this issue Nov 15, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@xylar
Copy link

xylar commented Nov 15, 2024

I just ran CTests on Perlmutter-CPU with nvidia and I'm seeing:

1: Global sum I4:    PASS (exp,act=2,2)
1: Global sum I8:    PASS (exp,act=4,4)
1: Global sum R4:    PASS (exp,act=6.000002,6.000002)
1: Global sum R8:    PASS (exp,act=8.000000000000201,8.000000000000201)
1: Global sum real:  PASS (exp,act=10.000002000000000,10.000002000000000)
1: Global sum A1DI4: PASS (exp,act=90,90)
1: Global sum A2DI4: PASS (exp,act=9900,9900)
1: Global sum A1DI8: PASS (exp,act=90,90)
1: Global sum A2DI8: PASS (exp,act=9900,9900)
1: Global sum A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
1: Global sum A2DR4: PASS (exp,act=9900.098633,9900.098633)
1: Global sum A1DR8: PASS (exp,act=90.0000000000020,90.0000000000020)
1: Global sum A2DR8: PASS (exp,act=9900.0000000009859,9900.0000000009859)
1: Global min I4:    PASS (exp,act=0,0)
1: Global max I4:    PASS (exp,act=1,1)
1: Global min R8:    PASS (exp,act=4.0000000000001,4.0000000000001)
1: Global max R8:    PASS (exp,act=5.0000000000001,5.0000000000001)
1: Global min A1DI4: PASS
1: Global max A1DI4: FAIL
1: Global sum device A1DI4: PASS (exp,act=90,90)
1: Global sum device A2DI4: PASS (exp,act=9900,9900)
1: Global sum device A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
0: Global sum I4:    PASS (exp,act=2,2)
0: Global sum I8:    PASS (exp,act=4,4)
0: Global sum R4:    PASS (exp,act=6.000002,6.000002)
0: Global sum R8:    PASS (exp,act=8.000000000000201,8.000000000000201)
0: Global sum real:  PASS (exp,act=10.000002000000000,10.000002000000000)
0: Global sum A1DI4: PASS (exp,act=90,90)
0: Global sum A2DI4: PASS (exp,act=9900,9900)
0: Global sum A1DI8: PASS (exp,act=90,90)
0: Global sum A2DI8: PASS (exp,act=9900,9900)
0: Global sum A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
0: Global sum A2DR4: PASS (exp,act=9900.098633,9900.098633)
0: Global sum A1DR8: PASS (exp,act=90.0000000000020,90.0000000000020)
0: Global sum A2DR8: PASS (exp,act=9900.0000000009859,9900.0000000009859)
0: Global min I4:    PASS (exp,act=0,0)
0: Global max I4:    PASS (exp,act=1,1)
0: Global min R8:    PASS (exp,act=4.0000000000001,4.0000000000001)
0: Global max R8:    PASS (exp,act=5.0000000000001,5.0000000000001)
0: Global min A1DI4: PASS
0: Global max A1DI4: FAIL
0: Global sum device A1DI4: PASS (exp,act=90,90)
0: Global sum device A2DI4: PASS (exp,act=9900,9900)
0: Global sum device A1DR4: PASS (exp,act=90.0001983643,90.0001983643)
srun: error: nid004451: tasks 0-1: Exited with exit code 10
srun: Terminating StepId=32907910.24

Note: Global max A1DI4: FAIL on both cores.

All other tests are passing.

@xylar xylar added the bug Something isn't working label Nov 15, 2024
@amametjanov amametjanov self-assigned this Nov 15, 2024
@brian-oneill
Copy link

For the record, seeing the same error occur with nvidiagpu on pm-gpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants