Disable XLA with TensorFlow determinisim #20315

Frightera · 2024-10-02T00:23:22Z

Currently if someone runs https://keras.io/examples/keras_recipes/reproducibility_recipes/, gets an error:

(PR for fixing recipe keras-team/keras-io#1941)

UnimplementedError: Graph execution error:

Detected at node gradient_tape/sequential_1/max_pooling2d_1_2/MaxPool2d/MaxPoolGrad defined at (most recent call last):
<stack traces unavailable>
GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
	 [[{{node gradient_tape/sequential_1/max_pooling2d_1_2/MaxPool2d/MaxPoolGrad}}]]
	tf2xla conversion failed while converting __inference_one_step_on_data_2978[]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions.
	 [[StatefulPartitionedCall]] [Op:__inference_one_step_on_iterator_3061]

Considering Keras 2, which had jit_compile = None (not enabled by default) it's better not to use XLA when TF determinism is enabled?

(I can check the failed tests if the team is willing to merge this PR)

codecov-commenter · 2024-10-02T00:30:08Z

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.45%. Comparing base (084b7e1) to head (afe9f86).
Report is 8 commits behind head on master.

Files with missing lines	Patch %	Lines
keras/src/trainers/trainer.py	50.00%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #20315      +/-   ##
==========================================
- Coverage   78.83%   74.45%   -4.38%     
==========================================
  Files         512      512              
  Lines       48995    49062      +67     
  Branches     9021     9035      +14     
==========================================
- Hits        38624    36529    -2095     
- Misses       8509    10728    +2219     
+ Partials     1862     1805      -57

Flag	Coverage Δ
keras	`74.32% <25.00%> (-4.37%)`	⬇️
keras-jax	`62.26% <0.00%> (-0.04%)`	⬇️
keras-numpy	`?`
keras-tensorflow	`63.55% <25.00%> (-0.02%)`	⬇️
keras-torch	`62.25% <0.00%> (-0.04%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…tting

fchollet · 2024-10-02T03:24:20Z

Thanks for the PR! Yes, this seems like a reasonable change. Please take a look at the test failure: FAILED keras/src/trainers/trainer_test.py::TrainerDistributeTest::test_end_to_end_tf_distribute - RuntimeError: Random ops require a seed to be set when determinism is enabled. Please set a seed before running the op, e.g. by calling tf.random.set_seed(1).

…al state of tensorflow across the tests

…numpy test fail

Frightera · 2024-10-03T15:51:58Z

Hi @fchollet, I have no idea why jax backend test failed since I specially decorated the test for tensorflow backend:

keras/keras/src/trainers/trainer_test.py

Lines 1721 to 1741 in f24c863

    
           pytest.mark.skipif( 
        
               backend.backend() != "tensorflow", 
        
               reason="This test is only applicable to TensorFlow.", 
        
           ) 
        
           @pytest.mark.requires_trainable_backend 
        
           def test_jit_compile_with_tf_determinism(self): 
        
               from tensorflow.python.framework.config import disable_op_determinism 
        
               from tensorflow.python.framework.config import enable_op_determinism 
        
               enable_op_determinism() 
        
               model = ExampleModel(units=3) 
        
               model.compile( 
        
                   optimizer=optimizers.SGD(), 
        
                   loss=losses.MeanSquaredError(), 
        
                   metrics=[metrics.MeanSquaredError()], 
        
               ) 
        
               self.assertFalse(model.jit_compile) 
        
               disable_op_determinism()

If I remove @pytest.mark.requires_trainable_backend, numpy backend fails on the same test. Any ideas?

fchollet · 2024-10-03T16:14:55Z

keras/src/trainers/trainer_test.py

@@ -1718,6 +1718,28 @@ def call(self, x, training=None):
        for v in model._compile_loss.variables:
            self.assertAllClose(v, 0.0)

+    pytest.mark.skipif(


You forgot the @ for the decorator

:-) it happens...

Frightera added 3 commits October 2, 2024 02:22

Disable XLA with TF determinisim enabled

0f052ca

Disable XLA with TF determinism enabled

b048657

Update trainer_test.py

afc2ecb

google-ml-butler bot added the size:S label Oct 2, 2024

google-ml-butler bot assigned gbaned Oct 2, 2024

Select backend in model_supports_jit for tf determinism and fix forma…

01f3147

…tting

Disable op determinism at the end of test in order not to effect glob…

44a5a1f

…al state of tensorflow across the tests

Frightera marked this pull request as ready for review October 2, 2024 23:58

Frightera added 2 commits October 3, 2024 03:16

Try to use pytest.mark.requires_trainable_backend decorator to avoid …

190f4da

…numpy test fail

Fix code formatting

f24c863

fchollet reviewed Oct 3, 2024

View reviewed changes

Add missing @ operator

afe9f86

Frightera requested a review from fchollet October 3, 2024 19:02

google-ml-butler bot added the awaiting review label Oct 3, 2024

fchollet approved these changes Oct 3, 2024

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Oct 3, 2024

fchollet merged commit 5aa5f88 into keras-team:master Oct 3, 2024
6 checks passed

google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Oct 3, 2024

Frightera deleted the frightera/disable_xla_with_determinisim branch October 12, 2024 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable XLA with TensorFlow determinisim #20315

Disable XLA with TensorFlow determinisim #20315

Frightera commented Oct 2, 2024 •

edited

Loading

codecov-commenter commented Oct 2, 2024 •

edited

Loading

fchollet commented Oct 2, 2024

Frightera commented Oct 3, 2024

fchollet Oct 3, 2024

Frightera Oct 3, 2024

Disable XLA with TensorFlow determinisim #20315

Disable XLA with TensorFlow determinisim #20315

Conversation

Frightera commented Oct 2, 2024 • edited Loading

codecov-commenter commented Oct 2, 2024 • edited Loading

Codecov Report

fchollet commented Oct 2, 2024

Frightera commented Oct 3, 2024

fchollet Oct 3, 2024

Choose a reason for hiding this comment

Frightera Oct 3, 2024

Choose a reason for hiding this comment

Frightera commented Oct 2, 2024 •

edited

Loading

codecov-commenter commented Oct 2, 2024 •

edited

Loading