Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable XLA with TensorFlow determinisim #20315

Merged

Conversation

Frightera
Copy link
Contributor

@Frightera Frightera commented Oct 2, 2024

Currently if someone runs https://keras.io/examples/keras_recipes/reproducibility_recipes/, gets an error:

(PR for fixing recipe keras-team/keras-io#1941)

UnimplementedError: Graph execution error:

Detected at node gradient_tape/sequential_1/max_pooling2d_1_2/MaxPool2d/MaxPoolGrad defined at (most recent call last):
<stack traces unavailable>
GPU MaxPool gradient ops do not yet have a deterministic XLA implementation.
	 [[{{node gradient_tape/sequential_1/max_pooling2d_1_2/MaxPool2d/MaxPoolGrad}}]]
	tf2xla conversion failed while converting __inference_one_step_on_data_2978[]. Run with TF_DUMP_GRAPH_PREFIX=/path/to/dump/dir and --vmodule=xla_compiler=2 to obtain a dump of the compiled functions.
	 [[StatefulPartitionedCall]] [Op:__inference_one_step_on_iterator_3061]

Considering Keras 2, which had jit_compile = None (not enabled by default) it's better not to use XLA when TF determinism is enabled?

(I can check the failed tests if the team is willing to merge this PR)

@codecov-commenter
Copy link

codecov-commenter commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.45%. Comparing base (084b7e1) to head (afe9f86).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
keras/src/trainers/trainer.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20315      +/-   ##
==========================================
- Coverage   78.83%   74.45%   -4.38%     
==========================================
  Files         512      512              
  Lines       48995    49062      +67     
  Branches     9021     9035      +14     
==========================================
- Hits        38624    36529    -2095     
- Misses       8509    10728    +2219     
+ Partials     1862     1805      -57     
Flag Coverage Δ
keras 74.32% <25.00%> (-4.37%) ⬇️
keras-jax 62.26% <0.00%> (-0.04%) ⬇️
keras-numpy ?
keras-tensorflow 63.55% <25.00%> (-0.02%) ⬇️
keras-torch 62.25% <0.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fchollet
Copy link
Member

fchollet commented Oct 2, 2024

Thanks for the PR! Yes, this seems like a reasonable change. Please take a look at the test failure: FAILED keras/src/trainers/trainer_test.py::TrainerDistributeTest::test_end_to_end_tf_distribute - RuntimeError: Random ops require a seed to be set when determinism is enabled. Please set a seed before running the op, e.g. by calling tf.random.set_seed(1).

@Frightera Frightera marked this pull request as ready for review October 2, 2024 23:58
@Frightera
Copy link
Contributor Author

Hi @fchollet, I have no idea why jax backend test failed since I specially decorated the test for tensorflow backend:

pytest.mark.skipif(
backend.backend() != "tensorflow",
reason="This test is only applicable to TensorFlow.",
)
@pytest.mark.requires_trainable_backend
def test_jit_compile_with_tf_determinism(self):
from tensorflow.python.framework.config import disable_op_determinism
from tensorflow.python.framework.config import enable_op_determinism
enable_op_determinism()
model = ExampleModel(units=3)
model.compile(
optimizer=optimizers.SGD(),
loss=losses.MeanSquaredError(),
metrics=[metrics.MeanSquaredError()],
)
self.assertFalse(model.jit_compile)
disable_op_determinism()

If I remove @pytest.mark.requires_trainable_backend, numpy backend fails on the same test. Any ideas?

@@ -1718,6 +1718,28 @@ def call(self, x, training=None):
for v in model._compile_loss.variables:
self.assertAllClose(v, 0.0)

pytest.mark.skipif(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot the @ for the decorator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) it happens...

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Oct 3, 2024
@fchollet fchollet merged commit 5aa5f88 into keras-team:master Oct 3, 2024
6 checks passed
@google-ml-butler google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Oct 3, 2024
@Frightera Frightera deleted the frightera/disable_xla_with_determinisim branch October 12, 2024 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants