Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRF layer continued #1363

Closed
wants to merge 12 commits into from
Closed

CRF layer continued #1363

wants to merge 12 commits into from

Conversation

gabrieldemarmiesse
Copy link
Member

@gabrieldemarmiesse gabrieldemarmiesse commented Mar 21, 2020

This is a new take on #377

I started with the implementation of @howl-anderson and tried to make it as keras friendly as possible.

Disclaimer: I don't know anything about CRF or the math behind it, so I won't be able to answer any questions about the maths. This PR is just #377 with the code moved around.

Review process

I suggest that if we agree on the general idea, we merge this and then we can add a docstring and tutorials. I didn't add the layer to the public API so it's fine to merge even if there are some rough edges. We can merge this PR as-is and then let @howl-anderson polish the CRF in other pull requests since 90% of this pull request is his work.

With this new architecture, you need two models, one for training and one for inferences. Both can be saved and loaded normally in the keras format. There seems to be a minor bug when using the tf format, I believe that it's a tensorflow bug, I'll open an issue and link it here.

minimal example:

There are two models, one for training, one for inference and they share some layers.

One layer is in charged of computing the loss, the loss is returned as a tensor in the network. We then give fake targets (0) and ask .compile() to use the MAE to leave the loss tensor untouched.

I took the numerical accuracy test from @howl-anderson and the loss is exactly the same :)

x_np, y_np = get_test_data()

x_input = tf.keras.layers.Input(shape=x_np.shape[1:])
y_input = tf.keras.layers.Input(shape=y_np.shape[1:])

crf_outputs = CRF(5, name="L")(x_input)
decoded_sequence, potentials, sequence_length, chain_kernel = crf_outputs

crf_loss = CRFLossLayer()([potentials, y_input, sequence_length, chain_kernel])

inference_model = tf.keras.Model(x_input, decoded_sequence)
training_model = tf.keras.Model([x_input, y_input], crf_loss)

training_model.compile("adam", loss="mae")
training_model.fit((x_np, y_np), y=np.zeros((2,)))
training_model.evaluate((x_np, y_np), y=np.zeros((2,)))

decoded_sequences_numpy = inference_model.predict(x_np)

@googlebot

This comment has been minimized.

@gabrieldemarmiesse gabrieldemarmiesse changed the title CRF continued CRF layer continued Mar 21, 2020
@howl-anderson

This comment has been minimized.

@googlebot

This comment has been minimized.

@gabrieldemarmiesse
Copy link
Member Author

@howl-anderson in tensorflow 2.2, it's possible to override the train_step method. Here is an example with gans: https://twitter.com/fchollet/status/1250622989541838848

Maybe we could keep the CFR layer as implemented here and let the user call crf_log_likelihood directly in train_step? We don't have then to subclass Layer or Loss, and we still have a clean implementation without private API and without hacks like passing the targets as input to the model.

The implementation would then be the CRF layer implemented here + a tutorial showing how to override train_step. Do you think it's a good idea?

@gabrieldemarmiesse
Copy link
Member Author

Superseded by #1733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants