Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attention layer does not accept output of previous layers in functional API #20318

Open
jorgenorena opened this issue Oct 2, 2024 · 2 comments
Assignees
Labels

Comments

@jorgenorena
Copy link

As an exercise to get acquainted with Keras, I want to train a simple model with attention to translate sentences.

I am not calling a tf function, only using Keras layers. But I get the following error:

A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces keras.layers and keras.operations). [...]

Here is the code for the model using Keras' functional API:

encoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
decoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)

embed_size = 128
encoder_inputs_ids = text_vec_layer_en(encoder_inputs)
decoder_inputs_ids = text_vec_layer_es(decoder_inputs)
encoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
decoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
encoder_embeddings = encoder_embedding_layer(encoder_inputs_ids)
decoder_embeddings = decoder_embedding_layer(decoder_inputs_ids)

encoder = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True)
encoder_outputs, *encoder_state = encoder(encoder_embeddings)

decoder = tf.keras.layers.LSTM(512, return_sequences=True)
decoder_outputs = decoder(decoder_embeddings, initial_state=encoder_state)

# Attention layer here!
# Problems getting it to work on Keras 3
attention_layer = tf.keras.layers.Attention()
attention_outputs = attention_layer([decoder_outputs, encoder_outputs])

output_layer = tf.keras.layers.Dense(vocab_size, activation="softmax")
Y_probas = output_layer(attention_outputs)

Expected behavior: The Keras attention layer accepts Keras tensor inputs. Or a more helpful error message is given.

Python version: 3.11.0
Tensorflow version: 2.17.0
Keras version: 3.4.1 (bundled with that Tensorflow version)

@mehtamansi29
Copy link
Collaborator

Hi @jorgenorena -

Thanks for reporting the issue. Based on code understand that you are trying to create model with attention for translate sentence.
Here instead of using tf.keras.layers.Attention you can use tf.keras.layers.MultiHeadAttention with query,key and value for dot product. And then those attention output need to combine with decoder output and then create model using function API.
Attached gist for your reference here.

tf.keras.layers.Attention is not fetching the input like this attention_outputs = attention_layer([decoder_outputs, encoder_outputs]). Here you can find more details about attention layer.

@jorgenorena
Copy link
Author

Thanks @mehtamansi29 . This does give a working model.

However, my interest was not so much to build a model to translate, but rather to understand how the Keras interface works. Is the behavior of the Attention layer expected? If so, what is the logic? Or is this a bug?

Thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants