Attention layer does not accept output of previous layers in functional API #20318

jorgenorena · 2024-10-02T13:35:46Z

As an exercise to get acquainted with Keras, I want to train a simple model with attention to translate sentences.

I am not calling a tf function, only using Keras layers. But I get the following error:

A KerasTensor cannot be used as input to a TensorFlow function. A KerasTensor is a symbolic placeholder for a shape and dtype, used when constructing Keras Functional models or Keras Functions. You can only use it as input to a Keras layer or a Keras operation (from the namespaces keras.layers and keras.operations). [...]

Here is the code for the model using Keras' functional API:

encoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)
decoder_inputs = tf.keras.layers.Input(shape=[], dtype=tf.string)

embed_size = 128
encoder_inputs_ids = text_vec_layer_en(encoder_inputs)
decoder_inputs_ids = text_vec_layer_es(decoder_inputs)
encoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
decoder_embedding_layer = tf.keras.layers.Embedding(vocab_size, embed_size, mask_zero=True)
encoder_embeddings = encoder_embedding_layer(encoder_inputs_ids)
decoder_embeddings = decoder_embedding_layer(decoder_inputs_ids)

encoder = tf.keras.layers.LSTM(512, return_sequences=True, return_state=True)
encoder_outputs, *encoder_state = encoder(encoder_embeddings)

decoder = tf.keras.layers.LSTM(512, return_sequences=True)
decoder_outputs = decoder(decoder_embeddings, initial_state=encoder_state)

# Attention layer here!
# Problems getting it to work on Keras 3
attention_layer = tf.keras.layers.Attention()
attention_outputs = attention_layer([decoder_outputs, encoder_outputs])

output_layer = tf.keras.layers.Dense(vocab_size, activation="softmax")
Y_probas = output_layer(attention_outputs)

Expected behavior: The Keras attention layer accepts Keras tensor inputs. Or a more helpful error message is given.

Python version: 3.11.0
Tensorflow version: 2.17.0
Keras version: 3.4.1 (bundled with that Tensorflow version)

The text was updated successfully, but these errors were encountered:

mehtamansi29 · 2024-10-03T09:55:01Z

Hi @jorgenorena -

Thanks for reporting the issue. Based on code understand that you are trying to create model with attention for translate sentence.
Here instead of using tf.keras.layers.Attention you can use tf.keras.layers.MultiHeadAttention with query,key and value for dot product. And then those attention output need to combine with decoder output and then create model using function API.
Attached gist for your reference here.

tf.keras.layers.Attention is not fetching the input like this attention_outputs = attention_layer([decoder_outputs, encoder_outputs]). Here you can find more details about attention layer.

jorgenorena · 2024-10-04T12:04:48Z

Thanks @mehtamansi29 . This does give a working model.

However, my interest was not so much to build a model to translate, but rather to understand how the Keras interface works. Is the behavior of the Attention layer expected? If so, what is the logic? Or is this a bug?

Thanks for your help.

github-actions bot assigned mehtamansi29 Oct 2, 2024

mehtamansi29 added type:Bug stat:awaiting response from contributor labels Oct 3, 2024

google-ml-butler bot removed the stat:awaiting response from contributor label Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention layer does not accept output of previous layers in functional API #20318

Attention layer does not accept output of previous layers in functional API #20318

jorgenorena commented Oct 2, 2024

mehtamansi29 commented Oct 3, 2024

jorgenorena commented Oct 4, 2024

Attention layer does not accept output of previous layers in functional API #20318

Attention layer does not accept output of previous layers in functional API #20318

Comments

jorgenorena commented Oct 2, 2024

mehtamansi29 commented Oct 3, 2024

jorgenorena commented Oct 4, 2024