Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sparse_categorical_crossentropy with ignore_class=-1 makes loss to nan #734

Open
innat opened this issue Feb 5, 2024 · 3 comments
Open
Assignees
Labels

Comments

@innat
Copy link

innat commented Feb 5, 2024

This behaviour happens in Keras 2 but works in Keras 3.


I tried to train a multi-output model. But it target looks like something as follows

y1_dummy = [1,  2,   0, -1,  0,  -1,  -1, -1,  3,  -1]
y2_dummy = [-1, -1, -1,  2,  -1,  0,   3,  1, -1,   2]

Between this two target array, -1 is paced to y2_dummy[0] but some value in y1_dummy[0] and continues. In training time, I set ignore_class = -1, please see below.

def custom_loss(y_true, y_pred):
    loss = sparse_categorical_crossentropy(
        y_true, y_pred, ignore_class=-1
    )
    return loss

The code works in Keras 3 but in Keras 2, the loss becomes nan. Below is the full code.

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt

num_samples = 10
num_classes = 4
input_shape = (224, 224, 3) 
x_dummy = np.random.rand(num_samples, *input_shape).astype('float32')
y1_dummy = [1,  2,   0, -1,  0,  -1,  -1, -1,  3,  -1]
y2_dummy = [-1, -1, -1,  2,  -1,  0,   3,  1, -1,   2]
_sample = tf.data.Dataset.from_tensor_slices(x_dummy)
_labels = tf.data.Dataset.from_tensor_slices(
    (
        y1_dummy, 
        y2_dummy
    )
)
_data = tf.data.Dataset.zip((_sample, _labels))
_data = _data.batch(batch_size=3, drop_remainder=True)

def custom_loss(y_true, y_pred):
    loss = sparse_categorical_crossentropy(
        y_true, y_pred, ignore_class=-1
    )
    return loss

input_layer = keras.Input(shape=input_shape)
flatten_layer = layers.Flatten()(input_layer)
output_layer1 = layers.Dense(
    num_classes, activation='softmax', name='out1'
)(flatten_layer)
output_layer2 = layers.Dense(
    num_classes, activation='softmax', name='out2'
)(flatten_layer)
A = keras.Model(
    inputs=input_layer, 
    outputs=[output_layer1, output_layer2]
)
A.compile(
    optimizer=Adam(), 
    loss={
    'out1': custom_loss, 
    'out2': custom_loss
    }
)
A.fit(
    _data, 
    epochs=2,
)
@tilakrayal
Copy link
Collaborator

tilakrayal commented Feb 6, 2024

@innat,
Thank you for reporting the issue. Could you please try to provide the PR for making these changes happen in the
tf-keras as this issue is present only in keras2? Thank you!

Copy link

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Feb 21, 2024
@innat
Copy link
Author

innat commented Feb 21, 2024

Hi, Im not open for the contribution. I found this from this https://stackoverflow.com/questions/77930212/multitask-learning-to-classify-on-dog-images/77935240#77935240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants