Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference on ONNX YOLOv8 model #2460

Open
cflavsAmbev opened this issue Jun 25, 2024 · 5 comments
Open

Inference on ONNX YOLOv8 model #2460

cflavsAmbev opened this issue Jun 25, 2024 · 5 comments
Assignees
Labels
type:Bug Something isn't working

Comments

@cflavsAmbev
Copy link

cflavsAmbev commented Jun 25, 2024

I trained a yolo xs model and exported as onnx file.

I created the inference session by following the code below

import onnxruntime as rt

sess = rt.InferenceSession(MODEL_PATH, providers=rt.get_available_providers()) 

model_inputs = sess.get_inputs()
input_names = [model_inputs[i].name for i in range(len(model_inputs))] 
input_shape = model_inputs[0].shape 


model_output = sess.get_outputs() 
output_names = [model_output[i].name for i in range(len(model_output))] 
outputs = sess.run(output_names, {input_names[0]: image.numpy()})
boxes, raw_scores = outputs

When executing the code, the output_names are ['box', 'class']. However, when I check each output, I get box shapes as (1, 8400, 64) and raw_scores shape equal to (1, 8400, 6).
Checking the box I have an array of 64 values, including negative values. How can I extract the bouding boxes from this output?

Example: array([-0.41838697,  4.6691685 , -0.31398767,  0.903074  , -0.4759045 ,
       -0.1759668 ,  0.13982718,  0.17300718,  0.09568841,  0.25713545,
       -0.48056224, -1.2766149 , -0.28608924, -0.49068266, -0.6215335 ,
       -1.3962162 ,  3.5720341 ,  2.485091  , -0.05451238,  1.2690719 ,
        0.26219204, -0.4092333 , -0.7778678 ,  0.09583969, -1.0101943 ,
       -1.1997509 , -0.7010503 , -0.33682668, -0.84273565, -1.0975788 ,
       -0.46223986, -1.0694335 ,  0.29062057,  1.8742971 ,  2.205299  ,
        0.57432723, -1.205116  ,  1.618118  , -0.07109317, -0.61723953,
       -0.9500371 , -0.41608053, -0.20256181, -1.1494515 , -0.6518638 ,
       -0.19544908, -0.84548193, -1.1186968 ,  0.49770474,  2.1773698 ,
        0.43691462, -0.5621399 ,  1.1421276 , -0.02915113,  0.565164  ,
       -0.08713075, -0.31974483, -0.77772075, -1.0449475 , -0.3073484 ,
       -0.6940311 , -1.1068969 , -0.9517348 , -0.96367306], dtype=float32)
@sachinprasadhs sachinprasadhs added the type:Bug Something isn't working label Jun 26, 2024
@cflavsAmbev
Copy link
Author

cflavsAmbev commented Jun 26, 2024

I tried following two issues reported 1337 and 2298 and implemented the following code

from keras_cv import ops
from keras_cv.models.object_detection.yolo_v8.yolo_v8_detector import YOLOV8Detector, decode_regression_to_boxes, dist2bbox, get_anchors
BOX_REGRESSION_CHANNELS = 64
preds = model.outputs[0]
model.outputs[0] = tf.reshape(preds, 
                            [-1, 4, BOX_REGRESSION_CHANNELS // 4])
model.outputs[0] = tf.linalg.matmul(keras.backend.softmax(model.outputs[0], axis=-1),
                keras.backend.arange(BOX_REGRESSION_CHANNELS // 4, dtype="float32")[..., None])
model.outputs[0] = tf.squeeze(model.outputs[0], -1)

anchor_points, stride_tensor = get_anchors(image_shape=model.input_shape[1:3])
stride_tensor = keras.backend.expand_dims(stride_tensor, axis=-1)

model.outputs[0] = dist2bbox(model.outputs[0], anchor_points) * stride_tensor 


model = tf.keras.Model(inputs=model.inputs, outputs=model.outputs)
model.summary()

I saved the model as onnx file but when I perform a sess.run I get this exception.

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Sub node. Name:'model_7/tf.math.subtract_5/Sub' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:629 onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can broadcast 0 by 0 or 1. 31950 is invalid.

@kvlsky
Copy link

kvlsky commented Jul 9, 2024

@sachinprasadhs do you have any updates on this issue?

@christian-plourde
Copy link

I'm having the same issue. I tried the solution here and I run into the same problem:

Error: SessionRun(Msg("Non-zero status code returned while running Sub node. Name:'YOLOv8_1/Sub' Status Message: /home/runn
er/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:666
onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can bro
adcast 0 by 0 or 1. 26880 is invalid.

I'm using onnxruntime 1.19.0 to do the inference. The model was produced with tf2onnx v1.16.1 with opset set to 18. If I lower the opset to 13, I instead get the error:

Error: SessionRun(Msg("Non-zero status code returned while running Add node. Name:'YOLOv8_1/Add' Status Message: /home/runn
er/work/ort-artifacts-staging/ort-artifacts-staging/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:666
onnxruntime::Broadcaster::Broadcaster(gsl::span, gsl::span) largest <= 1 was false. Can bro
adcast 0 by 0 or 1. 26880 is invalid.

@christian-plourde
Copy link

I think I have isolated the problem.

With the solution proposed here, the YOLOV8Detector model is wrapped and the prediction decoding is done after the fact with the helper functions from keras_cv.

The steps are as follows:

  1. Get the anchor points from the image by passing in the width and height of the image:
    anchor_points, stride_tensor = get_anchors(image_shape=model.input_shape[1:3])
    This results in two tensors with the following shapes:
    anchor_points: tf.Tensor([], shape=(0, 2), dtype=float32)
    stride_tensor: tf.Tensor([], shape=(0, 1), dtype=float32)
    Which is where the problems begin.

  2. Next, decode the predictions with:
    decoded = decode_regression_to_boxes(regression)
    where regression has the shape:
    <KerasTensor shape=(None, None, 64), dtype=float32, sparse=False, name=keras_tensor_295>
    resulting in decoded with a shape of:
    <KerasTensor shape=(None, None, 4), dtype=float32, sparse=True, name=keras_tensor_300>
    Which is the expected shape (4 values for x1, y1, x2, y2)

  3. Next, get the distance to the bounding boxes with:
    boxes = dist2bbox(decoded, anchor_points) * stride_tensor
    This results in this shape for the boxes:
    boxes <KerasTensor shape=(None, 0, 4), dtype=float32, sparse=True, name=keras_tensor_306>
    Which is incorrect. It should have the shape (None, None, 4). Because it doesn't, it results in the runtime session crashing with the error from my last comment.

It results in this shape because in the dist2bbox method this happens:

def mydist2bbox(distance, anchor_points):
    left_top, right_bottom = ops.split(distance, 2, axis=-1)
    # left_top: <KerasTensor shape=(None, None, 2), dtype=float32, sparse=False, name=keras_tensor_301> (makes sense, 2 values for each prediction)
    #  right_bottom: <KerasTensor shape=(None, None, 2), dtype=float32, sparse=False, name=keras_tensor_302> (makes sense, 2 values for each prediction)
    x1y1 = anchor_points - left_top
   # x1y1: <KerasTensor shape=(None, 0, 2), dtype=float32, sparse=False, name=keras_tensor_303> (this is wrong because it gets subracted from the anchor_points which has an unexpected shape as I said before)

    x2y2 = anchor_points + right_bottom
   # x2y2: <KerasTensor shape=(None, 0, 2), dtype=float32, sparse=False, name=keras_tensor_304> (same problem as x2y2)
    return ops.concatenate((x1y1, x2y2), axis=-1)  # xyxy bbox

This results in the shape KerasTensor shape=(None, 0, 4), dtype=float32, sparse=True, name=keras_tensor_306> for the boxes which is not expected. Is there a way to make the shape of the boxes (None, None, 4) as expected?

@christian-plourde
Copy link

I figured out the problem. The input shape is variable (None, None, None, 3) (batch size, width, height, rgb). When this is passed to the get_anchors, it doesn't know how large to make the resulting tensor resulting in a tensor of shape (0,2) as I described. If I instead fix the image size (in my case 1280 to 1024), every call to get anchors will instead have a shape of (26880, 2), which is the same size as the number of bounding boxes, which means that the add and subtract operations in the wrapped model can broadcast properly, eliminating the errors when inferencing. Hopefully this is helpful to someone else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants