-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiences from porting YOLOv8 to Axon #475
Comments
You need to pass |
Thanks! |
@hansihe Thank you for this very detailed write up! It's really helpful for improving the framework. Also, would you be interested in adding your Yolo implementation upstream to Bumblebee? We don't have any object detection models yet and I think it would be super useful to the community cc @jonatanklosko for his take as well
This is a good question and something I have debated for quite some time. Relevant issue in #459. For most cases a modules can just be replaced with Elixir functions. This is the pattern we follow in Bumblebee. For example: https://github.com/elixir-nx/bumblebee/blob/main/lib/bumblebee/text/bert.ex#L554
I agree though there is no easy way to explicitly group layers/models, and this is a serious drawback in the API right now. I have considered adding something like I will continue thinking through the best way to integrate this in a functional way, without necessarily forcing the OOP/module style approach. In Bumblebee we often reference "blocks" as a group of subnetworks, so maybe we can introduce an
Dot separated paths is also how we do it in Bumblebee. I agree that we again can probably do a better job of making this easier to do under the hood. Again maybe an
I believe we have something in Bumlebee as a utility that basically implements this, but I think this is a sign to upstream it. Perhaps
I'm a bit confused by this one.
Maybe the confusion is here? Axon networks don't come with any parameters, so when you're creating the network you're just building up an Elixir data structure. (see the When you call
I think this issue is with Livebook outputs and not with Axon? Larger models are difficult to inspect in general. PyTorch also has a nested "tree-view" of a model as their default representation, which may be helpful for us here.
Good point, I think in Bumblebee we have debug logs to indicate which parameters were missing in an initial state. I think we can add something similar here. I don't think we should raise though as we have to consider the training case where you partially initialize a model. I think debug here might be sufficient. In a debug mode we can also log unused parameters.
Good question. The unpickler is a separate library |
I would definitely be interested in getting the implementation into Bumblebee! I wasn't sure what the intention for Bumblebee was initially, but now that I hear it's meant to be a package of models it definitely makes sense to add it. YOLOv8 also has object classification and segmentation detection heads, adding those should not require a whole lot of effort. It's also not missing a whole lot of stuff in order to actually train the models either, mainly the DFL loss implementation + some image augmentation system. It would be interesting to get those pieces working as well.
That makes a lot of sense, it's pretty close to what I ended up with as well.
I think something like that would be really nice. For me, it would serve two main functions:
Something like a
Of course I am not familiar with the internals of the library, but something like that would be nice.
Thanks for the clarification, thinks make a lot more sense now. From my perspective of not being familiar with the internals, I had no idea if these functions were something that called into a NIF to initialize some state on the backend or something. Maybe we could add a few sentences to the documentation for
Yep, this is certainly more LiveBook related, but I thought I would bring it up here since they seem fairly adjacent in the ecosystem :)
👍
That makes a lot of sense. Let's work towards getting the YOLO implementation into Bumblebee. It then also makes sense for me to refactor the implementation a bit to use as much of the utilities and structure of Bumblebee as possible. |
Yeah! Bumblebee is pretty much pre-trained models and as a param rewritten so we can import params from HuggingFace! More info here: https://huggingface.co/docs/transformers/model_doc/yolos |
I also have another point that came to mind: YOLO is inherently a model that works pretty well with images of different dimensions. It would make sense for me to specify the shape of the input image as There are probably several places that makes that difficult, but the first one I encountered was the Is there any other way of doing this? The resize node in ONNX takes either a target size or a scale, which makes it possible to represent this operation on an unknown input size. |
@seanmor5 definitely, though I don't see any checkpoint for yolov8 on HF. There are files in this GitHub release, but what they store is a yolo-specific map and it has the whole PyTorch model serialized, not just the parameters.
@seanmor5 we only have
The only generic part would be loading the |
YOLOS is a different model based on the vision transformer architecture, while the regular YOLO series is a more traditional CNN based object detection model. As I understand it, YOLOS is mainly meant as an exploration in vision transformers, and is not necessarily meant to get state of the art performance. YOLOv8 gets quite a bit higher mAP than YOLOS.
The YOLOv8 model and the HF community doesn't have much overlap I believe. Is Bumblebee only meant for transformer models/models with a presence on HF? Those are indeed the checkpoint files you tend to use for the main YOLOv8 implementation. My implementation supports loading parameter from those files. YOLOv8 are also not that expensive to train from scratch, ranging from <1 day for the smallest variant to around 8 days for the x variant (all on V100). Training our own base checkpoints without importing them is very feasible. |
We currently add models implemented in We could support other "providers", but I don't think it's applicable to cases like this, where both the format and location is highly model-specific. One way to approach this would be to implement a model in bumblebee, then convert the parameters, as you did, and dump that into HF Hub repos. However, for this we need to figure out how we want to store all the configuration in a repo, similarly to the config files that |
The reason why I gravitate towards loading directly from the
When loading from the In general the use case of YOLO is more often training/transfer learning it on your own data than using the off the shelf provided parameters directly. I'm willing to donate the model to bumblebee if it is wanted, but I'm pretty neutral on it. It does sound like it might be a better fit for its own repo, since the YOLO ecosystem doesn't seem to overlap with the HF community very much. Anyways, I feel like this thread got a little bit out of hand as I mainly meant this to be a place for some feedback on Axon. Maybe we should continue this in an issue on either my repo or Bumblebee if we want to discuss it further? |
I recently ported the YOLOv8 object detection model to Axon, and just wanted to share my experiences with it.
https://github.com/hansihe/yolov8_elixir
Module
from PyTorch?Axon.namespace
looks somewhat like it, but there doesn’t seem to be a way of differentiating “this is a module that contains other layers, may depend on other layers outside of the module” vs “this is a subnetwork that is fully independent all the way to inputs”.04.c2f.m.0.bottle_neck.cv2.conv.conv2d
).C2f
layer which contains manyBottleneck
layers which contains other layers again.Axon
for destructuring a container? Say one layer returns an%{"a" => _, "b" => _}
container, having a way to destructure that in another layer without making many differentAxon.layer
s that just pull out one of the inner values.Axon.build
could be a little bit clearer on what theinit
vspredict
functions actually do.predict
can modify mutable internal state inXLA
or other backends?predict
had different stricter modes for stuff like:The text was updated successfully, but these errors were encountered: