Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Implement LBaaS in Yaook #719

Open
2 tasks
anjastrunk opened this issue Aug 29, 2024 · 6 comments
Open
2 tasks

[Feature Request] Implement LBaaS in Yaook #719

anjastrunk opened this issue Aug 29, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request SCS-VP10 Related to tender lot SCS-VP10

Comments

@anjastrunk
Copy link
Contributor

anjastrunk commented Aug 29, 2024

Yaook as a further implementation of SCS standards, does not support a standard conform load balancer, yet. We have to provide one. At this, the only requirement is to provide a OpenStack conform endpoint to the user. The behavior behind the sense does not matter.

Tasks:

This issue is related to #587, which standardizes mandatory and recommended IaaS Service and LBaaS should be part of it.

@anjastrunk anjastrunk added enhancement New feature or request SCS-VP10 Related to tender lot SCS-VP10 labels Aug 29, 2024
@markus-hentsch
Copy link
Contributor

Evaluate options for LBaaS

FTR, here is one of the main problems that prevented integration of Octavia in Yaook so far: https://storyboard.openstack.org/#!/story/2007370#comment-153426

In Yaook, any database instances are running behind HAProxy instances. This does seem to lead to severe problems with Octavia in production, according to the linked issue.

We should at least consider having a look at improving Octavia and/or its integration as re-implementing the whole Octavia LBaaS v2 API using a different LB framework will be no easy feat either.

We should get in touch with @horazont and check if there were other issues observed with Octavia than the one mentioned above that would need to be addressed as well.

@markus-hentsch
Copy link
Contributor

I had a discussion with @horazont about this:

  • the upstream issue report1 seems to suggest that the issue is a race condition between a) Octavia API instructing its workers via RPC and b) MariaDB syncing the database write of the Octavia API to the other replicas in conjunction with the workers attempting to read the entry while being scheduled to a different DB replica through HAProxy that has not yet received the sync
    • however, @horazont said he is not convinced that this actually is the problem since the HAProxy instances are configured to always schedule the DB queries to the first DB replica in Yaook
    • we should reproduce and analyze the issue
  • there seems to be an OVN backend driver2 for Octavia, we should have a look at that one too
  • creating an Octavia alternative with full API compatibility is a huge task and we should try all other options of getting Octavia working correctly in Yaook first, I think

Footnotes

  1. https://storyboard.openstack.org/#!/story/2007370#comment-153426

  2. https://docs.openstack.org/ovn-octavia-provider/latest/admin/driver.html

@markus-hentsch
Copy link
Contributor

While discussing the topic in a small topic kickoff with @kgube, @josephineSei and @kitsudaiki we identified the following tasks:

  • Get in touch with the relevant CSPs and check if the SCS reference implementation ever experienced issues like the one mentioned above1.
  • Research which subset of the Octavia API is actually used and strictly needed by the KaaS part of the SCS reference implementation.
  • Identify all possible use cases that the Octavia API offers and how each can be tested.
  • Implement an Octavia operator prototype for Yaook to integrate Octavia in Yaook.
  • Test the Octavia integration in Yaook, try to reproduce the original issue1 and find a fix for it.

Note that aside from the last point most of these tasks are independent and can be addressed in parallel.

Footnotes

  1. https://storyboard.openstack.org/#!/story/2007370#comment-153426 2

@berendt
Copy link
Member

berendt commented Oct 1, 2024

@markus-hentsch In Kolla (used by OSISM by default to deploy OpenStack) there are 2 ways to access the MariaDB Galera cluster: HAPRoxy + ProxySQL. With both ways all nodes in a cluster access the database through the same node. This is the node that holds the primary IP address managed by Keepalived. If Keepalived is not used and the database is accessed otherwise and possibly not via only one node, I think Galera ensures that the information is identical on all nodes because Galera implements a multi-master cluster.

@kitsudaiki
Copy link

Started with a first prototypical octavia-operator for YAOOK. Reactivated the old issue on gitlab regarding the octavia integration ( https://gitlab.com/yaook/operator/-/issues/186 ) and create a new brauch for an octavia-operator ( https://gitlab.com/yaook/operator/-/tree/feature/add-octavia-operator ) and octavia docker-image for YAOOK ( https://gitlab.com/yaook/images/octavia/-/tree/feature/initial-version ) for the implementation.

@berendt
Copy link
Member

berendt commented Oct 2, 2024

Started with a first prototypical octavia-operator for YAOOK. Reactivated the old issue on gitlab regarding the octavia integration ( https://gitlab.com/yaook/operator/-/issues/186 ) and create a new brauch for an octavia-operator ( https://gitlab.com/yaook/operator/-/tree/feature/add-octavia-operator ) and octavia docker-image for YAOOK ( https://gitlab.com/yaook/images/octavia/-/tree/feature/initial-version ) for the implementation.

For Amphora images you can use https://github.com/osism/openstack-octavia-amphora-image. I will add 2024.2 images later this day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SCS-VP10 Related to tender lot SCS-VP10
Projects
Status: Backlog
Development

No branches or pull requests

6 participants