Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration and risk management plans #13

Open
53 of 58 tasks
ezio-melotti opened this issue Feb 18, 2022 · 2 comments
Open
53 of 58 tasks

Migration and risk management plans #13

ezio-melotti opened this issue Feb 18, 2022 · 2 comments
Assignees

Comments

@ezio-melotti
Copy link
Member

ezio-melotti commented Feb 18, 2022

This issue describes the migration plan, testing strategy, execution plan, and risk management plan. This list of steps is not final, new steps might be added, the time estimates should be more accurate, and each step should be assigned to someone. This plan overrides PEP-588, and might eventually be turned into a PEP. For the time being is kept here for convenience.

This document uses the following terms:

  • (bpo) export: exporting issues from bpo (bugs.python.org) to a zip archive using a custom-made script
  • (ECI) import: importing the zip archive with the issues into a new repo on GitHub through the ECI (Enterprise Cloud Importer)
  • transfer: transferring issues from the repo where the issues got imported into an existing repo (e.g. python/cpython)
  • migration: the whole process including the three steps above and possibly additional minor steps

Migration plan

These are the steps required to migrate issues from bpo to GitHub:

  1. Inform the users about the migration (~2w)
  2. Start the migration by making bpo read-only
  3. Export all issues from bpo (<1h -- ~22m without attachments)
  4. Import issues in a new repo through the ECI (~25h ~12h *)
  5. Enable the issues tab on the cpython repo
  6. Transfer issues to the cpython repo (~4-7d ~20h **)
  7. Possibly setup and run post-migration actions
  8. Test everything and remove the issue template from the cpython repo
  9. Inform the users that the migration happened

* Importing 500 issues (without attachments) on a Friday morning (Europe)/Thursday night (US) took 13m. We currently have almost 60k issues, so it should take around 25h. Earlier imports took about half of this time though, so it might depend on the server load. Further testing showed that it takes about 12h.

** The transfer has been optimized, and it now takes about 20h.

Testing strategy

Each step of the previous list should be tested (if possible):

  1. ✔️ Informing users is tested by telling them and see their reaction.
  2. ✔️ Should be tested on a local instance of bpo. The test should verify that it's not possible to create new issues nor editing existing ones (this includes both changing fields and adding new comments). Issue redirects can also be tested and enabled before the migration starts.
  3. ✔️ This has been tested several times already, but a full test export should be performed shortly before the actual migration.
  4. ✔️ Like 3. this has also been tested and should be tested with a full import before the actual migration.
  5. ✔️ The issue template config has been tested on a separate repo and on python/cpython.
  6. ✔️ We already performed a test import with a subset of the issues (~500). We will perform more tests using small subsets until all the issues are ironed out, and we should perform a full test import before doing the actual migration.
  7. ✔️ GitHub Actions (e.g. updating issue references) can be tested on separate repos, and possibly added to the source tree before the migration starts. we currently don't have any additional actions.
  8. ✔️ This is just a matter of merging a PR that removes the issue template config file. (Remove the issue template config after the migration python/cpython#32106)
  9. ✔️ This doesn't require testing for emails/social media, but it does for Notify bpo users once the migration is done #12.

Execution plan

If all goes well, these are the actions that we will take:

  1. Users should be informed through different means, including but not limited to mails to python-dev/python-commiters, posts on Discourse, blog posts and other social media, and a banner on bpo.
  2. [Fri 25, evening] When the migration starts, the PR that makes bpo read-only will be merged and tested. The PR should also include a banner for bpo to explain users that the migration is in progress.
  3. [Fri 25, evening] After the PR has been merged and deployed, and after verifying that bpo is read-only, the export tool will be used to produce a zip file.
  4. [Fri 25, evening] The zip file will be then fed into the ECI. Given the amount of issues, the ECI might timeout and must be monitored to ensure that the import completes successfully. This will result in a new and separate repo that will include all the bpo issues.
    • Import the archive into the ECI (@ezio-melotti)
    • Start a backup import ~4h in (@ezio-melotti)
      • GitHub says it will only increase the load and make the first import slower
    • Save the migration ID/GUID of the import (@ezio-melotti)
    • Get the name of the on-call GitHub engineer (@ezio-melotti)
    • Monitor the import overnight until it's complete (@ezio-melotti, GitHub team)
      • If the import gives an error, use the "Retry" button to resume
      • If it gets stuck without errors, ping GitHub
  5. [Sat 26, morning] At this point, we can enable the issues tab, with the issue template config already in place.
  6. [Sat 26, morning] After everything is ready, we will inform GitHub. They will then start the issue transfer. This will need to be monitored in case of errors.
  7. [Sun 27, morning] Once the transfer is complete, we might need to run some post-migration actions (e.g. to update issue references). We will also manually run some of the other installed actions to make sure they work properly. Note that some actions might need to be tested after the next step. (@ambv, @ezio-melotti)
  8. [Sun 27, morning] Once all the issues have been transferred and tested, the issue template config will be removed by the cpython repo, allowing users to create new issues.
  9. [Sun 27, afternoon] Pre-written messages will be sent out on MLs and social media to inform the users. The script required for Notify bpo users once the migration is done #12 could be run now or later. Additional actions (e.g. weekly summary) could also be installed later.

There are also a number of related changes that should be done:

After the migration, and once we have the bpo->GH mapping, we could:

  • replace bpo-* refs with actual GH-* refs (this enables the mouse-over popup)
  • replace the dependencies list with a checklist of GH issues (this enables task tracking)
  • replace the superseder with Duplicate of GH-* (this enables duplicates tracking)

These changes affect the "Last update" datetime, so we could do them lazily through a GitHub action whenever someone edits an existing issue.

Risk management plan

This section discusses the failures we might encounter during each step of the migration and suggest ways to prevent them and deal with them. None of these things are expected to happen, but we should have a plan B just in case.

  1. Once we inform the users:

    • They might protest, but at this point the migration is going to happen, so the best we can do is addressing their feedback to the best of our ability.
  2. When we make bpo read-only:

    • If we fail to make bpo read-only, the migration will be delayed until we verified that is not possible to create/edit issues. This should also be tested on a local copy of the tracker beforehand.
    • If we make bpo read-only, but people (or bots) somehow manage to create a few issues and/or messages some other way, we could just inform them and ask them to recreate them on GitHub once the migration is done (if it's just bot messages we could even ignore them).
  3. Exporting issues from bpo:

    • This is easy to test but if somehow a new/recent issue/message breaks the exporter, I could try to identify and fix the problem on the fly, causing a small delay. If the issue is too complex to fix quickly, we might reopen bpo and reschedule the migration.
    • We highly depend on devguide documentation to ease transition from bpo to Github Issues for users unfamiliar with Github issues.
  4. Import issues in the ECI:

    • This is also easy to test, but time-consuming. We could also import the archive twice at the same time, so that if an import fails the other might succeed. If they both succeed we will also have a backup repo in case something goes wrong during the transfer.
    • If the import timeouts (as it often happens with big archives), a "Retry" button appears that will generally make the import resume. The timeouts also report a code and the migration id, and these can be used by GItHub to investigate the issue.
    • If the import fails because of a problem with the archive, either the problem should be fixed by opening and editing the archive manually, or by fixing the exporting tool and exporting a new archive. A full test import before the migration should help mitigate this risk.
    • If the import fails because of a problem with the ECI and can't be resumed, we will have to restart the import.
    • If the PC performing the import crashes or in case of blackout, it won't be possible to hit "Retry" from the ECI, but we could use the migration IDs to resume and complete the migration. The migration IDs should be saved beforehand. If this happens soon after the migration starts, it might be better to restart it from the ECI.
  5. Possibly partially lock the cpython repo:

    • Once we decided if/how to do this we should be able to test it on a separate repo, so it shouldn't fail as long as we document the steps and follow them
    • If locking doesn't work and people are somehow able to create issues, this will interfere with the numbering but I guess we will have to live with it (the numbering is changed anyway). As long as we advertise somehow that the migration is happening and users shouldn't create/edit issues, I think it's ok if those issues get lost.
  6. Transfer issues to the cpython repo:

    • This is handled entirely by GitHub team, so we have little control over this. It seems they have a certain degree of control, and they can transfer in batches and/or resume/retry the transfer. Doing a full test transfer will ensure that there no issues with problematic fields.
    • If an issue can't be transferred, it might be possible to edit the source issue and try again. If the import stops at the first failure, we might be able to preserve the ID ordering, if not, it could also be transferred again at the end or even after the migration.
    • Transferring deletes issues from the source repo, so -- unless there is a way to preserve them -- if something goes wrong and the transfer needs to be performed again, the archive will need to be imported again. This could be done preemptively so that after exporting the bpo issues we import the archive twice in two separate repos.
  7. Possibly setup and run post-migration actions

    • This depends on the actual actions being executed.
    • Once the migration is completed successfully, every other non-critical action could be done afterward, and should only cause minor inconveniences.
  8. Unlock the cpython repo and test everything

    • If something went wrong, we could disable the issues tab and unlock the repo while we investigate. We might be able to fix the issue directly, or possibly we will have to lock it again for a short time to re-import a few issues. Worst case scenario we will have to wipe away all issues and redo the transfer from scratch. Having a script able to inspect/edit/remove one or more issues through the API (since if the issues tab is disabled we won't be able to do it from there) might be helpful.
  9. Inform the users that the migration happened

  • We should be able to address any concern that didn't arise before the migration after the migration is complete. Informing the users clearly, widely, and in advance will help ensure that people knows about the migration, about what is getting transferred, about the duration of the downtime, and other things. This should help minimize surprises and hostile reactions.
@hugovk
Copy link

hugovk commented Apr 15, 2022

This is now done.

@hugovk
Copy link

hugovk commented Dec 23, 2022

  • Install actions in .github/actions/ on python/cpython

Is there anything to be done for this or can we check it off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants