Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(robot-server): Avoid features that will be removed in SQLAlchemy 2.0 #16926

Open
wants to merge 4 commits into
base: EXEC-655-store-commands-error-list-in-db
Choose a base branch
from

Conversation

SyntaxColoring
Copy link
Contributor

@SyntaxColoring SyntaxColoring commented Nov 21, 2024

Overview

Because of tests added in #16772, we're now seeing some SQLAlchemy deprecation warnings for the first time. They were always there, but now they're being surfaced.

But it turns out that there are very few of them and they're all easy to fix. So let's nip them in the bud now and make our lives easier whenever we get around to updating SQLAlchemy to 2.0.

Test Plan and Hands on Testing

Everything I'm changing should be covered well by automated tests.

Changelog

  • Avoid executing raw strs as statements. Use higher-level constructs instead, or, if that's not possible, sqlalchemy.text().
  • Avoid executing statements on a sqlalchemy.engine.Engine. Execute them on an open connection or transaction object instead.
  • Turn SQLAlchemy "removed in 2.0" warnings into errors, to stop us from adding any more.
  • Deduplicate the add_column() utility function, which was copied across several migrations.

Review requests

None in particular.

This is based on #16697 and will not merge until after that does.

Risk assessment

Low.

@SyntaxColoring SyntaxColoring requested review from a team November 21, 2024 00:36
@SyntaxColoring SyntaxColoring requested review from a team as code owners November 21, 2024 00:36
Copy link
Member

@sfoster1 sfoster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thsi will definitely make our lives easier

2. Use SQLAlchemy's `metadata.create_all()` to create an empty table with the new
schema, including the new column.
3. Copy rows from the old table to the new one, populating the new column
however you please.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is what Alembic is for.

But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.

Copy link
Contributor Author

@SyntaxColoring SyntaxColoring Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, this is what Alembic is for.

That is what we thought, but I looked into it when working on this, and I was underwhelmed. Alembic does implement this dance, and that is appealing. But:

First, like you said, there's a full Alembic ecosystem that we'd have to contend with. In particular, we would probably want to run Alembic "inside" our existing migration system. (It can't completely replace our own migration system, because we need to account for migrating regular files, and Alembic can only help with the schema inside the .db file.) Embedding it like that...seems messy? Like, one hypothetical way for it to work would be if Alembic gave us a standalone util function like alembic.add_all_column_constraints(command_table.c.command_status), and we'd call that from within our existing system, but it doesn't seem like Alembic works like that.

Second, as far as I can tell, Alembic leaves us on our own for the data part of these migrations, e.g. populating the new non-nullable column. They recommend some general patterns in sqlalchemy/alembic#972 (reply in thread) and https://alembic.sqlalchemy.org/en/latest/cookbook.html#data-migrations-general-techniques. Those patterns strike me as their own dances that are only marginally better. Especially if we need to rearchitect to fit into the Alembic ecosystem just for the privilege of using them.

But I could definitely have big misconceptions about all of this. I've never actually used Alembic for real. If you want to make a sketch or proof of concept to show what it would look like, I'd definitely love something better than what we have now.

But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.

Yeah. I'm not sure if this what you're getting at, but there is https://alembic.sqlalchemy.org/en/latest/autogenerate.html, and we might be able to combine that with https://alembic.sqlalchemy.org/en/latest/offline.html. So one option is to run Alembic once on our laptops to autogenerate the ALTER TABLE dance, and then manually integrate that with our data migrations. Is that what you have in mind?

with engine.begin() as transaction:
transaction.execute(
sqlalchemy.text(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use schemas here? Most of the SqlAlchemy code I've seen before passes around a tuple of (schema, table_name) for functions like this. Or is the schema included in the table_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this is what you mean by schema in this context, but we represent the SQL schema in terms of Python objects with a sqlalchemy.Metadata. That lives in, e.g. robot_server.persistence.tables.schema_8.

What would passing (schema, table_name) do?

Copy link
Contributor

@ddcc4 ddcc4 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, the word "schema" is way too overloaded in the programming world.

For organizing big projects, databases like PostgreSQL let you create folders/directories/namespaces/whatever-you-want-to-call-them to divide up your data. PostgreSQL calls these folders "schemas" (which have nothing to do with the colloquial use of "schema" to refer to the shape of a table). And tables, enum definitions, server-side functions, etc., all live inside a schema.

So to refer to a table in PostgreSQL, you would need its fully qualified name, like:
SELECT something FROM myschema.mytable WHERE ...

The practical upshot is that in the code I worked on, whenever you pass a table name around, you would also need to pass the schema name around. https://docs.sqlalchemy.org/en/20/core/metadata.html#specifying-the-schema-name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaaaah, gotcha, thank you.

No, we don't use that kind of schema. Our .db file has only one, implicit, "main" schema. In SQLite, the myschema.mytable syntax is used for when you're opening multiple .db files in the same connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In SQLite, the myschema.mytable syntax is used for when you're opening multiple .db files in the same connection.

Oh neat! Then as a sidenote, I think that would let us solve the TODO in your copy_rows_unmodified() (where you wanted to avoid pulling the whole DB into Python/SqlAlchemy and then writing it back out). You could open both the source and destination tables in the same connection, then do INSERT INTO new_table SELECT * FROM old_table, and have the copy be done entirely inside the SQL engine.

sqlalchemy.text(
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}"
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, if you want to try something cute, folks online seem to recommend constructing the statement like this:

compiler = engine.dialect.ddl_compiler(engine.dialect, None)
column_specification = compiler.get_column_specification(column)
...execute(f"ALTER TABLE {table_name} ADD COLUMN {column_specification}")

to have SqlAlchemy's compiler generate the column definition for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants