-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(robot-server): Avoid features that will be removed in SQLAlchemy 2.0 #16926
base: EXEC-655-store-commands-error-list-in-db
Are you sure you want to change the base?
refactor(robot-server): Avoid features that will be removed in SQLAlchemy 2.0 #16926
Conversation
- Execute on a transaction or connection, not the raw engine. - Use sqlalchemy.text(), not a raw string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thsi will definitely make our lives easier
2. Use SQLAlchemy's `metadata.create_all()` to create an empty table with the new | ||
schema, including the new column. | ||
3. Copy rows from the old table to the new one, populating the new column | ||
however you please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, this is what Alembic is for.
But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, this is what Alembic is for.
That is what we thought, but I looked into it when working on this, and I was underwhelmed. Alembic does implement this dance, and that is appealing. But:
First, like you said, there's a full Alembic ecosystem that we'd have to contend with. In particular, we would probably want to run Alembic "inside" our existing migration system. (It can't completely replace our own migration system, because we need to account for migrating regular files, and Alembic can only help with the schema inside the .db file.) Embedding it like that...seems messy? Like, one hypothetical way for it to work would be if Alembic gave us a standalone util function like alembic.add_all_column_constraints(command_table.c.command_status)
, and we'd call that from within our existing system, but it doesn't seem like Alembic works like that.
Second, as far as I can tell, Alembic leaves us on our own for the data part of these migrations, e.g. populating the new non-nullable column. They recommend some general patterns in sqlalchemy/alembic#972 (reply in thread) and https://alembic.sqlalchemy.org/en/latest/cookbook.html#data-migrations-general-techniques. Those patterns strike me as their own dances that are only marginally better. Especially if we need to rearchitect to fit into the Alembic ecosystem just for the privilege of using them.
But I could definitely have big misconceptions about all of this. I've never actually used Alembic for real. If you want to make a sketch or proof of concept to show what it would look like, I'd definitely love something better than what we have now.
But I wonder if there's a way to use Alembic as a fancy SQL generator for diffs without buying into the full Alembic ecosystem.
Yeah. I'm not sure if this what you're getting at, but there is https://alembic.sqlalchemy.org/en/latest/autogenerate.html, and we might be able to combine that with https://alembic.sqlalchemy.org/en/latest/offline.html. So one option is to run Alembic once on our laptops to autogenerate the ALTER TABLE
dance, and then manually integrate that with our data migrations. Is that what you have in mind?
with engine.begin() as transaction: | ||
transaction.execute( | ||
sqlalchemy.text( | ||
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use schemas here? Most of the SqlAlchemy code I've seen before passes around a tuple of (schema, table_name)
for functions like this. Or is the schema included in the table_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure this is what you mean by schema
in this context, but we represent the SQL schema in terms of Python objects with a sqlalchemy.Metadata
. That lives in, e.g. robot_server.persistence.tables.schema_8
.
What would passing (schema, table_name)
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, the word "schema" is way too overloaded in the programming world.
For organizing big projects, databases like PostgreSQL let you create folders/directories/namespaces/whatever-you-want-to-call-them to divide up your data. PostgreSQL calls these folders "schemas" (which have nothing to do with the colloquial use of "schema" to refer to the shape of a table). And tables, enum definitions, server-side functions, etc., all live inside a schema.
So to refer to a table in PostgreSQL, you would need its fully qualified name, like:
SELECT something FROM myschema.mytable WHERE ...
The practical upshot is that in the code I worked on, whenever you pass a table name around, you would also need to pass the schema name around. https://docs.sqlalchemy.org/en/20/core/metadata.html#specifying-the-schema-name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aaaaah, gotcha, thank you.
No, we don't use that kind of schema. Our .db file has only one, implicit, "main" schema. In SQLite, the myschema.mytable
syntax is used for when you're opening multiple .db files in the same connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SQLite, the
myschema.mytable
syntax is used for when you're opening multiple .db files in the same connection.
Oh neat! Then as a sidenote, I think that would let us solve the TODO in your copy_rows_unmodified()
(where you wanted to avoid pulling the whole DB into Python/SqlAlchemy and then writing it back out). You could open both the source and destination tables in the same connection, then do INSERT INTO new_table SELECT * FROM old_table
, and have the copy be done entirely inside the SQL engine.
sqlalchemy.text( | ||
f"ALTER TABLE {table_name} ADD COLUMN {column.key} {column_type}" | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if you want to try something cute, folks online seem to recommend constructing the statement like this:
compiler = engine.dialect.ddl_compiler(engine.dialect, None)
column_specification = compiler.get_column_specification(column)
...execute(f"ALTER TABLE {table_name} ADD COLUMN {column_specification}")
to have SqlAlchemy's compiler generate the column definition for you.
Overview
Because of tests added in #16772, we're now seeing some SQLAlchemy deprecation warnings for the first time. They were always there, but now they're being surfaced.
But it turns out that there are very few of them and they're all easy to fix. So let's nip them in the bud now and make our lives easier whenever we get around to updating SQLAlchemy to 2.0.
Test Plan and Hands on Testing
Everything I'm changing should be covered well by automated tests.
Changelog
str
s as statements. Use higher-level constructs instead, or, if that's not possible,sqlalchemy.text()
.sqlalchemy.engine.Engine
. Execute them on an open connection or transaction object instead.add_column()
utility function, which was copied across several migrations.Review requests
None in particular.
This is based on #16697 and will not merge until after that does.
Risk assessment
Low.