Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce statement snapshot type for Rdb_transaction #1497

Open
wants to merge 1 commit into
base: fb-mysql-8.0.32
Choose a base branch
from

Conversation

laurynas-biveinis
Copy link
Contributor

Previously, the snapshot state was tracked by a combination of
flags (m_is_delayed_snapshot, m_tx_read_only), and by certain snapshot pointers
being null or non-null. (i.e. m_read_opts[USER_TABLE].snapshot,
m_explicit_snapshot). The transaction state specified by different combinations
of the above was not immediately obvious and the purpose of the read only flag
was not the RO-ness as such but rather using the DB instead of Transaction API
to acquire snapshots.

Consolidate the states into an enum class snapshot_type { ... }
statement_snapshot_type field, consuming the two flags above. Introduce and
assert invariant across the state values and snapshot pointers being null or
non-null. Add to the invariant of AC-NL-RO-RC transactions. Rename
snapshot_created to assign_snapshot to better reflect what the method does (as
it does not, in fact, take a newly-created snapshot i.e. in the case of existing
explicit snapshot). Make share_explicit_snapshot and create_explicit_snapshot
set the m_read_opts snapshot and remove the redundant code from the callers. For
the latter, make it create the snapshot instead of taking the m_read_opts one.

Two of the possible states are EXPLICIT and READ_ONLY_TRX, and the existing code
implicitly had another state which was the combination of the two. Avoid
introducing such new state, and use EXPLICIT in it too. This results in a minor
user-visible change that START TRANSACTION WITH SHARED|EXISTING SNAPSHOT
transactions are grouped with CREATE|ATTACH EXPLICIT SNAPSHOT ones instead of
START TRANSACTION WITH CONSISTENT SNAPSHOT for the purposes of
INFORMATION_SCHEMA read-only flag and diagnostics on write attempts. Update the
diagnostic messages to reflect this.

@facebook-github-bot
Copy link

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@luqun
Copy link
Contributor

luqun commented Oct 31, 2024

could you take a look main.subselect_debug 'rocksdb_intrinsic_table' MTR? The following assert failed

storage/rocksdb/ha_rocksdb.cc:4976: std::unique_ptrrocksdb::Iterator myrocks::Rdb_transaction::get_iterator(rocksdb::ColumnFamilyHandle &, bool, const rocksdb::Slice &, const rocksdb::Slice &, myrocks::TABLE_TYPE, bool, bool): Assertion `statement_snapshot_type != snapshot_type::NONE || create_snapshot' failed.

@facebook-github-bot
Copy link

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

@laurynas-biveinis
Copy link
Contributor Author

@luqun , fixed, the assert did not accept temp tables. Also rebased, ready for review

@facebook-github-bot
Copy link

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

storage/rocksdb/ha_rocksdb.cc Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

// This is used by transactions started with "START TRANSACTION WITH
// CONSISTENT [ROCKSDB] SNAPSHOT". The snapshot has to be created via
// DB::GetSnapshot(), not via Transaction API.
READ_ONLY_TRX,
Copy link
Contributor

@luqun luqun Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From rocksdb_explicit_snapshot/create_explicit_snapshot, Looks like EXPLICIT snapshot also create via DB::GetSnapshot(). maybe add a comment for EXPLICIT

API.
*/
bool is_tx_read_only() const { return m_tx_read_only; }
[[nodiscard]] bool is_tx_read_only() const noexcept {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about add _snapshot into method name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the whole transaction that is read only

@@ -15163,7 +15362,8 @@ int ha_rocksdb::external_lock(THD *const thd, int lock_type) {
DBUG_RETURN(HA_ERR_UNSUPPORTED);
}

if (thd->get_explicit_snapshot()) {
if (unlikely(tx->has_explicit_snapshot() ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need check both(tx and thd)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state lifetimes don't fully match but only overlap. For instance,

CREATE EXPLICIT ROCKSDB SNAPSHOT;
INSERT INTO T1 VALUES(); # tx state not explicit, thd flag = true

And the opposite case:

START TRANSACTION WITH SHARED ROCKSDB SNAPSHOT;
INSERT INTO T1 VALUES(); # tx state explicit, thd flag = false

The tx state is set during transaction execution. The thd snapshot is set on
CREATE/ATTACH ROCKSDB SNAPSHOT, when there is no tx object.

I'm adding a clarifying comment.

@facebook-github-bot
Copy link

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

@laurynas-biveinis
Copy link
Contributor Author

@luqun , ready for review

@facebook-github-bot
Copy link

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@hermanlee hermanlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me.

Comment on lines 3716 to 3727
case snapshot_type::NONE:
assert(m_explicit_snapshot == nullptr);
assert(m_read_opts[USER_TABLE].snapshot == nullptr);
break;
case snapshot_type::CURRENT:
assert(m_explicit_snapshot == nullptr);
assert(m_read_opts[USER_TABLE].snapshot != nullptr);
break;
case snapshot_type::CURRENT_DELAYED:
assert(m_explicit_snapshot == nullptr);
assert(m_read_opts[USER_TABLE].snapshot == nullptr);
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are all the same, it might be simpler to combine them all. Logically, it means that even though these are slightly different snapshot cases, they follow the same invariantsonce the snapshot is allocated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do (but only for the 1st and the 3rd: the 2nd has the 2nd condition check inverted)

Comment on lines 4128 to 4143
switch (statement_snapshot_type) {
case snapshot_type::CURRENT:
assert(m_read_opts[USER_TABLE].snapshot == nullptr);
break;
case snapshot_type::CURRENT_DELAYED:
assert(m_read_opts[USER_TABLE].snapshot == nullptr);
statement_snapshot_type = snapshot_type::CURRENT;
break;
case snapshot_type::READ_ONLY_TRX:
assert(m_read_opts[USER_TABLE].snapshot == nullptr);
break;
case snapshot_type::EXPLICIT:
assert(snapshot ==
m_explicit_snapshot->get_snapshot()->snapshot());
break;
case snapshot_type::NONE:
assert(false);
__builtin_unreachable();
break;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe have this whole block be under NDEBUG, and pull out the case for changing the snapshot type from CURRENT_DELAYED to CURRENT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, also merging the case branches like for the previous comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And removing __builtin_unreachable() because it no longer has effect in debug-only code

storage/rocksdb/ha_rocksdb.cc Show resolved Hide resolved
Previously, the snapshot state was tracked by a combination of
flags (m_is_delayed_snapshot, m_tx_read_only), and by certain snapshot pointers
being null or non-null. (i.e. m_read_opts[USER_TABLE].snapshot,
m_explicit_snapshot). The transaction state specified by different combinations
of the above was not immediately obvious and the purpose of the read only flag
was not the RO-ness as such but rather using the DB instead of Transaction API
to acquire snapshots.

Consolidate the states into an enum class snapshot_type { ... }
statement_snapshot_type field, consuming the two flags above. Introduce and
assert invariant across the state values and snapshot pointers being null or
non-null. Add to the invariant of AC-NL-RO-RC transactions. Rename
snapshot_created to assign_snapshot to better reflect what the method does (as
it does not, in fact, take a newly-created snapshot i.e. in the case of existing
explicit snapshot). Make share_explicit_snapshot and create_explicit_snapshot
set the m_read_opts snapshot and remove the redundant code from the callers. For
the latter, make it create the snapshot instead of taking the m_read_opts one.

Two of the possible states are EXPLICIT and READ_ONLY_TRX, and the existing code
implicitly had another state which was the combination of the two. Avoid
introducing such new state, and use EXPLICIT in it too. This results in a minor
user-visible change that START TRANSACTION WITH SHARED|EXISTING SNAPSHOT
transactions are grouped with CREATE|ATTACH EXPLICIT SNAPSHOT ones instead of
START TRANSACTION WITH CONSISTENT SNAPSHOT for the purposes of
INFORMATION_SCHEMA read-only flag and diagnostics on write attempts. Update the
diagnostic messages to reflect this.
@facebook-github-bot
Copy link

@laurynas-biveinis has updated the pull request. You must reimport the pull request before landing.

@laurynas-biveinis
Copy link
Contributor Author

Ready for review again

@facebook-github-bot
Copy link

@luqun has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@luqun
Copy link
Contributor

luqun commented Nov 14, 2024

rocksdb_rpl.rpl_skip_trx_api_binlog_format MTR failed with
virtual void myrocks::Rdb_writebatch_impl::acquire_snapshot(bool, TABLE_TYPE): Assertion `statement_snapshot_type == snapshot_type::CURRENT' failed.

@laurynas-biveinis
Copy link
Contributor Author

rocksdb_rpl.rpl_skip_trx_api_binlog_format MTR failed with virtual void myrocks::Rdb_writebatch_impl::acquire_snapshot(bool, TABLE_TYPE): Assertion `statement_snapshot_type == snapshot_type::CURRENT' failed.

@luqun, I am unable to reproduce. Do you have more details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants