Problem of online scaling for source backfill #18300

xxchan · 2024-08-28T09:14:24Z

Problem

Currently, SourceBackfill has a hacky stuff: It needs to scan() the whole state table, including splits written by other actors.

risingwave/src/stream/src/executor/source/source_backfill_executor.rs

Lines 627 to 642 in 29db1d9

    
           /// All splits finished backfilling. 
        
           /// 
        
           /// We check all splits for the source, including other actors' splits here, before going to the forward stage. 
        
           /// Otherwise if we break early, but after rescheduling, an unfinished split is migrated to 
        
           /// this actor, we still need to backfill it. 
        
           async fn backfill_finished(&self, states: &BackfillStates) -> StreamExecutorResult<bool> { 
        
               Ok(states 
        
                   .values() 
        
                   .all(|state| matches!(state, BackfillState::Finished)) 
        
                   && self 
        
                       .backfill_state_store 
        
                       .scan() 
        
                       .await? 
        
                       .into_iter() 
        
                       .all(|state| matches!(state, BackfillState::Finished))) 
        
           }

The reason is to handle split migration (i.e., online scaling). SourceBackfill has 2 stages (backfill -> forward upstream). After it entered stage 2, it cannot go back to stage 1. So if an unfinished backfill work is migrated to an actor in stage 2, it cannot do backfilling.

However, the hack doesn't work correctly now, as shown in #18033 (comment). It's because at the beginning, the actor will only read the state written by itself. It needs to wait until it can read all actors' written data. i.e., wait for the first checkpoint has been available.

Note1: checkpoint is async, so we cannot rely on sth like "wait for N barriers to pass".
Note2: an actor doesn't know the total number of splits, so we cannot rely on the condition like "states.len() == num_splits". This is unlike WatermarkFilter executor, which has a similar hack, but it relies on "all vnodes are written". However, source splits are not distributed by vnodes.

Solution

There are several solutions to fix this bug:

Patch the hack: Wait until the "inited" state. We can use try_wait_epoch.
Rewrite the code to allow transition between stage 1 and 2: I don't think there's any technical restrictions to prevent us doing this. The reason why we didn't do it in the first place might just be an overlook. (But single direction transition looks slightly more natural though).
Disallow online scaling: Then the problem will disappear! This is inspired by @BugenZhao: MV backfilling actually is in a similar situation. It also goes from stage 1 to 2 in one direction. And currently we disallow online scaling for it. So we can do the same thing for Source backfilling.

More precisely,
- For foreground (blocking) DDL, scaling will be rejected. If cluster is down, the backfill will fail.
- For background DDL, scaling is done by rebuilding the actors. (So the new actor can start from stage 1. But the code doesn't need to handle stage 2 -> 1)
More on this: https://risingwave-labs.slack.com/archives/C05AZUR6Q5P/p1724227403344349

I prefer solution 3 because it can remove the hack and make the logic simpler to understand. In the long term, if we want to allow online scaling for foreground DDL, solution 2 might be the best (both for Source and MV). But it will need larger effort to implement.

edit: Now we did both 1 & 3. 3 is automatically done after we do blocking DDL, because of reschedule_lock. But we found it's not enough and still need 1. See #18112

edit: Imagine we don't have blocking DDL and don't track progress, it seems there's no way to disallow online scaling. So either 1 or 2 must be needed.

A little more

Relationship of the split migration problem with #18338 (blocking DDL) and #18299 (Finish backfill faster):

Blocking DDL requires finishing backfill faster for better UX. And if we finish backfill faster, the chances we need to handle split migration is lower..

Previously we only considered background backfill, and the backfill can be blocked for a long time, and we have to handle split migration gracefully. (But we forgot about the idea of rebuilding actors.)

The text was updated successfully, but these errors were encountered:

xxchan · 2024-09-02T07:35:58Z

I think there's no special code change needed to disallow online scaling for blocking DDL on source.

The existing mechanism is to grab reschedule_lock when creating streaming job. For blocking DDL, create_streaming_job will not return until backfill finished, so scaling is naturally prevented when we have blocking DDL for source.

risingwave/src/meta/src/rpc/ddl_controller_v2.rs

Line 82 in 602c6ad

    
           let _reschedule_job_lock = self.stream_manager.reschedule_lock_read_guard().await;

fuyufjh · 2024-09-03T06:51:55Z

Does this problem exist for a normal SourceExecutor? It also has to handle split migration (i.e., online scaling), and it stores offset in its state table.

xxchan · 2024-09-03T06:57:58Z

No, because SourceExecutor is a single loop. It doesn't need to transition between stages (Backfilling -> Finished).

xxchan · 2024-09-03T07:24:05Z

See this PR for more #18112

xxchan added the type/feature label Aug 28, 2024

xxchan self-assigned this Aug 28, 2024

xxchan mentioned this issue Aug 28, 2024

Tracking issue for shared source #16003

Open

18 tasks

github-actions bot added this to the release-2.1 milestone Aug 28, 2024

This was referenced Aug 30, 2024

Let source backfill finish when there's no data from upstream #18299

Closed

feat: support scale SourceBackfill executor #16825

Merged

fix: DROP DATABASE doesn't clean up the source stream job (in v1) #18033

Merged

xxchan mentioned this issue Sep 2, 2024

feat: track progress for SourceBackfill (blocking DDL) #18112

Merged

9 tasks

This comment was marked as resolved.

Sign in to view

xxchan changed the title ~~Disallow online scaling for shared source during (blocking) backfill~~ Problem of online scaling for shared source during (blocking) backfill Sep 4, 2024

xxchan changed the title ~~Problem of online scaling for shared source during (blocking) backfill~~ Problem of online scaling for source backfill Sep 4, 2024

xxchan closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of online scaling for source backfill #18300

Problem of online scaling for source backfill #18300

xxchan commented Aug 28, 2024 •

edited

Loading

xxchan commented Sep 2, 2024

fuyufjh commented Sep 3, 2024 •

edited

Loading

xxchan commented Sep 3, 2024

This comment was marked as resolved.

xxchan commented Sep 3, 2024

Problem of online scaling for source backfill #18300

Problem of online scaling for source backfill #18300

Comments

xxchan commented Aug 28, 2024 • edited Loading

Problem

Solution

A little more

xxchan commented Sep 2, 2024

fuyufjh commented Sep 3, 2024 • edited Loading

xxchan commented Sep 3, 2024

This comment was marked as resolved.

xxchan commented Sep 3, 2024

xxchan commented Aug 28, 2024 •

edited

Loading

fuyufjh commented Sep 3, 2024 •

edited

Loading