Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime schema declaration #165

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

Conversation

crwen
Copy link
Contributor

@crwen crwen commented Sep 24, 2024

support runtime schema

@KKould KKould mentioned this pull request Sep 25, 2024
2 tasks
@crwen crwen marked this pull request as ready for review September 26, 2024 12:16
@crwen crwen marked this pull request as draft September 26, 2024 14:14
crwen and others added 12 commits September 27, 2024 14:07
* fix point query at sst

* code fmt
… the storage layer and support multiple storage (tonbo-io#163)

* refactor: use [fusio](https://github.com/tonbo-io/fusio) to implement the storage layer and support multiple storage

* chore: remove println macros codegen

* fix: blocking of LevelStream in `open_file` under Unix system

* refactor: use fusio on Benchmark

* fix: benchmark/writes.rs file not found

* test: add unit tests for serdes

* chore: update read & write for fusio

* fix: example datafusion projection error

* fix: `Fs::list` may not sorted

* ci: add exmaples check

* feat: enable SQL query support in datafusion example (tonbo-io#169)

Added support for executing SQL queries using DataFusion's parser and physical plan execution. This enhancement allows querying the "music" table with SQL statements, improving flexibility and functionality.

* chore: resolve review

---------

Co-authored-by: Xwg <[email protected]>
…untime-schema

# Conflicts:
#	src/compaction/mod.rs
#	src/inmem/immutable.rs
#	src/inmem/mutable.rs
#	src/lib.rs
#	src/ondisk/scan.rs
#	src/option.rs
#	src/record/mod.rs
#	src/record/runtime/column.rs
#	src/record/runtime/mod.rs
#	src/record/runtime/record.rs
#	src/record/runtime/record_ref.rs
#	src/stream/package.rs
#	src/transaction.rs
@ethe ethe marked this pull request as ready for review September 29, 2024 03:03
@ethe ethe changed the title feat/runtime schema Runtime schema declaration Sep 29, 2024
@ethe ethe requested review from KKould and ethe and removed request for KKould September 30, 2024 05:43
@@ -69,6 +69,23 @@ jobs:
command: fmt
args: -- --check

exmaples:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flume = { version = "0.11", features = ["async"] }
fusio = { git = "https://github.com/tonbo-io/fusio.git", package = "fusio", rev = "317b1b0621b297f52145b41b90506632f2dc7a1d", features = ["tokio", "dyn"] }
fusio-parquet = { git = "https://github.com/tonbo-io/fusio.git", package = "fusio-parquet", rev = "317b1b0621b297f52145b41b90506632f2dc7a1d" }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the latest is: ab6a8073b966f9c7d7ca61c32646b7d45dbc4f31

where
A: ArrowArrays,
A::Record: Send,
{
fn from(mutable: SkipMap<Timestamped<<A::Record as Record>::Key>, Option<A::Record>>) -> Self {
fn from(
mutable: (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming variables separately is more readable

(mutable, instance): (
  SkipMap<Timestamped<<A::Record as Record>::Key>, Option<A::Record>>,
  &RecordInstance,
)

src/lib.rs Outdated
let instance =
RecordInstance::Runtime(DynRecord::empty_record(column_descs, primary_index));

let db = Self::build(option, executor, instance, manager).await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the variable db can be removed

fixed_projection.append(&mut projection);
fixed_projection.dedup();

let mask = ProjectionMask::roots(
&arrow_to_parquet_schema(R::arrow_schema()).unwrap(),
&arrow_to_parquet_schema(&self.schema.record_instance.arrow_schema::<R>()).unwrap(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to call this method self.schema.record_instance.arrow_schema::<R>(), use primary_key_index directly

}

pub(crate) fn size(&self) -> usize {
todo!()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lack of size() implementation should cause the data to be unable to trigger Compaction (the threshold will not be reached)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not used, I will delete it

{
match self {
RecordInstance::Normal => R::primary_key_path(),
RecordInstance::Runtime(_) => todo!(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing implementation

let datatype = col.datatype;
let name = col.name.to_string();
let value = match datatype {
Datatype::Int8 => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think macros can be used to simplify this to simply implement most types.

type Ref<'r> = DynRecordRef<'r>;

fn primary_key_index() -> usize {
unreachable!("This method is not used.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this method be used to implement another trait such as StaticRecord instead of unreachable! ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants