Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A StableBuffer module. #299

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

A StableBuffer module. #299

wants to merge 3 commits into from

Conversation

matthewhammer
Copy link
Contributor

Addresses #298

@matthewhammer matthewhammer marked this pull request as draft November 2, 2021 17:58
@nomeata
Copy link
Contributor

nomeata commented Nov 2, 2021

It might be worth making Buffer a boring OO wrapper around this, to avoid code duplication and allow cheap conversion

@matthewhammer
Copy link
Contributor Author

matthewhammer commented Nov 2, 2021

It might be worth making Buffer a boring OO wrapper around this, to avoid code duplication and allow cheap conversion

Yeah, seems like a good idea, to the extent that we can do this.

It's nearly possible to do it fully, except in the binary operation cases, where it is not (since there is no way to coerce the object type we'd have in the OO API into the stable type that we have in this one). That's been a long-standing difference and issue with the OO API designs in all modules. Generally, paying the cost of abstraction for the OO API means that binary operations (and some others) are less efficient.

@matthewhammer
Copy link
Contributor Author

matthewhammer commented Nov 2, 2021

Next step is to adapt the tests. Then ready for review. (But anyone is welcome to look now, too)

@nomeata
Copy link
Contributor

nomeata commented Nov 2, 2021

Presumably the stable buffer would be a var of the OO Buffer. What if you make it a public var, or add a getStableBuffer method? Could you then implement the binary operations?

@matthewhammer
Copy link
Contributor Author

matthewhammer commented Nov 3, 2021

Good questions --- Yeah, if we get rid of the data abstraction (at least exposing the representation to those clients that want to look) then I think the earlier impasse could vanish, yes. Seems like a perfect compromise: " Both sides" get something, but also give something up.*

BTW, do you fancy the public var or the accessor version?

FWIW, seems like the accessor version at least maintains some semblance of abstraction.**

*, ** : Though it really doesn't, since the accessor would also expose implementation details by exposing the part of the StableBuffer's internal array that is not holding any valid data. However, that seems to be a reasonable price to get other things we want though, IMO.

@matthewhammer matthewhammer marked this pull request as ready for review November 3, 2021 16:49
@matthewhammer
Copy link
Contributor Author

matthewhammer commented Nov 3, 2021

This PR has a complete module and test now. It's ready for review.

However, since starting the PR, @nomeata and I agreed that more work should follow soon:

  • Buffer uses this module (once merged) instead of having a copy of most of its code. It can expose this representation, and then support some binary operations more efficiently.
  • StableHashMap -- a stable version of HashMap

I'll save those tasks for separate PRs.

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

Are we happy with the name? Maybe something more succinct to encourage it's use (assuming we don't want to take over the existing Buffer name)?

@matthewhammer
Copy link
Contributor Author

I'm not satisfied with the name. Yeah, perhaps this should be called Buffer and the other one called FlexBuffer or something?

WDYT @rossberg ?

FWIW, I am recalling why we went with the OO API for the HashTable, now that I am revisiting that one in another branch. It requires some function parameters for many of the operations (put, replace, get) and like the Trie module does today, it will become less ergonomic when I move away from the OO API. That's one reason to use a long name for StableHashTable, I suppose. Arguably, we could rename Trie to StableTrie and TrieMap could just be Trie, for a similar reason.

Another thought I had was to use the path structure of base more than we do today to distinguish these vesions. For instance, choose two of the following: mo:base/Buffer versus mo:base/stable/Buffer versus mo:base/flexible/Buffer

@chenyan-dfinity
Copy link
Contributor

Stable is an overloaded term, especially when combined with a data structure. It can be mistaken to mean that the data structure has some stability property. The name FlexBuffer is already taken by a variant of protobuf.

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

Oh, right about the HashMap thing. Bummer. Stable functions would be useful :-D

@matthewhammer
Copy link
Contributor Author

Stable is an overloaded term, especially when combined with a data structure.

Really? I have heard of a stable algorithm (like "stable sorting"), but not much of stable data structures.

Motoko already has given this word some extra attention, by using it as a keyword (in the role that we mean here).

@chenyan-dfinity
Copy link
Contributor

It's not a popular term, but the same concept can be applied to data structures, for example: https://ece.uwaterloo.ca/~dwharder/aads/Projects/4/Stable_binary_heap/#:~:text=A%20heap%20is%20said%20to,were%20placed%20into%20the%20heap.

SAC also has a concept of stability, and we call it retroactive data structures. But it can be called as "stable data structures" :)

My main point is that the name for a data structure module specifies the property of the data structure, but Stable is an IC specific term. Maybe we can just call them pure or functional?

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

Maybe we can just call them pure or functional?

But it’s not pure or functional in the sense of immutability, as some might expect. procedural might work along this axis, but wouldn’t be helpful either :-)

@matthewhammer
Copy link
Contributor Author

How about "cumbersome" instead of "stable"?

CumbersomeBuffer or cumbersome/Buffer?

CumbersomeHashMap or cumbersome/HashMap?

They are "cumbersome" because:

  • The syntax is less ergonomic, especially for "chained calls" (A.foo(B.bar(x)) instead of x.bar().foo())
  • Sometimes, operations need extra high-order functions that would be captured by a class's closure in the OO version

Of course, the word "cumbersome" is itself too cumbersome! Alas!

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

How about this idiom:

For HashMap, a single module defines two types

module HashMap {
type S<K,V> = … // the mutable bits. S for “store” or “stable”
type T<K,V> = record … // the imperative OO interface. “T” for “Type”
}

and used like this:

stable var usersStore : HashMap.S<Nat,UserData> = HashMap.emptyStore();
let users : HashMap.T<Nat,UserData> = HashMap.T(usersStore, Nat.hash, Nat.eq);

This way:

  • One can use users.put etc. as before.
  • Only the data is in a stable var, OP works as before.
  • The parameters (hash, eq) are still passed to an object constructor, so no need to pass them every time
  • No need for manual pre/postupgrade code
  • T can still expose the S it wraps, for binary operations.

@matthewhammer
Copy link
Contributor Author

I like that idiom, and it matches my existing intuitions about how the layers of the Motoko PL naturally give rise to layers in each data structure's module, as you suggest.

But where you wrote type T<K,V> = record … // the imperative OO interface. “T” for “Type” I had expected = class instead, to match the other lines (where T names the class as a type, and also the class constuctor as a value).

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

Yeah, object or class, don't make a big difference to me

@matthewhammer
Copy link
Contributor Author

matthewhammer commented Nov 3, 2021

In another PR, for StableHashMap, I have the start of this layered type definition idiom, but sans the actual OO wrapper itself (the class definition that would use these functions).

  public type KVs<K, V> = AssocList.AssocList<K, V>;

  public type HashMap<K, V> = {
    var table : [var KVs<K, V>];
    var count : Nat;
  };

  public type HashMap_<K, V> = {
    var table : [var KVs<K, V>];
    var count : Nat;
    // (the class params:)
    keyEq : (K, K) -> Bool;
    keyHash : K -> Hash.Hash;
    initCapacity : Nat;
  };

Just to make the non-OO code ergonomic, I found myself introducing two record types, one is stable (like your S) and one is not (because it contains the functions we need sometimes). But the other is not quite what you have there as T, or maybe it is?

Have a look if you're curious, and feel free to comment there, even thought it's still "draft".

If this module were to also give the OO interface as well, as you propose, I think there'd be a need for a third type, for the object type that comes from the class definition, where all of the private variables are hidden in the type, and only the methods are given (the T type in your proposal).

@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

But the other is not quite what you have there as T, or maybe it is?

I think it is! Or rather, it'd be the encapsulated immutable state of the OO-style class, so it wouldn't show up as a type of its own. And all operations would be class ops and have easy access to these parameters. I don't think you'd explicitly give it a type, though.

nomeata added a commit to nomeata/motoko-base that referenced this pull request Nov 3, 2021
this is to demonstrate what I meant in
dfinity#300 (comment)
and
dfinity#299 (comment),
and how to introduce this without breaking changes (although it’s kinda
ugly)

What I would _want_ here is to introduce a second, more general,
constructor for the given class, but Motoko does not allow me to do that
easily. But I can hack around that by

 * Creating a new class, not public `HashMap_` with the constructor I
   want
 * In `HashMap`’s constructor, call the `HashMap_` constructor to create
   an inner object (`i`)
 * In `HashMap`, simply copy all the fields from the inner objects to
   the outer object.
 * A public module-level function (here `wrapS`) exposes the new
   constructor.

With this (generic, ugly) trick I can suppor the idiom
```
stable var userS : HashMap.S <UserId,UserData> = newS();
let user : HashMap.HashMap<UserId,UserData> = HashMap.wrapS(10, Nat.eq, Nat.hash, userS)
```
without changing the API.

But it is ugly, and the effect on documentation generation is probably
bad as well. So maybe a better course of action would be to have a midly
breaking change where we only have the “new” constructor, and people
will have to fix their code by passing `HashMap.newS()` as a new fourth
argument if they want their old behavior. Probably better than piling up
hacks like this.

In that case, simply rename `class HashMap_` to `HashMap`, remove `wrapS` and
the old `class HashMap`.
@nomeata
Copy link
Contributor

nomeata commented Nov 3, 2021

See #301 for how to achieve this without new modules

@ggreif ggreif force-pushed the master branch 2 times, most recently from d52aecd to 08507fc Compare October 21, 2022 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants