Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection expressions: inline collections in spreads and foreach #7864

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

cston
Copy link
Member

@cston cston commented Jan 23, 2024

Support collection expressions inline in expression contexts where the collection type is unspecified.

In spreads to add elements conditionally to the containing collection:

int[] items = [x, y, .. b ? [z] : []];

In foreach:

foreach (bool b in [false, true]) { }


An expression `E` is *spreadable* as values of type `Tₑ` if one of the following holds:
* `E` is a *collection expression* and for each element `Eᵢ` in the collection expression there is an implicit conversion to `Tₑ`.
* `E` is a [*conditional expression*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#1115-conditional-operator) `b ? x : y`, and `x` and `y` are *spreadable* as values of type `Tₑ`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only curious: Is there anywhere in the spec that states this in the opposite form, for purposes like target-typed new inside a conditional expression looking outward? Could both directions be unified somehow in some definition of type inference for any language constructs that support "seeing through" them?

@@ -250,6 +259,8 @@ If the target type is an *array*, a *span*, a type with a *[create method](#crea
* For each element in order:
* If the element is an *expression element*, the initialization instance *indexer* is invoked to add the evaluated expression at the current index.
* If the element is a *spread element* then one of the following is used:
* If the spread element expression is a *collection expression*, then the collection expression elements are evaluated in order as if the elements were in the containing collection expression.
* If the spread element expression is a *conditional expression*, then the condition is evaluated and if `true` the second-, otherwise the third-operand is evaluated as if the operand was an element in the containing collection expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we did not specify treating conditional expressions differently during construction than other expressions, and leave compiler optimizations discretionary?

@CyrusNajmabadi
Copy link
Member

I like the col<T> concept. We might want to call out an 'anonymous collection type'. Both to draw a correspondence to the existing anonymous -type feature, and to make it clear why it would not be possible to reference in source or metadata.

@cston cston changed the title Collection expressions: spread conditional expression Collection expressions: inline collections in spreads and foreach Feb 4, 2024
@cston cston marked this pull request as ready for review February 4, 2024 18:06
@cston cston requested a review from a team as a code owner February 4, 2024 18:06

### Conversions

To support conversions for inline collection expressions, an abstract *collection_type* is introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstract is a bit ambiguous. I would say anonymous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced "abstract" with "compile time".


The [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of a *collection_type* `col<E>` is `E`.

An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.
An implicit *collection type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to refer to iteration type? or should we say element type? (just thinking upon the last few ldm talks).

>
> The implicit *collection expression conversion* exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` where for each *element* `Eᵢ` in the collection expression:
> * If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`.
> * If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion **from `Sᵢ` to the *collection_type* `col<Tₑ>`**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!


The elements of a nested collection expression within a spread can be added to the containing collection instance directly, without instantiating an intermediate collection.

The elements of the nested collection expression are evaluated in order within the nested collection. There is no implied evaluation order between the elements of the nested collection and other elements within the containing collection expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong to me. there should def be an implied evaluation order. if i have

[a, b, .. c ? [d] : [e], f, g]

i would expect that d and e would be evaluated before f and g.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

The elements of a collection expression within a spread are evaluated in order as if the spread elements were declared in the containing collection expression directly.

* If `Eᵢ` is an *expression element*, the type of `Eᵢ`.
* If `Eᵢ` is a *spread element* `..Sᵢ`, the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ`.

Since collection expressions do not have a *type* or an *iteration type*, the *best common type* does not consider elements from nested collection expressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pity. mght be something to reconsider later. for example:

[a, b, .. c ? [d] : [e] f, g]. I could imaginet he type is the bct of a, c, d, e, f, g

```csharp
byte[] x = [1, .. b ? [2] : []]; // error: cannot convert int to byte
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
Predicate<int>[] predicates = [a, .. b ? [i => i.ToString()] : []]; // No type for lambda.
Predicate<int>[] predicates = [a, .. b ? [(int i) => i.ToString()] : []]; // Func<int, string> not compatible with Predicate<int>

Natural type would also allow a subset of the scenarios that are supported above. But since the above proposal relies on target typing and natural type relies on *best common type* of the elements within the collection expression, there will be some limitations if we don't also support target typing with nested collection expressions.
```csharp
byte[] x = [1, .. b ? [2] : []]; // error: cannot convert int to byte
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
string?[] y = [1, .. b ? [null] : []]; // error: no type for [null]


For a collection expression used as the collection in a `foreach` statement, the compiler may use any conforming representation for the collection instance, including eliding the collection.

The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler *may* evaluate subsequent elements before executing the loop body for preceding elements.
Copy link

@KevinCathcart KevinCathcart Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Is that really something that should be left unspecified? The compiler cannot easilly change it later, as that would be an observable breaking change, and one that feels pretty likely to break real programs, whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

Edit: Actually an even better example of a case where users would absolutely notice a change if one is made later is if some element throws an exception when evaluated. That is a case where it could make a really big difference to users as to whether the exception occurs before any loop iterations, or not.

I'd honestly expect all the elements to be evaluated first before any loop executions, simply because that it what would happen if the expression were converted to literally any runtime type.

The only sensible implementation strategies I can even think of that would allow for delayed evaluation would be unrolling the loop body (even if an unusual unrolling like: loop and process these initial literal elements, then loop and process these elements from a spread), or the creation of a generator that yields the elements one by one, and then iterating over that generator's output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really something that should be left unspecified?

Yes. It allows the compiler great leeway in the impl strategy for converting the expressions to a collection.

The compiler cannot easilly change it later, as that would be an observable breaking change

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

and one that feels pretty likely to break real programs,

I would definitely hope not.

whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

NOte: it's fine for someFunctionWithSideEffects to have side effects. Those will happen after myFoo is evaluated, and before any other elements are evaluated. What is unspecified is what it means is if the iteration of that evaluated value itself mutates. That would be a pattern we consider too 'out there', and we're willing to say that if your iterators do this, then collection expressions are not a good fit for you. We want to optimzie for the 99.99% case, where iteration order will not have an impact.

Note: to reiterate. Evaluation-order of the elements is still guaranteed to be left-to-right, just as it always has been in C#. So side effects from prior element evaluations will be seen by later element evaluations.

I'd honestly expect all the elements to be evaluated first before any loop executions

That would require stack space for all their values. That would def be undesirable in many cases. Simply imagine a collection expr with 1000 elements. If we require all that evaluation up front, we need to store all those values somewhere (like teh stack) then copy them into the dest. However, if we give the compiler freedom, it can do something like use a single space on the stack, evaluate each element it it, add to the dest, and the move on to the next (reusing that single space).

Copy link

@KevinCathcart KevinCathcart Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and one that feels pretty likely to break real programs,
I would definitely hope not.

A better example is if evaluation of any element throws. I can predict with high certainty that users will notice and care about the difference between processing the loop body for the first handful elements and then throwing vs throwing before processing any elements. Imagine if say the loop body is writing to a file and the 15th element throws when evaluated. Pretty significant difference if the file ends up empty, vs if it has lines for the previously processed elements.

If initially implemented with one at a time evaluation, then users may come to depend on the file having the lines from all elements before the throwing was was evaluated. If initially pre-evaluating all elements then users could easily become reliant on that behavior.

And files are hardly the only such scenarios. This could also apply to GUI scenarios like adding elements to listbox, or whatever.

Obviously users understand they can get partial execution of the loop if the body throws. And users quickly learn about lazy Enumerables, like those created by generators and or LINQ, where they can get partial execution from the enumerator throwing. But users are used to any type of foreach-able construct either being eagerly evaluated or lazily evaluated.

But here you are trying to reserve the right for the compiler to sometimes treat the collection expression as eagerly evaluated (like arrays or spans or most collection classes), and other times to be lazily evaluated like a generator or LINQ Enumerables, and possibly even mix the two within one expression in undefined ways that may change in the future. Which is very much not something users are used to, and something which will be hard to explain to users.

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

Average users don't read the compiler specification closely enough to have any idea what is marked as unspecified or not. They see the new feature announcement which gives the basic overview (which probably won't have a giant warning about eagerness being unspecified) or they discover the feature in some codebase. Either way they play around with it a bit, and check (experimentally) to see the current behavior for things they might care about (like eager or lazy evaluation) and then assume it won't change in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better example is if evaluation of any element throws

As mentioned previously, evaluation order of elements is strictly defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and something which will be hard to explain to users

For the tiny set of people that run into an issue here, the situation can be explained.

and then assume it won't change in the future.

Then that is the consequence of behaving that way. We have made a decision balancing the needs of the ecosystem as a whole. If developers run into an issue because of that, we are ok with it on balance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also apply to GUI scenarios like adding elements to listbox, or whatever.

It shouldn't. You can only get the final collection if no throwing occurred during the creation of the collectino. It is all assigned to a temp. So you either get it all, or you get nothing (since the actual code that cn reference that newly created collection doesn't run due to the exception).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which probably won't have a giant warning about eagerness being unspecified

For nearly all users, there's no need to specify. Nearly all users with nearly all cases will simply get collections that work well including wrt to the efficiency of producing the result. Users that absolutely depend not only on element evaluation order, but enumeration order of those elements are going to be extremely slim. To the point that we literally haven't heard of a single user being affected by this in the entire beta and release period since release.

This indicates to me that this is not the sort of footgun being warned about. But is rather just a small part of the design that most people won't ever need to know about or be affected by. This is in line with many other features we have where there's some niche case that might result in slightly unexpected or undefined behavior if the user is doing something very far out of the norm. As such, designing and documenting it in the same vein seems totally ok :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi I really don't think we are talking about the same thing, and that may explain some confusion.

As best I can tell, you seem to be talking about part of the already shipped collection expression feature for building collections where iteration order of spreads when building a collection may vary relative to the evaluation of elements. If that is what you are talking about then yes, I fully agree that it is really unlikely that changes there break people in practice.

But what I am talking about here is part of proposed specification for foreach loops over collection expressions. What I am looking at seems to say executions of the body of the foreach loop can interleave in unspecified ways with evaluating the elements of collection expression passed to the foreach loop:

The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Which says that foreach (int i in [1, myfunc()]) { foo(i); } might do:

  • evaluate 1
  • run foo(1)
  • evaluate myfunc() (which happens to throw)

or it might do:

  • evaluate 1.
  • evaluate myfunc() (which happens to throw).
  • Would now try to run loop body for each, which won't happen because of the throw.

And furthermore which one it does might change in the future. That is what I have concerns about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concerns. We'll def consider them during the ldm discussions on this topic.

@KevinCathcart
Copy link

If I understand correctly, this specification would not enable users to do:

foreach(int digit in isEven ? [0,2,4,6,8] : [1,3,5,7,9] )
{...}

That is because the proposed changes for foreach are defined in terms of the collection directly being a collection expression. So instead users would need to wrap that with another collection expression and a spread: foreach(int digit in [..isEven ? [0,2,4,6,8] : [1,3,5,7,9]]). (As an aside, if I am reading this right, that wrapping won't work with an implicitly typed iteration variable because it won't have any best common type, which feels unfortunate, so users would be forced to explicitly type the iteration variable for this scenario.)

Allowing foreach to support any col<Tₑ> when there is an explicitly typed iteration variable (of type Tₑ), seems like it might be possible. This could be done be by offering col<Tₑ> as a target type after exhausting the previously defined foreach binding approaches. Admittedly the lowering could get somewhat complicated. For example, the code could have a conditional operator with an IEnumerable<char> on one side, and a ReadOnlySpan<char> on the other, and as far as I can tell that conditional would happily target type to col<char>, but the lowerings for each side of that conditional are different. A combination lowering should be possible to implement though, as I show later.

Supporting similar for implicitly typed iteration variables, like in foreach(var digit in isEven ? [0,2,4,6,8] : [1,3,5,7,9] ) would seem harder to specify. Perhaps it can be done by specifying that if the type of the collection cannot be found without target typing, then recurse into the expression, collecting up a bunch of types/expressions) to use to calculate a best common type as follows:

  • for an expression with an iteration type, that iteration type gets contributed towards finding the best common type.
  • for a conditional expression recurse each side with this algorithm
  • for a collection expression each non-spread element gets contributed towards finding the best common type while the spreads would recurse.

I know that isn't correct specification language, but hopefully it gets the idea across.

Now it may be that further extending foreach beyond what this proposal has to support that scenario is too much work for too little gain, but thought it was worth considering.

Example of lowering a conditional `col` when sides do not share a type.
public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<Char> right)
{
    foreach(var c in b ? left : right)
    {
        //body
    }
}

could be lowered as:

public void Z(bool b, IEnumerable<char> left, ReadOnlySpan<char> right)
{
    // Save off condition, as we only want to evaluate it once.
    bool _cond = b;
    // left side helper variable(s)
    IEnumerator<char>? _enumerator = default;
    // right side helper variable(s)
    ReadOnlySpan<char> _span = default;
    int _num = default;

    if(_cond) {
        // left setup;
        _enumerator = left.GetEnumerator();
    } else {
        //right setup;
         _span = right;
        _num=0;
    }

    try{
        while(_cond ? _enumerator!.MoveNext() : _num<_span.Length)
        {
            char c = _cond ? _enumerator!.Current : _span[_num];

            // Body goes here

            if(_cond) {
                //left side end of loop
            } else {
                // right side end of loop
                _num++;
            }
        }
    }
    finally
    {
        if (_cond)
        {
            // left side finally
            _enumerator?.Dispose();
        }
        else
        {
            //right side finally
        }
    }
}

I am obviously using the convention of leading underscore indicating helper variables not visible to user written code, rather than indicating fields or similar.

@CyrusNajmabadi
Copy link
Member

If I understand correctly, this specification would not enable users to do:

The intent is for that to be possible. And our last design meeting on the topic indicated a means by which that would be possible.

@CyrusNajmabadi
Copy link
Member

The general idea (similar to what you were saying) is that we will target to some collection_type<Te> (just for the purposes of binding all the elements. But we will also effectively say that that collection_type itself is irrelevant and that the compiler is free to substitute any type it wants there (or eschew using a type at all).

@hez2010
Copy link

hez2010 commented Mar 31, 2024

I think some part of this proposal needs to be updated with #7994
which makes using Span<T>, ReadOnlySpan<T> as the underlying type a thing in async code as well.

@KennethHoff
Copy link

KennethHoff commented Mar 31, 2024

I realize this isn't really intended to add new syntax, but instead make existing syntax work inside collections (.. it's also a PR, and not an issue or even a discussion), but it would be cool to have a binary operator that conditionally does something and nothing otherwise, like the following:

List<int> ints = [1, 2, 3, a && ..[4], 10];

Which equates to

List<int> ints = [1, 2, 3, ..a ? [4] : [], 10)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants