Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Strict 32 Bit Alignment Platforms #332

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cart
Copy link

@cart cart commented May 9, 2020

Support Strict 32 Bit Alignment Platforms

Pathfinder currently uses 1-byte and 2-byte data types in many of its shaders, but some graphics apis (such as WebGPU) do not support those types. In preparation for a WebGPU backend, I have added a shader_alignment_32_bits feature flag. When enabled, it will treat both 1-byte and 2-byte datatypes as 4-byte datatypes.

I added the AlignedU8, AlignedU16, AlignedI8, and AlignedI16 type aliases, which will point to either their real type (ex: u8) or their aligned type (ex: u32) based on the feature flags. I also added their corresponding attribute types (ALIGNED_U8_ATTR), which resolve to the correct VertexAttrType.

This change means that the strides for some shader layouts change based on the feature flag. To make this more straightforward, I added a VertexBufferDescriptor type, which automatically calculates the offsets and strides of its VertexAttrDescriptors. It also sets their index and divisor, which will always be the same within a vertex buffer. This change significantly reduces the boilerplate required for vertex attributes and also eliminates a number of classes of errors:

let fill_attrs= &[
    device.get_vertex_attr(&fill_program.program, "FromSubpx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "ToSubpx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "FromPx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "ToPx").unwrap(),
    device.get_vertex_attr(&fill_program.program, "TileIndex").unwrap(),
];

static FILL_BUFFER: Lazy<VertexBufferDescriptor> = Lazy::new(|| {
    let mut descriptor = VertexBufferDescriptor{
        index: 1,
        divisor: 1,
        vertex_attrs: vec![
            VertexAttrDescriptor::datatype_only(VertexAttrClass::FloatNorm, VertexAttrType::U8, 2),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::FloatNorm, VertexAttrType::U8, 2),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U8, 1),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U8, 1),
            VertexAttrDescriptor::datatype_only(VertexAttrClass::Int, VertexAttrType::U16, 1),
        ]
    };
    descriptor.update_attrs();
    descriptor
});

device.bind_buffer(&vertex_array, &vertex_buffer, BufferTarget::Vertex);
FILL_BUFFER.configure_vertex_attrs(device, &vertex_array, fill_attrs);

The only place I didn't apply this change to was StencilVertexArray, which has a stride that does not line up with its datatype. @pcwalton is this intentional or is it a bug?

device.configure_vertex_attr(&vertex_array, &position_attr, &VertexAttrDescriptor {
    size: 3,
    class: VertexAttrClass::Float,
    attr_type: VertexAttrType::F32,
    // this would normally be 4 * 3
    stride: 4 * 4,
    offset: 0,
    divisor: 0,
    buffer_index: 0,
});

To decrease the odds of breaking things and to retain clarity, this PR doesn't try to optimize the layouts for 32 bit alignment, however there is plenty of potential in that area.

Without the shader_alignment_32_bits flag everything works as expected:

normal

Unfortunately, I have a few artifacts when I enable the shader_alignment_32_bits flag:

32bitaligned

I am investigating this now, but if anyone has ideas let me know.

@pcwalton because hes the expert here :)
@kvark because he is probably invested in wgpu compatibility
@tangmi because he started work on a wgpu backend and i'm assuming he will eventually be blocked by this

@cart
Copy link
Author

cart commented May 9, 2020

More context here: #306

@kvark
Copy link
Member

kvark commented May 9, 2020

WebGPU supports Char2 and UChar2. Why do you want to align those to 32 bits?
My recommendation is - don't have a separate code path, just change pathfinder to use those 2-byte formats where the original used 1-byte. Last time I checked, pathfinder does most of the work in fragment shaders, so I don't think this 1 more byte per vertex is going to change any benchmarks.

@cart
Copy link
Author

cart commented May 9, 2020

Those are vector types right? You're right to say that "32 bit alignment" isnt required given that 16 bit types exist. But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?

@tangmi
Copy link
Contributor

tangmi commented May 9, 2020

My plan for the wgpu backend (untested!) was to use Char2 and UChar2 where pathfinder wants the scalar variants (same with Char3, etc) and just let the let the extra components of one value overlap the next value (hopefully none of the wgpu/gfx-hal implementations with check and complain!). If this doesn't work, @kvark's proposal of splatting (or zeroing) the pathfinder buffers to a supported WebGPU format would work fine, I think.

I'm also hoping to take advantage of some of the jank of shader inputs binding... in my experience the cpu vertex attribute layout and shader declarations don't need to agree. I think this quickly goes into undefined behavior territory, but should (??) be safe if the shader declares fewer components than the cpu vertex attribute. @kvark do you have thoughts/ideas on this? My goal here is to avoid needing to modify the shaders for WebGPU if I can.

@kvark
Copy link
Member

kvark commented May 9, 2020

@cart

But wouldn't that complicate the code more given that we'd be dealing with vec2s when the code assumes a single int/uint?

My proposal is to make this code assume vec2/uvec2. It's not a big complication.

@tangmi

but should (??) be safe if the shader declares fewer components than the cpu vertex attribute

We haven't discussed this particular aspect of the validation with WebGPU group. And when we will, such an idea would be hard to sell: if the user specifies vec4 in the vertex attribute format, they might as well specify it in the shader input, just not use all the components of it. Validating these two sides to match is a simpler spec, simpler implementation, and no obvious downsides to the users.

So, in the end, we may end up allowing that. I don't think you should be betting on that though. As I said, PathFinder shouldn't be bottle-necking on vertex fetches (correct me if that's wrong!), so that extra byte isn't going to harm anybody.

@cart
Copy link
Author

cart commented May 9, 2020

I think all of the options discussed are viable, but they all introduce weirdness somewhere:

  • this pr:
    • pros: original code path / shaders largely untouched. cpu and gpu datatypes in sync
    • cons: two code paths and therefore easier to regress and harder to maintain. type aliases that change size based on a cargo feature reduce clarity. 8 bit datatypes become 32 bit datatypes, which will negatively affect performance (notably more than the vec2 types mentioned below would).
  • replace all cases of int/uint with vec2/uvec2 in shaders and rust datatypes
    • pros: single code path for all backends. cpu and gpu datatypes in sync
    • cons: slight loss of clarity as "vectorness" is not relevant to the functionality. probably a tiny performance loss.
  • define overlapping vec types in the wgpu backend's shader layouts whenever int/unit is encountered
    • pros: doesn't touch existing code paths / scoped to wgpu backend.
    • cons: maybe unsupported by wgpu in the future? adds jank to the wgpu backend.
  • leave rust datatypes intact, but change shaders (and their descriptors) to expect vec2/uvec2
    • pros: rust side stays mostly the same
    • cons: loss of clarity in shaders. probably a tiny performance loss. cpu/gpu datatypes out of sync. complicates buffer creation because we can't just do Vec::<OriginalRustDataType>::get_bytes(self). We now need to add padding.

Does that seem correct (and fair)?

I'm happy to make any of those options work. As my ultimate goal here is to get a solution to this problem merged so I don't need to fork pathfinder, the opinion that matters the most to me is pcwalton's. (I'm not @-ing him anymore because i've done that enough already).

edit: forgot to mention the relatively "big" gpu data types this pr uses as a con

@pcwalton
Copy link
Contributor

Yeah, I was thinking that to work around this limitation we would just pack all the vertex attributes into three uints (each instance is 12 bytes long, right?) and unpack them manually in the shader. It's not just WebGPU; Metal has limitations on vertex attributes too (though not as severe) and I'm tired of working around them :)

I would try to avoid making the tiles longer; the CPU side is very memory bound and every byte I could shave off improved performance. Though fills are more important to optimize than tiles.

@cart
Copy link
Author

cart commented May 13, 2020

That makes sense to me!

Some good-ish news: I have a semi-working backend for my engine/render graph built on top of wgpu:
image

The bad news is that the image above is supposed to be a house 😄

Note: This is using the "vulkan-style" shaders from the other pr, but I'm not using the "32bit alignment" code from this pr. Instead I just added some offsets / vec2s like kvark suggested.

I'm honestly at a bit of a loss. The offset changes / vulkan-style shaders work fine with the GL backend, so this is almost certainly an issue with my backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants