Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in D3D11VA Deinterlacing #15197

Open
softworkz opened this issue Oct 28, 2024 · 8 comments
Open

Bug in D3D11VA Deinterlacing #15197

softworkz opened this issue Oct 28, 2024 · 8 comments
Labels

Comments

@softworkz
Copy link

softworkz commented Oct 28, 2024

mpv Information

Exists since 7 years.

Other Information

- Windows version: Any with DirectX 11.1 support
- GPU model, driver and version: All Intel & all AMD GPUs (less obvious with Nvidia)
- Source of mpv:https://github.com/mpv-player/mpv/blob/master/video/filter/vf_d3d11vpp.c
- Introduced in this commit: https://github.com/mpv-player/mpv/commit/49f73eaf7b6f58e82376fc764ab0743c039d5278 (which introduced the filter)

Reproduction Steps

Playback some interlaced video which has some sharp horizontal edges or text/graphics.
Play with D3D11 hw decoding and deinterlacing enabled.
It's not always immediately obvious - you need to look closely: Use a fixed scaling of 1.0 or 2.0, or set the resolution of a 4k display to 1600:900 and watch it there.

Expected Behavior

This is how it looks with bwdif (software) deinterlacing:

flickering_bwdif.mp4

Actual Behavior

And here are the results with D3D11VPP deinterlacing:

Intel & AMD

flickering_intel.mp4

The example is with Intel GPU and it looks the same with AMD GPUs.

Nvidia

Nvidia looks more like bwdif at first sight, but that's just because it supports the "blend" deinterlacing mode. The result is that it doesn't show that flickering/shaking, but with degraded appearance.

Here's a screenshot from Nvidia:

image

And this from Intel:

image

Sample Files

mpv_d3d11_deint_sample.zip

@softworkz
Copy link
Author

softworkz commented Oct 28, 2024

I don't have a build and development workflow for MPV that's why I'm filing an issue rather than a PR, but I think I've gotten behind the reasons why it doesn't work properly.

Issue 1

The first part that is wrong is this:

if (!mp_refqueue_should_deint(p->queue)) {
d3d_frame_format = D3D11_VIDEO_FRAME_FORMAT_PROGRESSIVE;
} else if (mp_refqueue_is_top_field(p->queue)) {
d3d_frame_format = D3D11_VIDEO_FRAME_FORMAT_INTERLACED_TOP_FIELD_FIRST;
} else {
d3d_frame_format = D3D11_VIDEO_FRAME_FORMAT_INTERLACED_BOTTOM_FIELD_FIRST;
}
ID3D11VideoContext_VideoProcessorSetStreamFrameFormat(p->video_ctx,
p->video_proc,
0, d3d_frame_format);

wm4 had commited this code with the following comment:

Another strange detail is how to select top/bottom fields and field
dominance. At least I'm getting quite similar results to vavpp on Linux,
so I'm content with it for now.

The answer to this is simple: you don't have to do that, because the result frames that you get from the DXVA decoders always include both fields already.
What ffmpeg does in that case is to emit the same decoded frame twice and declares it once as field0 and then as field1 (in case of software output, it also adds the linesize to the pointer, so that it starts on the second line instead of the first).
That's probably where the confusion came from and ended up in the wrong code (which I quoted). Directly above those lines is similar code, but that part is correct. It tells the videoprocessor about the frame order (whether top of bottom first). But that's something that doesn't change frequently (almost never).
Yet, the quoted code changes the frame order on each field - obviously not correct.

@softworkz
Copy link
Author

softworkz commented Oct 28, 2024

Issue 2

This is about the deinterlacing modes. For illustration, here some screenshots from DXVAChecker (nice tool):

Nvidia

image

Intel

image

AMD

image

Summary

Intel and Nvidia have one "video processor", AMD has two (separate one for deinterlacing), but all of them are indicating multiple deinterlacing methods, so there's not one for method X and another one for method Y.
How can you choose?

Original Commit

Not a new question at all - this is another part from the original commit of the D3D11VPP filter:

I'm not sure how to select the deinterlacing mode at all. You can
enumate the available video processors, but at least on Intel, all of
them either signal support for all deinterlacers, or none (the latter is
apparently used for IVTC). I haven't found anything that actually tells
the processor which algorithm to use.

I came to wonder about the same thing yesterday and even 8 years later it's still been tough to find out, because it's just not documented clearly.
The answer here is that the deinterlacing mode is determined implicitly based on other parameters you set and how you are using the individual APIs:

  • Blend
    blend means that two fields are merged into a single frame, so there's no doublilng of the framerate, also no reference frames are needed for this: You provide a frame with both fields and get a single frame back
    • To choose Blend, you set the output framerate of the videoprocessor to the same rate like the input framerate
  • Bob
    Bob means framerate doubling and it doesn't need any reference frames, it's also a kind of fallback (see below)
    • To choose Bob, you set the outpute framerate to the double of the input framerate and you need to submit the (combined) input frame twice to the processor to get the two output frames
  • Adaptive and Motion Compensation
    These are the really interesting ones. I don't think you can really make an explicit choice between them, but to get one of those, you need to do the same like for Bob, but additionally, you need to provide the number of future and past reference frames that is indicated by the processor (see screenshots). Once you don't provide those frames, it automatically falls back to Bob
  • Inverse Telecine
    Again, this is controlled by input/output framerate ratio and the way you feed and receive frames from the processor

That's also the answer to this line in the code:

// TODO: so, how do we select which rate conversion mode the processor uses?

Conclusions

  • The modes "blend" and "ivtc" can be removed right away from code and docs, because they are not compatible with the implementation (expecting frame doubling)
  • The input and output frame rates need to be set in the D3D11_VIDEO_PROCESSOR_CONTENThttps://learn.microsoft.com/en-us/windows/win32/api/d3d11/nf-d3d11-id3d11videocontext-videoprocessorsetstreamoutputrate_DESC structure before creating the enumerator and probably also on the processor after creation with VideoProcessorSetStreamOutputRate.
  • Reference frames should be provided to achieve the best possible deinterlacing results

@softworkz
Copy link
Author

Issue 3

When we set input and output framerate on the processor, there's still one question remaining: When input and output rate are different, how to transform N input frames into M output frames?

The key to that are frame numbers for input and output side. These need to be set in the right way which reflects the relation between input and output frame and in the current code, that not right either.

I had looked into verious other implementations - all different and most of them wrong (as I know now). I had my own theory but wasn't sure about it (yet I was right, hehe).
This morning, I came to wonder how GPU vendors are supposed to implement all this stuff at the driver side when the documentation is so vague about many crucial parts. So I looked at "the other side" - Windows driver development documentation, and guess what? All the things that are left out in the consuming API documentation is there...

Required Changes

D3D11_VIDEO_PROCESSOR_STREAM stream = {
.Enable = TRUE,
.pInputSurface = in_view,
};
int frame = mp_refqueue_is_second_field(p->queue);
hr = ID3D11VideoContext_VideoProcessorBlt(p->video_ctx, p->video_proc,
out_view, frame, 1, &stream);

  • In the Blt call above, a constant value of 0 needs to be supplied instead of frame.

  • The value of frame needs to be applied to the OutputIndex member of the D3D11_VIDEO_PROCESSOR_STREAM structure.

Then all should be good...

Docs: https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/d3dumddi/ns-d3dumddi-_dxvahdddi_stream_data

@kasper93
Copy link
Contributor

Thank you for the detailed analysis. I’m aware this filter is glued together, and when adding scaling, I was really hoping not to run into issues with the rest of this code.

That being said, it’s not as simple as you suggest; neither of your recommendations improves the situation, and in fact, they make it worse. But I did like 10 minutes try only, so don't take my word too seriously.

I don't have a build and development workflow for MPV

I can look into this when I find the time. Frankly, though, you’re already well ahead with the research you’ve done. Building mpv with MSYS2 is actually quite straightforward. If you want to give it a try, see compile-windows.md. I can do it too, but it’s not a high priority for me and probably will not do it quite soon.

@softworkz
Copy link
Author

Thank you for the detailed analysis. I’m aware this filter is glued together, and when adding scaling, I was really hoping not to run into issues with the rest of this code.

I'm sure it's not related to the scaling you have added, it has always been like that (one of our beta users mentioned it's a long-standing issue in MPV).
In fact, the scaling works great. By setting video-unscaled and gpu-dumb-mode and adding some code which is adapting the scale factor to the output window size, I was able to achieve playback with a much lower energy consumption. It may have inferior quality, but when running on battery, it's probably an acceptable compromise for many.
So for me, the scaling has been a very welcome addition! ❤️

That being said, it’s not as simple as you suggest; neither of your recommendations improves the situation, and in fact, they make it worse. But I did like 10 minutes try only, so don't take my word too seriously.

I kind of hoped it might be like a 10 minute thing for someone who has a working dev environment.
But as always, devil is in the details, and I'm also not sure whether the Intel and AMD deinterlacers might not depend on getting the reference frames supplied as they are indicating.
One other test I made though is that I ran the test file through ffmpeg with full-qsv hardware transcoding and the deinterlacing from vpp_qsv, so I can confirm that the Intel deinterlacing itself can actually produce proper results.

I can look into this when I find the time. Frankly, though, you’re already well ahead with the research you’ve done. Building mpv with MSYS2 is actually quite straightforward. If you want to give it a try, see compile-windows.md. I can do it too, but it’s not a high priority for me and probably will not do it quite soon.

Originally I had expected having to do this at some point, but mpv is very flexible and working very reliably, so my integration via libmpv is almost complete and that's the first issue where dealing with mpv source would be needed. Right now, I'm a bit short on time, so let's see who may first to find some time and passion for looking into this. 😆

Thanks

@kasper93
Copy link
Contributor

I kind of hoped it might be like a 10 minute thing for someone who has a working dev environment.
But as always, devil is in the details, and I'm also not sure whether the Intel and AMD deinterlacers might not depend on getting the reference frames supplied as they are indicating.

Yes, I suspect this has to be set properly. In fact mpv has all the queue for frames, so we can do it without much trouble, except understanding what exactly API expects from us.

I've tested in madVR which has all this implemented properly (I think) and it produces proper result, but also there is little bit of flicker just after the new elements show, so this make me think that it indeed uses previous frames. And this makes sense, but it needs so love to put into mpv to make it all correctly set-up.

@softworkz
Copy link
Author

I have reviewes all implementations on GitHub which are using those APIs, if you like I can send you the links. Those known to be good are all providing reference frames, the two samples from Microsoft don't, but well, it's just samples, keeping things simple.

The important thing to consider is that the D3D11 APIs are meaning 2 combined fields in a frame while the frames in the refqueue have one frame per field (even though each two frames are containing both fields), so if the current refqueue frame is frame 0 and the deinterlacer wants two future frames, we need to provide refqueue frames 2 and 4 (not 1 and 2).

@ValeZAA

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants