4502522a7a changed the way mpv handled and
saved cached files. In particular, it made a separate boolean option for
actually enabling cache and left the *-dir options as purely just a path
(i.e. having a dir set didn't mean you save cache). This technically
regressed people's configs, so let's just turn the cache on by default.
Linux users already expect random stuff in ~/.cache and well everyone
else can just live with some files possibly appearing in their config
directory.
I originally left `drmprime_overlay` as higher priority because
`drmprime` was new, and because I didn't have any hardware where both
worked (only one or the other) so I couldn't compare relative
performance, and if only one worked, the priority didn't matter.
But with time and more usage, we've reached a point where we can say we
would recommend using `drmprime` in situations where both work, and
we've also been able to identify hardware where both do indeed work and
it seems that `drmprime` is more reliable.
So, let's flip them.
Vulkan Video Decoding has finally become a reality, as it's now
showing up in shipping drivers, and the ffmpeg support has been
merged.
With that in mind, this change introduces HW interop support for
ffmpeg Vulkan frames. The implementation is functionally complete - it
can display frames produced by hardware decoding, and it can work with
ffmpeg vulkan filters. There are still various caveats due to gaps and
bugs in drivers, so YMMV, as always.
Primary testing has been done on Intel, AMD, and nvidia hardware on
Linux with basic Windows testing on nvidia.
Notable caveats:
* Due to driver bugs, video decoding on nvidia does not work right now,
unless you use the Vulkan Beta driver. It can be worked around, but
requires ffmpeg changes that are not considered acceptable to merge.
* Even if those work-arounds are applied, Vulkan filters will not work
on video that was decoded by Vulkan, due to additional bugs in the
nvidia drivers. The filters do work correctly on content decoded some
other way, and then uploaded to Vulkan (eg: Decode with nvdec, upload
with --vf=format=vulkan)
* Vulkan filters can only be used with drivers that support
VK_EXT_descriptor_buffer which doesn't include Intel ANV as yet.
There is an MR outstanding for this.
* When dealing with 1080p content, there may be some visual distortion
in the bottom lines of frames due to chroma scaling incorporating the
extra hidden lines at the bottom of the frame (1080p content is
actually stored as 1088 lines), depending on the hardware/driver
combination and the scaling algorithm. This cannot be easily
addressed as the mechanical fix for it violates the Vulkan spec, and
probably requires a spec change to resolve properly.
All of these caveats will be fixed in either drivers or ffmpeg, and so
will not require mpv changes (unless something unexpected happens)
If you want to run on nvidia with the non-beta drivers, you can this
ffmpeg tree with the work-around patches:
* https://github.com/philipl/FFmpeg/tree/vulkan-nvidia-workarounds
We will need the full ra_ctx to be able to look up all the state
required to initialise an ffmpeg vulkan hwcontext, so pass let's
pass the ra_ctx instead of just the ra.
It was unsafe to return pointer to memory that was freed on another
thread, just copy the string to caller owned sturcture.
Fixes crashes when displaying passes stats with gpu-next.
This adds cache as a possible path for mpv to internally pick
(~/.cache/mpv for non-darwin unix-like systems, the usual config
directory for everyone else). For gpu shader cache and icc cache,
controlling whether or not to write such files is done with the new
--gpu-shader-cache and --icc-cache options respectively. Additionally,
--cache-on-disk no longer requires explicitly setting the --cache-dir
option. The old options, --cache-dir, --gpu-shader-cache-dir, and
--icc-cache-dir simply set an override for the directory to save cache
files. If unset, then the cache is saved in XDG_CACHE_HOME.
This reworks all of mpv's unit tests so they are compiled as separate
executables (optional) and run via meson test. Because most of the tests
are dependant on mpv's internals, existing compiled objects are
leveraged to create static libs and used when necessary. As an aside, a
function was moved into video/out/gpu/utils for sanity's sake (otherwise
most of vo would have been needed). As a plus, meson multithreads
running tests automatically and also the output no longer pollutes the
source directory. There are tests that can break due to ffmpeg changes,
so they require a specific minimum libavutil version to be built.
This came up in #9828. According to the header comments, creating a 1D
ra_tex requires height and depth to be set to 1. For a 2D texture, it
requires depth be set to 1. There were a couple of spots in mpv's code
where this wasn't being followed. Although there was no known bug from
this, the rest of the code works like this so it was a good idea to go
ahead and sync it up. As a followup, let's just add some simple asserts
to ra.c to enforce this so it doesn't go unnoticed in the future.
ra_ctx_opts.want_alpha and vo_wayland_set_opaque_region's alpha
argument are only used as bool but both are ints. Particularly for the
function argument, passing a 0 or 1 is confusing - at first glance it
looks like you're specifying an alpha value of 0 or 1.
Since they're only used as bools, make them bools.
c784820454 introduced a bool option type
as a replacement for the flag type, but didn't actually transition and
remove the flag type because it would have been too much mundane work.
For OpenGL, this is based on simply comparing GL_VENDOR strings against
a list of allowed vendors.
Co-authored-by: Nicolas F. <ovdev@fratti.ch>
Co-authored-by: Niklas Haas <git@haasn.dev>
In debug mode the macro causes an assertion failure.
In release mode it works differently and tells the compiler that it can
assume the codepath will never execute. For this reason I was conversative
in replacing it, e.g. in mpv-internal code that exhausts all valid values
of an enum or when a condition is clear from directly preceding code.
Currently, the lcms-related options are only defined when lcms2 is
enabled at build time. However, this causes issues (e.g. segfault) for
vo_gpu_next, which relies on the presence of these options (to forward
them to libplacebo).
(That libplacebo internally depends on lcms2 as well is an
implementation detail - compiling mpv *without* lcms against libplacebo
*with* lcms should be possible in principle)
Fixes: #10891Closes: #10856
This is a very simple but easy way of doing it. Ideally, it would be
nice if we could also add some sort of introspection about shader
parameters at runtime, ideally exposing the entire list of parameters as
a custom property dict. But that is a lot of effort for dubious gain.
It's worth noting that, as currently implemented, re-setting
`glsl-shader-opts` to a new value doesn't reset back previously mutated
values to their defaults.
vo_dmabuf_wayland has its own ra and context so it can handle all the
different hwdec correctly. Unfortunately, this API was pretty clearly
designed with vo_gpu/vo_gpu_next in mind and has a lot of concepts that
don't make sense for vo_dmabuf_wayland. We still want to bolt on a
ra_ctx, but instead let's rework the ra_ctx logic a little bit. First,
this introduces a hidden bool within the ra_ctx_fns that is used to hide
the wldmabuf context from users as an option (unlike the other usual
contexts). We also want to make sure that hidden contexts wouldn't be
autoprobed in the usual ra_ctx_create, so we be sure to skip those in
that function. Additionally, let's create a new ra_ctx_create_by_name
function which does exactly what says. It specifically selects a context
based on a passed string. This function has no probing or option logic
is simplified just for what vo_dmabuf_wayland needs. The api/context
validations functions are modified just a little bit to make sure hidden
contexts are skipped and the documentation is updated to remove this
entries. Fixes#10793.
Wayland VO that can display images from either vaapi or drm hwdec
The PR adds the following changes:
1. a context_wldmabuf context with no gl dependencies
2. no-op ra_wldmabuf and dmabuf_interop_wldmabuf objects
no-op because there is no need to map/unmap the drmprime buffer,
and there is no need to manage any textures.
Tested on both x86_64 and rk3399 AArch64
We've had some annoying names for interops, which we can't simply
rename because that would break config files and command lines. So we
need to put a little more effort in and add a concept of legacy names
that allow us to continue loading them, but with a warning.
The two I'm renaming here are:
* vaapi-egl -> vaapi (vaapi works with Vulkan too)
* drmprime-drm -> drmprime-overlay (actually describes what it does)
* cuda-nvdec -> cuda (cuda interop is not nvdec specific)
Today, lavfi filters are provided a hw_device from the first
hwdec_interop that was loaded, regardless of whether it's the right one
or not. In most situations where a hardware based filter is used, we
need more control over the device.
In this change, a `hwdec_interop` option is added to the lavfi wrapper
filter configuration and this is used to pick the correct hw_device to
inject into the filter or graph (in the case of a graph, all filters
get the same device).
Note that this requires the use of the explicit lavfi syntax to allow
for the extra configuration.
eg:
```
mpv --vf=hwupload
```
becomes
```
mpv --vf=lavfi=[hwupload]:hwdec_interop=cuda-nvdec
```
or
```
mpv --vf=lavfi-bridge=[hwupload]:hwdec_interop=cuda-nvdec
```
If we want to be able to handle conversion between hw formats in filter
chains, then we need to be able to load hwdec_interops from filters, as
the VO is only ever going to initialise one interop, based on its
configuration. That means that in almost all situations, only one of
the required interops will be loaded at the time the filter is
initialised.
The existing code has some assumptions that new hwdec_interops will not
be loaded after the vo has picked one to use. This change fixes two
instances:
* Refusing to load a new hwdec_interop if there is at least one
loaded already.
* Not recalculating the set of formats known to the autoconvert
filter when a new output format shows up. This leads to autoconvert
not knowing that a new format is supported when the hwdec interop is
lazily loaded.
The older overlay based drmprime hwdec should be preferred to the new
texture mapping one. This is for a few reasons:
1. In any situation where both hwdecs work, it's probably right to use
the more mature one by default, for now.
2. It seems like the overlay path primarily works on older SoCs
where the texture path is less performant, and in at least one
tested case is visually buggy, so you definitely want it to be
tried first.
3. In situations where the old hwdec doesn't work, it will fall through
to the new one.
In the confusing landscape of hardware video decoding APIs, we have had
a long standing support gap for the v4l2 based APIs implemented for the
various SoCs from Rockship, Amlogic, Allwinner, etc. While VAAPI is the
defacto default for desktop GPUs, the developers who work on these SoCs
(who are not the vendors!) have preferred to implement kernel APIs
rather than maintain a userspace driver as VAAPI would require.
While there are two v4l2 APIs (m2m and requests), and multiple forks of
ffmpeg where support for those APIs languishes without reaching
upstream, we can at least say that these APIs export frames as DRMPrime
dmabufs, and that they use the ffmpeg drm hwcontext.
With those two constants, it is possible for us to write a
hwdec-interop without worrying about the mess underneath - for the most
part.
Accordingly, this change implements a hwdec-interop for any decoder
that produces frames as DRMPrime dmabufs. The bulk of the heavy
lifting is done by the dmabuf interop code we already had from
supporting vaapi, and which I refactored for reusability in a previous
set of changes.
When we combine that with the fact that we can't probe for supported
formats, the new code in this change is pretty simple.
This change also includes the hwcontext_fns that are required for us to
be able to configure the hwcontext used by `hwdec=drm-copy`. This is
technically unrelated, but it seemed a good time to fill this gap.
From a testing perspective, I have directly tested on a RockPRO64,
while others have tested with different flavours of Rockchip and on
Amlogic, providing m2m coverage.
I have some other SoCs that I need to spin up to test with, but I don't
expect big surprises, and when we inevitably need to account for new
special cases down the line, we can do so - we won't be able to support
every possible configuration blindly.
Using cmsFLAGS_HIGHRESPRECALC results in Little-CMS generating an
internal 49x49x49 3DLUT, from which it then samples our own 3DLUT. This
is completely pointless and not only destroys the accuracy of the 3DLUT,
but also results in no additional gain from increasing the 3DLUT
precision further.
The correct flag for us to be using is cmsFLAGS_NOOPTIMIZE, which
suppresses this internal 3DLUT generation and gives us the full
precision. We can also specify cmsFLAGS_NOCACHE, which is negligible but
in theory prevents Little-CMS from unnecessary pixel equality tests.