Until recently, ao_lavc and vo_lavc started encoding whenever the core
happened to send them data. Since audio and video are not initialized at
the same time, and the muxer was not necessarily opened when the first
encoder started to produce data, the resulting packets were put into a
queue. As soon as the muxer was opened, the queue was flushed.
Change this to make the core wait with sending data until all encoders
are initialized. This has the advantage that we don't need to queue up
the packets.
The main change is that we wait with opening the muxer ("writing
headers") until we have data from all streams. This fixes race
conditions at init due to broken assumptions in the old code.
This also changes a lot of other stuff. I found and fixed a few API
violations (often things for which better mechanisms were invented, and
the old ones are not valid anymore). I try to get away from the public
mutex and shared fields in encode_lavc_context. For now it's still
needed for some timestamp-related fields, but most are gone. It also
removes some bad code duplication between audio and video paths.
Print them as a warning.
Note that there may be some cases where it underruns, without being a
bad condition. This could possibly happen e.g. if the last chunk is
written, and then it resumes playback some time after that. Eventually I
want to add more code to avoid such spurious warnings.
There is a dedicated thread for feeding audio to the ALSA API from a
buffer with a larger size. There is little reason to have such a large
device buffer.
One can now set the number of buffers and the buffer size.
This can reduce the CPU usage and the total latency stays mostly the same.
As there are sync mechanisms the A/V sync continue intact and working.
It also modifies 6.1 channel order, as per OpenAL spec
and add AOPLAY_FINAL_CHUNK support
OpenAL Soft's AL_SOFT_source_latency extension allows one to correctly
get the device output latency, facilitating the syncronization with
video.
Also added a simpler generic fallback that does not take into account
latency of the device.
Uses OpenAL Soft's AL_DIRECT_CHANNELS_SOFT extension and can be controlled through
a new CLI option, --openal-direct-channels.
This allows one to send the audio data direrctly to the desired channel without
effects applied.
Although half (non-fast track on sink rate) or one-third (non-fast track not on sink rate) of the buffer size of the created AudioTrack instance as the SL Enqueue buffer size is basically enough for dropout-free playback, only using the full size can avoid stutter upon (re)start of playback.
Here are the various buffer sizes on different track/sink rate when on Bluetooth audio on Android O:
aptX @ 48kHz:
Sink rate: 48000 Hz
44100 Hz: 10632 frames (241.09 ms)
48000 Hz: 11544 frames (240.50 ms)
88200 Hz: 21216 frames (240.54 ms)
96000 Hz: 23088 frames (240.50 ms)
176400 Hz: 42384 frames (240.27 ms)
192000 Hz: 46128 frames (240.25 ms)
SBC/AAC/aptX @ 44.1kHz:
Sink rate: 44100 Hz
44100 Hz: 10776 frames (244.35 ms)
48000 Hz: 11748 frames (244.75 ms)
88200 Hz: 21552 frames (244.35 ms)
96000 Hz: 23448 frames (244.25 ms)
176400 Hz: 43056 frames (244.08 ms)
192000 Hz: 46848 frames (244.00 ms)
The above results were produced with the following code:
import android.media.AudioAttributes;
import android.media.AudioFormat;
import android.media.AudioTrack;
class AudioInfo {
public static void main(String[] args) {
int nosr = AudioTrack.getNativeOutputSampleRate(3);
System.out.printf("Sink rate: %d Hz\n", nosr);
int[] rates = {44100,48000,88200,96000,176400,192000};
for (int rate: rates) {
AudioAttributes aa = new AudioAttributes.Builder().setFlags(256).build();
AudioFormat af = new AudioFormat.Builder().setSampleRate(rate).build();
AudioTrack at = new AudioTrack(aa, af, 4, 1, 0);
int sr = at.getSampleRate();
int bs = at.getBufferSizeInFrames();
float ms = bs * (float) 1000 / sr;
at.release();
System.out.printf("%d Hz: %d frames (%.2f ms)\n", sr, bs, ms);
}
}
}
Therefore bumping the device buffer size to 250ms.
If you set desired.samples to 0, SDL will return a default buffer size
on obtained.samples. This was broken, because ceil_power_of_two(0)
returns 1. Since 0 is usually not considered a power of two, this is
probably correct, but we still want to set desired.samples to 0 in this
case.
You can use --audio-buffer=0 to minimize the audio buffer size. But if
the AO reports no device buffer size (like e.g. ao_jack does), then the
buffer size is actually 0, and playback can never work properly.
Make it fallback to a size of 1, which is unlikely to work properly, but
you get what you asked for, instead of a freeze.
While the soft buffer size is already by default 200ms, it is not enough to guarantee dropout-free playback on Bluetooth audio. Bumping the device buffer size to the same value seems to suffice.
Always make the hw params dump function use MSGL_DEBUG, and remove the
MSGL_V use. That means you need -v -v to see them. The detailed
information is usually not very interesting, so this reduces the log
noise.
The af_get_best_sample_formats() function had an argument of
int[AF_FORMAT_COUNT], which is slightly incorrect, because it's 0
terminated and should in theory have AF_FORMAT_COUNT+1 entries. It won't
actually write this many formats (since some formats are fundamentally
incompatible), but it still feels annoying and incorrect. So fix it, and
require that callers pass an AF_FORMAT_COUNT+1 array.
Note that the array size has no meaning in C function arguments (just
another issue with C static arrays being weird and stupid), so get rid
of it completely.
Not changing the af_lavcac3enc use, since that is rewritten in another
branch anyway.
This commit eliminates the following clang warning:
warning: macro expansion producing 'defined' has undefined behavior [-Wexpansion-to-defined]
Going by the clang commit message, this seems to be explicitly specified
as UB by the standard, and they added this warning because MSVC
apparently results in different behavior. Whatever, we can just avoid
the warning with some small changes.
stdatomic.h defines no atomic_float typedef. We can't just use _Atomic
unconditionally, because we support compilers without C11 atomics. So
just create a custom atomic_float typedef in the wrapper, which uses
_Atomic in the C11 code path.
This does what af_volume used to do. Since we couldn't relicense it,
just rewrite it. Since we don't have a new filter mechanism yet, and the
libavfilter is too inconvenient, do applying the volume gain in ao.c
directly. This is done before handling the audio data to the driver.
Since push.c runs a separate thread, and pull.c is called asynchronously
from the audio driver's thread, the volume value needs to be
synchronized. There's no existing central mutex, so do some shit with
atomics. Since there's no atomic_float type predefined (which is at
least needed when using the legacy wrapper), do some nonsense about
reinterpret casting the float value to an int for the purpose of atomic
access. Not sure if using memcpy() is undefined behavior, but for now I
don't care.
The advantage of not using a filter is lower complexity (no filter auto
insertion), and lower latency (gain processing is done after our
internal audio buffer of at least 200ms).
Disavdantages include inability to use native volume control _before_
other filters with custom filter chains, and the need to add new
processing for each new sample type.
Since this doesn't reuse any of the old GPL code, nor does indirectly
rely on it, volume and replaygain handling now works in LGPL mode.
How to process the gain is inspired by libavfilter's af_volume (LGPL).
In particular, we use exactly the same rounding, and we quantize
processing for integer sample types by 256 steps. Some of libavfilter's
copyright may or may not apply, but I think not, and it's the same
license anyway.
Looks like this is covered by LGPL relicensing agreements now.
Notes about contributors who could not be reached or who didn't agree:
Commit 7fccb6486e has tons of mp_msg changes look like they are not
copyrightable (even if they were, all mp_msg calls were rewritten in
mpv times again). The additional play() change looks suspicious, but
the function was rewritten several times anyway (first time after that
commit in 4f40ec312).
Commit 89ed1748ae was rewritten in commit 325311af3 and then again
several times after that. Basically all this code is unnecessary in
modern mpv and has been removed.
No code survived from the following commits: 4d31c3c53, 61ecf838f2,
d38968bd, 4deb67c3f. At least two cosmetic typo fixes are not
considered as well.
Commit 22bb046ad is reverted (this wasn't a valid warning anyway, just
a C++-ism icc applied to C). Using the constants is nicer, but at least
I don't have to decide whether that change was copyrightable.
I _think_ this confuses Coverity and it thinks there is uninitialized
data to be read. Initialize the array to change/remove the warning, or
if there's a real problem, to make it easier to detect. (Basically apply
defensive coding.)
This should actually cover all of them, if you take into account that
some unchanged GPL source files include header files with such checks.
Also this was done already for the libaf derived code.
This is only for "safety" and to avoid misunderstandings.
Just reimplement it in some way, as mp_audio is GPL-only.
Actually I wanted to get rid of audio_buffer.c completely (and instead
have a list of mp_aframes), but to do so would require rewriting some
more player core audio code. So to get this LGPL relicensing over
quickly, just do some extra work.
Completely untested (rsound dev libs unavailable on my system). Trivial
enough that it's very likely that it'll just work. No port selection,
but could be added by parsing it as part of the device name.
Should fix#4714.
This is pretty pointless, but I believe it allows us to claim that the
new code is not affected by the copyright of the old code. This is
needed, because the original mp_audio struct was written by someone who
has disagreed with LGPL relicensing (it was called af_data at the time,
and was defined in af.h).
The "GPL'ed" struct contents that surive are pretty trivial: just the
data pointer, and some metadata like the format, samplerate, etc. - but
at least in this case, any new code would be extremely similar anyway,
and I'm not really sure whether it's OK to claim different copyright. So
what we do is we just use AVFrame (which of course is LGPL with 100%
certainty), and add some accessors around it to adapt it to mpv
conventions.
Also, this gets rid of some annoying conventions of mp_audio, like the
struct fields that require using an accessor to write to them anyway.
For the most part, this change is only dumb replacements of mp_audio
related functions and fields. One minor actual change is that you can't
allocate the new type on the stack anymore.
Some code still uses mp_audio. All audio filter code will be deleted, so
it makes no sense to convert this code. (Audio filters which are LGPL
and which we keep will have to be ported to a new filter infrastructure
anyway.) player/audio.c uses it because it interacts with the old filter
code. push.c has some complex use of mp_audio and mp_audio_buffer, but
this and pull.c will most likely be rewritten to do something else.
Any bad HRESULTs should have been printed already and lots of failure modes
don't have an HRESULT leading to awkward hr = E_FAIL business.
This also checks the exit status of GetBufferSize in the align hack. A final
fatal message is added if either of the retry hacks fail.
Previously, the entire convert_buffer was being copied to the desination without
regard to the fact that it may be packed and therefore smaller.
The allocated conversion buffer was also way to big
bytes * (channels * samples) ** 2
instead of
bytes * channels * samples
This shouldn't affect which are chosen, but it should speed up the search by
putting more common configurations earlier so that a working sample format and
sample rates can be found sooner obviating the need to search them for each
iteration of the outer loops.
The loop to select the native wasapi_format for the incoming audio was
not breaking correctly when it found the most desirable format. It
therefore executed completely leaving the least desirable format (u8) as
the choice.
fixes#4582
I'd actually be somewhat interested in supporting this, as it could help
testing the S24 conversion code. But then again it's only a pain,
there's no immediate need, and it would require new options to make
ao_pcm.c select this output format at all.
Do conversion directly, using the infrastructure that was added before.
This also rewrites part of format negotation, I guess.
I couldn't test the format that was used for S24 - my hardware does not
report support for it. So I commented it, as it could be buggy. Testing
this with the wasapi_formats[] entry for 24/24 uncommented would be
appreciated.
Instead of the infrastructure added in the previous commit to do the
conversion within the AO.
If this is used, and snd_pcm_status_get_avail() returns more frames than
snd_pcm_write*() actually accepts, you will get some nice audio
corruption.
Also, this mutates the data passed via play(), which is rather fishy,
but sort of doesn't matter for now. Surely this will cause unintended
bugs and WTFs.
I plan to remove the S24 sample formats in mpv. It seems like we should
still support this _somehow_ in AOs though. So the idea is to convert
the data to more obscure representations (that would not be useful for
filtering etc. anyway) within the AO.
This commit adds helper to enable this. ao_convert_fmt is meant to
provide mechanisms for this, rather than a generic audio format
description (as the latter leads only to overly generic misery). The
conversion also supports only cases which we think will be needed at
all.
The main advantage of this approach is that we get S24 out of sight,
and that we could support other crazy formats (like S20). The main
disadvantage is that usually S32 will be selected (if both S32 and S24
are available), and there's no user control to force S24. That doesn't
really matter though, and at worst makes testing harder or will lead
to unpleasant arguments with audiophiles (they'd be wrong anyway).
ao_convert_fmt.pad_lsb is ignored, although if we ever find a case in
which playing S32 with data in the LSBs breaks when playing it as padded
24 bit format. (For example, WAVEFORMATEXTENSIBLE recommends setting the
unused bits to 0 if wValidBitsPerSample implies LSB padding.)
UWP does not support the whole IMMDevice API. Instead, you need to use a
new API (available starting from Windows 8), which is in addition not in
MinGW, and extremely unpleasant to use.
The wasapiuwp2.dll wrapper is a small custom MSVC DLL, which does this
instead, and returns a normal IAudioClient.
Before this, ao_wasapi did not initialize on UWP.
While this is perfectly OK on Unix, it causes annoying valgrind
warnings, and might be otherwise confusing to others.
On Windows, the runtime can actually abort the process if this is
called.
push.c part taken from a patch by Pedro Pombeiro.
The code accounting for the terrible AUDCLNT_E_BUFFER_SIZE_NOT_ALIGNED
semantics (which MSDN claims can happen "starting with Windows 7" - so
probably on Windows 10 too) duplicated the call for creating the
IAudioClient. That's not great, so get rid of it.
Let wasapi_thread_init() handle this. It has a retry loop anyway. This
redoes device lookup and format negotiation, but potential failures due
to race conditions (what if the driver decides to change behavior)
shouldn't be worse than before.
Before this change, AOs could have internal alignment, and play() would
not consume the trailing data if the size passed to it is not aligned.
Change this to require AOs to report their alignment (via period_size),
and make sure to always send aligned data.
The buffer reported by get_space() now always has to be correct and
reliable. If play() does not consume all data provided (which is bounded
by get_space()), an error is printed.
This is preparation for potential further AO changes.
I casually checked alsa/lavc/null/pcm, the other AOs might or might not
work.
Right now, the current order pretty much means that pulse defaults to
S16 for arbitrary unsupported formats, but fallback to float would make
more sense since it's the easiest to convert everything to without
requiring dithering, and PA will probably just internally convert things
to float anyway.
Also move S32 above S16, which essentially means format_maps is sorted
by preference. (Although ao_pulse currently ignores this and always
picks the first as a fallback)
The user bugmen0t was apparently a shared github account with publicly
available login. Thus, we can't get LGPL relicensing permission from the
people who used this account. To relicense successfully, we have to
remove all their changes.
This commit should remove 20d1fc13, f26fb009, defbe48d. It also should
remove whatever test fragments were copied from the ancient configure,
as well as some configure logic (potentially that device path stuff).
I think this change still preserves the most important use-cases of OSS:
BSDs, and the Linux OSS emulation (the latter for testing only).
According to an OSS user, the 4front checks were probably broken anyway.
The SunAudio stuff was probably for (Open)Solaris, which is dead.
ao_oss.c itself will remain GPL, and still contains bugmen0t changes.
All relevant authors have agreed to the relicensing.
Problem cases:
eca47b1a5e: someone else gets credited for the "idea" of this change,
but it doesn't seem like it was a patch (otherwise reimar would have
said "patch"). Also, the associated code got essentially removed again
anyway. (The option parsing was rewritten fully.)
ffb529e4eb: anonymous/unknown author, but the code was fully removed
anyway. The struct was removed, and the modern code does explicit
read/write calls.
40789473d2: author was not contacted, but this code was removed
anyway. The magic number (0x7ffff000) is still in the new code, but I
don't think that is copyright relevant.
c750b8ab2d: the message was entirely removed.
All contributors of the current code have agreed. ao.c requires a
"driver" entry for each audio output - we assume that if someone who
didn't agree to LGPL added a line, it's fine for ao.c to be LGPL
anyway. If the affected audio output is not disabled at compilation
time, the resulting binary will be GPL anyway, and ootherwise the
code is not included.
The audio output code itself was inspired or partially copied from
libao in 7a2eec4b59 (thus why MPlayer's audio code is named libao2).
Just to be sure we got permission from Aaron Holtzman, Jack Moffitt, and
Stan Seibert, who according to libao's SVN history and README are the
initial author. (Something similar was done for libvo, although the
commit relicensing it forgot to mention it.)
242aa6ebd4: anders mostly disagreed with the LGPL relicensing, but we
got permission for this particular commit.
0ef8e55573: nick could not be reached, but the include statement was
removed again anyway.
879e05a7c1: iive agreed to LGPL v3+ only, but this line of code was
removed anyway, so ao_null.c can be LGPL v2.1+.
9dd8f241ac: patch author could not be reached, but the corresponding
code (old slave mode interface) was completely removed later.
All authors have agreed.
One exception is 71247a97b3, whose author was not asked, but we deem
the change as trivial. (And technically it was replaced when the audio
chain dropped non-native endian sample formats.)
All authors have agreed to the relicensing.
The code was pretty much rewritten by Stefano Pigozzi. Since the rewrite
happened incrementally, and seems to include refactored portions of
older code, this relicensing was done on the pre-refactor code do.
The original commit adding this AO (as ao_macosx.c) credits Timothy J.
Wood as original author. He was asked and agreed to LGPL. It's not
entirely sure from which project this code came from, but it's probably
libao. In that project, Stanley Seibert made some changes to it (who as
a major developer of libao was asked just to be sure), and also Ralph
Giles and Ben Hines made two small changes. The latter were not asked,
but none of their code survived anyway.
IMMDeviceEnumerator_RegisterEndpointNotificationCallback() will start
listening for notifications, and is the point at which callbacks can
start firing. These callbacks will read the fields we set after the
register calls, which is a potential race condition. Move it upwards.
This tried to use AF_FORMAT_DOUBLE as KSDATAFORMAT_SUBTYPE_IEEE_FLOAT,
with wBitsPerSample==64. This is probably not allowed, and drivers
appear to react inconsistently to it. (With one user, the format was
accepted during format negotiation, but then rejected on actual init.)
Remove it, which essentially forces it to fall back to some other
format. (Looks like it'll use af_select_best_samplerate(), which would
probably make it try S32 next.)
The af_fmt_from_planar() is so that we don't have to care about
AF_FORMAT_FLOATP. Wasapi always requires packed data anyway.
This should actually handle other potentially unknown sample formats
better.
This changes that set_waveformat() always set the exact format. Now it
might set a "close" format instead. But all callers seem to deal with
this well. Although in theory, callers should probably handle the
fallback. The next cleanup (if ever) can take care of this.
The buffer_size may be updated before the process callback is called for
the first time. Or, the connection graph could change, which changes the
latency of the pipeline after mpv's output. Ensure we keep on top of
these changes by registering callbacks to update our latency estimation.
It appears some device can be missing if we filter too many. In
particular, I've seen devices starting with "front" and "sysdefault"
being mapped to different hardware. I conclude that it's not sane trying
to present a nice device list to users in ALSA. It's fucked. (Although
kodi appears to attempt some intense "beautification" of the device
list, which includes parsing parameters from the device name and such.
Well, let's not.)
No other audio API requires such ridiculous acrobatics.
Apparently POLLERR can be set if poll is called while the device is in
the SND_PCM_STATE_PREPARED state. So assume that we can simply call
snd_pcm_status() to check whether the error is because the device went
away (i.e. we expect it to return ENODEV if this happened).
This avoids sporadic device lost warnings and AO reloads. The actual
device lost case is untested.
For example, previously, --audio-device='alsa/' would provide ao->device="" to
the alsa driver in spite of the fact that this is an already parsed option. To
avoid requiring a check of ao->device[0] in every driver, make sure this never
happens.