RGB to YUV converison was previously baked into every scale shader, but
this work has been moved to the YUV packing shaders. The scale shaders
now write RGBA instead. In the case where base and output resolutions
are identical, the render texture is forwarded directly to the YUV pack
step, skipping an entire fullscreen pass.
Intel GPA, SetStablePowerState, Intel HD Graphics 530, NV12
1920x1080, Before:
RGBA -> UYVX: ~321 us
UYVX -> Y: ~480 us
UYVX -> UV: ~127 us
1920x1080, After:
[forward render texture]
RGBA -> Y: ~487 us
RGBA -> UV: ~131 us
1920x1080 -> 1280x720, Before:
RGBA -> UYVX: ~268 us
UYVX -> Y: ~209 us
UYVX -> UV: ~57 us
1920x1080 -> 1280x720, After:
RGBA -> RGBA (rescale): ~268 us
RGBA -> Y: ~210 us
RGBA -> UV: ~58 us
There are devices like the GV-USB2 that produce frames with smmoth
timestamps at an uneven pace, which causes OBS to stutter because the
unbuffered path is designed to aggressively operate on the latest frame.
We can make the unbuffered path work by making two adjustments:
- Don't discard the current frame until it has elapsed.
- Don't skip frames in the queue until they have elapsed.
The buffered path still has problems with deinterlacing GV-USB2 output,
but the unbuffered path is better anyway.
Testing:
GV-USB2, Unbuffered: Stuttering is gone!
GV-USB2, Buffered: No regression (still broken).
SC-512N1-L/DVI, Unbuffered: No regression (still works).
SC-512N1-L/DVI, Buffered: No regression (still works).
It's a waste of GPU time to do two fullscreen passes to render final mix
previews. Use blend states to simulate the black background of
DrawBackdrop() for the following situations:
- Main preview window (Studio Mode off)
- Studio Mode: Program
This does not effect:
- Studio Mode: Preview (still uses DrawBackdrop)
- Fullscreen Projector (uses GPU clear to black)
- Windowed Projector (uses GPU clear to black)
intel GPA, SetStablePowerState, Intel HD Graphics 530, 1920x1080
Before:
DrawBackdrop: ~529 us
main texture: ~367 us (Cheaper than drawing a black quad?)
After:
[DrawBackdrop optimized away]
main texture: ~383 us
Add a separate shader for area upscaling to take advantage of bilinear
filtering. Iterating over texels is unnecessary in the upscale case
because a target pixel can only overlap 1 or 2 texels in X and Y
directions. When only overlapping one texel, adjust UVs to sample texel
center to avoid filtering.
Also add "base_dimension" uniform to avoid unnecessary division.
Intel HD Graphics 530, 644x478 -> 1323x1080: ~836 us -> ~232 us
Unlike get_properties, there is not reason to not call get_defaults if it is
given in addition to get_defaults2. Additonally this fixes the bug with
'init_encoder' which would only ever call get_defaults, resulting in broken
encoders if those used get_defaults2.
This implements pausing of outputs. To accomplish this, raw audio/video
data is halted to the encoders or raw output. Pausing is as precisely
timed as possible according to the timing of the obs_output_pause call,
and audio data will be spliced down to the exact audio sample in
accordance to that timing at the start/end marks.
Outputs that support this (outputs used for recording) can set the
OBS_OUTPUT_CAN_PAUSE capability flag.
If the audio subsystem was buffered to any extent, the audio of a raw
output would start off at a negative offset, requiring each raw output
to implement a "prepare_audio" function (as seen in the FFmpeg output)
in order to ensure proper synchronization with video. This did not
apply to encoded outputs because it was already being performed by the
obs-encoder code.
If an audio source does not provide enough data at a steady pace, the
timestamp update does not happen, and buffering increases until it
maxes out. To counteract this, update the timestamp anyway.
Another issue for decoupled audio sources is that timing is not
adjusted for divergence from system time. Making this adjustment is
better for timing stability.
5+ hours of stable audio without any buffering on my GV-USB2 where it
used to add 21ms every 5 mintues or so.
Fixes https://obsproject.com/mantis/view.php?id=1269
Fix ternary test to use BGRX render targets for YUV to RGB
conversions. The previous behavior may have been fine though since
the shaders fill the alpha channel with 1.0 anyway.
Code submissions have continually suffered from formatting
inconsistencies that constantly have to be addressed. Using
clang-format simplifies this by making code formatting more consistent,
and allows automation of the code formatting so that maintainers can
focus more on the code itself instead of code formatting.
In 'libobs/CMakeLists.txt', use '${CMAKE_INSTALL_LIBDIR}' instead of
'${CMAKE_INSTALL_PREFIX}/lib', as the latter results into 'libobs.pc'
being installed under '/lib' when '/lib64' would be more appropriate.
In 'libobs/libobs.pc.in', use '@CMAKE_INSTALL_FULL_LIBDIR@' for
'libdir', '@CMAKE_INSTALL_FULL_INCLUDEDIR@' for 'includedir',
and '@CMAKE_INSTALL_PREFIX@' for 'prefix'.
Gentoo-Bug: https://bugs.gentoo.org/644538
The cache coherency of rasterization for full-screen passes is better
using an oversized triangle that is clipped rather than two triangles.
Traversal order of rasterization is GPU-specific, but will almost
certainly be better using an undivided primitive.
A smaller benefit is that quads along the diagonal are not evaluated
multiple times, but that's minor in comparison.
Redo format shaders to bypass vertex buffer, and input layout. Add
global shader bool "obs_glsl_compile" to make API-specific decisions,
i.e. handle upside-down UVs. gl_ortho is not needed for format
conversion because the vertex shader does not use ViewProj anymore.
This can be applied to more situations, but start small first.
Testbed full screen passes, Intel HD Graphics 530:
RGBA -> UYVX: 467 -> 439 us, ~6% savings
UYVX -> uv: 295 -> 239 us, ~19% savings
Similar to item_visible, this event fires whenever a scene item is
locked or unlocked. This allows the UI and libobs to remain in sync
regarding scene elements' statuses.
Switch for loop to do/while because we know the condition is always
true for the first loop.
Replace int math with float math to play nicely with more GPUs.
Add variables imagesize/targetsize to avoid redundant reciprocals.
Intel GPA results: 1166 -> 836 us
Remove three instances of unnecessary double-buffering. They are not
needed to avoid stalls, and cause increased memory traffic when
measured on Intel HD 530, presumably because texture data will remain
in cache if sampled immediately after write.
(Note: GPU timings from Intel GPA are volatile.)
NV12, 3 Draws:
RGBA -> UYVX: 628 us -> 543 us
UYVX -> Y: 522 us -> 507 us
UYVX -> UV: 315 us -> 187 us
Total, Duration: 1594 us -> 1153 us
Total, GTI Read Throughput: 25.2 MB -> 15.9 MB
Normally, paired encoders are unpaired when they stop. However, if the
pairing occurs before the encoders actually start, and the encoders
never actually end up starting, they are never unpaired, and that
pairing stays with them until the next time an output is started up
again. That in turn can cause an output that uses one of the encoders
but not the other to not function correctly, and neither properly
"start" nor stop because the data is queued continually in the
interleaved packet array.
For example, let's say there are two outputs, two video encoders, and
one audio encoder. This can be reproduced by using advanced output mode
and making the two outputs use separate video encoders while sharing
track 1's audio encoder. If you start up the stream output first and it
fails to fully connect for whatever reason (bad server, bad stream key,
etc), then you start up the recording output, the recording output will
appear to be running, but will not stop when you hit "stop recording".
It will stay perpetually on "stopping recording" and will get stuck that
way. This is because when the streaming output started, the streaming
output would initially pair video encoder A with audio encoder A before
the encoders actually fully started up (as the encoders do not fully
start up until a connection is successfully made), and when the
recording output starts up after that disconnection, audio encoder A
will wait for video encoder A rather than video encoder B because that
pairing was never actually cleared.
So, instead of pairing encoders when the output starts, wait until the
encoders themselves are being started and then pair the encoders at that
point in time. This ensures that the encoders start up and will clear
their pairing when no longer in use.
A previous refactoring to make DrawMatrix unnecessary has left behind
unreachable YUV conversions. Even if this code was somehow reachable,
DrawMatrix for YUV -> RGB doesn't exist anymore, so they would render
incorrectly anyway.
(This commit also modifies the UI, obs-ffmpeg, and obs-output modules)
Fixes a long-time regression where the program would lock up if an
encode call fails. Shuts down all outputs associated with the failing
encoder and displays an error message to the user.
Ideally, it would be best if a more detailed error could be displayed to
the user about the nature of the error, though the primary problem is
the encoder errors are typically not something the user would be able to
understand. The current message is a bit of a generic error message;
improvement is welcome.
Another suggestion is to try to have the encoder restart seamlessly,
though it would take a significant amount of work to be able to make it
do something like that properly, and it sort of assumes that encoder
failures are sporadic, which may not necessarily be the case with some
hardware encoders on some systems. It may be better just to use another
encoder in that case. For now, seamless restart is ruled out.
The issue with the current bilinear_lowres_scale effect is that it
samples adjacent texels, disregarding the texel-to-pixel ratio. If the
ratio is large, this can lead to aliasing. This change provides a fair
set of texture samples across the entire pixel.
The 8-sample pattern used here comes from Direct3D.
This commit adds a function to forcefully stop a transition, and to
increment/decrement the showing counter for a source with the MAIN_VIEW
type.
These functions are needed for the transition previews to work as
intended.
There are cases where alpha is multiplied unnecessarily. This change
attempts to use premultiplied alpha blending for composition.
To keep this change simple, The filter chain will continue to use
straight alpha. Otherwise, every source would need to modified to output
premultiplied, and every filter modified for premultiplied input.
"DrawAlphaDivide" shader techniques have been added to convert from
premultiplied alpha to straight alpha for final output. "DrawMatrix"
techniques ignore alpha, so they do not appear to need changing.
One remaining issue is that scale effects are set up here to use the
same shader logic for both scale filters (straight alpha - incorrectly),
and output composition (premultiplied alpha - correctly). A fix could be
made to add additional shaders for straight alpha, but the "real" fix
may be to eliminate the straight alpha path at some point.
For graphics, SrcBlendAlpha and DestBlendAlpha were both ONE, and could
combine together to form alpha values greater than one. This is not as
noticeable of a problem for UNORM targets because the channels are
clamped, but it will likely become a problem in more situations if FLOAT
targets are used.
This change switches DestBlendAlpha to INVSRCALPHA. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
obs-transitions: Use premultiplied alpha blend, and simplify shaders
because both inputs and outputs use premultiplied alpha now.
Fixes https://obsproject.com/mantis/view.php?id=1108
Ensures that functions loaded by `os_dlsym()` come only from the
specified library that was loaded with `os_dlopen()` rather than the set
of libraries loaded by the specified library.
libobs: Add support for limited to full color range conversions when
using RGB or Y800 formats, and move RGB converison for Y800 formats to
the GPU.
decklink: Stop hiding color space/range properties for RGB formats, and
remove "YUV" from "YUV Color Space" and "YUV Color Range".
win-dshow: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
UI: Remove "YUV" from "YUV Color Space" and "YUV Color Range".
Fixes handling of the `obs_source_frame::full_range` member variable,
which is often set to false by default by many plugins even when using
RGB, which would cause RGB to be marked as "partial range". This change
is crucial for when partial range RBG support is implemented.
Adds `obs_source_frame2` structure that replaces the `full_range` member
variable with a `range` variable, which uses the `video_range_type` enum
to allow handling default range values. This member variable treats
VIDEO_RANGE_DEFAULT as full range if the format is RGB, and partial
range if the format is YUV.
Also adds `obs_source_output_video2` and `obs_source_preload_video2`
functions which use the `obs_source_frame2` structure instead of the
`obs_source_frame` structure.
When using the original `obs_source_frame`, `obs_source_output_video`,
and `obs_source_preload_video` functions, RGB will always be full range
by default for backward compatibility purposes.
The line drawing functions previously assumed the upper-left 3x3 for
box_transform only held scale. The matrix can also hold rotation, so
pass in scale separately.
Fixes https://obsproject.com/mantis/view.php?id=1442
Property groups allow multiple properties to be combined into a single
visual group and thus can be used to reduce visual clutter and improve
user experience by giving additional context to a property. The name of
a child of a group is not affected by the group and thus all property
names must still be unique.
A new parent field has been added to obs_properties_t to ease the
tracking of existing properties. That way properties inside a group will
also be able to verify that they have a unique name.
Currently several shaders need "DrawMatrix" techniques to support the
possibility that the input texture is a "YUV" format. Also, "DrawMatrix"
is overloaded for translation in both directions when it is written for
RGB to "YUV" only.
A cleaner solution is to handle "YUV" to RGB up-front as part of format
conversion, and ensure only RGB inputs reach the other shaders. This is
necessary to someday perform correct scale filtering without the cost of
redundant "YUV" conversions per texture tap.
A necessary prerequisite for this is to add conversion support for
VIDEO_FORMAT_I444, and that is now in place. There was already a hack in
place to cover VIDEO_FORMAT_Y800. All other "YUV" formats already have
conversion functions.
"DrawMatrix" has been removed from shaders that only supported "YUV" to
RGB conversions. It still exists in shaders that perform RGB to "YUV"
conversions, and the implementations have been sanitized accordingly.
Adds more log information for effects, such as the reformatted effect code, parameters, techniques and passes. This is useful for tracking down issues and what actually ends up being compiled in the end, without having to guess where the error is actually at, as what is compiled and what was originally there usually differ by a lot.
This information is only logged if libobs is compiled in debug and _DEBUG_SHADERS is defined and not compiled if either of those are not the case.
Add D3D/GL debug markers to make RenderDoc captures easier to tranverse.
Also add obs_source_get_name_no_null() to avoid boilerplate for safe
string formatting.
Closesobsproject/obs-studio#1799
Add support for debug markers via D3DPERF API and KHR_debug. This makes
it easier to understand RenderDoc captures.
D3DPERF is preferred to ID3DUserDefinedAnnotation because it supports
colors. d3d9.lib is now linked in to support this.
This feature is disabled by default, and is controlled by
GS_USE_DEBUG_MARKERS.
From: obsproject/obs-studio#1799
We can't compare addresses of ComPtr for self-reference directly because
the address-of operator is overloaded, causing a compiler error. This
fix more or less matches the WRL implementation.
Currently SrcBlendAlpha and DestBlendAlpha are both ONE, and can
combine together to form two. This is not a noticeable problem for
UNORM targets because the channels are clamped, but it will likely
become a problem if FLOAT targets are more widely used.
This change switches DestBlendAlpha to INVSRCALPHA, and starts
backgrounds as opaque black instead of transparent black. The blending
behavior of stacked transparents is preserved without overflowing the
alpha channel.
Allows removing a property from a properties list in get_properties or
modified callback to enable dynamic property generation. The behavior
is undefined if the UI properties list is not refreshed by returning
true from the modified callback.
It appears there's a projection flip that is applied in some situations,
like the preview pane in studio mode, and the shader math fails when
it's active causing the output color to be zero. This fixes the math for
GLSL (with a tiny redundancy penalty to HLSL), and cleans up some
unnecessary code along the way.
Use abs() to avoid zero area in case the OpenGL projection flip is
active. Also simplify the math, and remove the unnecessary sampler
state.
While performing a release candidate, it is important to use an actual
version before the actual release candidate version so that the
auto-updater will know to update to 23.1.0
OBS_EFFECT_AREA from 7d811499e was inserted in the middle of the enum,
which breaks ABI for any binaries that use
OBS_EFFECT_PREMULTIPLIED_ALPHA or OBS_EFFECT_BILINEAR_LOWRES.