This may be faster or slower than the existing specialized kernels,
so I opted not to prefer it by default. I also deliberately didn't expose
additional filter function capabilites yet.
The main motivating reason here is to get correct anti-aliasing behavior
when downscaling, which is currently completely broken.
Signed-off-by: Niklas Haas <git@haasn.dev>
Useful for GPU-based filters, which may also need to compute filter weights.
Since we cannot cross-link to internal functions, we need to recompile this
helper inside libavfilter.c.
Signed-off-by: Niklas Haas <git@haasn.dev>
This fallback function is used if external MMX is available,
while inline MMX and intrinsics for emitting emms are unavailable.
It is implemented as an avpriv function, which has several
drawbacks for shared builds:
1. The function is so small (3 bytes; 16 with padding)
that the overhead of exporting and importing it dwarfs
the gains from code deduplication.
2. A call to an external library has more overhead than
a library-internal one.
3. It may cause linking failures when a libavutil not exporting
avpriv_emms_asm() is paired with a library needing it
(if inline assembly and intrinsics were unavailable when building
the dependent library). I am not aware of this ever happening.
4. We would be forced to keep avpriv_emms_asm() around for ABI stability
even after it is no longer needed.
This commit therefore uses the STLIBOBJS, SHLIBOBJS approach
to duplicating it into each library on its own if needed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
libnpp and the corresponding filters have been deprecated
in commit 994a368451
on 2025-09-26. By the time of our next release,
a year will have passed, so they are removed immediately.
Note: Passing --enable-libnpp to configure results in
a warning about the deprecation and is otherwise a no-op.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ass filter exposes libass' shaping mode selection so callers can
request complex shaping for scripts such as Arabic. The subtitles filter
uses the same renderer path but did not expose the option.
This left the zero-initialized shaping field to select
ASS_SHAPING_SIMPLE implicitly.
Expose the same shaping option for subtitles and default it to auto,
matching the ass filter. This allows subtitles=...:shaping=complex to
render Arabic lam-alef correctly when libass is built with HarfBuzz
support.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
This patch adds ONNX Runtime as a new DNN backend for FFmpeg's dnn_processing
filter, enabling hardware-accelerated neural network inference on multiple
GPU and NPU platforms.
Execution Providers Supported:
- CPU execution provider (default)
- CUDA execution provider (NVIDIA GPUs)
- DirectML execution provider (AMD/Intel/NVIDIA GPUs on Windows)
- VitisAI execution provider (AMD Ryzen AI NPU)
The options for dnn_processing with dnn_backend=onnx:
- device: execution provider — cpu, cuda, dml, or vitisai (default: cpu)
- device_id: GPU device index (default: 0)
- threads_per_operation: inference thread count for CPU EP (default: 0, auto)
- input: input tensor name. When omitted the backend resolves it from loaded session
- output: output tensor name. When omitted the backend resolves it from loaded session
Example usage:
# CPU inference
ffmpeg -i input.mp4 -vf "format=rgb24,dnn_processing=dnn_backend=onnx:model=model.onnx:input=image_in:output=image_out" output.mp4
# CUDA GPU inference
ffmpeg -i input.mp4 -vf "dnn_processing=dnn_backend=onnx:model=model.onnx:device=cuda:device_id=0" output.mp4
# DirectML GPU inference (Windows)
ffmpeg -i input.mp4 -vf "dnn_processing=dnn_backend=onnx:model=model.onnx:device=dml:device_id=0" output.mp4
# VitisAI NPU inference
ffmpeg -i input.mp4 -vf "dnn_processing=dnn_backend=onnx:model=model.onnx:device=vitisai" output.mp4
Note: depending on the model, you may need a format filter (e.g. format=rgb24 or format=grayf32) before dnn_processing to convert the frames to the pixel format the model's input tensor expects.
Signed-off-by: younengxiao <steven.xiao@amd.com>
Reviewed-by: Guo Yejun <yejun.guo@intel.com>
Slice based filter workers compute their per-thread row/sample/channel
boundaries as total * jobnr / nb_jobs. The total * jobnr product is
evaluated in int and overflows signed int for large dimensions and many
slice threads, before the division by nb_jobs brings it back in range.
deinterlace_slice() computed per-thread row boundaries with int
multiplication height * (jobnr + 1). With a tall frame and many filter
threads the product overflows signed int before the division by nb_jobs.
Use int64_t for the intermediate product before converting back to int
row indices.
Found-by: Kery (Qi Kery <qikeyu2001@outlook.com>)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The device-only compilation path of vf_scale_cuda.h pulled in <stdint.h>
solely to obtain uint8_t for the CUdeviceptr typedef. On Windows-on-ARM
(aarch64 mingw) this drags in _mingw.h, whose ARM __prefetch intrinsic is
guarded by !__has_builtin(__prefetch). During clang's --cuda-device-only
pass __has_builtin has deferred/inconsistent semantics on the auxiliary
(host) target, so the guard mis-fires, the inline __prefetch definition is
emitted, and clang rejects it:
_mingw.h: error: definition of builtin function '__prefetch'
This broke the msys2-clangarm64 FATE slot once ffnvcodec (and thus the
nvcc-compiled CUDA filters) was enabled for aarch64 Windows.
uint8_t is unsigned char, so use that directly and drop the <stdint.h>
include. Device-only code should not depend on the host C runtime headers.
No functional or ABI change.
Signed-off-by: Diego de Souza <ddesouza@nvidia.com>
libplacebo versions before v365 passed .flags = 0 when retrieving the queues
from imported Vulkan devices, so we have to error out in the case of a mismatch
to avoid undefined behavior (Vulkan spec).
See-Also: https://code.videolan.org/videolan/libplacebo/-/merge_requests/856
Sponsored-by: nxtedition AB
Signed-off-by: Niklas Haas <git@haasn.dev>
Log the script and direction picked by HarfBuzz, plus codepoint and
glyph counts, so the shaper choice can be verified. Differing
codepoint and glyph counts indicate reordering / ligation /
decomposition.
Codepoints are sampled before hb_shape(), which flips the buffer
content type to GLYPHS.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
shape_text_hb() set HB_SCRIPT_LATIN and called
hb_buffer_guess_segment_properties() on an empty buffer, so the
inference was a no-op. Bengali and other Indic / USE scripts reached
the default OT shaper instead of their script-specific shaper,
leaving the virama visible and consonants disjointed (e.g. স্টারমার
rendered as স্ টারমার).
Add the UTF-8 text first, keep the existing LTR direction used by the
FriBidi visual-order pipeline, then guess segment properties so the
script comes from the actual Unicode contents.
Fixes: https://code.ffmpeg.org/FFmpeg/FFmpeg/issues/23014
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
For glyphs whose source is already in bitmap form (color emoji fonts such
as NotoColorEmoji.ttf), FT_Glyph_To_Bitmap(..., destroy=0) returns the
input pointer unchanged. The result is that glyph->bglyph[idx] aliases
glyph->glyph (and analogously border_bglyph[t] may alias border_glyph).
glyph_enu_free then called FT_Done_Glyph on both, double-freeing the
underlying object.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The code as written can add such glyphs in the cache so we need to check
glyphs from the cache too.
This should be the most robust and simple solution
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
out_layout at this point is a zeroed struct, and even after being filled below in the code
it's ensured it will be the same as outlink->side_data.
The actual check should be between inlink and outlink layouts. If they differ, then swr
will do remixing and as such the downmix info side data will no longer be valid for any
filter or encoder down the chain.
Signed-off-by: James Almer <jamrial@gmail.com>
try_push_frame() decides whether an input buffer is already tracked by testing
`j == i` (the channel index) instead of `j == nb_buffers`. Once an earlier
channel shared a buffer, nb_buffers falls behind i and a genuinely new buffer is
never referenced, so it is freed while the output frame still points at it.
Reported by Franciszek Kalinowski (isec.pl / striga.ai) and Bartosz Smigielski.