FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2026-06-24 08:48:37 +00:00

Author	SHA1	Message	Date
Niklas Haas	30155f9c3a	swscale/uops: split planes when generating ops lists This updates uops_macros.h and the graph.c implementation in lockstep, otherwise we'd have an intermediate commit with a bunch of broken formats. Overall speedup=1.008x faster, min=0.144x max=5.550x The min/max numbers are mostly measurement noise, but the real speedup for affected formats is anywhere from 0.9x to around 2x-3x. It's worth noting that the speedup for the formats which currently regress is because we don't yet refcopy the planes, but I have another series in the works which will take care of this soon. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-23 11:48:13 +00:00
Niklas Haas	ed12cf7515	swscale/uops: simplify uop mask printing slightly We can re-use the helper we just added. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-23 11:48:13 +00:00
Niklas Haas	27ff50f6ab	swscale/ops_dispatch: add SWS_OP_FLAG_DRY_RUN Avoids us having to write awkward code like `output ? &pass : NULL`. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-23 11:48:13 +00:00
Niklas Haas	5163b05552	swscale/uops: add SWS_UOP_READ_PALETTE This commit only adds the uop itself; it does not yet add any implementations. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-20 14:08:49 +00:00
Niklas Haas	168dfcb557	swscale/ops: add SWS_RW_PALETTE for PAL8 read type I decided to model this as a separate read/write type, rather than as a separate operation (e.g. SWS_OP_PALETTE), because it makes the semantics surrounding the read value range, plane pointer setup, etc. much cleaner. (This will become evident in the upcoming changes to the dispatch layer) Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-20 14:08:49 +00:00
Niklas Haas	b3689e792f	swscale/uops: simplify permute naming scheme We also drop the useless/unused mask from the permute ops. Avoids a bunch of otherwise duplicate permute ops. Now that this is handled by SWS_UOP_MOVE for x86, there is no downside to this. The FATE change is a pure rename of the uops dumps. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-20 03:10:22 +02:00
Ramiro Polla	10e9f273ee	swscale/uops: relax detection of exact computations in linear The first computation in a linear row doesn't have anything to accumulate to, so a multiply-accumulate instruction won't be used either way. This led to identical functions being instantiated for different params.	2026-06-18 14:50:45 +00:00
Ramiro Polla	80cd1a4361	swscale/uops: skip offset from unity detection for linear There is no easy optimization that can be triggered by knowing that the offset is exactly 1. This led to identical functions being instantiated for different params.	2026-06-18 14:50:45 +00:00
Niklas Haas	625ab011f4	swscale/uops: add default fallback for translate_op() Makes it a bit easier to add ops and uops in separate commits. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas Haas	b488ee5553	swscale/ops: generalize SwsReadWriteOp.packed to enum I want to start adding more data layouts, like semiplanar formats (nv12), or palette formats. I made an effort to distinguish existing checks for rw.packed into "mode != PLANAR" and "mode == PACKED", based on the intent of the surrounding code, in anticipation of these new layouts. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas Haas	11900e4e12	swscale/ops: generalize SWS_OP_FILTER_* result type Instead of hard-coding SWS_PIXEL_F32 here. This is not really useful yet, but I wanted to clean up the semantics here regardless. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas Haas	091149b187	swscale/ops: group filtered rw metadata into struct This is a minor cosmetic improvement that allows me to use more convenient names for a filter-related metadata fields, without confusion. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-11 16:27:47 +00:00
Niklas Haas	36004d681f	swscale/uops: add SWS_UOP_MOVE for optimal register-register swizzles This decomposes a swizzle mask into a series of optimal register-register moves, using at most two temporary scratch registers. This is a better match for ASM-style backends than the existing PERMUTE/COPY uops that are designed for the needs of the C backend (or other backends which either apply the swizzle mask directly or permute pointers). I originally had logic equivalent to this written in NASM macros, but it was just such a complicated mess that I think it's better to rewrite it in C and have the resulting metadata be an explicit part of the uop definition. This commit only adds the uop, I'll update the x86 implementation in the next step. Co-authored-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	228ef8d97b	swscale/ops: make compile() take const SwsOpList * The old x86 backend was the only backend that actually mutated the ops list. With this gone, we can constify this parameter. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	6057759ffc	swscale/uops: parametrize filter op result type The ops.h infrastructure currently hard-codes this as SWS_PIXEL_F32, but I want to at least properly parametrize this in case we ever decide to revisit this decision in the future. In particular, it may become relevant for trivial kernels or kernels whose intermediates are bounded, exact integers (which could possibly be output directly as e.g. U16 or U32). The FATE change is just because the filter op names gained a suffix. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	4a8a1f5b8b	swscale/uops: add SWS_UOP_READ_PLANAR_FV_FMA Analog of SWS_UOP_READ_PLANAR_FV for FMA-enabled backends. The logic for determining when we can safely use FMA is maybe a bit obtuse, given that a `return type == SWS_PIXEL_U8` would have just done the trick as well, but better to be safe than sorry, if we ever decide to tune this constant in the future. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	dbe961b4cd	swscale/uops: add SWS_UOP_LINEAR_FMA and SWS_UOP_FLAG_FMA This is like SWS_UOP_LINEAR but parametrized by which matrix entries can use FMA instead of bitexact IEEE mul/add instructions. I decided to make these a separate uop to avoid bogging down the reference backend with arch-specific details like FMA. However, I think FMA ops are quite common/universal so I pre-emptively split it into its own separate flag rather than defining something like SWS_UOP_FLAG_X86. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	4e18068165	swscale/uops: also generate macros under SWS_BITEXACT And SWS_BITEXACT\|SWS_ACCURATE_RND, for completeness. This roughly doubles the runtime of the uops macros generation. Let's hope it doesn't explode further. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	157f586e5c	swscale/uops: thread SwsContext through ff_sws_ops_translate() Needed to access ctx->flags, in particular SWS_BITEXACT. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	f97ba8cbe7	swscale/uops: loop over all flags when generating macros This list is currently empty but will be expanded by the following commit. I briefly tested whether it would be worth avoiding the free/realloc on the uops array, but found the performance difference to be negligible. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	02a168a576	swscale/uops: keep track of input range during op translation Needed for the FMA decision logic. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	3f9219d605	swscale/uops: add SwsUOpFlags to ff_sws_ops_translate() These will be used to e.g. enable extra uops during translation. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	adaf142647	swscale/uops: generate uop helper macros This follows the same approach as is used currently by ops_entries_aarch64, except I decided to have the generation logic live directly in uops.c to allow re-using internal helpers and move it closer to the other helpers that depend on the exact set of uops and their fields. Unlike libswscale/tests/sws_ops.c, we make an effort to actually test all relevant flag combinations, since these can affect the generated op lists. I will use these macros to auto-generate both the C template-based kernels, as well as the entire x86 backend, in the near future, hence their excessive flexibility. Re-use the libswscale/tests/sws_ops.c that we already compile. We could put it in its own file but this is just as convenient, and it's easily moved anyways. Having it be a FATE test ensures that it is always up-to-date. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 18:27:20 +02:00
Niklas Haas	aaf6a52fe6	swscale/uops: add uop translation logic This will replace the fuzzy matching logic in op_match() that is used by the C and x86 implementations, as well as the translation to AARCH64_OP_* that is used by the NEON asmgen backend. Down the line, this function will also take a set of flags to enable backend-specific kernels like FMA variants, but I also decided to keep it initially simple to ease the transition. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:10:39 +02:00
Niklas Haas	dc88bcdf8c	swscale/uops: add uop definitions Taken from AARCH64_OP_*, but generalized/simplified a bit and updated to add missing op types, especially for special cases that already have dedicated implementations on x86. This initial definition is kept intentionally simple and close to SwsOp, to make it easier to port the existing ops backends to the new infrastructure. However, in the future, this will be refactored dramatically - distinctions like convert vs expand will cease to exist on the SwsOp level, and will instead be introduced by separate optimization passes on the uops level. SWS_UOP_LINEAR in particular will most likely be broken up into multiple uops. I also took this opportunity to redefine the mask in a more useful way. I decided to split up SWS_OP_CONVERT as well, because it was making x86 codegen unnecessarily difficult due to the strong interaction between exact pixel sizes. Signed-off-by: Niklas Haas <git@haasn.dev>	2026-06-09 01:09:34 +02:00

25 Commits