3494 Commits

Author SHA1 Message Date
Andreas Rheinhardt
762b94e672 libs: Bump major version of all libraries
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-06-23 19:15:57 +02:00
Andreas Rheinhardt
2d3776b8cc avutil/x86/emms: Unavpriv avpriv_emms_asm()
This fallback function is used if external MMX is available,
while inline MMX and intrinsics for emitting emms are unavailable.
It is implemented as an avpriv function, which has several
drawbacks for shared builds:
1. The function is so small (3 bytes; 16 with padding)
that the overhead of exporting and importing it dwarfs
the gains from code deduplication.
2. A call to an external library has more overhead than
a library-internal one.
3. It may cause linking failures when a libavutil not exporting
avpriv_emms_asm() is paired with a library needing it
(if inline assembly and intrinsics were unavailable when building
the dependent library). I am not aware of this ever happening.
4. We would be forced to keep avpriv_emms_asm() around for ABI stability
even after it is no longer needed.

This commit therefore uses the STLIBOBJS, SHLIBOBJS approach
to duplicating it into each library on its own if needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2026-06-23 19:15:57 +02:00
Niklas Haas
30155f9c3a swscale/uops: split planes when generating ops lists
This updates uops_macros.h and the graph.c implementation in lockstep,
otherwise we'd have an intermediate commit with a bunch of broken formats.

Overall speedup=1.008x faster, min=0.144x max=5.550x

The min/max numbers are mostly measurement noise, but the real speedup for
affected formats is anywhere from 0.9x to around 2x-3x.

It's worth noting that the speedup for the formats which currently regress
is because we don't yet refcopy the planes, but I have another series in the
works which will take care of this soon.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
09c0cd6837 swscale/tests/sws_ops: split passes when printing ops lists
This affects a large number of conversions across the board, either:

1. Lifting a constant alpha/chroma clear out from the conversion pass:

 rgb24 16x16 -> yuva444p 16x16:
+  [ u8 $XXX] SWS_OP_CLEAR        : {255 _ _ _}
+  [ u8 XXXX] SWS_OP_WRITE        : 1 elem(s) planar >> 0, via {3}
+    ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
+ translated micro-ops:
+    u8_clear_x_1
+    u8_write_planar_x
+ Sub-pass #1:
   [ u8 +++X] SWS_OP_READ         : 3 elem(s) packed >> 0
   [ u8 +++X] SWS_OP_CONVERT      : u8 -> f32
   [f32 ...X] SWS_OP_LINEAR       : matrix3+off3 [...]
   [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
   [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
-  [ u8 +++$] SWS_OP_CLEAR        : {_ _ _ 255}
-  [ u8 XXXX] SWS_OP_WRITE        : 4 elem(s) planar >> 0
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
     ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)

 gray 16x16 -> yuv444p 16x16:
+  [ u8 $$XX] SWS_OP_CLEAR        : {128 128 _ _}
+  [ u8 XXXX] SWS_OP_WRITE        : 2 elem(s) planar >> 0, via {2, 1}
+    ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
+ translated micro-ops:
+    u8_clear_xy_xx
+    u8_write_planar_xy
+ Sub-pass #1:
   [ u8 =XXX] SWS_OP_READ         : 1 elem(s) planar >> 0
   [ u8 =XXX] SWS_OP_CONVERT      : u8 -> f32
   [f32 .XXX] SWS_OP_LINEAR       : luma [...]
   [f32 .XXX] SWS_OP_DITHER       : 16x16 matrix + {0 -1 -1 -1}
   [f32 +XXX] SWS_OP_CONVERT      : f32 -> u8
-  [ u8 +$$X] SWS_OP_CLEAR        : {_ 128 128 _}
-  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+  [ u8 XXXX] SWS_OP_WRITE        : 1 elem(s) planar >> 0
     ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
  translated micro-ops:
     u8_read_planar_x
     u8_to_f32_x
     f32_linear_x_x000x
     f32_dither_x_0_16x16
     f32_to_u8_x
-    u8_clear_yz_xx
-    u8_write_planar_xyz
+    u8_write_planar_x

or

2. Passing through a plane that was previously unmodified by an ops chain:

 gbrap 16x16 -> yuva444p 16x16:
-  [ u8 ====] SWS_OP_READ         : 4 elem(s) planar >> 0, via {2, 0, 1, 3}
-  [ u8 ====] SWS_OP_CONVERT      : u8 -> f32
-  [f32 ...=] SWS_OP_LINEAR       : matrix3+off3 [...]
-  [f32 ...=] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
-  [f32 +++=] SWS_OP_CONVERT      : f32 -> u8
-  [ u8 XXXX] SWS_OP_WRITE        : 4 elem(s) planar >> 0
+  [ u8 =XXX] SWS_OP_READ         : 1 elem(s) planar >> 0, via {3}
+  [ u8 XXXX] SWS_OP_WRITE        : 1 elem(s) planar >> 0, via {3}
     ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
  translated micro-ops:
-    u8_read_planar_xyzw
-    u8_to_f32_xyzw
+    u8_read_planar_x
+    u8_write_planar_x
+ Sub-pass #1:
+  [ u8 ===X] SWS_OP_READ         : 3 elem(s) planar >> 0, via {2, 0, 1}
+  [ u8 ===X] SWS_OP_CONVERT      : u8 -> f32
+  [f32 ...X] SWS_OP_LINEAR       : matrix3+off3 [...]
+  [f32 ...X] SWS_OP_DITHER       : 16x16 matrix + {0 3 2 -1}
+  [f32 +++X] SWS_OP_CONVERT      : f32 -> u8
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
+ translated micro-ops:
+    u8_read_planar_xyz
+    u8_to_f32_xyz
     f32_linear_xyz_xxx0x_xxx0x_xxx0x
     f32_dither_xyz_0_3_2_16x16
-    f32_to_u8_xyzw
-    u8_write_planar_xyzw
+    f32_to_u8_xyz
+    u8_write_planar_xyz

(Op lists are abridged slightly for brevity)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
0901ca4108 swscale/ops_dispatch: add option to split const/copied subpasses
This already helps performance as-is, but will help performance massively
once we add the ability for the memcpy backend to do a refcopy instead of
an actual copy.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
65197f67ff swscale/ops_dispatch: add option to link subpass outputs together
Not needed currently but will be used for parallel splits.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
291e849ee3 swscale/ops_optimizer: add ff_sws_op_list_split_planes()
Can be used to extract a reduced subset of operations affecting only certain
output planes, e.g. splitting an op list into a "memcpy" and a "non-memcpy"
part, or splitting apart op lists for independent or subsampled planes.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
ed12cf7515 swscale/uops: simplify uop mask printing slightly
We can re-use the helper we just added.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
4f6d5c1794 swscale/uops: add a helper to print a comp mask as a string
For debugging/logging purposes exclusively.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
b494a82321 swscale/ops: remove now-unneeded function
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
2eb254c5fd swscale/ops_dispatch: avoid possible infinite recursion
If the filter cannot actually be optimized into the read (for whatever
reason), this code would previously loop infinitely. Bail out cleanly
instead.

The FFSWAP is there to make the error message print the remainder (the one
containing unsplittable ops), rather than the noop list.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
848753352d swscale/ops_dispatch: substantially refactor subpass compilation
Instead of a loop with fixed structure, this function now recursively calls
itself as many times as needed to satisfy all criteria.

This is absolutely needed for the upcoming refactor which will allow for
also splitting apart ops lists as needed to e.g. handle partially subsampled
ops lists, which may need a complex sequence of filtering and merge steps
to be fully satisfied.

This does modify the way in which subpasses are compiled slightly, in that
each new subpass first tried again un-split, rather than a single split
resulting in all subsequent passes being split as well. This is mostly a
benign change, though it might matter one day.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
2509eb3e8c swscale/ops_optimizer: extract subpass splitting logic to helper
I will also delete the old name in an upcoming commit.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
409b870bd6 swscale/tests/sws_ops: only print actually compiled ops lists
We already have the unoptimized reference ops; printing each intermediate
stage here is just noise that makes this file harder to scroll through IMO.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
17154619e5 swscale/ops_dispatch: group compilation args into struct
This will make it easier to keep passing around these parameters in helper
functions in the upcoming refactor.

Take the opportunity to also rename the plain `compile` function to
`compile_single`.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
27ff50f6ab swscale/ops_dispatch: add SWS_OP_FLAG_DRY_RUN
Avoids us having to write awkward code like `output ? &pass : NULL`.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
e3d15d4606 swscale/ops_dispatch: move compile flags from ops.h
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
07e6ee54e7 swscale/ops_dispatch: move no-op check after optimization pass
Otherwise, this will false negative if the redundant operations haven't
been optimized away yet, resulting in unnecessary memcpy operations.

Fixes: a534156083
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
d86f0ae534 swscale/ops_dispatch: don't assume first operation is a read
Makes ff_sws_compile_pass() more robust; will be needed for plane splitting.
Besides, it's perfectly valid to have an operation list that starts with
e.g. SWS_OP_CLEAR.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
c1eee3d4d8 swscale/graph: add a function to allow reusing output buffers
Used for plane splitting, among other things. (e.g. plane passthrough)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
aecca98488 swscale/ops: fix noop check ignoring read/write filters
Fixes a few cases where we previously didn't actually scale:

 gbrpf32le 16x16 -> gbrpf32le 16x32:
-  (no-op)
+  [f32 ...X] SWS_OP_READ         : 3 elem(s) planar >> 0 + 2 tap bilinear filter (V)
+    min: {nan nan nan _}, max: {nan nan nan _}
+  [f32 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
+    ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
+ translated micro-ops:
+    f32_read_planar_fv_xyz_f32
+    u32_write_planar_xyz

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
cd2109b3a6 swscale/ops: fix stale comment
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Niklas Haas
b992479539 swscale/ops: keep track of copied/cleared components
These represent components which have not (yet) been modified from their
input values (i.e. after a read, or clear). Such components can be
basically passed through via a refcopy (where applicable), as well as helping
to distinguish dissimilar types of plane for (plane splitting).

Generates benign diffs like:

 gray 16x16 -> yuv444p 16x16:
-  [ u8 +XXX] SWS_OP_READ         : 1 elem(s) planar >> 0
+  [ u8 =XXX] SWS_OP_READ         : 1 elem(s) planar >> 0
     min: {0 _ _ _}, max: {255 _ _ _}
-  [ u8 +XXX] SWS_OP_CONVERT      : u8 -> f32
+  [ u8 =XXX] SWS_OP_CONVERT      : u8 -> f32
     min: {0 _ _ _}, max: {255 _ _ _}
   [f32 .XXX] SWS_OP_LINEAR       : luma [[73/85 0 0 0 16] [0 1 0 0 0] [0 0 1 0 0] [0 0 0 1 0]]
     min: {16 _ _ _}, max: {235 _ _ _}
   [f32 .XXX] SWS_OP_DITHER       : 16x16 matrix + {0 -1 -1 -1}
     min: {16.001953 _ _ _}, max: {235.998047 _ _ _}
   [f32 +XXX] SWS_OP_CONVERT      : f32 -> u8
     min: {16 _ _ _}, max: {235 _ _ _}
-  [ u8 +++X] SWS_OP_CLEAR        : {_ 128 128 _}
+  [ u8 +$$X] SWS_OP_CLEAR        : {_ 128 128 _}
     min: {16 128 128 _}, max: {235 128 128 _}
   [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) planar >> 0
-    (X = unused, z = byteswapped, + = exact, 0 = zero)
+    ('X' unused, 'z' byteswapped, '=' copied, '$' const, '+' integer, '0' zero)
  translated micro-ops:
     u8_read_planar_x
     u8_to_f32_x
     f32_linear_x_x000x
     f32_dither_x_0_16x16
     f32_to_u8_x
     u8_clear_yz_xx
     u8_write_planar_xyz

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-23 11:48:13 +00:00
Ramiro Polla
d09330e578 swscale/aarch64/ops: mark more operations as type-invariant
This prevents the generation of a few more duplicate functions (where
there would be both f32 and u32 functions).

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-06-22 13:56:31 +02:00
Ramiro Polla
a016f34d17 swscale/aarch64/ops: remove redundant linear combinations
There is no easy optimization that can be triggered by knowing that the
offset is exactly 1. This led to identical functions being instantiated
for different params.

Also simplified the AVRational comparisons a bit.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-06-22 13:56:31 +02:00
Ramiro Polla
7fc637fc0c swscale/aarch64/ops: fix mask for swizzle ops
The mask for swizzle ops assumed that merely having a component assigned
to itself was enough to detect whether the swizzle was needed for that
component, but that wasn't correct. We should also take into account
whether the component is needed for the next operation or not.

Additionally, prevent duplicate functions from being generated by
clearing the swizzle index for unused components.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-06-22 13:56:31 +02:00
Ramiro Polla
083089e047 swscale/aarch64/ops: remove redundant single-component packed read/write
These functions are essentially the same as single-component planar
read/write, and are actually never instantiated. This was left over
from the initial implementation.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-06-22 13:56:31 +02:00
Niklas Haas
9cbd889670 swscale/x86/ops: add AVX2/SSE4 path for SWS_UOP_READ_PALETTE
The AVX2 is a fairly straightforward vpgatherdd + 4x4 transpose. The SSE4
fallback is an unrolled scalar loop, for lack of anything better to do.

checkasm:
 - CPU: AMD Ryzen 9 9950X3D 16-Core Processor (00B40F40)
 - Timing source: x86 (rdtsc)
 - Bench duration: 10000 µs per function (45898205 cycles)
 - Random seed: 2518020648

Benchmark results:
  name                             cycles (vs ref)
  u8_read_palette_xyzw_c:          2877.5
  u8_read_palette_xyzw_x86_sse4:   1951.9 ( 1.47x)
  u8_read_palette_xyzw_x86_avx2:   1051.6 ( 2.74x)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
d4b9b94ccb swscale/format: add support for AV_PIX_FMT_PAL8
This is handled using the new SWS_RW_PALETTE read op mode. We need to be a bit
careful to use the correct pixfmt descriptor downstream, because the descriptor
for PAL8 itself merely describes the *index*, rather than the actual data
values.

Accomplish this by introducing a new function to map the palette format to the
resulting pixel format after applying the palette (explicitly documented as
AV_PIX_FMT_RGB32).

+pal8 16x16 -> rgb24 16x16:
+  [ u8 +++X] SWS_OP_READ         : 4 elem(s) palette >> 0
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 +++X] SWS_OP_SWIZZLE      : 2103
+    min: {0 0 0 _}, max: {255 255 255 _}
+  [ u8 XXXX] SWS_OP_WRITE        : 3 elem(s) packed >> 0
+    (X = unused, z = byteswapped, + = exact, 0 = zero)
+ translated micro-ops:
+    u8_read_palette_xyzw
+    u8_permute_xz_zx
+    u8_write_packed_xyz
...

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
ffd6855a50 swscale/vulkan/ops: properly error out for unsupported read modes
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
6217350269 swscale/uops_backend: add SWS_UOP_READ_PALETTE reference implementation
This does not actually generate any code yet as the macro is still empty,
but that will change once I add support for generated palette reads to
the format handling code. This logic merely needs to be in place first
to avoid introducing broken intermediate states where palette uops are
generated but not implemented by the reference backend.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
5163b05552 swscale/uops: add SWS_UOP_READ_PALETTE
This commit only adds the uop itself; it does not yet add any implementations.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
efbfd75434 swscale/ops_dispatch: add support for dispatching palette reads
This requires some tiny bit of extra setup work from the dispatch layer.
Specifically, we need to arrange for the palette data pointer to end up in
exec.in[1], and to disable the pointer advancement logic for this plane (this
can be accomplished by just setting the stride and bump to 0).

We also want to disable the tail buffer / overflow pixel copying logic for
the palette, which can be accomplished by ensuring that p->planes_in only
contains the number of *data* planes, excluding the fixed palette.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
d26994d35f swscale/format: exclude palette formats for output
In theory, we could learn to handle them internally, using the same
systematic palette trick, but I'll defer this for now, as vf_scale already
handles this internally.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
168dfcb557 swscale/ops: add SWS_RW_PALETTE for PAL8 read type
I decided to model this as a separate read/write type, rather than as a
separate operation (e.g. SWS_OP_PALETTE), because it makes the semantics
surrounding the read value range, plane pointer setup, etc. much cleaner.

(This will become evident in the upcoming changes to the dispatch layer)

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 14:08:49 +00:00
Niklas Haas
b3689e792f swscale/uops: simplify permute naming scheme
We also drop the useless/unused mask from the permute ops.

Avoids a bunch of otherwise duplicate permute ops. Now that this is
handled by SWS_UOP_MOVE for x86, there is no downside to this.

The FATE change is a pure rename of the uops dumps.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:10:22 +02:00
Niklas Haas
8cc6b2ddaf swscale/tests/swscale: fix unscaled subsampled chroma format check
This should be matching against the *chroma* scaler, not the main scaler.
Of course, under normal circumstances, scaler_sub matches scaler, but this
allows users to explicitly override this defaulting by setting e.g.

-scaler none -scaler_sub bicubic

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:08:33 +02:00
Niklas Haas
fa1ca69a8b swscale/filters: add ability to set a virtual output size
Odd-size luma planes are not exact multiples of the chroma plane; but the
sample grid is still matched as though it were. We need to account for this
when translating a luma sample to the corresponding chroma sample coordinates.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:08:21 +02:00
Niklas Haas
8215e9bbea swscale/filters: add option for adding an input pixel offset
This is needed for chroma subsampling, which requires a different filter
offset for chroma subsamples (according to the frame's chroma location).

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:08:07 +02:00
Niklas Haas
1f6dc79c80 swscale/format: factor out ff_sws_chroma_pos() helper
Moved here from graph.c, as it's needed for the new chroma scaling code.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:07:35 +02:00
Niklas Haas
4653e68aab swscale/graph: nuke SwsGraph.field
No longer needed after the previous commit.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:07:20 +02:00
Niklas Haas
aedede0cee swscale/format: add SwsFormat.field
This metadata is needed to compute the correct chroma sampling offsets.
We previously stored this in graph->field, but that's a bad place for it,
because it doesn't survive the translation to the ops abstraction layer.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:06:21 +02:00
Niklas Haas
7e7c1c0d94 swscale/format: nuke ff_props_equal()
And merge it with the more clear ff_fmt_equal().

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:05:36 +02:00
Niklas Haas
3a2c5050c6 swscale: fix format equality check
I can't say I remember why this logic was written this way, but I can't
think of any good reason why we should exclude comparing the image
dimensions here - the intent is obviously to allow passthrough / noop.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:05:08 +02:00
Niklas Haas
ba1c1d9eee swscale/graph: separate pass dispatch size from buffer size
This allows adding passes which will be dispatched over a reduced number of
lines, without affecting the allocated buffer dimensions - e.g. for passes
which purely write to subsampled chroma planes.

A few hard-coded references to pass->width/height need to be replaced by
the corresponding output frame references, but it's not a huge deal.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:04:27 +02:00
Niklas Haas
cb8a006f8a swscale/graph: don't over-allocate pass buffer lines
This is not only wasteful but also serves no real purpose. Looping over
the correct number of lines is trivial; there is far less point in vertical
padding than horizontal padding.

Furthermore, this might actually introduce issues when linking output buffers;
since the extra padding depends on the pass's alignment and threading
requirements, which may differ from pass to pass.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:04:01 +02:00
Niklas Haas
d474b408f2 swscale/ops_optimizer: simplify unused op check (cosmetic)
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:03:38 +02:00
Niklas Haas
b120505ce2 swscale/ops: apply ff_sws_comp_mask_swizzle() in-place
More convenient at every use site.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 03:02:41 +02:00
Niklas Haas
faac9fa705 swscale/ops_optimizer: set correct range metadata after split pass
Replaces a few "nan" value ranges by real values, and drops a bunch of
redundant non-FMA variants that resulted from this bug.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 02:52:02 +02:00
Niklas Haas
e52459195c swscale/ops: simplify SWS_OP_READ default comps handling
We can still pre-fill the prev array here; ff_sws_apply_op_q() is a no-op.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-06-20 02:50:26 +02:00