jkqxz: most of these are also part of DXVA, Hantro HW (and same for Rockchip), parse some of the slice header itself
but you need to tell it how many bits to skip for what isn't being parsed, this information is well known by the parser + we are not allowed to parse in the kernel
jkqxz: the context is that paulk-leonov started with writing a VA driver for the V4L2 interface, which was for Cedrus/Allwinner, initially it kind of worked since Cedrus is slice base decoder, which mostly match the VA-API design
we later introduced Hantro and RK drivers, but this hardware is frame base, you pass the full frame, in fact you don't even have to give it the offset to each slices, it does that on the HW side
but it had this issue that it needs to know the size of few things to skip them
an option for the VA driver that pH5 could have considered is to parse the slice header again inside the VA driver, but then it really makes VA a miss-fit, forcing double or triple parsing
jkqxz: to be fair, I think Intel will completely ignore this pull request, as they ignore pretty much anything that isn't Intel, anyone not Intel is second citizen in the VA API space
Are you one of the folks working on Vulkan interface ? I really think this is the way forward for HW accelerator that are bound to a GPU command stream
I thought for a short moment that someone could write a V4L2 shm driver that calls into the GPU driver, but that would be so awkward, and it's only a solution if you don't use any shaders, we can't compile shaders inside the kernel (or at least it's complicated)
Everything in that pull request looks like a special which doesn't appear anywhere else to me.
idr_pic_id doesn't affect decoding at all, so there is no reason to give it to a decoder (it's only used for AU splitting, which is already done when we get here).  If hardware pointlessly wants it then giving it zero will be harmless.
The only use of the num_ref_idx_lN_default_active values is as the default value for the slice.  Given you already know the real value for the slice, you can just give the parser that and it will get the same answer.
(That's equivalent to there being a new PPS with the correct num_ref_idx_lN_active value in front of every slice.)
For the element size stuff, I wonder whether the easiest answer would be to serialise the slice parameters into something which looks kindof like a slice header and then give it to the hardware in place of the real one which it can't parse properly.
E.g. Mesa does that for JPEG, because the AMD hardware wants to parse some of the JPEG headers itself: <https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/va/picture_mjpeg.c#n118>.
Intel can be pushed into accepting things in libva if you make it easy for them.  Adding to the core decoder structures is going to be hard, because it pushes something onto every user.
jkqxz: see G1_REG_DEC_CTRL5_IDR_PIC_ID in the kernel, I don't think bbrezillon or ezequielg have tested leaving this to zero
it was in the HW interface, so we added it into the kernel
(it's harmless extra info)
and then G1_REG_DEC_CTRL6_REFIDX0_ACTIVE/G1_REG_DEC_CTRL6_REFIDX1_ACTIVE for the next one, but yeah, maybe that could be given the real value, cc andrzej_p
jkqxz: and for the last one, this is correct, I told pH5 that we didn't need that, we just need a golum encoder for the few fields we want to know the size
but on Linux side, we believe it make more sense to just pass that information through, avoiding the glue code when possible
jkqxz: btw, feel free to give a look and comment on the uAPI, I don't think it's that useful for AMD, as your decoder is bound to the GPU command stream, but it seems like you could provide good feedback at least
jkqxz: be aware that my intention is to avoid using that VA driver, since the post-processing defined there is completely orthogonal to how it work in Chips&Media and Verisilion/Hantro designs
and it's really annoying to have to map v4l2 format to DRM to then map it to GStreamer
ndufresne:  Where do you mean for the API to look at?
In the Linux kernel, include/media/h264-ctrls.h, include/media/mpeg2-ctrls.h, include/media/vp8-ctrls.h
VP9/HEVC proposal exist too, this is all non-public yet, as we want further eyes
and you can comment over the linux-media mailing list
or here
Yeah, the non-codec stuff in VA is often not what you want.  Making the codec parts alone work would give you support for programs like mpv or vlc without any changes on their side, though.
there is also documentation on what's the rules around V4L2 M2M and stateless codecs
there is a native implement for the v4l2 stateless codec for ffmpeg around already
mpv and vlc uses ffmpeg, in that context
and anyway, there is so many flaws in VA, like it's single codec per PC, if you have multiple GPUs, it will only use one, no device enumeration
the new Intel HW have shader base and HW codec now, so they ended adding a new profile (the LP/lowpower profiles)
that's a very horrible hack, they can't fix anything properly
Uh, you are baking this weird structure size thing into the kernel API?  That does not sound fun.
that's fine, if the HW needs it, the kernel should not be in the way for this
Multiple GPUs work trivially by being different DRI devices at the top level.  Multiple interfaces inside a single device is not so good, yeah.
yeah, maybe I need to look further, but it's not super neat the GPU selection stuff
Doesn't that set a bad precedent, though?  If every new device which comes along gets its own special fields which the user needs to fill to work around its quirks then you've lost a load of the benefit of a stable API.
well, if we had this in place before Hantro was there, maybe, but Hantro is too widely used now, we can't affort not supporting it's subtilities
there is only 3 brands of non-GPU HW the we are aware of that don't do everything in a firmware, well there's possible a fourth coming
On the stable API front, we hope most HW will fit what is in place, the big challenge was slice vs frame based (to me frame base is odd, but it's the most common)
jkqxz: just to make sure, it's only two sizes that you think are bad right ? dec_ref_pic_marking_bit_size and pic_order_cnt_bit_size
pic_order_cnt_bit_size is a simple function of other fields, so while redundant it's easy to calculate.
dec_ref_pic_marking_bit_size is rather nasty.
from a bitstream parser perspective, they are both trivial, you know that information while parsing, but from VA API, indeed
Yeah.  And none of the other APIs have it either (DXVA or VDPAU).
There is no point in including num_slice_groups_minus1 and slice_group_change_cycle.  Noone is ever going to support FMO in hardware, and you would need a load of additional fields to do it anyway.
jkqxz: those are in the spec, we don't care we just put them in
jkqxz: there was someone concern about ability to do reconstruction, and we simply agreed to put everything define by MPEG LA, regardless if there is FMO or ASO HW available
Then you're missing all of the other FMO fields?  It seemed strange to only include some of them.
(Still, really doesn't matter.)
do we ?
Making first_mb_in_slice only a uint16_t will overflow on 8K video.
ok, can you send an email about this please ?
(Or more generally on levels > 5.2.)
andrzej_p: ^
jkqxz: lol, it's 16bit in VA-API ;-D
I don't think any Intels can do 8K H.264.  (Only H.265 and maybe VP9.)
VA-API is supposed to be a common abstraction, hence my point of trying not to use it ;-P
but 8K H264 is likely pretty rare
Yeah.  I doubt that would actually cause any problems.
we can still change it, I bet they looked at Hantro  and Cedrus registers
plus VA-API
It's bad in DXVA too.
It's not clear what pic_num on DPB entries is.  (PicNum or LongTermPicNum depending on which is valid, maybe?  If so, uint16_t is not sufficient.)
that I'd need bbrezillon to look at ^
DPB entries are used by hantro driver to build hantro specific set of ref list
those list were in the uAPI initially, but we managed to move them away
but the DPB entries becomes a little bit more important in that context