ndufresne, so after thinking about it for a long time, it feels like the DPB concept is really decoder-specific and should be dropped from our interface
to the benefit of passing the short and long term lists only (along with the buffer info relevant to v4l2)
and having in-kernel helpers for building "common" dpbs
like dpbs where each reference needs to keep the same entry
mripard: paulk-leonov: what happened to the pixel format helpers in the end?
I can see some pushback from DRM folks few weeks ago
yeah IIRC danvet wasn't soo enthusiastic about it
maybe after all we should go back to just have V4L2 helpers?
as ezequielg posted some time ago
I think it's silly issues to resolve
and really think it's worth bringing the discussion to an actual conclusion
well, those often take most of the time and energy ;/
okay, let me read through the whole thread again and see if I can post some motivating comments
thanks :)
thanks for working on this
tfiga: Daniel and Mauro rejected it, so I guess that won't really go anywhere
mripard: you mean, the V4L2-only one?
the common one
uhh
hverkuil: I'm testing https://patchwork.linuxtv.org/patch/55947/
so I guess we should go fore the v4l2-only one ezequielg posted some time ago
then I'm now even more than before for just having V4L2 helpers
yeah
hverkuil: and have one question => should we set V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF on the last slice of a frame?
hverkuil: hm, nevermind, I think I understand how that's supposed to work
bbrezillon: it's up to the application. It is OK to always set it, but if it is the last slice of a frame then the capture buffer won't be released until the first slice of the next frame is queued. So if the application knows that it is the last slice of a frame, then it can leave out this flag so the buffer is released immediately.
was having a hard time understanding why there was this check done at the beginning of device_run()
I guess that's here to handle the case where the next frame decoding starts before the previous frame has ended
and yes, the flag should not be set on the last slice of the last frame
but I guess clearing on all last slices is also good
*clearing it
I was my understanding that applications do not always know if a slice is the last slice, at least not until they start parsing the next slice in the bitstream.
hverkuil: actually, there's one problematic case, say the last slice a frame N had the flag set, and first slice of frame N+1 doesn't
since you're testing the flag on the current slice, the capture buf won't be released
in practice that shouldn't happen
because, either all last slices are not flagged with "hold buf" or they all are
hverkuil: I'm testing with FFmpeg, and I indeed don't have this info out of the box
(the is_last_slice info)
but if I rely on the auto-dequeue stuff, how does that work for the last slice of the last frame?
will ->device_run() be called one last time?
good question.
Do you stop the decoder in that case? On decoder stop it should release any held capture buffer, but that's not implemented in this patch.
hverkuil: hm, I think we want to dequeue the cap buffer before stopping the decoder
Sorry, this is a stateless decoder, not stateful. Disregard my comment.
note that I can easily find when this is the last slice in ffmpeg, so that shouldn't be a problem for my use case
I don't think this can be solved in a general way without using a decoder stop command to release the last capture buffer.
Or queue an empty output buffer, that would also work. But that feels like a hack.
yes, I considered that too, but don't like the idea
hverkuil: BTW, why do we need to check for V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF in v4l2_m2m_release_capture_buf()?
mripard: I don't agree. a common 4CC library is useful, I think we should continue working on it
isn't it enough to check the timestamps?
and Daniel didn't reject it. he was concerned that it would slow down DRM development, but we can address his concerns
bbrezillon: I was wondering the same thing. I need to take another look at it later today.
hverkuil: ok, cool
paulk-leonov, hmm, removing dpb would be annoying, since this is how bbrezillon managed to support the rockchip without p/b0/b1 list, basically we recalculate the list from the reference/dpb information
paulk-leonov, and that reference list is also passed to the HW, why do you say that list is HW specific ?
bbrezillon, ok, so ffmpeg is synchronous (e.g. is unable to make use of queues) ?
because through vaapi, you can queue more slices even if the previous frame is not done decoding
it just happen that you'll get NO_SURFACE when the pool has exhausted
And then you can leave it to the driver to detect a new slice being queued (through timestamp change)
hverkuil, fyi ^
bbrezillon, about the timestamp vs the flag, the idea is that the timestamp should be the default, but you let go the frame only when the timestamp changes, which is when you receive the start of the following frame
bbrezillon, the flag on the other end, is for the case the userspace no which one is last, in which case you know you have completed decoding asap, and can let go the frame immediatly
The difference is a frame a latency, since in the worst case, there will be 1 frame duration between the last bit of a frame and the first of the following
bbrezillon, if ffmpeg is unable to make use of the queues, then it should never set the flag, and keeping doing the hack it's going now
so basically, for each slice, it will not set the flag, so the capture will be release, and then it should requeue that capture before handling the following slice of the same frame
if you want to make use of the queuing at least  for multiple slices, you could, on new slices, queue the pending slice, and then on end_frame, queue the final slice removing that flag
which would be less hacky and would avoid the dq/q dance
ndufresne: actually we had a discussion with paulk-leonov this morning, and I thing it's mainly a naming issue
what is passed as a 'DPB' is actually just a list of long/short refs extracted from the bitstream
ndufresne, yes, there's about no functional difference
but the spec also describes the DPB as the list internally maintained by the decoder block (which can be re-ordered/shuffled for various reasons)
pinchartl, thanks for keeping hope alive for common formats, I strongly believe in it too :)
I think daniels' concerns can be resolved indeed
ndufresne: (switching to the hold flag topic) oh, so it's an || not an && we want in the release_cap_buf() func
ndufresne: regarding the ffmpeg, I'm currently doing exactly what you suggest
unset the flag from frame_end() and always keep one slice in the ffmpeg queue before flushing it (so that we always have one slice when reaching ->end_frame())
ndufresne: what about the fact that we're testing the flag on the new cap buf instead of the previous one?
bbrezillon, paulk-leonov: I'm all in favour calling this references, or reference_list, that would make our API close to VAAPI/VDPAU too
great
bbrezillon, ok, I think I don't understand this bit " the fact that we're testing the flag on the new cap buf instead of the previous one"
say you have 2 frames N and N+1 and each have 2 slices
yes, so basically 4 output buffer, 2 capture buffer right ?
a decoder that knows the boundary in advance would do [ts1, FLAG] [ts1, NONE] [ts2, FLAG] [ts2, NONE]
bbrezillon: I'm fixing that, I'm close to posting a v2 of that patch.
you queue FnS0 with the hold flag set, but the test you do it against FnS1
while if you don't know, you simply do [ts1, FLAG] [ts1, FLAG] [ts2, FLAG] [ts2, FLAG]
and it's the new ts2 that will make the buffer being released
I think that's how I understood hverkuil proposal
my proposal was the other way around, the flag was to ask the buffer to be released, and then the rest was done on timestamp change
an capture buffer, which in turn will become a reference cannot have two timestamps, the timestamp must be a unique identifier for the reference
bbrezillon, ah, that one just looks like a bug ;-D
ndufresne: I can't do that: the normal default behavior is to always release the capture buffer. I can't suddenly hold on to buffers for specific drivers, that's asking for problems.
I'm just saying that you do the test on the newly queued cap buf instead of the previous one (the one that actually asked to hold/release the cap buf)
but apparently hverkuil is fixing that
hverkuil, yes, I didn't consider that the framework would do that for us
so there's nothing to see here :)
I've mixed cap and out bufs in my explanation, but you get this idea :)
hehe, we all mix those
ezequielg and I wanted to use def to change these names in the code, was not very popular
well m2m talks about src/dst already, which is a step forward :)
jernej, Have you looked at the rcar_fdp1 driver? That's a de-interlacer.
ndufresne: paulk-leonov: posted v2. Using the decoder stop command is just a proposal.
thanks hverkuil
Although it's not unreasonable, I think. But it definitely needs review.
it would be great to have this ready by the time the rockchip driver is ready for merge
(which will also depend on my ability to test/update cedrus for this)
paulk-leonov: you're welcome. let's make it happen and motivate Maxime :-)
Adding decoder stop support is independent of this new flag and can be done later, so that's not blocking.
sidenote, I've started this thread about the issue I see with firmware-based decoders that don't do the decoding in the firmware: https://lkml.org/lkml/2019/5/12/81
the situation is the same for the video decoding blocks in nvidia GPUs too
currently nouveau uses blobs too, but it could use the v4l2 interface
with a free firmware instead
paulk-leonov, but nouveau's blob is for VDPAU right ?
have you given a look at the new NVDEC interface, they seem to have split the parsing and the decoding
I'm with ndufresne in that you can't enforce this. Recommend, yes, but if you try to enforce it, then that will prevent supporting a lot of codecs.
ffmpeg uses it's own parsers notably, so NVDEC is implemented as an accelerator there
ndufresne, I think the firmware blob itself is pretty much agnostic, but yeah they have software for it through vdpau
but vdpau also provides parsed bitstream too
so it seems to be "as split as could be"
hverkuil, my personal opinion is that it's worth rejecting a few codecs to make sure the ones we support don't depend on blobs
but it's a debate
either way, reverse will happen and leverage/high standards is not the only way to resolve the situation
Step one is to have every codec use the same API. That's hard enough. And if we reject drivers for such codecs, then SoC vendors will just keep using their own crappy APIs.
Basically, pick your battles.
well, IMO it's still nearly as crappy if there's a firmware API, it's just integrated better
it's all about hiding crap
so that's why I prefer no crap at all
paulk-leonov, there is a staging driver for tegra 2 or 3 (not sure), and indeed it felt like an in between, might be interesting to give a look
since in this case, a firmware is not technically legit, but just a way to hide out how the hardware works
ndufresne, yes there's a chance it's the same cores as the ones used in nvidia GPUs
paulk-leonov, I think hverkuil's point is that we need to have a strong response in term of kernel API because reversing some of this starts convincing vendor that no-blob for this is better
and it's a pretty sensitive thing for encoders, where everyone believe they got the best bitrate adapter in the world and needs to keep this private (I even seen things being obfuscated)
my point is that we shouldn't be trying to convince them to do it right, but set a standard
right, so I think cedrus/rockchip is good part of such a plan
history showed that the industry tends to take whatever path is the easiest, so we have to make it so that the right path is the only viable easy path
well, of course it could make them by more Hantro, they are big winner now ;-D
haha yeah looks that way
either way I doubt there is such a huge diversity of decoders
so once we have the hantro exposed, we have n platforms that can just use that even if there was previously a firmware needed
if it makes the design companies be more open in term of register interface, that would be great too
I have very little knowledge around decoding IP companies, I only know 3, Chips&Media (coda), Allegro (running on ZynqMP) and Hantro (rk3288 and imx8m)
someone onces criticized the linux-media community as being too far from vendors, I guess it works both way, the real vendors hides behind SoC vendors, so it's pretty hard to get to know them
indeed
kbingham: no, not yet, but thanks for pointing it out
I check it out now
jernej, I'm sure I supported various input buffer types with that. It was quite some time ago now though.
my main concern is to support multiple buffers for fields
approach taken with m2m-deinterlace is not ideal for userspace apps
Yes, I think I split the input queue into "fdp1_field_buffer" structures, and then process field by field. So that can be input as either 1 field per buffer, or 2 fields per buffer and the code handles it accordingly.
Then fdp1_field_complete() handles returning the input buffers back when the last field is processed.
So if there is just a single field per buffer, it all still works.
see fdp1_buf_prepare_field() for the field management
kbingham: how did you test your driver? with v4l2-ctl or something else
paulk-leonov: are you sure ffmpeg passed the diff of the ref list?
to me, it looks like it just fills the DPB with the short/long term ref list
https://github.com/bbrezillon/FFmpeg/blob/rk-vpu-h264/libavcodec/v4l2_request_h264.c#L73
and that's really what I need to build the P0/B0/B1 lists that are expected by the hantro IP
bbrezillon, ah yes my bad, I had misunderstood
we must pass what ffmpeg has in its short_ref and long_ref
and maybe call them accordingly
Kwiboo, jernej: I know we picked-up the "v4l2-request" name for our libva, but I think we should discuss renaming the thing :p
like v4l2_m2m_request
you mean in ffmpeg too?
ndufresne, ^^^^^
jernej, I mean especially in ffmpeg
:)
it's a v4l2 m2m implementation too
the code is really nice
how's hevc doing with it?
I think I still missed custom scaling listsw
and interlaced support
well, hevc works to some degree
I added scaling lists IIRC
nice
https://github.com/LibreELEC/LibreELEC.tv/blob/master/projects/Allwinner/patches/linux/0008-HEVC-improvements.patch
changes in that patch would make much more sense if they would be splitted and have proper commit message
paulk-leonov: I planned to do review of your series for a long time, but always forgot
thanks
jernej, hehe no problem
if you don't get one in one week please ping me
jernej, mostly we're aiming for quick inclusion at this poitn
point*
we can always rework the controls later
so if it is functional, it should be good to go
I would like quick inclusion too
since then I can start sending my improvements
I changed buffer management slightly, so memory is allocated on demand
definitely
ah nice, it wasn't very good indeed
this helps with 4K playback
I got an H6 just today btw
cheap android box
the UI is terrible an yet manages to run kodi
but we can do much better
sure we can :)
btw, kwiboo has patches which should enable HDR playback on H6
but were not yet tested
ooh that's cool
I think we have an HDR tv at the office
only issue is that sun4i-drm driver doesn't support 10-bit colors yet, so 8-bit with HDR doesn't make much sense
anyway, we are very close with Cedrus being usable
I just can't figure remaining issues with HEVC and H264 codecs
vast majority works, but not all
well I definitely plan on doing a round of fixups once we have things in place :)
and yeah 10-bit should be added to drm too
well, on H6 Cedrus already decodes 10-bit videos and convert them to 8-bit (default setting, so nothing has to be done for this)
however, all 10-bit videos have corrupted video in top left corners
that's one of three annoying issues I was talking about
I see
paulk-leonov: rename of the ffmpeg hwaccel makes sense :-), btw there is also a possibility that there will be work on a new v4l2 m2m statful decoder in ffmpeg, the current one has some flaws and rather hard to understand code (due to custom buffer reference count instead of using AVBufferRef and more)
the ffmpeg v4l2 request code is heavily inspired by the vaapi code in ffmpeg so there are no coincidence that controls, reference lists etc look/is configured similar
paul374, totally agree for the renaming of v4l2-request, and the VA backend name should follow
bbrezillon: you should probably rebase your ffmpeg branch when you have a chance :-), https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel should have some minor improvements, like auto selecting media/video device with help of libudev (--enable-libudev)
Kwiboo: nice
though my tests are pretty basic right now
and specifying the devs to use is not so hard when you're just testing
yep, that is why I did not care much about media/video dev node selection in the beginning, my H3 dev board only hade one media/video device, was only until I started testing on rk using correct dev nodes started to matter