ndufresne, so after thinking about it for a long time, it feels like the DPB concept is really decoder-specific and should be dropped from our interface to the benefit of passing the short and long term lists only (along with the buffer info relevant to v4l2) and having in-kernel helpers for building "common" dpbs like dpbs where each reference needs to keep the same entry mripard: paulk-leonov: what happened to the pixel format helpers in the end? I can see some pushback from DRM folks few weeks ago yeah IIRC danvet wasn't soo enthusiastic about it maybe after all we should go back to just have V4L2 helpers? as ezequielg posted some time ago I think it's silly issues to resolve and really think it's worth bringing the discussion to an actual conclusion well, those often take most of the time and energy ;/ okay, let me read through the whole thread again and see if I can post some motivating comments thanks :) thanks for working on this tfiga: Daniel and Mauro rejected it, so I guess that won't really go anywhere mripard: you mean, the V4L2-only one? the common one uhh hverkuil: I'm testing https://patchwork.linuxtv.org/patch/55947/ so I guess we should go fore the v4l2-only one ezequielg posted some time ago then I'm now even more than before for just having V4L2 helpers yeah hverkuil: and have one question => should we set V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF on the last slice of a frame? hverkuil: hm, nevermind, I think I understand how that's supposed to work bbrezillon: it's up to the application. It is OK to always set it, but if it is the last slice of a frame then the capture buffer won't be released until the first slice of the next frame is queued. So if the application knows that it is the last slice of a frame, then it can leave out this flag so the buffer is released immediately. was having a hard time understanding why there was this check done at the beginning of device_run() I guess that's here to handle the case where the next frame decoding starts before the previous frame has ended and yes, the flag should not be set on the last slice of the last frame but I guess clearing on all last slices is also good *clearing it I was my understanding that applications do not always know if a slice is the last slice, at least not until they start parsing the next slice in the bitstream. hverkuil: actually, there's one problematic case, say the last slice a frame N had the flag set, and first slice of frame N+1 doesn't since you're testing the flag on the current slice, the capture buf won't be released in practice that shouldn't happen because, either all last slices are not flagged with "hold buf" or they all are hverkuil: I'm testing with FFmpeg, and I indeed don't have this info out of the box (the is_last_slice info) but if I rely on the auto-dequeue stuff, how does that work for the last slice of the last frame? will ->device_run() be called one last time? good question. Do you stop the decoder in that case? On decoder stop it should release any held capture buffer, but that's not implemented in this patch. hverkuil: hm, I think we want to dequeue the cap buffer before stopping the decoder Sorry, this is a stateless decoder, not stateful. Disregard my comment. note that I can easily find when this is the last slice in ffmpeg, so that shouldn't be a problem for my use case I don't think this can be solved in a general way without using a decoder stop command to release the last capture buffer. Or queue an empty output buffer, that would also work. But that feels like a hack. yes, I considered that too, but don't like the idea hverkuil: BTW, why do we need to check for V4L2_BUF_FLAG_M2M_HOLD_CAPTURE_BUF in v4l2_m2m_release_capture_buf()? mripard: I don't agree. a common 4CC library is useful, I think we should continue working on it isn't it enough to check the timestamps? and Daniel didn't reject it. he was concerned that it would slow down DRM development, but we can address his concerns bbrezillon: I was wondering the same thing. I need to take another look at it later today. hverkuil: ok, cool paulk-leonov, hmm, removing dpb would be annoying, since this is how bbrezillon managed to support the rockchip without p/b0/b1 list, basically we recalculate the list from the reference/dpb information paulk-leonov, and that reference list is also passed to the HW, why do you say that list is HW specific ? bbrezillon, ok, so ffmpeg is synchronous (e.g. is unable to make use of queues) ? because through vaapi, you can queue more slices even if the previous frame is not done decoding it just happen that you'll get NO_SURFACE when the pool has exhausted And then you can leave it to the driver to detect a new slice being queued (through timestamp change) hverkuil, fyi ^ bbrezillon, about the timestamp vs the flag, the idea is that the timestamp should be the default, but you let go the frame only when the timestamp changes, which is when you receive the start of the following frame bbrezillon, the flag on the other end, is for the case the userspace no which one is last, in which case you know you have completed decoding asap, and can let go the frame immediatly The difference is a frame a latency, since in the worst case, there will be 1 frame duration between the last bit of a frame and the first of the following bbrezillon, if ffmpeg is unable to make use of the queues, then it should never set the flag, and keeping doing the hack it's going now so basically, for each slice, it will not set the flag, so the capture will be release, and then it should requeue that capture before handling the following slice of the same frame if you want to make use of the queuing at least for multiple slices, you could, on new slices, queue the pending slice, and then on end_frame, queue the final slice removing that flag which would be less hacky and would avoid the dq/q dance ndufresne: actually we had a discussion with paulk-leonov this morning, and I thing it's mainly a naming issue what is passed as a 'DPB' is actually just a list of long/short refs extracted from the bitstream ndufresne, yes, there's about no functional difference but the spec also describes the DPB as the list internally maintained by the decoder block (which can be re-ordered/shuffled for various reasons) pinchartl, thanks for keeping hope alive for common formats, I strongly believe in it too :) I think daniels' concerns can be resolved indeed ndufresne: (switching to the hold flag topic) oh, so it's an || not an && we want in the release_cap_buf() func ndufresne: regarding the ffmpeg, I'm currently doing exactly what you suggest unset the flag from frame_end() and always keep one slice in the ffmpeg queue before flushing it (so that we always have one slice when reaching ->end_frame()) ndufresne: what about the fact that we're testing the flag on the new cap buf instead of the previous one? bbrezillon, paulk-leonov: I'm all in favour calling this references, or reference_list, that would make our API close to VAAPI/VDPAU too great bbrezillon, ok, I think I don't understand this bit " the fact that we're testing the flag on the new cap buf instead of the previous one" say you have 2 frames N and N+1 and each have 2 slices yes, so basically 4 output buffer, 2 capture buffer right ? a decoder that knows the boundary in advance would do [ts1, FLAG] [ts1, NONE] [ts2, FLAG]Â [ts2, NONE] bbrezillon: I'm fixing that, I'm close to posting a v2 of that patch. you queue FnS0 with the hold flag set, but the test you do it against FnS1 while if you don't know, you simply do [ts1, FLAG] [ts1, FLAG] [ts2, FLAG] [ts2, FLAG] and it's the new ts2 that will make the buffer being released I think that's how I understood hverkuil proposal my proposal was the other way around, the flag was to ask the buffer to be released, and then the rest was done on timestamp change an capture buffer, which in turn will become a reference cannot have two timestamps, the timestamp must be a unique identifier for the reference bbrezillon, ah, that one just looks like a bug ;-D ndufresne: I can't do that: the normal default behavior is to always release the capture buffer. I can't suddenly hold on to buffers for specific drivers, that's asking for problems. I'm just saying that you do the test on the newly queued cap buf instead of the previous one (the one that actually asked to hold/release the cap buf) but apparently hverkuil is fixing that hverkuil, yes, I didn't consider that the framework would do that for us so there's nothing to see here :) I've mixed cap and out bufs in my explanation, but you get this idea :) hehe, we all mix those ezequielg and I wanted to use def to change these names in the code, was not very popular well m2m talks about src/dst already, which is a step forward :) jernej, Have you looked at the rcar_fdp1 driver? That's a de-interlacer. ndufresne: paulk-leonov: posted v2. Using the decoder stop command is just a proposal. thanks hverkuil Although it's not unreasonable, I think. But it definitely needs review. it would be great to have this ready by the time the rockchip driver is ready for merge (which will also depend on my ability to test/update cedrus for this) paulk-leonov: you're welcome. let's make it happen and motivate Maxime :-) Adding decoder stop support is independent of this new flag and can be done later, so that's not blocking. sidenote, I've started this thread about the issue I see with firmware-based decoders that don't do the decoding in the firmware: https://lkml.org/lkml/2019/5/12/81 the situation is the same for the video decoding blocks in nvidia GPUs too currently nouveau uses blobs too, but it could use the v4l2 interface with a free firmware instead paulk-leonov, but nouveau's blob is for VDPAU right ? have you given a look at the new NVDEC interface, they seem to have split the parsing and the decoding I'm with ndufresne in that you can't enforce this. Recommend, yes, but if you try to enforce it, then that will prevent supporting a lot of codecs. ffmpeg uses it's own parsers notably, so NVDEC is implemented as an accelerator there ndufresne, I think the firmware blob itself is pretty much agnostic, but yeah they have software for it through vdpau but vdpau also provides parsed bitstream too so it seems to be "as split as could be" hverkuil, my personal opinion is that it's worth rejecting a few codecs to make sure the ones we support don't depend on blobs but it's a debate either way, reverse will happen and leverage/high standards is not the only way to resolve the situation Step one is to have every codec use the same API. That's hard enough. And if we reject drivers for such codecs, then SoC vendors will just keep using their own crappy APIs. Basically, pick your battles. well, IMO it's still nearly as crappy if there's a firmware API, it's just integrated better it's all about hiding crap so that's why I prefer no crap at all paulk-leonov, there is a staging driver for tegra 2 or 3 (not sure), and indeed it felt like an in between, might be interesting to give a look since in this case, a firmware is not technically legit, but just a way to hide out how the hardware works ndufresne, yes there's a chance it's the same cores as the ones used in nvidia GPUs paulk-leonov, I think hverkuil's point is that we need to have a strong response in term of kernel API because reversing some of this starts convincing vendor that no-blob for this is better and it's a pretty sensitive thing for encoders, where everyone believe they got the best bitrate adapter in the world and needs to keep this private (I even seen things being obfuscated) my point is that we shouldn't be trying to convince them to do it right, but set a standard right, so I think cedrus/rockchip is good part of such a plan history showed that the industry tends to take whatever path is the easiest, so we have to make it so that the right path is the only viable easy path well, of course it could make them by more Hantro, they are big winner now ;-D haha yeah looks that way either way I doubt there is such a huge diversity of decoders so once we have the hantro exposed, we have n platforms that can just use that even if there was previously a firmware needed if it makes the design companies be more open in term of register interface, that would be great too I have very little knowledge around decoding IP companies, I only know 3, Chips&Media (coda), Allegro (running on ZynqMP) and Hantro (rk3288 and imx8m) someone onces criticized the linux-media community as being too far from vendors, I guess it works both way, the real vendors hides behind SoC vendors, so it's pretty hard to get to know them indeed kbingham: no, not yet, but thanks for pointing it out I check it out now jernej, I'm sure I supported various input buffer types with that. It was quite some time ago now though. my main concern is to support multiple buffers for fields approach taken with m2m-deinterlace is not ideal for userspace apps Yes, I think I split the input queue into "fdp1_field_buffer" structures, and then process field by field. So that can be input as either 1 field per buffer, or 2 fields per buffer and the code handles it accordingly. Then fdp1_field_complete() handles returning the input buffers back when the last field is processed. So if there is just a single field per buffer, it all still works. see fdp1_buf_prepare_field() for the field management kbingham: how did you test your driver? with v4l2-ctl or something else paulk-leonov: are you sure ffmpeg passed the diff of the ref list? to me, it looks like it just fills the DPB with the short/long term ref list https://github.com/bbrezillon/FFmpeg/blob/rk-vpu-h264/libavcodec/v4l2_request_h264.c#L73 and that's really what I need to build the P0/B0/B1 lists that are expected by the hantro IP bbrezillon, ah yes my bad, I had misunderstood we must pass what ffmpeg has in its short_ref and long_ref and maybe call them accordingly Kwiboo, jernej: I know we picked-up the "v4l2-request" name for our libva, but I think we should discuss renaming the thing :p like v4l2_m2m_request you mean in ffmpeg too? ndufresne, ^^^^^ jernej, I mean especially in ffmpeg :) it's a v4l2 m2m implementation too the code is really nice how's hevc doing with it? I think I still missed custom scaling listsw and interlaced support well, hevc works to some degree I added scaling lists IIRC nice https://github.com/LibreELEC/LibreELEC.tv/blob/master/projects/Allwinner/patches/linux/0008-HEVC-improvements.patch changes in that patch would make much more sense if they would be splitted and have proper commit message paulk-leonov: I planned to do review of your series for a long time, but always forgot thanks jernej, hehe no problem if you don't get one in one week please ping me jernej, mostly we're aiming for quick inclusion at this poitn point* we can always rework the controls later so if it is functional, it should be good to go I would like quick inclusion too since then I can start sending my improvements I changed buffer management slightly, so memory is allocated on demand definitely ah nice, it wasn't very good indeed this helps with 4K playback I got an H6 just today btw cheap android box the UI is terrible an yet manages to run kodi but we can do much better sure we can :) btw, kwiboo has patches which should enable HDR playback on H6 but were not yet tested ooh that's cool I think we have an HDR tv at the office only issue is that sun4i-drm driver doesn't support 10-bit colors yet, so 8-bit with HDR doesn't make much sense anyway, we are very close with Cedrus being usable I just can't figure remaining issues with HEVC and H264 codecs vast majority works, but not all well I definitely plan on doing a round of fixups once we have things in place :) and yeah 10-bit should be added to drm too well, on H6 Cedrus already decodes 10-bit videos and convert them to 8-bit (default setting, so nothing has to be done for this) however, all 10-bit videos have corrupted video in top left corners that's one of three annoying issues I was talking about I see paulk-leonov: rename of the ffmpeg hwaccel makes sense :-), btw there is also a possibility that there will be work on a new v4l2 m2m statful decoder in ffmpeg, the current one has some flaws and rather hard to understand code (due to custom buffer reference count instead of using AVBufferRef and more) the ffmpeg v4l2 request code is heavily inspired by the vaapi code in ffmpeg so there are no coincidence that controls, reference lists etc look/is configured similar paul374, totally agree for the renaming of v4l2-request, and the VA backend name should follow bbrezillon: you should probably rebase your ffmpeg branch when you have a chance :-), https://github.com/Kwiboo/FFmpeg/commits/v4l2-request-hwaccel should have some minor improvements, like auto selecting media/video device with help of libudev (--enable-libudev) Kwiboo: nice though my tests are pretty basic right now and specifying the devs to use is not so hard when you're just testing yep, that is why I did not care much about media/video dev node selection in the beginning, my H3 dev board only hade one media/video device, was only until I started testing on rk using correct dev nodes started to matter