[05:34] <gnurou_> ndufresne: paulk-leonov: great discussion. having potentially multiple OUTPUT buffers kind of breaks the buffer symetry we were expecting from m2m, but it is clear that we need to handle this use-case
[05:35] <gnurou_> and thanks to venus we know that we can use m2m in that non-symetric way, even though it is a stateful driver
[05:35] *** gnurou_ is now known as gnurou
[06:30] *** arnd has quit IRC (Ping timeout: 252 seconds)
[06:43] *** ndec has quit IRC (Ping timeout: 258 seconds)
[06:57] <tfiga> ndufresne: RK does the per-slice reordering internally in the hardware, so it needs the original DPB ref pic lists
[06:58] <tfiga> ndufresne: moreover, it parses the slice headers itself, including the slice type
[06:58] <tfiga> so it chooses itself from the 3 lists
[08:25] *** Whoopie has quit IRC (Ping timeout: 246 seconds)
[12:54] <paulk-leonov> tfiga, what difference do you make between "original DPB ref piclist" and not-original? I think that (except for cedrus/H264) the DPB we pass is what was in the bitstream
[12:55] <paulk-leonov> quick general question: does exporting MPLANE formats to dma-buf implies having 2 dma-buf fds?
[12:55] <bbrezillon> paulk-leonov: yep :)
[12:55] <paulk-leonov> thanks bbrezillon :)
[12:56] <bbrezillon> that's one of the things the EXT_BUF APIs are trying to address
[12:56] <bbrezillon> well, not exactly actually
[12:56] <paulk-leonov> I should take a look at it then
[12:56] <bbrezillon> I only tried to address the import case
[12:57] <bbrezillon> (being able to pass dmafd+offset so that you can pass the same fd for all planes + a different offset)
[12:58] <paulk-leonov> so basically matching DRM behavior as far as I know
[12:58] <bbrezillon> yes
[12:58] <paulk-leonov> that would be neat indeed
[12:59] <paulk-leonov> mhh we actually allow 1 handle per buffer plane on DRM
[13:00] <paulk-leonov> although DRM itself won't export that
[13:00] <bbrezillon> yes, but we have an offset
[13:00] <bbrezillon> you can pass the same handle for all planes
[13:00] <paulk-leonov> right
[13:00] <bbrezillon> what's missing in V4L is the offset
[13:00] <paulk-leonov> I see
[13:01] <paulk-leonov> I haven't looked much at dma-buf import on the v4l2 side already
[13:02] <paulk-leonov> so there are currently cases where v4l2 can't import a v4l2 buffer from another device, if it's MPLANE
[13:54] <ndufresne> tfiga, I was reading mpp library yesterday, and they have full reordering implemented there
[13:55] <ndufresne> But I'm still working on tracing the ref lists
[13:55] <ndufresne> tfiga, so my plan is to instrument the best RK implementation so far, the mpp, and from there I'll be able to compare against ffmpeg and chromium
[13:56] <ndufresne> from there, I'll try and make recommendation to try and lower the difference, otherwise we'll endup with shitty userspace for RK if it's too special
[13:59] <ndufresne> Question though, as we have the DPB, flipping order does not seems like a very difficult task
[13:59] <ndufresne> just a double for loop, on very small lists
[14:01] <ndufresne> paulk-leonov, all formats have been duplicated, so even if you use mplane buffer, you can use single DMABuf
[14:01] <ndufresne> the problem that bbrezillon need to address is if you need one allocation with "non-standard" offsets
[14:02] <ndufresne> and importing DMABuf with offsets in general, which is not really supported
[14:02] <ndufresne> (e.g. if you have 1 DMABuf, but need to import it as MPLANE for a specific HW, you'll need to be able to import that DMABuf n-times with offsets)
[14:09] <ndufresne> tfiga, hmm, sorry, they do have reordering, but in the encoder, sorry for that
[14:10] <paulk-leonov> ndufresne, pardon my ignorance, but how is reordering different from eviction and what role does it play?
[14:11] <ndufresne> paulk-leonov, it's another reorder ;-P
[14:11] <ndufresne> it's the processing described in 8.2.4.2 of the spec
[14:13] <ndufresne> I believe it ensure a more linear processing by the accelerator
[14:13] <ndufresne> but the H264 spec never puts any rational for any of the sections, so it's pure guessing
[14:14] <ndufresne> * rationale
[14:14] <bbrezillon> ndufresne: I'm almost sure the problem tfiga wanted me to address in the first place the mutli-plane buf with all planes pointing to the same dmabuf each plane at a different offset
[14:14] <paulk-leonov> indeed, I often fail to see the point of the things described in the spec
[14:15] <ndufresne> those exist, but they are private to the members, I'm not a member ;-D
[14:15] <ndufresne> they provide a spec, but they don't want to help anyone create a codec
[14:15] <bbrezillon> not sure he mentioned the non-standard offset thing
[14:16] <ndufresne> bbrezillon, which is a special case of what is already supported, since we already support having NV12 in one allocation exposed through the v4l2_buffer_mplane structure
[14:16] <paulk-leonov> good to know they did it on purpose at least...
[14:16] <paulk-leonov> tfiga, ndufresne: I think it would make more sense for our API to provide the ordered list
[14:16] <ndufresne> bbrezillon, but we discovered that using data_offset to import these is not ideal, and it requires passing multiple time the same dmabuf, which may cause overhead
[14:16] <paulk-leonov> we certainly don't want re-order done in the kernel, nor to provide two redundant lists
[14:17] <ndufresne> paulk-leonov, what I'm thinking, and that I'll try, is that you can redo the DPB order simply by iterating the DPB we have and looking up in the ref list to build a RK list
[14:17] <bbrezillon> ndufresne: ok, so you are (ab)using the data_offset field to encode that
[14:18] <paulk-leonov> ndufresne, totally, that seems manageable
[14:18] <ndufresne> not me personally, we had no idea, data_offset wasn't exactly specified
[14:18] <ndufresne> for capture, it's works, pretty awkward thing since it's included in the size, but there is no semantic issues there
[14:19] <ndufresne> but it's also not sufficient really
[14:20] <ndufresne> paulk-leonov, basically, if it's low cost, I'd try to stick with the userspace expectation, and if you look at vaapi, or the FFMPEG accelerator abstraction, ordered list is what userspace expects
[14:20] <paulk-leonov> sounds good
[14:20] <ndufresne> anyway, today I'm tracing those RK list, and will be able to trully confirm
[14:20] <bbrezillon> ndufresne: note that I still pass the DMAbuf several times (once per plane) in the new API
[14:20] <ndufresne> bbrezillon, hmm, the thing is that it's error prone
[14:21] <ndufresne> since you endup having to translate the offsets back and forth
[14:21] <bbrezillon> just followed what DRM is doing
[14:21] <ndufresne> in GStreamer we always keep the offsets relative to the image, assuming a specific component order
[14:22] <ndufresne> the side effect is that for DRM we need to translate this offset relative to the DMABuf if there is more then one
[14:22] <ndufresne> well, actually, it might be Gst making it complicated here
[14:22] <bbrezillon> it's also more flexible this way
[14:22] <ndufresne> because of the multi-segment flat representation of memory
[14:22] <bbrezillon> though we probably don't care about this flexibility
[14:23] <ndufresne> in gst, the point is that if the buffer is non-writable (two data owners), you'll copy in a single buffer, and the offset remains unchanged
[14:23] <ndufresne> driver (except for USB drivers maybe) should in general never have to copy
[14:24] <bbrezillon> not sure I follow you
[14:24] <bbrezillon> the offset I pass to the new API is relative to the DMA buf
[14:25] <ndufresne> in userspace, we often multiplex images across multiple thread, which in turns makes the buffer non-writable, so if you want to overlay something in one thread, you have to copy first
[14:25] <bbrezillon> ok
[14:25] <ndufresne> (copy-on-write)
[14:25] <ndufresne> but while doing so, we simplify the memory by moving to a single allocated block, it's also faster to allocate
[14:26] <bbrezillon> still okay
[14:26] <ndufresne> anyway, this was just a parenthesis, should not affect your work
[14:27] <bbrezillon> what I don't understand is why passing a single DMAbuf fd is simpler than passing X
[14:27] <ndufresne> bbrezillon, ezequielg: Btw, were are the WIP RK mainline H264 driver code again ?
[14:28] <bbrezillon> I mean X times the same fd
[14:28] <ndufresne> bbrezillon, for importation I guess it's simple, the thing get messy on exportation
[14:28] <bbrezillon> oh, I don't support allocting multiplanar buffers in a single chunk so far
[14:28] <ndufresne> in the sense that you need to decide if you expose two DMABuf pointing to the same (just a ref) or make the API more complex to expose 1
[14:29] <ndufresne> but some HW might require that
[14:29] <ndufresne> my question is how do we create something symmetrical
[14:29] <bbrezillon> yes, last time I asked it wasn't clear whether this was needed or not
[14:30] <bbrezillon> so I just ommited that part
[14:30] <ndufresne> IP interface are a bit of a jungle
[14:31] <bbrezillon> ndufresne: regarding H264, I don't know
[14:31] <bbrezillon> probably somewhere in ezequielg's tree
[14:31] <ndufresne> some little bird told me it's your tree now ;-D
[14:31] <bbrezillon> :)
[14:32] <bbrezillon> nope, unless ezequielg gives me his korg ssh key :)
[14:32] <bbrezillon> oh no, it's infradead
[14:32] <ndufresne> haha, I was joking
[14:32] <bbrezillon> ndufresne: anyway, would appreciate your feedback on v2 of the RFC
[14:32] <bbrezillon> if you find some time
[14:36] <ndufresne> sure, I'll try and dive deeper in it next pass
[15:42] *** benjiG has left