How do you feed data (e.g. YUV images) into an encoder? I'm looking this function: https://bit.ly/3yvje5v and it seems that `void* mem` is not used.. 
If I'm correct, that particular case is making use of dma-transfers between devices as describe here https://bit.ly/3s0xWzc 
roxlu: the buffer is referenced by the fd argument
dmabufs are buffers who are referenced by a handle, in the form of a file descriptor
they can be passed to devices
there's no dma transfer between devices
the camera writes to the buffer using DMA, and the encoder reads from the buffer using DMA, there's no direct communication between the camera and the encoder
dmabuf allows using the same buffer for the camera and the encoder, avoiding a copy from the camera buffer to a separate encoder buffer. that's about all there is to it
hey pinchartl! thanks for answering
for this test case I want to provide yuv buffers from cpu/app memory. If I'm correct I can use either user pointers or mmap'd memory. 
USERPTR is not recommended
MMAP is better in this case
Ok cool, I did remember that from ages ago :) 
USERPTR was a historical mistake, before we had dmabuf
I'm trying to create some output/capture buffers but failing. Whenever I set the v4l2_buffer.length the ioctl call with QUERYBUF fails 
what do you set .length to ?
e.g. when I uncomment this line: https://gist.github.com/roxlu/61fa7299071358b67ee9ac9099c5d2c8#file-v4l2-tests-cpp-L37 
and what error does the ioctl return ?
for the _OUTPUT I set it to 1, I'm using I420 as input
the result of ioctl is invalid arg
why do you set it to 1 when the planes array has three entries ?
that's probably wrong indeed
here https://github.com/raspberrypi/libcamera-apps/blob/56ae02a57193d703524ebf6ca64cf093df44de36/encoder/h264_encoder.cpp#L149 they do the same
When I use 1 plane instead of 3 and set .length to 1 I'm getting the same error.
the rpi code you point to queries the capture buffer, which uses V4L2_PIX_FMT_H264, which requires a single plane
try to increase the number of planes to 3
although V4L2_PIX_FMT_YUV420 should use a single plane as well
if that fails, you can trace the call inside the kernel to see where it fails
increasing the video device debug level and/or the vb2 debug level may help
/sys/module/videobuf2_common/parameters/debug
/sys/module/videobuf2_v4l2/parameters/debug
oh awesome! I was thinking about how to get more debug info
/sys/class/video4linux/video*/dev_debug
you have the source code for the kernel, so you can add printk statements there if all else fails
but start with the debug levels
pinchartl: ok, I was thinking about that, but I have no experience with that (yet), would love to learn more about htat as it's very helpfull to be able to debug like that
let me first change the length to 3 and see what happens (I think that failed before)
ok yeah that fails
you can set all the debug levels to 10 (I think the maximum level that is used is 3, so 10 will for sure give you all messages). that will be pretty verbose
ok, how do you set those? like `echo -n "10" > /sys/module/videobuf2_common/parameters/debug` ?
yes
(no need for -n)
ok thanks
https://gist.github.com/roxlu/960446530c85c3999e8ab22d9064cf80#file-dmesg-log-L83 hmm not sure if I see anything in particular wrong here
I've increased the verbosity a bit: https://gist.githubusercontent.com/roxlu/f91c158f4d3176baef191b1581a786cb/raw/f8955c846b632786b64c820e3018fa9c47b0fb85/dmesg.log 
ok .. I think I've to do a facepalm 
yep... omg, https://gist.github.com/roxlu/61fa7299071358b67ee9ac9099c5d2c8#file-v4l2-tests-cpp-L42 
(0 == r) -> (0 != r)
ok, now I can start feeding in some buffers.  
so if I'm correct I need to fetch a buffer from OUTPUT and then fill it with my yuv data?
:-)
yes, you need to take an output buffer, fill it with data, and queue it
and to take an output buffer, I use `VIDIOC_QUERYBUF` with a pass a v4l2_buffer which has been setup in a similar way as I did in my pastes, but then change the `index` value to the one which is available?
no exactly
after allocating buffers, they're all available to the application
so you can take any buffer you want
to mmap() the memory, you need to know the offset to pass to mmap() that corresponds to the buffer
and that's what QUERYBUF gives you
you should call QUERYBUF on all buffers after allocating them and save the offsets
and then mmap() the buffers
that's all init-time operations, after that you don't need to call QUERYBUF anymore
ok, and after the initialization phase you use QBUF and DQBUF to hand over buffers to the encoder and retrieve used ones?
yes
ok I see and that's what you have to do for both "streams", I mean the _OUTPUT and _CAPTURE types?
on the output side, you have to fill buffers with data to encode, and queue them
on the capture side, you can just queue all buffers initially
and you'll dequeue them when they're ready
ah ok, so the capture side is a bit simpler to implement; I mean, I can imagine that for the output side you might want need to wait for buffers to be reusable? or maybe you just need to make sure you've got enough buffers
on both side you'll cycle through buffers
on the capture side you enqueue everything at the beginning, and then have to wait until a buffer is available for dequeue
at that point, you dequeue it, process it (for instance write its contents to a file), and requeue it
similar story on the output side, you fill buffers, queue them, wait for buffers to have been processed, dequeue them, refill them, requeue them
ok yes thanks, I see how this works now ... I had to grasp a few new concepts but now I see how nicely everything fits together! 
to wait for buffers to be ready, you should use select() or poll()
.. well I do have to look into how to implement this all correctly; and how/when poll() can be used to check of available buffers but the rpi code uses that. 
^^ yeah 
in order to wait on both the capture and output queues at the same time
my initial idea was to create a thread for the output side. when a user has yuv data, I copy (not optimal) and add it to a cpu queue which is fed/copied into the OUTPUT side in a thread. 
it would be better to not copy the yuv data to another thread; and assume there is always a OUTPUT buffer avvailable to fill
threads are overkill for this. an event-driven design is good enough
and you should avoid copies
they are VERY expensive
yeah 
though if you have no control over the OUTPUT there is not really another solution to copy the data the user gives you
i'll also create a version that uses DMABUFs
do it the other way around then, give to the user the buffers they need to fill
design your APIs to avoid all copies
ah that's a good idea!
does this also work with GL/Vulkan texture buffers .. where you can directly transfer e.g. webcam frames into GPU ?
GL/Vulkan have extensions to use dmabuf, yes
oh nice!
I was also wondering about those DMABUFs, when e.g. using frames from a webcam directly as input for an encoder, is the `fd` value for each buffer different? 
(not sure if I explained that correctly)
I was wondering about that, because here https://github.com/raspberrypi/libcamera-apps/blob/56ae02a57193d703524ebf6ca64cf093df44de36/encoder/h264_encoder.cpp#L196 
no offset is given that could be used to get a pointer to some buffer; instead it uses the given `fd`, which made me think that each buffer has it's own `fd`
yes, the fd is different for each buffer
the fd is a buffer handle
ok
pinchartl: when I create 2 threads can I use poll on the same FD, with POLLIN? and how would I be able to see what buffer is ready (e.g. the _OUTPUT or _CAPTURE)? 
.. I mean, let's say I want to start feeding yuv data, then I must get access to a buffer that I can fill, but how do I know if I can start feeding data? and how would I now what buffer to fill? ..and how would I know when I can reuse the buffer? 
polling on the same fd will work, but you can't tell which of the two threads will be woken up
once select() returns with events on the fd
you can just call VIDIOC_DQBUF twice, once with each buffer type (capture and output)
if VIDIOC_DQBUF fails with -EAGAIN, you just continue
if it succeeds, you process the dequeued buffer
make sure to open the device in non-blocking mode (O_NONBLOCK)
you need to keep track of buffers
ah ok, so it would be better to just create one thread for the poll()ing
if you allocate N buffers, initially you have buffers 0..N-1 that you can use
you pick any of them, fill it, and queue it
at that point you need to record that that buffer has been queued and can't be used anymore
and do I use the `.index` member to keep track of which buffer is used/reusable?
once you dequeue it with DQBUF, you put it back in the pool of available buffers
yes, you can use the buffer index for that
you have to maintain a pool of indexes that are free
or even indices
yep, I got that from the rpi example indeed
thanks
it would be nice to have a userspace library for codecs that would handle all that for you
like libcamera does for cameras
yeah, I think in general a nice clean hw-encoding/decoding library would be nice
not a library like ffmpeg, just a small one that does encoding/decoding
would a ioctl block when not using O_NONBLOCK?
of course you could also use a framework like gstreamer that would handle it for you
or some other func/
VIDIOC_DQBUF will block if no buffer is available and the device is open in blocking mode
yeah gstreamer is nice but to me it's similar in size as ffmpeg (probably a bit smaller)
ok
blocking mode makes very little sense in general. in a very small test application that just captures frames from one device, maybe, but as soon as you make something "real", non-blocking mode is the way to go
yes
hverkuil: I'd like to review the patch. I can take it to my tree then.