[00:52] *** mchehab has quit IRC (Read error: No route to host) [01:05] *** iive has quit IRC (Quit: They came for me...) [06:11] *** bparrot has quit IRC (Remote host closed the connection) [08:41] <pinchartl> hverkuil: ping [08:41] <hverkuil> pinchartl: pong [08:41] <pinchartl> how are you doing ? [08:42] <hverkuil> good! Thank you. You as well, I hope? [08:42] <pinchartl> a bit jetlagged, but otherwise fine :-) [08:42] <pinchartl> I have a V4L2 API question for you [08:42] <pinchartl> and a nasty one I'm afraid [08:43] <pinchartl> I have been informed of this use case [08:43] <pinchartl> a mem-to-mem device, with 4 DMA in and 5 DMA out [08:43] <pinchartl> that's 9 DMA engines [08:43] <pinchartl> so 9 video nodes [08:44] <pinchartl> the issue is that called VIDIOC_QBUF 9 times for every processing pass is too costly in terms of performances [08:44] <pinchartl> the required frame rate can't be achieved [08:44] <hverkuil> what's the framerate? [08:44] <pinchartl> I can't disclose lots of information about the details, but it's high [08:45] <pinchartl> is there a way we could extend V4L2 to support this without changing the whole API ? [08:46] <pinchartl> I've received other questions for which I've been able to propose more or less simple V4L2 extensions [08:46] <pinchartl> but for this one I4m a bit stuck [08:46] <pinchartl> I'm not sure we can do something without extensive modifications [08:46] <hverkuil> More than 60Hz? Is it certain that it is the ioctl syscall itself that is the cause of the overhead, and not cache management or something like that? Do you have real numbers? [08:47] <hverkuil> Is this on an arm SoC or on Intel? [08:47] <pinchartl> it's ARM [08:47] <pinchartl> I'll ask for more details on the numbers [08:47] <pinchartl> but I wanted to already pick your brain on whether there would be a not too difficult way forward that may be escaping me at the moment [08:49] <hverkuil> Before doing anything I really want to first know for certain that it is the ioctl that causes the problem. Also, I know that for Intel CPUs recent kernels take measures to reduce the risk of spectre attacks etc. which cause a performance hit for syscalls. Those can be turned off, and it will be interesting to know how much of a difference that makes in the ioctl performance. [08:49] <hverkuil> It would be very, very useful to have some actual numbers there. [08:51] <pinchartl> ioctls on ARM have become more costly for the same reason as on x86 [08:51] <pinchartl> but I'll try to get numbers, yes [08:52] <hverkuil> It's a common theme lately about the cost of ioctls, but it would be nice to have some real numbers. It's something we can take into account when designing new APIs. [08:53] <hverkuil> This information can feed directly into the discussion about the new streaming ioctls. [08:58] <kbingham> http://fpshighspeedcam.com/ says some of the high end sony cameras can do 1000 fps now! Yikes. [08:58] <kbingham> some wierd reduction in frame sizes, but I guess it allows capturing 'slow motion' videos. [09:03] <hverkuil> In any case, we need to start thinking about providing support for high speed capturing, and devices with many capture/output streams. [09:04] <pinchartl> and https://www.hasselblad.com/h6d-multishot/ does 400MP [09:05] <pinchartl> likely not at 1000fps though :-) [09:05] <hverkuil> High-speed capturing would probably need more than 32 buffers per queue, and needs to (de)queue more than one frame per (D)QBUF. [09:06] <pinchartl> yes, batching buffers is also needed [09:06] <hverkuil> For m2m devices with multiple streams we need to be able to (de)queue those additional streams with the same filehandle (discussed in Lyon). [09:07] <hverkuil> And probably also (please confirm for your use-case) (de)queue all those streams with one ioctl. [09:07] <hverkuil> This assumes all those streams are synchronized. [09:08] <hverkuil> So the streaming ioctls need batching and allowing to queue/dequeue buffers for multiple streams. [09:08] <pinchartl> for mem-to-mem devices we can consider everything works synchronously, yes [09:09] <hverkuil> Basically an NxM operation for N simultaneous streams and M buffers batched. [09:09] <pinchartl> that's departing "a bit" from the V4L2 model :-) [09:10] <kbingham> So we simply need to move enormous amounts of data with zero bandwidth usage, and at extremely high rates with zero latency. Simples. :-D haha [09:10] <kbingham> I'll go collect my pink unicorn on my way out. [09:11] <hverkuil> Not really. It's just a new ioctl. Basically you are pushing some of the things you would have to do in userspace into kernelspace to avoid the ioctl overhead. The basic concept remain the same. [09:11] <hverkuil> This is definitely a separate ioctl since few devices need support for this and the ioctl will be a lot more complex API-wise. [09:12] <pinchartl> well, it would ditch most of the ioctls we have for a new one (let's keep in mind that we ned to be able to configure formats, controls, selection rectangles and links in requests) [09:13] <pinchartl> with lots of DMA engines, and thus lots of video nodes and subdev nodes, the ioctl overhead of having to propagate a format change through the pipeline is too high using per-pad ioctls [09:13] <pinchartl> it will be interesting. in a masochist "I like challenges" way :-) [09:30] <hverkuil> There are two mostly separate areas where ioctls become a pain: per-frame configuration changes for complex pipelines, and having to handle many buffers in parallel (i.e. processing many parallel streams, high-speed capture). [09:31] <hverkuil> The second part is relatively easy to implement (I think), the first would require a radical redesign. [09:31] <bbrezillon> hverkuil: is that okay if I add a v4l2_ctrl_s_ctrl_compound() helper similar to v4l2_ctrl_s_ctrl_string,u64() but for a compound [09:32] <hverkuil> bbrezillon: sure, as long as you have a user for that function. [09:32] <bbrezillon> or should I add a specialized version for the VP9 frame context ctrl? [09:32] <bbrezillon> I have one user => VP9 [09:33] <hverkuil> How about a v4l2_ctrl_s_ctrl_compound() in v4l2-ctrls.c and static inlines in v4l2-ctrls.h for specialized versions? [09:34] <hverkuil> I want to avoid adding lots of specialized functions to v4l2-ctrls.c that take up space when they are quite possibly not used. [09:34] <bbrezillon> I do agree [09:34] <bbrezillon> not even sure the specialized wrapper is useful [09:35] <bbrezillon> let's go for the s_ctrl_compound() first [09:35] <bbrezillon> we'll see if there's a need for specialized wrapper afterwards [09:46] <hverkuil> right [11:18] *** NiksDev has quit IRC (Remote host closed the connection) [11:49] <dafna21> hverkuil: hi, there is a crash when opening a subdevice fh while the device is unregistered, there is a crash when trying to reference `sd->v4l2_dev` , since it is set to null in `v4l2_device_unregister_subdev` [11:50] <dafna21> this happens also when removing the device with "echo 0 > /confgfs/vimc/dev/hotplug" in the configfs feature, So I don't know if this feature is ready yet [11:51] <dafna21> I guess a crash caused by unbinding the device with "echo > /sys/.../ubdind" is not as bad [11:52] <dafna21> I wonder how this should be solved [11:53] <hverkuil> subdev_open should call video_is_registered(vdev) and return with -ENODEV is it is not registered. Similar to what v4l2_open does. [11:54] <hverkuil> Only release() can be called for a fh whose device node was unregistered, all other ops should return -ENODEV. [11:55] <hverkuil> compare v4l2-subdev.c with v4l2-dev.c. [11:55] <dafna21> but shouldn't it be insiede a lock?, sd->v4l2_dev can be set to null right after it [11:57] <hverkuil> yes, it should. [11:57] <dafna21> it should be in a lock? [11:58] <dafna21> so the unregister_subdev should also be under the same lock? [11:59] <hverkuil> I need to think about this a bit more. [11:59] <hverkuil> The video_is_registered(vdev) check is definitely required, so you can start with that. [12:00] <hverkuil> drivers/media/mc/mc-device.c seems to have the same problem. [12:00] <hverkuil> It should check against media_devnode_is_registered(). [12:01] <dafna21> when this check is not under a lock it seems pointless, because the device can always be unregistered wight after this check [12:02] <hverkuil> The problem here is not with that call, it is that sd->v4l2_dev is set to NULL while there are still users of the sd. [12:03] <hverkuil> I'll get back to you later. [13:32] *** mchehab has quit IRC (Ping timeout: 265 seconds) [13:37] <svarbanov> hverkuil, request api related question, let say I allocate request_fd and s_ctrls(request_fd) ... qbuf(out queue) and finally call MEDIA_REQUEST_QUEUE. Can I expect that when DQBUF(capture queue) will have the same request_fd [13:38] <svarbanov> i.e. the dequeued capture buffer has requested controls applied? [13:44] <hverkuil> svarbanov: no, the request_fd is not copied to capture buffers. The capture queue does not use the request API for m2m devices. [13:44] <hverkuil> You need to use the timestamp field to associate capture buffers with output buffers. [14:08] <svarbanov> hverkuil, sounds asymmetric to me, I have to think more is that usable for passing metadata like HDR and other custom data [14:09] <hverkuil> svarbanov: I'm confused. Are you talking about the request API with m2m devices (stateless codecs) or with capture devices? [14:10] <hverkuil> you were talking about out_queue and capture_queue, so I assumed an m2m device. [14:11] <svarbanov> hverkuil, I'm thinking about Venus (stateful codec) and how to pass metadata between userspace and driver [14:12] <hverkuil> Ah, OK. [14:12] <svarbanov> because the metadata is per buffer and it is bidirectional, i.e. userspace filled metadata buffer for output queue and read metadata on capture buffer but the information is somehow related [14:13] <hverkuil> are you talking encoder or decoder? [14:13] <svarbanov> encoder [14:14] <hverkuil> what sort of metadata do you get on the capture queue? [14:14] <hverkuil> I understand the need for metadata on the output queue, but what you get back on the capture queue isn't clear to me. [14:19] <svarbanov> see this header https://github.com/realme-kernel-opensource/realme5-kernel-source/blob/master/include/uapi/media/msm_vidc_utils.h [14:21] <hverkuil> does that mean that with the venus encoder you need to splice that information into the generated bytestream in userspace? How does userspace use this information? [14:22] <svarbanov> it is complicated and I still don't understand how that information is used [14:23] <svarbanov> for now I only know that the metadata should be associated with data buffer [14:24] <svarbanov> and currently in downstream code they have separate v4l2_plane for that [14:25] <hverkuil> In any case, this is really information that should be returned as read-only controls in the output request. You process a request of the output queue and report back any associated produced metadata as controls in the request. [14:26] <hverkuil> The output queue is per-frame, so it's easy to keep everything together. The capture queue is just to capture the result. [14:26] <hverkuil> Provided the encoder produces a single compressed frame per capture buffer, you can easily map to the output buffer using the timestamp as tag. [14:27] <hverkuil> Other encoders might generate a continuous bytestream on the capture side, and then using the request API for capture buffers fails. [14:29] <hverkuil> don't attempt to have both sides of an m2m device implement the request API, that's going to be very painful. [14:29] <svarbanov> I'd like to avoid v4l2 controls because they are expensive for such big chunks of data :( [14:30] <svarbanov> I have to investigate more on that, just wondering how request API is applicable in this case [14:30] <hverkuil> Can this data be DMAed to a buffer by the hardware? Or is it always going to be a memcpy? [14:31] <hverkuil> If it can't be DMAed, then there is not much of a downside to using controls in a request. [14:31] <svarbanov> actually in Venus case this extradata is filled by remote processor so for the v4l2 driver it looks like DMAed [14:32] <svarbanov> you pass an virtual address of a buffer and the remote side just fills it with that information [14:32] <hverkuil> If it can be DMAed, then you'd need some metadata queue. And then we run into a problem that was discussed in Lyon in that m2m devices can only use a single video node and there is no support for substreams. [14:33] <hverkuil> Depending on the details, it might be possible that the remote side can just write to the buffer containing the control value directly. [14:35] <hverkuil> You'd have one memcpy when you get the controls in userspace after the request is completed. [14:35] <svarbanov> yeah, but we still copy_to|from_user in v4l2 control apis [14:37] <hverkuil> Step one is to look into how this metadata is used. [14:38] <hverkuil> And if you want to use metadata 'DMA' for this, then that won't be supported for some time yet. [14:42] <hverkuil> Also, it might be useful to measure how much additional time it will actually take to do the memcpy. I.e. how expensive is the G_EXT_CTRLS call for a u32 and for a u32 array of N elements? At what value of N does the performance really begin to matter. [14:43] <svarbanov> that is clear, just wanted to go in the right direction :) [14:43] <hverkuil> That would be useful information for all of us in deciding when to go with memcpy and when metadata streaming is needed. [14:44] <svarbanov> I've started profiling get and set control with ctrl.ptr and ctrl.size = 16KB [14:45] <svarbanov> on my SoC it is ~50us but occasionally it could be 100,200 even 500us when the cpu which execute the control is interrupted from hard_irq [14:46] <svarbanov> and even I saw how the control from userspace is started on cpu0 but finished on cpu4 [14:46] <svarbanov> thanks for advices, I will look in that direction [14:48] <hverkuil> That's measuring latency, not the actual memcpy time. [14:54] <svarbanov> correct, I wanted to know the latency of the get/set control, I will measure copy_to|from_user (if I remember correctly it was ~5us @ 16KB) [14:58] <svarbanov> and cpu freq was 1.8Ghz [15:14] *** mszyprow has quit IRC (Ping timeout: 245 seconds) [15:20] <syoung> mchehab1: https://git.linuxtv.org/ says internal server error [15:33] <garrettkajmowicz> Is there a comprehensive V4L2 python library? The closest I've found is pyv4l2 but it lacks a lot of functionality. [15:48] <ndufresne> svarbanov, I like the hdr10+ metadata payload, size + data (simple) [15:49] <svarbanov> ndufresne, in the header file? [15:49] <ndufresne> https://github.com/realme-kernel-opensource/realme5-kernel-source/blob/master/include/uapi/media/msm_vidc_utils.h#L190 [15:50] <svarbanov> ndufresne, yes :), probably the format isn't known at that time ? [15:51] <ndufresne> hdr10+ should be somewhere, I still need to find the time to read the spec [15:51] <ndufresne> right now I'm trying to compare gst HDR10 stuff to see if they hold with other implementation [15:53] <ndufresne> svarbanov, for msm_vidc_mastering_display_colour_sei_payload, is that in q16 ? [15:56] <svarbanov> ndufresne, I have to check, is it in q16 in SEI packet? [15:57] <ndufresne> I guess I can check that, just asking, in GStreamer the guy who integrated it just use two uint, numerator / denomimator for the coordinates [15:57] <ndufresne> (should be future proof) [15:59] <ndufresne> it's a bit funky in the bitstream, but it's 16bit fractions [16:00] <ndufresne> it's stored Rx_n|Ry_n|Rx_d|Ry_x and then same for green and blue [16:01] *** mtretter has quit IRC (Quit: Leaving) [16:01] <ndufresne> svarbanov, so the msm C struct does not match the bitstream apparently :-( [16:01] <ndufresne> unless it's GStreamer bug, checking now [16:02] <svarbanov> ndufresne, what is the difference? [16:02] <svarbanov> the parameters should be the same [16:02] <svarbanov> for hdr10 [16:02] <svarbanov> hdr10+ is another story I guess [16:03] <ndufresne> ah, had it wrong [16:04] <ndufresne> in HEVC, the coordinate is normalized to 16bits, with 0.00002 increments, range 0 to 50K [16:06] <ndufresne> in GStreamer, we normalize as a fraction between 0 and 1 [16:06] <ndufresne> so go figure what msm normalized to, but I suspect it might be Q16 (that would be consistent) [16:09] <ndufresne> svarbanov, so are you trying to see how to plumb this kind of metadata to stateful encoders ? [16:10] <ndufresne> if it was sateless, I'd make this a control and include it into the request [16:11] <svarbanov> yes, I'm trying to investigate how to pass these metadata between v4l2 driver and v4l2 client [16:11] *** benjiG has left [16:12] <ndufresne> I would really hate if that was a seperate v4l2 node [16:13] <svarbanov> me too :) [16:13] <ndufresne> and I don't think it actually can work without load of plumbing, since you have to add API to associate two instances [16:14] <ndufresne> userspace whise, we'll have to add request support, so controls bundles as request seems nice to "pass" metadata [16:14] <ndufresne> I'd like to ear from pinchartl how it works for camera to read-back the applied controlled with sensors, since this is similar to reading back metadata [16:23] <svarbanov> actually this kind of metadata is synchronous with the data buffers, on the other side the metadata with separate video node is more like asynchronous (because the hw which produces it, is separate from the one which doing the processing) [16:24] <ndufresne> well, for UVC, the data is synchronous, basically you have a sequence number, and the metadata for the frame matches in seq [16:24] <ndufresne> (synchronous, or serialized) [16:25] <svarbanov> ndufresne, can you point me to the userspace code which is doing that matching [16:26] <ndufresne> I don't have code myself for that [16:26] <ndufresne> if there is OSS code, that would be in OpenHMD [16:27] <ndufresne> (but inside out tracking is quite early + sensors are not always standard uvc) [16:28] <ndufresne> In fact, I have no support for UVC metadata in GStreamer [16:28] <ndufresne> would first need to add code to match the video nodes ;-D [16:29] <ndufresne> but I think the design would be to manage both nodes as single stream, and just dq both together, to avoid having to do the matching [16:31] <ndufresne> svarbanov, I'm happy you are giving a look at this, I got a lot of potential HDR work, were metadata is crucial [16:32] <svarbanov> that is one of things which I'd like to avoid, i.e. wait for both buffers in userspace when they come together in kernel driver [16:34] <svarbanov> ndufresne, this (metadata) is one of the obstacles to use upstream driver in android [16:34] <ndufresne> it seems rather racy, since if userspace is waken up on one node, it always have to do an extra non-blocking poll on the other nodes to make sure it was not fires "at the same time" but after [16:34] <svarbanov> the other one is compressed NV12 [16:35] <ndufresne> about compress NV12, make sure to sync with bbrezillon then, he's looking at AFBC for various cases, and codec support seems like a must [16:36] <svarbanov> ndufresne, yes, I'm following his patchset which introduces v4l2 modifiers [16:36] <ndufresne> in the end, we need to plumb modifiers, there is no way we'll be able to add thousands of formats linearly [16:37] <ndufresne> there a guy from NVidia at XDC that was saying there is around 50K of possible modifiers on these chips [16:38] <svarbanov> that sounds like over-engineering ;) [16:38] <ndufresne> they have variable tiling and compression configuration which they pick doing some math with some parameters [16:39] <ndufresne> it's just a tiler configuration thing [16:40] <ndufresne> (for gpu, 1% gain in optimization is justified) [16:40] <ndufresne> while in streaming, if you reach your target, you are done, nobody will benchmark raw speed [16:42] <svarbanov> ndufresne, you will be in Tokyo next week, right? [16:42] <ndufresne> right, landing Sunday [16:43] <ndufresne> I think the metadata will be on the table, so I'm trying to read as much as I can now [16:43] <svarbanov> ok, If you guys have time, talk about metadata ... opps you bite me [16:43] <ndufresne> ok [16:45] <ndufresne> one thing I'd try to learn there is how do you got from let's say a 48bit dynamic sensor, to a final HDR10 or HDR10+ encoded image (high level, I'm not very into these math anymore) [16:45] <svarbanov> I will try to prepare a document with use-case and our concerns about performance impact and synchronization in userspace [16:45] <ndufresne> sounds good [16:45] <ndufresne> I'm sure they have the same concerns with the sensors [16:47] <ndufresne> ezequielg will also be there, but he must be on a plane already [17:17] *** pH5 has quit IRC (Quit: bye) [18:51] *** syoung has quit IRC (Read error: Connection reset by peer) [19:50] <ezequielg> Actually, I'm on a bus, but will get on a plane later today. [21:08] *** tensa has quit IRC (Quit: Ping timeout (120 seconds)) [21:08] *** indy has quit IRC (Quit: ZNC - http://znc.sourceforge.net) [21:08] *** blinky42 has quit IRC (Quit: No Ping reply in 180 seconds.) [21:12] *** cybrNaut has quit IRC (Ping timeout: 268 seconds) [22:00] *** sailus has quit IRC (Quit: leaving) [22:06] *** dv_ has quit IRC (Ping timeout: 240 seconds) [23:10] *** andrey_utkin has quit IRC (Quit: Gateway shutdown)