[01:54] <ldiamond> I'm trying to change the settings of a video capture device with v4l2-ctl without success [01:55] <ldiamond> If I open the device in another software like obs or a web browser, the FPS is set correctly. When I open it with `mpv` it drops to 5fps. How do I set it to 30fps such that it's the default [09:12] <jc44> The codec works fine without its pipeline full. It just works better if it is. What custom API are you talking about? [09:19] *** jc44 has left [12:24] <ndufresne> jc44: the phase stuff [12:25] <ndufresne> jc44: I thought about this a little more, to really operate the best single stream performance, it will be handy if userspace is aware of your pipeline depth (in this case 2) [12:25] <ndufresne> because to make use of this pipeline, we need to allocate enough buffers [12:26] <ndufresne> jc44: does the two phase needs to execute the same stream, or can you parallel decode while using these phases ? [12:30] <ndufresne> Kwiboo: when you worked on rkvdec hevc, did you tested slice-per-slice decoding ? And if so, does it works ? [12:30] <jc44> Assuming you are looking at FFmpeg the phase stuff is just me hacking FFmpeg s.t. it will execute more than 1 hwaccel decode at once. The V4L2 API hasn't changed at all. [12:31] <ndufresne> ah, it looked like there was some controls behind, sorry then [12:31] <ndufresne> jc44: I'm the person writing the GStreamer side of this for the context, and API reviewer [12:32] <jc44> It is a bit overengineered as it is being reused from another decoder which did need more through execution control [12:32] <ndufresne> in GStreamer we notice an important performance gain to only sync at the output phase, post re-ordering [12:32] <ndufresne> but if you have a baseline profile without B-Frames, you endup outputing immediatly, so there is no gain [12:33] <jc44> Yup - exactly - my VAAPI shim does exactly that [12:33] <ndufresne> while I know what's next to solve this in GStreamer, the ffmpeg side remains a bit of an unknown [12:33] <ndufresne> to me I mean [12:34] <ndufresne> but I feel like for now (mainline whise I mean) it's not urgent, as we have plenty of other things to improve first [12:34] <ndufresne> like finishing the HEVC support as an example ;-D [12:34] <jc44> Well - in FFmpeg my MT hwaccel hack does do the trick - it is compatible with non-pipelined V4L2 "stateless" decoders too [12:35] <ndufresne> basically, even if you don't have that pipelines or multi-core support, just he fact of picking a job on the IRQ is a huge performance gain [12:37] <ndufresne> jc44: we know from the reference doc that the Hantro/Verrisilicon can be configured with up to 4 cores, and all cores can be the same function, so it's been a reflection ezequielg and I have had on how to cleanly allow this [12:38] <ndufresne> that being said, you are more likely to find this configuration on a PCI card, which all comes with proprietary drivers unfortunatly, or runs a full os on the card (so you don't play with the accelerators directly) [12:39] <grkblood13> is this channel a good place to ask about errors from the v4l build script? [12:40] <jc44> Well sometime back in Feb (probably) I posted a [RFC} to LinuxMedia that decoupled capture buffer completion from job completion (which is what my decoder uses) - it vanished into a black hole [12:41] <jc44> By my reading of the published API that should be legit [12:42] <jc44> I now can't find it as Google sinks reflected posts (grumble mutter) [12:45] <paulk-leonov> jc44: can you ellaborate on this idea of decoupling? [12:45] <jc44> Re multi-stream decode - yes the decoder can decode two different streams at once [12:46] <jc44> I'll see if I can find the original post/patch [12:47] <broonie> jc44: https://lore.kernel.org/linux-media/ might turn it up. [12:47] <ndufresne> jc44: that thread ? "Multiple activve jobs with the V4L2 request i/f?" (sorry if it went unnotice) [12:48] <ndufresne> jc44: do you know if " [PATCH] media: videobuf2: Fix length check for single plane dmabuf queueing" was merged ? [12:48] <jc44> No - OK I'm confused I'm sure I posted a patch but it isn't there [12:48] <grkblood13> media_build/v4l/tw686x-core.c:89:50: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types] module_param_call(dma_mode, tw686x_dma_mode_set, tw686x_dma_mode_get, [12:49] <ndufresne> yeah, I don't have that patch in my linux-media folder [12:49] <jc44> Hang on - I'll find you a GitHub reference to the patch - cos I do use it [12:50] <paulk-leonov> jc44: it looks like the blocking point is that v4l2 m2m requires both capture and output to be completed before it schedules another request [12:50] <paulk-leonov> another job* [12:50] <jc44> Yup - and that was what this patch fixes [12:50] <paulk-leonov> this is quite separate from the request API as far as I understand [12:50] <ndufresne> jc44: but we all agree that some of the simplistic approaches in the m2m framework design as unfortunate, and it seems to discourage of lot of folks [12:50] <paulk-leonov> so yeah I think m2m needs to be extended to support multiple jobs in parallel [12:51] <ndufresne> it goes further, we might want proper scheduling of these jobs, so it respect the general OS/process configuration [12:51] <paulk-leonov> it looks like allowing a job to complete before both buffers are done is an improper fix [12:52] <paulk-leonov> right [12:52] <ndufresne> so far we avoided this by using locking, which trigger the general purpose scheduler [12:52] <jc44> OK - but it is an easy fix that didn't break anything [12:52] <paulk-leonov> mhh it looks like it goes against the logic of m2m [12:54] <jc44> I though I might get a small patch accepted - wholesale reengineering was never going to fly [12:54] <paulk-leonov> looks like this deserves a serious rework IMO [12:55] <paulk-leonov> as ndufresne said a simplistic approach was taken initially [12:55] <paulk-leonov> I guess it's in part because the design was aimed at firmware-based decoders :) [12:56] <tfiga> grkblood13: I think it might be a good idea to describe what you are trying to do [12:56] <tfiga> what drivers are you building, on what system, etc. [12:57] <grkblood13> tfiga, I'm trying to run the media_build script located in the git://linuxtv.org/media_build repo [12:57] <grkblood13> its on an Nvidia Jetson Nano SBC [12:58] <grkblood13> i only care about the xc5000 driver, since thats what the hauppauge tuner uses, but I dont see anythign in the isntruction on how to single that one driver out [13:05] *** benjiG has left [13:05] <jc44> ndufresne: My patch is her: https://github.com/jc-kynesim/linux/commit/dad65a0a21ca6e852cae62600ad6e521779c0b0c#diff-a82075c19f662d89b479054fc8d82500 [13:07] <jc44> The idea is that you can detach the capture buffer from the job for later completion. It enforces capture buffer ordering to repect the job ordering to help avoid confusion. If a decoder doesn't care about it then the patch does nothing [13:08] <jc44> A more fully fledged multi-job thing would of course be better - but that was way past my understanding of how it should be done in order to avoid breaking other stuff [13:11] <ndufresne> jc44: not sure it's the right approach, since it's in violation of the stateles decode spec, that says that userspace only need to wait on the request to know the capture is ready and can be decoded without blocking [13:12] <ndufresne> I'd rather prefer a multi-job API then this [13:16] <ndufresne> jc44: on the GPU side, the way they design this would mean the what we call the "request" or the job, would stay internal to the driver, instead the driver would immediatly provide a dmabuf fence for the picture buffer to be produced [13:16] <ndufresne> I think the design is better, but we need to live with v4l2 legacy [13:16] <jc44> Fair enough - once that turns up I'll be very happy to use it. [13:17] <ndufresne> (at least in the short) [13:21] <jc44> What exactly is a dmabuf fence - is it just a blocking event that gets set once decode has completed? [13:54] <jc44> I don't think we need a multi-job (user) API - all we need is a driver-side API for m2m to allow a new job to start before signalling job completion to the user? [13:54] <ndufresne> jc44: a fence is a synchronization object that will reach completion state (or error) when a pending read or write operation completes [13:55] <ndufresne> jc44: yes, that means we need multiple concurrent jobs to be allowed [13:55] <ndufresne> With fences, the driver will deliver the buffer with the promise it will be filled when the fence is signaled [13:56] <jc44> Yes - so it is basically an event object [13:56] <ndufresne> on the GFX side, this is how they manage to program the next HW in the pipeline without paying the cost of the userspace scheduling delay, with the benefit of no latency added and lower memory usage [13:56] <ndufresne> jc44: the terms from from GL/Vulkan fwiw [13:57] <jc44> Yeah - thought so - my GL is poor [13:59] <ndufresne> the request is very similar, but there is only 1 request, while in GFX, they would likely pass fences for every bitstream buffer, then possible pass a fence for the picture buffer (as it might still be used by render, you have to sync on that), and the driver will return a fence for the picture buffer [13:59] <jc44> The idea of only 1 job at once seemed to be baked pretty hard into the mem2mem code when I looked at it - it isn't going to be a simple task to unwind that [13:59] <ndufresne> in gfx, the number of input/output buffers per "job" is arbitrary [13:59] <ndufresne> jc44: worst you just ditch it, and write a better helper, this is all unstable / internal API [14:00] <ndufresne> I'm always surprise when kernel folks complains about internal design errors, there is no stable API, life is so much easier [14:01] <jc44> Still - rather you trying to get that through than me! [14:03] <jc44> I'll cheer from the sidelines :-) [14:09] <tfiga> grkblood13: what's the kernel version running on that system? [14:10] <grkblood13> tfiga, 4.9.140-tegra [14:11] <tfiga> Okay, so it is quite old, but not that old that things would break. [14:12] <tfiga> I think it indeed tries to build some other driver, which perhaps was never tested for an ARM build [14:12] <tfiga> But I'm not that much familiar with the media build too. Perhaps there is a way to make it build only the specific drivers you need... [14:13] <tfiga> grkblood13: for a bit more luck, I'd try to put all the information in an email and send to the linux-media mailing list [14:14] <tfiga> Remembering about making in plain text (no html) and other mailing list communication guidelines [14:14] <grkblood13> im hoping the nvidia guys will help, but im not holding my breath. ill more than likely just return this sbc. [14:15] <tfiga> http://vger.kernel.org/lkml/ for mailing list guidelines [14:16] <tfiga> Generally the kernel your board is running is a custom branch from Nvidia, so we can't provide any support here [14:16] <tfiga> Although someone with some good will might show up if you ask on the lists [14:17] <tfiga> Ideally you would try to find an SBC that runs the mainline kernel [14:19] <grkblood13> i needed something with an nvidia gpu to take advantage of the hwaccel decoding and encoding they support via ffmpeg [14:20] <grkblood13> so my options are extremely limited [14:22] <tfiga> I believe Nvidia is one of the worst choices if it's about hardware decoding on linux [14:23] <tfiga> Although currently there is some work on a tegra v4l2 video driver so there is a chance that it could improve [14:24] <tfiga> Rockchip, Allwinner, i.MX, Qualcomm and others already have mainline support for video codecs using v4l2 [14:29] <ndufresne> tfiga: note, it's not the worst choice is you don't care about using Open Source, and as said, their proprietary stack is widely supported across MM frameworks (they even make the maintenance themself) [14:30] <grkblood13> i do have a rock64 [14:30] <grkblood13> it has an rk3328 [14:30] <ndufresne> that's a variant of 3399 fyi [14:30] <ndufresne> I think it was paulk-leonov who started adding support ? [14:31] <ndufresne> it's progressing for sure, there is also a gameboy like device running it [14:31] <grkblood13> if i remember the last i messed with it there was no hw encoding support for ffmpeg [14:31] <ndufresne> yeah, no ffmpeg support for that no [14:31] <paulk-leonov> ndufresne: support for what, sorry? [14:32] <ndufresne> wasn'T you working on rk3328 ? [14:32] <ndufresne> I can be wrong [14:32] <paulk-leonov> nope, I was doing px30 [14:32] <paulk-leonov> IIRC Kwiboo worked on rk3328? [14:32] <ndufresne> ok, but it's again in the same family [14:32] <paulk-leonov> well, it's a rockchip :p [14:33] <paulk-leonov> but px30 is close to rk3326 [14:33] <ndufresne> no, but I mean it runs the same VPU [14:33] <ndufresne> px30 is the industrial grade version of [14:33] <paulk-leonov> let me dig my table out [14:33] <paulk-leonov> about what is similar to what [14:34] <grkblood13> i need a sub $100 option that supports ffmpeg h262 hwdec and ffmpeg h264 hwenc [14:34] <paulk-leonov> mhh can't find it anymore [14:35] <paulk-leonov> grkblood13: there is no stateless h264 enc support so far [14:35] <paulk-leonov> although I've started working towards that [14:35] <paulk-leonov> so you'll need to be looking at a stateful encoder I guess [14:35] <paulk-leonov> (shaaaame) [14:36] <grkblood13> what do you mean by stateless? [14:36] <ndufresne> paulk-leonov: well, tfiga's team made a v4l2 plugin, and downstream driver [14:36] <paulk-leonov> there are two main types of VPUs [14:36] <ndufresne> we made it run on rk3399 recently, with gstreamer [14:36] <paulk-leonov> ndufresne: right and there's also mpp [14:36] <paulk-leonov> with the rockchip kernel [14:36] <ndufresne> and there also mpp with rk mailbox driver [14:36] <ndufresne> indeed [14:37] <paulk-leonov> grkblood13: vpus that use direct hardware blocks are called stateless and those that go through a microcontroller with a non-free firmware are called stateful [14:37] <ndufresne> but it's far from a plug and play experience yet [14:37] <paulk-leonov> grkblood13: this is because the firmware retains the state of the current decoding and do bitstream parsing to configure actual hardware blocks [14:37] <ndufresne> do ffmpeg have libmpp support ? [14:38] <ndufresne> perhaps it's viable option for grkblood13 ? [14:38] <paulk-leonov> ndufresne: I think I've seen it around [14:38] <paulk-leonov> but maybe it was a downstream version [14:38] <paulk-leonov> not sure [14:38] <paulk-leonov> or it's only for decoding in upstream and not encoding [14:38] <paulk-leonov> something like that [14:38] <paulk-leonov> rockchip sure has a downstream version that supports both [14:39] <ndufresne> paulk-leonov: let's admit, we are still trying to figure-out what to expose and how to expose, the h1 HW assisted rate control is really bad, in fact, libmpp no longer uses it [14:39] <paulk-leonov> right [14:39] <paulk-leonov> it still helps in my case [14:39] <ndufresne> so we might got the easy route, and just expose it as fixed QP, and let userspace figure-out some frame base RC without feedback [14:39] <paulk-leonov> but I had the feeling both mpp and chromeos implementations were doing things wrong [14:40] <grkblood13> i also has an orangpi on hand with an h3 [14:40] <grkblood13> have an* [14:41] <paulk-leonov> no support for that encoder yet [14:42] <tfiga> grkblood13: actually, why do you need to use the media build? The xc5000 driver seems to have been there for a long time [14:42] <tfiga> Maybe all you need is building a custom kernel from the tegra sources, with the driver enabled? [14:42] <grkblood13> tfiga, its not on the nvidia jetpack sdk [14:42] <tfiga> There must be kernel sources provided for sure [14:43] <grkblood13> yea, i have it [14:43] <tfiga> And that driver is included in Linux 4.9 [14:43] <tfiga> So it is probably disabled in kernel config [14:44] <grkblood13> ill pastebin the directory tree of what they provide. maybe youll knwo what file i need to poke around in [14:44] <ndufresne> paulk-leonov: well, mpp is a mailbox, it's wrong from kernel point of view, low security, and very toward closed source userspace driver, but chromeos isn't very wrong, it's pre-request API, but otherwise is just a v4l2 m2m driver, but it does not produce stream headers, and depends on many controls that aren't specified yet or even generalized [14:44] <paulk-leonov> yeah I know [14:44] <grkblood13> https://pastebin.com/raw/rWwzDekZ [14:44] <paulk-leonov> I've seen the code a lot :) [14:45] <ndufresne> paulk-leonov: as a short cust, they implemented a libv4l2 plugin that hides the custom controls and finishes the bitstream data to make it behave like other statefull encoder, I was impressed that it just worked over gstreamer [14:46] <ndufresne> the RC is funny, but patent free, and works well enough, it's a PID algo [14:46] <paulk-leonov> yup [14:46] <paulk-leonov> but I suspect the checkpoint part is not implemented correctly [14:46] <tfiga> grkblood13: please check some build manuals from the SDK. There could be some kernel configuration guide [14:46] <paulk-leonov> I didn't run the code so I couldn't really confirm [14:47] <paulk-leonov> but maybe that's why it's considered so bad [14:47] <paulk-leonov> in my case it does help to achieve CBR [14:47] <ndufresne> paulk-leonov: the check points are likely broken in the HW design [14:47] <paulk-leonov> ? [14:47] <paulk-leonov> no they work [14:47] <paulk-leonov> for sure [14:47] <ndufresne> not completely [14:47] <paulk-leonov> what's wrong with them? [14:48] <paulk-leonov> at least on px30 the hardware behaves as expected [14:48] <ndufresne> the hacks you see in mpp are from the reference code, in the ref code they have comment explaining the cases were they must not be used [14:48] <paulk-leonov> I even did the resulting Qp calculation by hand and all [14:48] <paulk-leonov> I don't really remember hacks in mpp [14:49] <ndufresne> basically, for some frame type, and some corner cases, there is overflow resulting in invalid stream [14:49] <paulk-leonov> ah ok [14:49] <paulk-leonov> maybe I just didn't hit that [14:49] <tfiga> ndufresne: paulk-leonov: h1 had many revisions [14:49] <tfiga> It's possible that later on something was fixed [14:49] <paulk-leonov> ok [14:49] <paulk-leonov> well anyway, it's doesn't make a huge different either [14:49] <paulk-leonov> difference [14:50] <paulk-leonov> so just frame-wide Qp is fine [14:50] <ndufresne> the design behind rk3399/26/28 was not updated, they only rewrote the register to avoid some patent [14:50] <ndufresne> so this is all about g1v6 [14:50] <paulk-leonov> lol that's the reason they changed the layout? [14:50] <ndufresne> I never played with a v7, and in VC8000E, the checkpoints are gone [14:51] <paulk-leonov> is v7 on the imx8mm? [14:51] <grkblood13> tfiga, so i found the .config file from their build manual. tons of variables in there, but nothing with "5000" in the name. not sure what im looking for. [14:51] <paulk-leonov> or osme later version [14:51] <ndufresne> paulk-leonov: I don't have a imx8m mini to verify [14:51] <paulk-leonov> well if it disappeared from later revision that's even less incentive to support it [14:52] <tfiga> grkblood13: I'm not familiar with Nvidia kernel build. Normally you would run something like make menuconfig [14:52] <tfiga> And that would give you a console GUI for editing the config [14:52] <ndufresne> paulk-leonov: that was my point, I'm sure it has some merit, since we all know that wather per macroblock thing you use, it's better then framed based [14:52] <tfiga> But in your case, there might be some custom build system which wouldn't work well with that [14:53] <tfiga> So please check the SDK documentation [14:53] <ndufresne> Intel added API for per-macroblock QP, which would have been even more powerful then checkpints, but they never actually implemented it in the drivers :-( [14:53] <paulk-leonov> ok [14:54] <paulk-leonov> anyway I'm fine with CBR being best effort [14:54] <paulk-leonov> I guess it more or less always is [14:54] <ndufresne> paulk-leonov: but think of it, that's a lot of data, we are already worried with v4l2 controls with pretty small set of data [14:54] <paulk-leonov> yes that too [14:54] <ndufresne> I also suspect that complexity analyses will move on to ML accelerator in the future [14:55] <ndufresne> but that might lead to exploding amount of encoder configurations [14:55] <ndufresne> we'll see [14:56] <paulk-leonov> then we'll need a quantum encoder to handle combinatory explosion :p [15:07] <grkblood13> tfiga, so i got into menuconfig and Device Drivers -> Multimedia Support -> Analog/Digital TV Support were dont unchecked. I enabled those. [15:09] <grkblood13> before I rebuild the kernel (which takes a while), do you think I should look for anything else? [15:22] <tfiga> I'd say better not change too much, as those vendor sources are often fragile [15:23] <tfiga> I haven't used dvb on linux myself, so not sure what to look for specifically [15:23] <tfiga> I guess you may have to do some trial and error [15:38] <Kwiboo> ndufresne: re rkvdec hevc, I have only tested frame-based decoding, it is working pretty good so I have not seen any need to try slice based decoding [15:39] <Kwiboo> see https://github.com/Kwiboo/linux-rockchip/commits/linuxtv-rkvdec-work-in-progress for current code, I plan to do some minor updates to improve rk3328 support (use 64-byte cache line etc) and test rk3288 hevc this weekend [15:50] <ndufresne> Kwiboo: nice, didn't you you looked at rk3288, the feld slightly different IP, I never looked deeper [15:51] <ndufresne> Kwiboo: the problem is that there is nothing in the uAPI to tell userspace if we can do 1 slice or have to do all slice, and to make it worst, there is only 16slices, while HEVC supports 600 [15:51] <ndufresne> of course if you hardcode your build for the platform, your fine, but that is not really my goal [16:08] <Kwiboo> ndufresne: the rk3288 hevc block regs seems to match rkvdec it just seems to miss the reg space needed for full h264 and vp9 and only a single cache, so I am expecting driver to only need minimal changes to support it [16:16] <Kwiboo> and I agree, for now I just hard coded 16 slices to get something running, a proper solution is needed for variable-length slice param ctrls :-) [17:01] <grkblood13> tfiga, Ive got activity lights! [17:15] <ndufresne> Kwiboo: and cedrus works fine frame based with that, or you haven't tested back cedrus with these changes ? [17:16] <ndufresne> btw, I was already on that branch, it's what I've been working with this week, as we don't have the G2 driver ready yet [17:17] <ndufresne> I'm starting to have mostly good frames now [17:29] <Kwiboo> ffmpeg should send 1-16 slice ctrls depending on what .cfg.dims is used by driver, so 1 ctrl for cedrus and 16 for rkvdec, 16 is the max currently defined in my ffmpeg code [17:38] <Kwiboo> cool, that branch has not changed since monday, I will push some updates after rk3288/rk3328 testing/tuning :-) [17:39] <ndufresne> uh, I see, didn't know that was how you handled it [17:40] <ndufresne> I just removed that dim in fact, cause it would otherwise fail if you don't set exactly 16 entries [18:27] <Kwiboo> I had one or two 4k sample video that used 2-4 slices so I just defined something low and shared that between ffmpeg/driver, ffmpeg was already trying to set .size to max_slices * sizeof(ctrl) [18:29] <Kwiboo> and since kernel just ignored anything beyond 1 * sizeof(ctrl), I just changed that to use max(1, ctrl.elms) instead of min(16, max_slices) [18:49] <ndufresne> Kwiboo: I wanted to ask, mpp enabled HW RPS for some revisions, did you tried this mode ? why did you decide for software RPS ? [19:02] <Kwiboo> ndufresne: I also noticed that they enabled it for the "v345" hw rev, guess it is the revision used in newer socs, the hw rev used in rk3328 is "v341", guessing based on mpp code the hw rps mode may be a new feature [19:08] <ndufresne> I see, so the HW you use likely needs the sofware version [19:09] <Kwiboo> yep, for h264 I can see a flag for hw/sw mode in docs, but for hevc it is only: "HEVC rps contains a number of slice data composed by 2 units of 32-byte data." [19:09] <ndufresne> Kwiboo: anyway, will probably need to add a third value to the decoding mode, slice based is expected to allow from 1 to N slices, frame based in H264 was done with the idea that we didn't have to pass any per slice data, so a third one should be frame base with slice data [19:10] <ndufresne> intereting that means they allow up to 300 slices, not 600 [19:11] <ndufresne> anyway, that limit depends on the level, 200 for 5.X and 600 slices for 6.X [19:12] <ndufresne> anyway, step a) will be to fix the control interface [19:12] <ndufresne> then adding a new enum will be trivial, just need to find a name [19:16] <Kwiboo> the bitmap for sps/pps listed in docs do not match 100% with hw, so I do not fully trust the docs, there is also another note with "sw_slice_num max value is change to 600", who knows what it supports :-) [20:51] <ndufresne> Kwiboo: 600 is level 6+, but looking at that level looks a bit more , but yeah, who knows, testing is key