#v4l 2019-05-24,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
ezequielgndufresne: iirc, vicodec is still not following (or was last time i tried) the stateful api. and so it doesn't work with gstreamer. i sent a few patches that were not applied, and then dropped the ball. [00:00]
ndufresne: you do have this: https://patchwork.kernel.org/cover/10614385/ right? [00:05]
................................................................................ (idle for 6h38mn)
hverkuilezequielg: are you using this vicodec branch: https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vicodec
That should follow the stateful API including changes that are planned for the next RFC of that API.
ndufresne: bbrezillon: not setting the hold flag when userspace should will indeed result in undefined behavior. That isn't a problem IMHO, except that this should be tested to ensure the driver won't crash when this happens.
[06:43]
................ (idle for 1h17mn)
***arnd has quit IRC (Ping timeout: 252 seconds) [08:02]
..... (idle for 24mn)
bbrezillonhverkuil: are you okay with the "slice bufs might be retained until the frame if fully decoded" solution
I talking about the multi-slice per frame case
where you'd have to trigger X V4L reqs to get the final capture frame, each time with a new slice buffer
I'm currently trying to implement that in the rockchip driver, and the greatly complexify the whole thing (now I have to figure out when ouput buffers should be retained too)
[08:26]
..... (idle for 20mn)
paulk-leonovjernej, decoding the same MPEG-2 frame with your clock patch gives a significant boost, going from 5683 us to 2495 us
on H3
that is
and that's an I-frame, subsequent predicted frames take < 2 ms
so that should be pretty good for 60 fps :)
[08:48]
........ (idle for 36mn)
hverkuilbbrezillon: so this is for HW that needs all slices of a frame before it can start decoding?
paulk-leonov: are you planning to post a series for hevc as well?
[09:24]
bbrezillonhverkuil: yes
paulk-leonov: I think Kwiboo needed extra flags attached to the DPB/ref entries
we need a way to define which field of an interlaced capture frame the reference is pointing to
[09:25]
hverkuilbbrezillon: and in that case you hold on to the OUTPUT buffers until all the slices are present, and then you start decoding and when done release all the requests? [09:27]
paulk-leonovhverkuil, yes definitely
hverkuil, (about hevc)
[09:31]
bbrezillonhverkuil: that's what ndufresne suggested, yes [09:31]
paulk-leonovhverkuil, also these series require patch https://patchwork.linuxtv.org/patch/56206/ [09:31]
bbrezillonthis being said, I started implementing that in the rockchip driver and it's a huge mess [09:32]
paulk-leonovbbrezillon, the plane is to merge things as-is and rework them as needed later -- there will be lots more changes to bring anyway
the plan*
[09:32]
bbrezillonpaulk-leonov: ok
that'd be great to have them merge soon then :)
*merged
[09:32]
hverkuilbbrezillon: I don't think that's a good idea. It's the holding on to the OUTPUT buffers that's the problem: that has implications for userspace since it means it will have to request more buffers since it can't recycle an OUTPUT buffer until the frame was decoded. [09:33]
bbrezillonhverkuil: yes, that's what I pointed out yesterday [09:34]
hverkuilbbrezillon: It's OK if the OUTPUT buffers are copied to an internal scratch buffer first. [09:34]
bbrezillonbut that's just the tip of the iceberg
there's even more bookeeping to be done driver side because of that
[09:34]
hverkuilbut that means it is no longer zero copy. [09:34]
bbrezillonyes, I know :-/ [09:35]
hverkuilI think that if the driver requires all slices to be present, then there are two options (sketched very high-level, the details are up to you guys!): [09:36]
bbrezillonthe other option would be to allow passing all the slices in one decode-frame operation [09:36]
paulk-leonovhverkuil, both series are nowsent out :)
now out*
[09:37]
Kwiboobbrezillon: thats the way I want to do it in ffmpeg, all slices in one output buffer ;-) [09:37]
hverkuil1) Add a new pixelformat to indicate that the driver requires all slices to be part of a single OUTPUT buffer. [09:37]
paulk-leonovdidn't read all the backlog, but "multi-slice per frame" is not a specific case or a feature, it's the de-facto natural granularity for video decoding [09:38]
hverkuil2) Signal somehow that all OUTPUT buffers containing slices for a frame should be added to the same request.
I think 1 is the easiest.
[09:38]
bbrezillonhverkuil: #1 should also specify the separator (start_code) format between each slice [09:39]
hverkuilYes. [09:40]
paulk-leonovsee https://www.mail-archive.com/linux-media@vger.kernel.org/msg146976.html
I don't think a pixel format is a relevant solution to this
[09:40]
bbrezillonpaulk-leonov: no matter how we signal this, if we want zero copy, it has to be passed as a single output buffer containing all the slices [09:41]
paulk-leonovbbrezillon, why is that? [09:41]
hverkuilActually, userspace can check if VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF is set: this only makes sense for hardware that can properly process slices. [09:41]
paulk-leonovdoesn't ffmpeg get data per-slice anyway?
hverkuil, every decoding block should be able to operate per-slice
[09:42]
bbrezillonpaulk-leonov: I'm talking about HW support here [09:42]
hverkuilA driver that needs all slices for a frame would simply not set that flag. [09:42]
bbrezillonpaulk-leonov: the hantro HW is not fully stateless [09:42]
paulk-leonovbbrezillon, maybe, but IIRC we've established that it can operate per-slice [09:43]
bbrezillonthat is, if you decode the first slice of a frame, the next operation you *have to* do is decode the second slice
if you do something else in between, the state is lost
and you have to go back to decoding the first slice again
[09:43]
paulk-leonovmhh [09:44]
bbrezillonthat's a problem if we want to support multiple users [09:44]
paulk-leonovso there is some hardcoded address for metadata or something? [09:44]
bbrezillondon't know what's kept in the state exactly [09:44]
paulk-leonovor it reuses the same SRAM or something
well it feels like it could just be an implementation detail
that we don't know how to handle yet
[09:44]
bbrezillonnot really [09:45]
paulk-leonovif there's this kind of restriction, then yes, the driver should probably only allow one client [09:45]
bbrezillonunless by implementation detail, you mean we can't support zero-copy for that case [09:45]
hverkuilAn alternative is to have a flag added to the format (ENUM_FMT) indicating that the HW can't decode until all slices are present. And support a scratch buffer as well. So userspace can either just push one slice per request and the driver will copy to a scratch buffer, or software can combine all slices in one buffer, clear the HOLD flag and give it to the driver. Which will then just decode that buffer without copying. [09:45]
paulk-leonovI don't see how it's related to zero-copy
hverkuil, that's pretty much what I was suggesting on the email I linked
have the same interface with either 1 or n slices
and the driver indicates what it can do
either way, we can't assume that gathered slices without per-slice metadata will work well
so if we pass multiple slices per output buffer in a single request, we need control arrays
[09:45]
bbrezillonhverkuil: sounds like something that could work [09:48]
Kwiboopaulk-leonov: I would belive that if we submit multiple slices (including start_code in between) the hw only would need initial slice header information, why would it need slice information for each slice if hw is capable of frame by frame decoding ? [09:48]
hverkuilAn alternative (reading back bbrezillon's description of the problem) is that the driver can influence the scheduling of jobs: i.e. while decoding slices for a frame, block others until the full frame is finished. [09:49]
bbrezillonhverkuil: but that means you can then have a deadlock caused by userspace [09:49]
paulk-leonovKwiboo, you're assuming that the hardware will do slice header parsing
Kwiboo, which IIRC is the case on the hantro but not generally speaking
[09:49]
bbrezillonhverkuil: I mean, assuming we allow passing slice by slice [09:50]
paulk-leonovbbrezillon, we absolutely need the ability to work slice-by-slice, I think it is non-negotiable at this point :) [09:50]
bbrezillonpaulk-leonov: I think any HW that's capable of decoding a full frame is also able to parse the bitstream, at least to some extent [09:50]
hverkuilbbrezillon: it's going to hurt somewhere. [09:50]
paulk-leonovbbrezillon, I don't see why that would have to be true [09:51]
bbrezillonpaulk-leonov: I'm not arguing against the slice by slice approach [09:51]
paulk-leonovit would make sense, but it's an arbitrary choice [09:51]
Kwiboopaulk-leonov: exactly, and for other case when hw do not support header parsing each slice can be submitted independently [09:51]
paulk-leonovmhh [09:51]
bbrezillonI'm just saying that we also need frame by frame support [09:51]
paulk-leonovKwiboo, what I have in mind is rk3399's ability to pre-program registers [09:51]
bbrezillonit's not one or the other
I think we need both
[09:51]
hverkuilIf passing one slice per buffer is a requirement (not unreasonable IMHO), then there are two choices: either change the scheduling, or copy to a scratch buffer. At least, I can't think of another solution. [09:52]
paulk-leonovbbrezillon, understood :)
Kwiboo, so we could use a per-frame interface where the regs for each slice are pre-programmed
which would be efficient use of the hardware
but that requires having the parsed values
so this makes me think that we also need an array of the per-slice controls for the per-frame case
[09:52]
bbrezillonhverkuil: in that case, I guess the extra copy is our best option [09:54]
paulk-leonovhverkuil, actually we could have it so that the full-frame format matches what userspace gets from the bitstream
although it would still have to parse it, we could avoid copies this way
[09:54]
bbrezillonI don't thing preventing multi-user usage or allowing userspace to lock the whole thing is acceptable [09:55]
Kwiboopaulk-leonov: for such case you would still need to parse all slices, so submitting them one by one should not be an issue? would it?
another more pressing latency / performance related issue is that in current request handling all processing (building ref list etc in rk case) is done when vpu does nothing (single client), requests should probably have a pre-process stage that can run at the same vpu is running
[09:55]
paulk-leonovthen there are two cases: either the format works for the hardware and we have no bounce buffer, either it doesn't and the driver re-orders things
Kwiboo, you *always* need to parse all the slices anyway
which is different than doing an extra copy of the whole thing
[09:55]
bbrezillontfiga: ^ [09:56]
paulk-leonovKwiboo, yup that would be a nice optimization and per-frame submission can help
(well maybe not that much, it's a slightly different perf issue)
Kwiboo, we could have a "prepare" m2m step that the core allows to take place while a m2m job is running
which just sets up cached registers
should be dead easy with regmap
[09:56]
Kwiboopaulk-leonov: that sounds like a good idea, then the main thing to do in device_run would be to flush regmap to hwregs, should give us best possible utilization of vpu with a single client [10:02]
paulk-leonovagreed [10:02]
bbrezillonKwiboo: talking about regmap
did you consider using the regmap infra to deal with reg layout differences between 2 HW revisions?
[10:06]
Kwiboopaulk-leonov: rk mpp library even keep a cache of up to 16 prepared hwregs to achieve best possible performance, in order to match that a "prepare" step at submit of request may be needed ?
bbrezillon: no I did not, I know to little about what you can do with regmaps ;)
[10:07]
bbrezillonmight be worth giving it a try
hverkuil, paulk-leonov: so, to sum-up, the idea is to keep per-slice decoding with the HOLD_CAP_BUF flag approach, and them let drivers do the buffering (we might be able to provide generic helpers to do that) when they can't do per-slice decoding
[10:08]
............ (idle for 55mn)
hverkuil: crazy idea, but who knows => could we return -EAGAIN when a slice decoding is submitted but the HW needs more data?
this way we would let userspace continue filling the output buffer where it stopped and submit things again
paulk-leonov, Kwiboo ^
[11:06]
paulk-leonovseems tricky to coordinate [11:12]
bbrezilloncan you elaborate?
doesn't sound more tricky than the other things we've been suggesting to me :P
[11:12]
hverkuilbbrezillon: if this is done, then it has to happen at request validation time. And EAGAIN is definitely not the right error code for that.
And basically isn't this the same as setting a flag telling userspace that all slices should be in a single buffer?
[11:16]
bbrezillonfair enough
but this solution has been rejected, right?
[11:18]
hverkuilpaulk-leonov: I had hoped to make a PR for the cedrus driver, but I've run out of time for today. It did pass the test build, but I want to do a final review of the series before posting the PR. [11:20]
paulk-leonovhverkuil, sure thing, thanks! [11:20]
hverkuilPlease double check that this branch contains all the patches you need: https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=for-v5.3b
I had to add the coda patches as well to prevent conflicts (it appears you based your patches on top of those, or at least the v4l2-ctrls.c patch)
[11:21]
paulk-leonovhverkuil, looks good to me [11:29]
........ (idle for 38mn)
bbrezillonpaulk-leonov, hverkuil, ndufresne, tfiga: so, I'm gonna try and implement paulk-leonov's suggestion (https://www.mail-archive.com/linux-media@vger.kernel.org/msg146976.html)
any objections
?
[12:07]
paulk-leonovnot on my side :)
and thanks!
[12:10]
..... (idle for 23mn)
bbrezillonpaulk-leonov: did you already think about how slice ctrls should be exposed now that we can pass max 16 slices per decode operation?
should we extend V4L2_CID_MPEG_VIDEO_H264_SLICE_PARAMS with a param so we have 16 different ctrl ids (one per slice)
?
[12:33]
paulk-leonovbbrezillon, we can simply pass a control array instead :) [12:42]
bbrezillonI think I found my answer (you can use arrays of controls) [12:42]
paulk-leonovhehe [12:42]
bbrezillonyep, just figured that out [12:42]
........................ (idle for 1h58mn)
***benjiG has left [14:40]
bbrezillonpH5: nice!
one of use will have to rebase its work on the other, cause the branch you based your work on has changed a bit (rockchip_vpu_common.h no longer exists)
*one of us
[14:41]
***prabhakarlad has left [14:43]
..... (idle for 23mn)
pH5bbrezillon: I'd be happy to rebase, can you point me to the right place?
rk-vpu-mpeg2-v5 rebased onto media-tree master maybe?
[15:06]
bbrezillonpH5: https://github.com/bbrezillon/linux/commits/rk-vpu-h264-vp8
up to https://github.com/bbrezillon/linux/commit/ff6b6d2589ecc72817abafb9bbf921021b609e07
[15:10]
pH5bbrezillon: ok, thanks. has that been posted yet? [15:11]
bbrezillonI'll rebase my H264 changes on top once your changes are merged
ezequielg: will post a v6 on monday (or is it v7? I don't remember)
[15:11]
ezequielgv6
according to my math.
[15:11]
pH5alright [15:13]
bbrezillonpH5: looks like everything fits in a single driver in the end
except that you might have to split m2m nodes
[15:18]
....... (idle for 31mn)
pH5bbrezillon: yes, I think you were right. sharing all the driver boilerplate is worth the added indirections in this case. [15:50]
............... (idle for 1h10mn)
***lucaceresoli has quit IRC (Remote host closed the connection) [17:00]
....................................... (idle for 3h11mn)
flokhi. i believe there's a problem in libv4l-dev. I ask it (via VIDIOC_S_FMT) to set 640x480 yet I get the full raspberry pi camera resolution (3kx2k) [20:11]
................. (idle for 1h22mn)
ezequielgndufresne: hey
fwiw, these work for me:
gst-launch-1.0 -v v4l2src io-mode=dmabuf-import ! kmssink driver-name=virtio_gpu force_modesetting=true
gst-launch-1.0 -v v4l2src io-mode=dmabuf ! kmssink driver-name=virtio_gpu force_modesetting=true
gst-launch-1.0 -v v4l2src io-mode=mmap ! kmssink driver-name=virtio_gpu force_modesetting=true
on 5.1.0-rc5
I don't have any crazy patches.. I think (!)
[21:33]
............. (idle for 1h3mn)
ndufresne: let me know if i'm missing anything, but doesn't this cover all cases? [22:38]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)