#v4l 2019-03-08,Fri

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***cybrNaut has quit IRC (Ping timeout: 258 seconds) [03:34]
........ (idle for 35mn)
ndufresnevtnewb, gst-plugins-bad have intel MSDK support now, it replaces the other stuff
glad you worked it ou
vtnewb, btw, in v4l2-ctl norm is also called, standard, v4l2-ctl --list-standards / --set-standard / --get-detected-standard
[04:09]
......................... (idle for 2h1mn)
***bingbu has quit IRC (Ping timeout: 244 seconds) [06:13]
............. (idle for 1h2mn)
ayaka has quit IRC (Ping timeout: 250 seconds) [07:15]
............................................................ (idle for 4h56mn)
hverkuilpinchartl: ping [12:11]
pinchartlhverkuil: pong [12:19]
hverkuilpinchartl: do you have 5-10 minutes? I could use your advice on an MC issue I found. [12:20]
pinchartlgo ahead [12:22]
hverkuilvimc is loaded. Then I stream video using v4l2-ctl --stream-mmap.
Next I unbind all subdevs, while streaming is in progress.
When one or more subdevs are unbound, the vimc driver will unregister the media device (the topology is no longer valid when that happens).
When the media device is unregistered all entities in the graph are destroyed: media_gobj_destroy(&entity->graph_obj);
This sets the entity->graph_obj.mdev to NULL.
The problem is that when the video device closes it calls stop_streaming which in turn calls media_pipeline_stop(&vcap->vdev.entity);
And media_pipeline_stop uses entity->graph_obj.mdev, which by now is a NULL pointer.
[12:25]
pinchartllifetime management is amazing, isn't it ? :-)
sailus: ^
[12:30]
hverkuilI am currently fixing this by testing if entity->graph_obj.mdev is NULL and just returning in media_pipeline_stop in that case. [12:31]
pinchartlthat's a hack
(and could possibly be racy ?)
[12:31]
hverkuilI suspected as much, but I am not sure what should really happen. [12:32]
pinchartlI think the entity should remain valid as long as it's in use
and thus as long as the device is streaming
here's what I think should happen
1. at the beginning of the subdev unbind, set a flag to indicate an unbind is in progress, and wake up all tasks waiting (for instance waiting on buffers)
2. at the beginning of the top-level ioctl handler, increment a use count to tell an ioctl is in progress
3. right before the increment, check the unbind in progress flag, and return an error if it is set
2. and 3. should be done atomically
4. after setting the flag in 1., wait for the ioctl counter to return to 0
5. at the end of the top-level ioctl handler, decrease the counter, and wake up any task (4.) waiting on it if the counter reaches 0
6. the subdev unbind can now proceed
in addition to that, you need
(I think)
to refcount entities, and only delete them when the refcount reaches 0
although I'm not sure that's needed at this point
but you need to ensure that setting entity->graph_obj.mdev isn't set to NULL until all file handles are closed
so you need to delay media_gobj_destroy()
[12:32]
hverkuilI think that's the core problem here. [12:37]
pinchartlit's the core problem, but without 1-6 above, you'll have a race condition
so both are needed I think
I think we need to split media_gobj_destroy() out of entity unregistration
and refcount entities
an entity unregistration we should mark the entity as being removed from the graph, possibly with a special status, but we need to ensure that all references stay valid
certainly the entity pointer
but also the things that the entity point to, possibly
in a nutshell
we store lots of pointers in lots of structures
each time that is done, we need to ensure we either get a new reference, or borrow a reference
borrowing a reference means skipping refcount++ on the account that we can ensure that the pointer will stay valid for the full lifetime of the structure counting the pointer
e.g.
if you want to pass 20 arguments to a function, and decide to store them in a struct
and one of those arguments is a media_entity pointer
provided that the function doesn't store the pointer internally, there's no need to refcount++ the entity just to store its pointer in the struct
(it's a simple example)
it's really a contract
my advice would be to start by writing down the rules of the contract (what is refcounted, what outlives what, ...)
and then we'll have to go through core code and check each pointer to ensure it doesn't break the contract
[12:37]
***svarbanov has quit IRC (Ping timeout: 255 seconds) [12:43]
hverkuilpinchartl: to what extent has sailus been working on this? (too long ago since I last looked at his patch series)
do you remember?
[12:44]
pinchartlsailus: ping
I'm afraid I don't
he hasn't worked on 1-6 at all as far as I remember
and I don't think he has really worked on entity lifetime management, only on the media_device
(but I could be wrong)
[12:46]
hverkuilRealistically I won't be working on your proposal any time soon. It's a lot of work and the HW codec project is simply more important. [12:47]
pinchartlthat's fine, we've lived with those races for so long that there's no urgency
I mean, it's very easy to crash most V4L2/DVB/MC drivers
and it's been the case for several years
we know this needs to be fixed, we know it's a huge amount of work
and nobody has time for it at the moment
the core issue is that lifetime management hasn't been taken into account for a long time, and especially during the last core rework of the MC
nowadays a few more people understand the problem, which is already a good step forward
we still need to fix it though :-)
but please, no workaround that reduces the size of the race window
those workarounds make reworking the code to really fix the problem more difficult
it's still fixable now
but at some point, if we keep adding more and more bandaid, it will be too difficult to fix it and only a rework will be possible
(remember how vb1 was broken beyond repair and couldn't be fixed in-place)
so, for now, we have to acknowledge that unbind can crash
[12:47]
hverkuilI'll keep this test in test-media, but will comment it out for now. [12:51]
pinchartlwhat is the HW codec project you mentioned ? getting the cedrus driver merged ? [12:52]
hverkuilSeveral things: reviewing and hopefully merging stateless codec support for rockchip and codecs other then just MPEG2 (h264, vpx, hevc).
And adding compliance tests (needed before the HW codec documentation can be merged).
I'm behind on that since the work on the regression tests is taking longer than I expected.
[12:53]
pinchartlI'm not sure that's more important than ensuring we have solid foundations to build drivers on, but I understand priorities may vary :-)
there's little incentive to fix issues such as lifetime management
and I think it's the responsibility of maintainers to set priorities there
remember how Linus threatened to stop merging any arch/arm/ code if nothing was done about board files ?
I don't think we would have moved to DT without that
and I think it was the right move
[12:55]
hverkuilpinchartl: feel free to take over from me! :-) [12:58]
pinchartlI don't think it will ever happen if we don't collectively agree that it is a priority and that new code will not be merged until we get it fixed (or at least if we don't get it fixed by a given deadline) [12:59]
hverkuilBTW, I completely agree with you and I hope to get back to this. But right now I have to switch back to the codec work. [12:59]
pinchartlI think it's the kind of topic that would benefit from a code camp. we get you, Sakari and I locked in a room for a week, and only get out when it's done :-) [13:00]
.... (idle for 15mn)
sailuspinchartl: Pong. [13:15]
pinchartlsailus: please help yourself to the backlog :-) [13:17]
sailusIt's way too long... :-P
I remember discussing the issue two years ago. :-)
And I believe a proper solution is needed.
And it will be based on refcounting.
[13:18]
.............. (idle for 1h9mn)
vtnewbndufresne jeez if the intel MSDK is built into GST-plugins-bad that's both awesome and terrible, because I spent a ton of hours bootstrapping their installation instructions into this Up Square board hah. I'll check it out though, I'd love to see it stabil enough for me to get out of this old kernel. Thanks again for your help [14:29]
.............. (idle for 1h7mn)
***benjiG has left [15:36]
hverkuilshuah: tested the Media Device Allocator API and it all looks good. I plan on making a pull request on Monday. Phew... [15:41]
shuahhverkuil: awesome - thanks. You made my Friday!
hverkuil: I have some data on VBI I want it to run it by you. I will send email
[15:46]
hverkuilshuah: you're welcome!
shuah: a suggestion for your test document: describe testing with just arecord, v4l2-ctl and dvbv5-scan. I.e., all low-level utilities that can easily be used in regression test scripts.
mchehab described yesterday in this channel how to test with dvbv5-scan and it worked great.
[15:48]
shuahhverkuil: good point - I will update it - will check the logs for mchehab's tips on dvbv5-scan [15:50]
hverkuilmchehab: thanks for the help!
shuah: hint for v4l2-ctl: the --sleep option is very useful. You can do things like 'v4l2-ctl -T --sleep 10' to get the analog tuner information and v4l2-ctl will sleep for 10 seconds afterwards while keeping the video device open. Attempting to use dvbv5-scan in another shell will nicely return busy in that case.
[15:51]
shuahthanks. I am going to add this to the document. [15:54]
hverkuilI also fixed a bug in v4l2-compliance. It failed testing the au0828 media device, but that was the compliance test itself. [15:54]
mchehabhverkuil: anytime [15:55]
hverkuilv4l2-compliance -m /dev/mediaX now passes. [15:55]
shuahIt is becoming tough to test with xawtv, vlc, etc. - analog channels are disappearing fast and cable is going too [15:55]
hverkuilfor tests like this I want to only use very basic utilities. If something fails it is really hard to debug, whereas with these simple utilities it is much easier to debug. [15:57]
shuahyeah especially when strace data is needed [15:57]
hverkuilAnd also to make a test script to automate testing. [15:57]
shuahright
I should look into automating some of this testing
[15:57]
***nst has quit IRC (Ping timeout: 240 seconds) [16:12]
................ (idle for 1h19mn)
shuahhverkuil: no rush on this - can v4l2-ctl be used to capture vbi - I am using qv4l2 - looking for simpler alternative [17:31]
............. (idle for 1h3mn)
hverkuilshuah: v4l2-ctl -d /dev/vbi0 --stream-mmap [18:34]
shuahhverkuil: I tried that and see "unsupported stream type" - I will look into this further. [18:44]
hverkuilyou have the very latest v4l-utils?
Support for vbi streaming was added in November last year.
[18:45]
shuahMy v4l2 says I cloned it back in December - I will update and try [18:53]
.......................................... (idle for 3h29mn)
***indy has quit IRC (Remote host closed the connection) [22:22]
....... (idle for 33mn)
tfiga has quit IRC (Read error: Connection reset by peer) [22:55]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)