#v4l 2015-12-09,Wed

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
***darvon has quit IRC (Ping timeout: 245 seconds)
lexano has quit IRC (Ping timeout: 260 seconds)
[00:22]
........... (idle for 52mn)
ndufresne has quit IRC (Read error: Connection reset by peer) [01:14]
............................................. (idle for 3h44mn)
theBear has quit IRC (Ping timeout: 250 seconds) [04:58]
............................................................................................... (idle for 7h52mn)
ribalda_ is now known as ribalda [12:50]
sailushverkuil: Heippa! [13:03]
hverkuilsailus: hi [13:03]
sailusHow do you do? [13:04]
hverkuilvery busy
but feel free to ask quick questions :-)
It's one of those days where things only get worse, not better :-(
made me wish I'd stayed in bed.
[13:04]
sailushverkuil: I acked the final patch in the set. [13:13]
hverkuilthanks [13:13]
sailusOtherwise it's non-trivial and I don't have time to properly review it but I believe others have done that already.
So I guess we're good.
[13:13]
............................................. (idle for 3h43mn)
***benjiG has left [16:57]
.......................... (idle for 2h9mn)
awalls has left [19:06]
.......................... (idle for 2h8mn)
mchehabpinchartl: why the MC sometimes uses a mutex and sometimes use a spin_lock?
that doesn't sound right to me
[21:14]
pinchartlmchehab: as a general rule a spinlock is used for short section that don't need to sleep, with code that needs to be callable from interrupt context, and a mutex for larger code sections that are always called in non-sleepable context. if you tell me exactly what you're referring to I can probably provide more information [21:17]
mchehabwell, media_device_unregister* uses the spinlock...
but the methods to remove stuff uses the mutex
that sounds wrong
[21:18]
pinchartlthe spinlock protects the list of entities, to make entity lookup in non-sleepable context possible [21:19]
mchehabbut we're protecting all the rest with the mutex [21:20]
pinchartlthe graph mutex protects more complex operations, in particular graph walk
this will need to be revisited to implement dynamic changes to the graph
[21:20]
mchehabwell, we're using it also for media_create_intf_link() / media_devnode_remove() [21:21]
pinchartlI expect that most of the code wil need to be rewritten
s/wil/will/
[21:21]
mchehab(as before: creating/removing things were using mutexes) [21:21]
pinchartls/the code/that code/ [21:21]
mchehabI suspect that this is causing some very hard to track bugs, like the ones that Shuah complains, when she uses the Kernel instrumentation for memory/lock troubles [21:22]
pinchartlI wonder whether we'll need rcu to handle dynamic updates [21:22]
mchehabyes, RCU is one alternative
not easy to make it work
[21:22]
pinchartlI haven't analyzed the problem yet so I don't know what the best solution will be
but it will certainly require very careful design
and lots of prototyping and testing
[21:23]
mchehabyes [21:23]
shuahmutex hold during device removal is definitely a problem [21:23]
mchehabyes, and we do that
at V4L2 core
[21:25]
pinchartlwe'll need reference counting
references to entities don't bother me too much
[21:25]
shuahOne thing I considered is making media_device_unregister execute only once [21:25]
mchehabpinchartl: well, I removed refcount per your request ;) [21:25]
pinchartlwhat bothers much is drivers using those references to travel the graph through links [21:25]
mchehabI was using it on the first version of the MC patchset [21:26]
pinchartlshuah: so you can remove a single entity only, and then it will always return an error ? :-)
mchehab: it wasn't needed without dynamic updates :-)
[21:26]
mchehabthere's one problem with refcount: we'll likely need to dynamically allocate entity, instead of using media_entity embedded [21:27]
shuahwhat I mean is keep state that we started unregister process upon first call and then return [21:27]
pinchartlthe reason I asked for it to be removed was that there was, in my opinion, no way to implement correct reference counting without a full design for dynamic updates
we would have ended up with some kind of reference counting
that wouldn't have been wrong per se
[21:27]
mchehabpinchartl: the problem I'm facing is not related to dynamic update.... it is related to device unbind [21:27]
pinchartlbut that would most likely not have fulfilled the dynamic updates use case
requiring a rework of the code
[21:27]
shuahcan we make graph rcu list? We probably have more readers than writers [21:28]
pinchartlshuah: rcu is an option, but I have no way to tell whether it's a good one without studying it in detail first [21:29]
shuahright - also it could be more complex because rcu constructs need to be followed for the solution to be correct [21:29]
pinchartlit looks like it would be very complex, I agree [21:31]
shuahWhat I mean is we will have to restructure the code and provide API so drivers don't hold mutex directly - it has to all go through API so rcu constructs don't have to spread out in driver code [21:31]
pinchartlI think the base issue will be to implement an efficient and easy to use API for graph traversal without any race condition with entity addition/removal [21:31]
shuahright [21:32]
pinchartlrcu in driver code is likely the last thing we want :-) [21:32]
shuahexactly [21:32]
pinchartlif we were to expose rcu to drivers we would have absolute certainty that none of them would get it right :-) [21:32]
mchehabagreed
well, for now I'll be balancing graph object creation/removal using the spinlock
keeping the mutex *only* for graph traversal
[21:32]
shuahok one thing we can do is reduce the critical sections that need mutex hold [21:34]
mchehab(although this still doesn't sound right)
this way, I'll remove the mutex_lock() from the non-atomic context
[21:34]
shuahso read only use mutex and write cases use spinlock [21:35]
mchehabthat it is currently called during device removal [21:35]
shuahreaders could still contend on the mutex though - won't graph walks take longer [21:36]
mchehabshuah: on a dynamic change, this is still problematic...
what happens if some graph elements are inserted or removed while graph traversal is running?
[21:36]
pinchartlthe code is known to be racy currently
I would try to avoid making it worse
but I'm not sure it makes sense fixing it completely before implementing dynamic updates
[21:36]
mchehabpinchartl: I suspect that the current code is broken for device driver removal
probably unnoticed, because this is something that people don't do on embedded environment
[21:37]
pinchartldevice driver removal is broken by design
it's an unsupported feature of the kernel
[21:38]
mchehabpinchartl: well, but USB device removal is supported [21:38]
pinchartlunbinding a device from a driver, on the other hand, is supposed to be supported in a race-free way [21:38]
mchehaband this is where the problem happens
when a device is removed, the V4L core calls the *remove* functions in non-atomic context
[21:38]
pinchartlno, it's not supported. it works most of the time and is available to users, but it's known to be broken
device removal must be supported of course
[21:39]
mchehabtwo are protected by the spin_lock [21:39]
pinchartlunplugging a device should not cause issues [21:39]
mchehabthe rest are "protected" by mutex [21:40]
pinchartlthe usb disconnect callback is called directly from an interrupt handler ? I thought there was a work queue nowadays
but anyway
[21:40]
mchehabexcept that mutex doesn't protect on non-atomic context and causes KASAN to hang [21:40]
pinchartlwhen a usb device is disconnected [21:40]
mchehabkasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] SMP KASAN [21:41]
pinchartlthere's no need to remove the v4l2 devices right away
look at what uvcvideo does, it only removes the v4l2 devices when the last reference goes away
[21:41]
mchehabpinchartl: the problem is acually when the au8522 subdev gots unbind
due to the device removal
[21:42]
pinchartlwhat does it do then ? [21:42]
mchehabhttp://pastebin.com/Yuvyjg9c
[   68.005776] BUG: sleeping function called from invalid context at include/linux/sched.h:2776
[21:43]
pinchartllooks like me might need to design dynamic updates sooner than later
but not at nearly midnight I'm afraid
[21:45]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)