↑back Search ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |
Who | What | When |
---|---|---|
*** | darvon has quit IRC (Ping timeout: 245 seconds)
lexano has quit IRC (Ping timeout: 260 seconds) | [00:22] |
........... (idle for 52mn) | ||
ndufresne has quit IRC (Read error: Connection reset by peer) | [01:14] | |
............................................. (idle for 3h44mn) | ||
theBear has quit IRC (Ping timeout: 250 seconds) | [04:58] | |
............................................................................................... (idle for 7h52mn) | ||
ribalda_ is now known as ribalda | [12:50] | |
sailus | hverkuil: Heippa! | [13:03] |
hverkuil | sailus: hi | [13:03] |
sailus | How do you do? | [13:04] |
hverkuil | very busy
but feel free to ask quick questions :-) It's one of those days where things only get worse, not better :-( made me wish I'd stayed in bed. | [13:04] |
sailus | hverkuil: I acked the final patch in the set. | [13:13] |
hverkuil | thanks | [13:13] |
sailus | Otherwise it's non-trivial and I don't have time to properly review it but I believe others have done that already.
So I guess we're good. | [13:13] |
............................................. (idle for 3h43mn) | ||
*** | benjiG has left | [16:57] |
.......................... (idle for 2h9mn) | ||
awalls has left | [19:06] | |
.......................... (idle for 2h8mn) | ||
mchehab | pinchartl: why the MC sometimes uses a mutex and sometimes use a spin_lock?
that doesn't sound right to me | [21:14] |
pinchartl | mchehab: as a general rule a spinlock is used for short section that don't need to sleep, with code that needs to be callable from interrupt context, and a mutex for larger code sections that are always called in non-sleepable context. if you tell me exactly what you're referring to I can probably provide more information | [21:17] |
mchehab | well, media_device_unregister* uses the spinlock...
but the methods to remove stuff uses the mutex that sounds wrong | [21:18] |
pinchartl | the spinlock protects the list of entities, to make entity lookup in non-sleepable context possible | [21:19] |
mchehab | but we're protecting all the rest with the mutex | [21:20] |
pinchartl | the graph mutex protects more complex operations, in particular graph walk
this will need to be revisited to implement dynamic changes to the graph | [21:20] |
mchehab | well, we're using it also for media_create_intf_link() / media_devnode_remove() | [21:21] |
pinchartl | I expect that most of the code wil need to be rewritten
s/wil/will/ | [21:21] |
mchehab | (as before: creating/removing things were using mutexes) | [21:21] |
pinchartl | s/the code/that code/ | [21:21] |
mchehab | I suspect that this is causing some very hard to track bugs, like the ones that Shuah complains, when she uses the Kernel instrumentation for memory/lock troubles | [21:22] |
pinchartl | I wonder whether we'll need rcu to handle dynamic updates | [21:22] |
mchehab | yes, RCU is one alternative
not easy to make it work | [21:22] |
pinchartl | I haven't analyzed the problem yet so I don't know what the best solution will be
but it will certainly require very careful design and lots of prototyping and testing | [21:23] |
mchehab | yes | [21:23] |
shuah | mutex hold during device removal is definitely a problem | [21:23] |
mchehab | yes, and we do that
at V4L2 core | [21:25] |
pinchartl | we'll need reference counting
references to entities don't bother me too much | [21:25] |
shuah | One thing I considered is making media_device_unregister execute only once | [21:25] |
mchehab | pinchartl: well, I removed refcount per your request ;) | [21:25] |
pinchartl | what bothers much is drivers using those references to travel the graph through links | [21:25] |
mchehab | I was using it on the first version of the MC patchset | [21:26] |
pinchartl | shuah: so you can remove a single entity only, and then it will always return an error ? :-)
mchehab: it wasn't needed without dynamic updates :-) | [21:26] |
mchehab | there's one problem with refcount: we'll likely need to dynamically allocate entity, instead of using media_entity embedded | [21:27] |
shuah | what I mean is keep state that we started unregister process upon first call and then return | [21:27] |
pinchartl | the reason I asked for it to be removed was that there was, in my opinion, no way to implement correct reference counting without a full design for dynamic updates
we would have ended up with some kind of reference counting that wouldn't have been wrong per se | [21:27] |
mchehab | pinchartl: the problem I'm facing is not related to dynamic update.... it is related to device unbind | [21:27] |
pinchartl | but that would most likely not have fulfilled the dynamic updates use case
requiring a rework of the code | [21:27] |
shuah | can we make graph rcu list? We probably have more readers than writers | [21:28] |
pinchartl | shuah: rcu is an option, but I have no way to tell whether it's a good one without studying it in detail first | [21:29] |
shuah | right - also it could be more complex because rcu constructs need to be followed for the solution to be correct | [21:29] |
pinchartl | it looks like it would be very complex, I agree | [21:31] |
shuah | What I mean is we will have to restructure the code and provide API so drivers don't hold mutex directly - it has to all go through API so rcu constructs don't have to spread out in driver code | [21:31] |
pinchartl | I think the base issue will be to implement an efficient and easy to use API for graph traversal without any race condition with entity addition/removal | [21:31] |
shuah | right | [21:32] |
pinchartl | rcu in driver code is likely the last thing we want :-) | [21:32] |
shuah | exactly | [21:32] |
pinchartl | if we were to expose rcu to drivers we would have absolute certainty that none of them would get it right :-) | [21:32] |
mchehab | agreed
well, for now I'll be balancing graph object creation/removal using the spinlock keeping the mutex *only* for graph traversal | [21:32] |
shuah | ok one thing we can do is reduce the critical sections that need mutex hold | [21:34] |
mchehab | (although this still doesn't sound right)
this way, I'll remove the mutex_lock() from the non-atomic context | [21:34] |
shuah | so read only use mutex and write cases use spinlock | [21:35] |
mchehab | that it is currently called during device removal | [21:35] |
shuah | readers could still contend on the mutex though - won't graph walks take longer | [21:36] |
mchehab | shuah: on a dynamic change, this is still problematic...
what happens if some graph elements are inserted or removed while graph traversal is running? | [21:36] |
pinchartl | the code is known to be racy currently
I would try to avoid making it worse but I'm not sure it makes sense fixing it completely before implementing dynamic updates | [21:36] |
mchehab | pinchartl: I suspect that the current code is broken for device driver removal
probably unnoticed, because this is something that people don't do on embedded environment | [21:37] |
pinchartl | device driver removal is broken by design
it's an unsupported feature of the kernel | [21:38] |
mchehab | pinchartl: well, but USB device removal is supported | [21:38] |
pinchartl | unbinding a device from a driver, on the other hand, is supposed to be supported in a race-free way | [21:38] |
mchehab | and this is where the problem happens
when a device is removed, the V4L core calls the *remove* functions in non-atomic context | [21:38] |
pinchartl | no, it's not supported. it works most of the time and is available to users, but it's known to be broken
device removal must be supported of course | [21:39] |
mchehab | two are protected by the spin_lock | [21:39] |
pinchartl | unplugging a device should not cause issues | [21:39] |
mchehab | the rest are "protected" by mutex | [21:40] |
pinchartl | the usb disconnect callback is called directly from an interrupt handler ? I thought there was a work queue nowadays
but anyway | [21:40] |
mchehab | except that mutex doesn't protect on non-atomic context and causes KASAN to hang | [21:40] |
pinchartl | when a usb device is disconnected | [21:40] |
mchehab | kasan: GPF could be caused by NULL-ptr deref or user memory accessgeneral protection fault: 0000 [#1] SMP KASAN | [21:41] |
pinchartl | there's no need to remove the v4l2 devices right away
look at what uvcvideo does, it only removes the v4l2 devices when the last reference goes away | [21:41] |
mchehab | pinchartl: the problem is acually when the au8522 subdev gots unbind
due to the device removal | [21:42] |
pinchartl | what does it do then ? | [21:42] |
mchehab | http://pastebin.com/Yuvyjg9c
[ Â 68.005776] BUG: sleeping function called from invalid context at include/linux/sched.h:2776 | [21:43] |
pinchartl | looks like me might need to design dynamic updates sooner than later
but not at nearly midnight I'm afraid | [21:45] |
↑back Search ←Prev date Next date→ Show only urls | (Click on time to select a line by its url) |