summaryrefslogtreecommitdiffstats
path: root/kernel/rcu/tree.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2017-05-101-417/+293
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes are: - Debloat RCU headers - Parallelize SRCU callback handling (plus overlapping patches) - Improve the performance of Tree SRCU on a CPU-hotplug stress test - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Open-code the rcu_cblist_n_lazy_cbs() function rcu: Open-code the rcu_cblist_n_cbs() function rcu: Open-code the rcu_cblist_empty() function rcu: Separately compile large rcu_segcblist functions srcu: Debloat the <linux/rcu_segcblist.h> header srcu: Adjust default auto-expediting holdoff srcu: Specify auto-expedite holdoff time srcu: Expedite first synchronize_srcu() when idle srcu: Expedited grace periods with reduced memory contention srcu: Make rcutorture writer stalls print SRCU GP state srcu: Exact tracking of srcu_data structures containing callbacks srcu: Make SRCU be built by default srcu: Fix Kconfig botch when SRCU not selected rcu: Make non-preemptive schedule be Tasks RCU quiescent state srcu: Expedite srcu_schedule_cbs_snp() callback invocation srcu: Parallelize callback handling kvm: Move srcu_struct fields to end of struct kvm rcu: Fix typo in PER_RCU_NODE_PERIOD header comment rcu: Use true/false in assignment to bool rcu: Use bool value directly ...
| * rcu: Open-code the rcu_cblist_n_lazy_cbs() functionPaul E. McKenney2017-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * rcu: Open-code the rcu_cblist_n_cbs() functionPaul E. McKenney2017-05-021-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * rcu: Open-code the rcu_cblist_empty() functionPaul E. McKenney2017-05-021-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>
| * srcu: Make rcutorture writer stalls print SRCU GP statePaul E. McKenney2017-04-261-8/+4
| | | | | | | | | | | | | | | | | | | | | | In the past, SRCU was simple enough that there was little point in making the rcutorture writer stall messages print the SRCU grace-period number state. With the advent of Tree SRCU, this has changed. This commit therefore makes Classic, Tiny, and Tree SRCU report this state to rcutorture as needed. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Mike Galbraith <efault@gmx.de>
| *-. Merge branches 'doc.2017.04.12a', 'fixes.2017.04.19a' and 'srcu.2017.04.21a' ↵Paul E. McKenney2017-04-211-407/+289
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | into HEAD doc.2017.04.12a: Documentation updates fixes.2017.04.19a: Miscellaneous fixes srcu.2017.04.21a: Parallelize SRCU callback handling
| | | * rcu: Make non-preemptive schedule be Tasks RCU quiescent statePaul E. McKenney2017-04-211-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a call to schedule() acts as a Tasks RCU quiescent state only if a context switch actually takes place. However, just the call to schedule() guarantees that the calling task has moved off of whatever tracing trampoline that it might have been one previously. This commit therefore plumbs schedule()'s "preempt" parameter into rcu_note_context_switch(), which then records the Tasks RCU quiescent state, but only if this call to schedule() was -not- due to a preemption. To avoid adding overhead to the common-case context-switch path, this commit hides the rcu_note_context_switch() check under an existing non-common-case check. Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * srcu: Parallelize callback handlingPaul E. McKenney2017-04-211-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1,2], however, there are workloads that could result in a high volume of concurrent invocations of call_srcu(), which with current SRCU would result in excessive lock contention on the srcu_struct structure's ->queue_lock, which protects SRCU's callback lists. This commit therefore moves SRCU to per-CPU callback lists, thus greatly reducing contention. Because a given SRCU instance no longer has a single centralized callback list, starting grace periods and invoking callbacks are both more complex than in the single-list Classic SRCU implementation. Starting grace periods and handling callbacks are now handled using an srcu_node tree that is in some ways similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched (for example, the srcu_node tree shape is controlled by exactly the same Kconfig options and boot parameters that control the shape of the rcu_node tree). In addition, the old per-CPU srcu_array structure is now named srcu_data and contains an rcu_segcblist structure named ->srcu_cblist for its callbacks (and a spinlock to protect this). The srcu_struct gets an srcu_gp_seq that is used to associate callback segments with the corresponding completion-time grace-period number. These completion-time grace-period numbers are propagated up the srcu_node tree so that the grace-period workqueue handler can determine whether additional grace periods are needed on the one hand and where to look for callbacks that are ready to be invoked. The srcu_barrier() function must now wait on all instances of the per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected by ->lock, srcu_barrier() can remotely add the needed callbacks. In theory, it could also remotely start grace periods, but in practice doing so is complex and racy. And interestingly enough, it is never necessary for srcu_barrier() to start a grace period because srcu_barrier() only enqueues a callback when a callback is already present--and it turns out that a grace period has to have already been started for this pre-existing callback. Furthermore, it is only the callback that srcu_barrier() needs to wait on, not any particular grace period. Therefore, a new rcu_segcblist_entrain() function enqueues the srcu_barrier() function's callback into the same segment occupied by the last pre-existing callback in the list. The special case where all the pre-existing callbacks are on a different list (because they are in the process of being invoked) is handled by enqueuing srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on the done-callbacks check that takes place after all callbacks are inovked. Note that the readers use the same algorithm as before. Note that there is a separate srcu_idx that tells the readers what counter to increment. This unfortunately cannot be combined with srcu_gp_seq because they need to be incremented at different times. This commit introduces some ugly #ifdefs in rcutorture. These will go away when I feel good enough about Tree SRCU to ditch Classic SRCU. Some crude performance comparisons, courtesy of a quickly hacked rcuperf asynchronous-grace-period capability: Callback Queuing Overhead ------------------------- # CPUS Classic SRCU Tree SRCU ------ ------------ --------- 2 0.349 us 0.342 us 16 31.66 us 0.4 us 41 --------- 0.417 us The times are the 90th percentiles, a statistic that was chosen to reject the overheads of the occasional srcu_barrier() call needed to avoid OOMing the test machine. The rcuperf test hangs when running Classic SRCU at 41 CPUs, hence the line of dashes. Despite the hacks to both the rcuperf code and that statistics, this is a convincing demonstration of Tree SRCU's performance and scalability advantages. [1] https://lwn.net/Articles/309030/ [2] https://patchwork.kernel.org/patch/5108281/ Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fix initialization if synchronize_srcu_expedited() called first. ]
| | | * srcu: Make num_rcu_lvl[] array be externalPaul E. McKenney2017-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit makes the num_rcu_lvl[] array external so that SRCU can make use of it for initializing its upcoming srcu_node tree. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Remove redundant levelcnt[] array from rcu_init_one()Paul E. McKenney2017-04-181-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The levelcnt[] array is identical to num_rcu_lvl[], so this commit removes levelcnt[]. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * srcu: Move rcu_init_levelspread() to rcu_tree_node.hPaul E. McKenney2017-04-181-25/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit moves the rcu_init_levelspread() function from kernel/rcu/tree.c to kernel/rcu/rcu.h so that SRCU can access it. This is another step towards enabling SRCU to create its own combining tree. This commit is code-movement only, give or take knock-on adjustments. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * srcu: Move rcu_seq_start() and friends to rcu.hPaul E. McKenney2017-04-181-35/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit moves rcu_seq_start(), rcu_seq_end(), rcu_seq_snap(), and rcu_seq_done() from kernel/rcu/tree.c to kernel/rcu/rcu.h. This will allow SRCU to use these functions, which in turn will allow SRCU to move from a single global callback queue to a per-CPU callback queue. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * srcu: Allow SRCU to access rcu_scheduler_activePaul E. McKenney2017-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is primarily a code-movement commit in preparation for allowing SRCU to handle early-boot SRCU grace periods. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * srcu: Abstract multi-tail callback list handlingPaul E. McKenney2017-04-181-236/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RCU has only one multi-tail callback list, which is implemented via the nxtlist, nxttail, nxtcompleted, qlen_lazy, and qlen fields in the rcu_data structure, and whose operations are open-code throughout the Tree RCU implementation. This has been more or less OK in the past, but upcoming callback-list optimizations in SRCU could really use a multi-tail callback list there as well. This commit therefore abstracts the multi-tail callback list handling into a new kernel/rcu/rcu_segcblist.h file, and uses this new API. The simple head-and-tail pointer callback list is also abstracted and applied everywhere except for the NOCB callback-offload lists. (Yes, the plan is to apply them there as well, but this commit is already bigger than would be good.) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Place guard on rcu_all_qs() and rcu_note_context_switch() actionsPaul E. McKenney2017-04-181-14/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_all_qs() and rcu_note_context_switch() do a series of checks, taking various actions to supply RCU with quiescent states, depending on the outcomes of the various checks. This is a bit much for scheduling fastpaths, so this commit creates a separate ->rcu_urgent_qs field in the rcu_dynticks structure that acts as a global guard for these checks. Thus, in the common case, rcu_all_qs() and rcu_note_context_switch() check the ->rcu_urgent_qs field, find it false, and simply return. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org>
| | | * rcu: Eliminate flavor scan in rcu_momentary_dyntick_idle()Paul E. McKenney2017-04-181-50/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_momentary_dyntick_idle() function scans the RCU flavors, checking that one of them still needs a quiescent state before doing an expensive atomic operation on the ->dynticks counter. However, this check reduces overhead only after a rare race condition, and increases complexity. This commit therefore removes the scan and the mechanism enabling the scan. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Pull rcu_qs_ctr into rcu_dynticks structurePaul E. McKenney2017-04-181-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_qs_ctr variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Pull rcu_sched_qs_mask into rcu_dynticks structurePaul E. McKenney2017-04-181-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_sched_qs_mask variable is yet another isolated per-CPU variable, so this commit pulls it into the pre-existing rcu_dynticks per-CPU structure. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Semicolon inside RCU_TRACE() for tree.cPaul E. McKenney2017-04-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current use of "RCU_TRACE(statement);" can cause odd bugs, especially where "statement" is a local-variable declaration, as it can leave a misplaced ";" in the source code. This commit therefore converts these to "RCU_TRACE(statement;)", which avoids the misplaced ";". Reported-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | | * rcu: Maintain special bits at bottom of ->dynticks counterPaul E. McKenney2017-04-181-16/+61
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, IPIs are used to force other CPUs to invalidate their TLBs in response to a kernel virtual-memory mapping change. This works, but degrades both battery lifetime (for idle CPUs) and real-time response (for nohz_full CPUs), and in addition results in unnecessary IPIs due to the fact that CPUs executing in usermode are unaffected by stale kernel mappings. It would be better to cause a CPU executing in usermode to wait until it is entering kernel mode to do the flush, first to avoid interrupting usemode tasks and second to handle multiple flush requests with a single flush in the case of a long-running user task. This commit therefore reserves a bit at the bottom of the ->dynticks counter, which is checked upon exit from extended quiescent states. If it is set, it is cleared and then a new rcu_eqs_special_exit() macro is invoked, which, if not supplied, is an empty single-pass do-while loop. If this bottom bit is set on -entry- to an extended quiescent state, then a WARN_ON_ONCE() triggers. This bottom bit may be set using a new rcu_eqs_special_set() function, which returns true if the bit was set, or false if the CPU turned out to not be in an extended quiescent state. Please note that this function refuses to set the bit for a non-nohz_full CPU when that CPU is executing in usermode because usermode execution is tracked by RCU as a dyntick-idle extended quiescent state only for nohz_full CPUs. Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Fix typo in PER_RCU_NODE_PERIOD header commentPaul E. McKenney2017-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | This commit just changes a "the the" to "the" to reduce repetition. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * rcu: Use bool value directlyNicholas Mc Guire2017-04-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The beenonline variable is declared bool so there is no need for an explicit comparison, especially not against the constant zero. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * rcu: Improve comments for hotplug/suspend/hibernate functionsPaul E. McKenney2017-04-191-4/+37
| | | | | | | | | | | | Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| | * rcu: Remove obsolete comment from rcu_future_gp_cleanup() headerPaul E. McKenney2017-04-191-3/+1
| |/ | | | | | | | | | | | | | | The rcu_nocb_gp_cleanup() function is now invoked elsewhere, so this commit drags this comment into the year 2017. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | rcu/tracing: Add rcu_disabled to denote when rcu_irq_enter() will not workSteven Rostedt (VMware)2017-04-101-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tracing uses rcu_irq_enter() as a way to make sure that RCU is watching when it needs to use rcu_read_lock() and friends. This is because tracing can happen as RCU is about to enter user space, or about to go idle, and RCU does not watch for RCU read side critical sections as it makes the transition. There is a small location within the RCU infrastructure that rcu_irq_enter() itself will not work. If tracing were to occur in that section it will break if it tries to use rcu_irq_enter(). Originally, this happens with the stack_tracer, because it will call save_stack_trace when it encounters stack usage that is greater than any stack usage it had encountered previously. There was a case where that happened in the RCU section where rcu_irq_enter() did not work, and lockdep complained loudly about it. To fix it, stack tracing added a call to be disabled and RCU would disable stack tracing during the critical section that rcu_irq_enter() was inoperable. This solution worked, but there are other cases that use rcu_irq_enter() and it would be a good idea to let RCU give a way to let others know that rcu_irq_enter() will not work. For example, in trace events. Another helpful aspect of this change is that it also moves the per cpu variable called in the RCU critical section into a cache locale along with other RCU per cpu variables used in that same location. I'm keeping the stack_trace_disable() code, as that still could be used in the future by places that really need to disable it. And since it's only a static inline, it wont take up any kernel text if it is not used. Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* | rcu: Fix dyntick-idle tracingPaul E. McKenney2017-04-101-25/+23
|/ | | | | | | | | | | | | | | | | | | | | The tracing subsystem started using rcu_irq_entry() and rcu_irq_exit() (with my blessing) to allow the current _rcuidle alternative tracepoint name to be dispensed with while still maintaining good performance. Unfortunately, this causes RCU's dyntick-idle entry code's tracing to appear to RCU like an interrupt that occurs where RCU is not designed to handle interrupts. This commit fixes this problem by moving the zeroing of ->dynticks_nesting after the offending trace_rcu_dyntick() statement, which narrows the window of vulnerability to a pair of adjacent statements that are now marked with comments to that effect. Link: http://lkml.kernel.org/r/20170405093207.404f8deb@gandalf.local.home Link: http://lkml.kernel.org/r/20170405193928.GM1600@linux.vnet.ibm.com Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | | | <linux/sched/debug.h> We are going to split <linux/sched/debug.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/debug.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar2017-03-021-0/+1
| | | | | | | | | | | | | | | | | | <uapi/linux/sched/types.h> We are going to move scheduler ABI details to <uapi/linux/sched/types.h>, which will be used from a number of .c files. Create empty placeholder header that maps to <linux/types.h>. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* rcu: Separate the RCU synchronization types and APIs into ↵Ingo Molnar2017-03-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | <linux/rcupdate_wait.h> So rcupdate.h is a pretty complex header, in particular it includes <linux/completion.h> which includes <linux/wait.h> - creating a dependency that includes <linux/wait.h> in <linux/sched.h>, which prevents the isolation of <linux/sched.h> from the derived <linux/wait.h> header. Solve part of the problem by decoupling rcupdate.h from completions: this can be done by separating out the rcu_synchronize types and APIs, and updating their usage sites. Since this is a mostly RCU-internal types this will not just simplify <linux/sched.h>'s dependencies, but will make all the hundreds of .c files that include rcupdate.h but not completions or wait.h build faster. ( For rcutiny this means that two dependent APIs have to be uninlined, but that shouldn't be much of a problem as they are rare variants. ) Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
*-. Merge branches 'doc.2017.01.15b', 'dyntick.2017.01.23a', ↵Paul E. McKenney2017-01-251-85/+177
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | 'fixes.2017.01.23a', 'srcu.2017.01.25a' and 'torture.2017.01.15b' into HEAD doc.2017.01.15b: Documentation updates dyntick.2017.01.23a: Dyntick tracking consolidation fixes.2017.01.23a: Miscellaneous fixes srcu.2017.01.25a: SRCU rewrite, fixes, and verification torture.2017.01.15b: Torture-test updates
| | * rcu: Make rcu_cpu_starting() use its "cpu" argumentPaul E. McKenney2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The rcu_cpu_starting() function uses this_cpu_ptr() to locate the incoming CPU's rcu_data structure. This works for the boot CPU and for all CPUs onlined after rcu_init() executes (during very early boot). Currently, this is the full set of CPUs, so all is well. But if anyone ever parallelizes boot before rcu_init() time, it will fail. This commit therefore substitutes the rcu_cpu_starting() function's this_cpu_pointer() for per_cpu_ptr(), future-proofing the code and (arguably) improving readability. This commit inadvertently fixes a latent bug: If there ever had been more than just the boot CPU online at rcu_init() time, the old code would not initialize the non-boot CPUs, but rather would repeatedly initialize the boot CPU. Reported-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Don't wake rcuc/X kthreads on NOCB CPUsPaul E. McKenney2017-01-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Friesen notice that rcuc/X kthreads were consuming CPU even on NOCB CPUs. This makes no sense because the only purpose or these kthreads is to invoke normal (non-offloaded) callbacks, of which there will never be any on NOCB CPUs. This problem was due to a bug in cpu_has_callbacks_ready_to_invoke(), which should have been checking ->nxttail[RCU_NEXT_TAIL] for NULL, but which was instead (incorrectly) checking ->nxttail[RCU_DONE_TAIL]. Because ->nxttail[RCU_DONE_TAIL] is never NULL, the only effect is to cause the rcuc/X kthread to execute when it should not do so. This commit therefore checks ->nxttail[RCU_NEXT_TAIL], which is NULL for NOCB CPUs. Reported-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Once again use NMI-based stack traces in stall warningsPaul E. McKenney2017-01-231-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit is for all intents and purposes a revert of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"). The reason to suppose that this can now safely be reverted is the presence of 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI"), which is said to have made NMI-based stack dumps safe. However, this reversion keeps one nice property of bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that only those CPUs blocking the grace period are dumped. The new trigger_single_cpu_backtrace() is used to make this happen, as suggested by Josh Poimboeuf. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Remove short-term CPU kickingPaul E. McKenney2017-01-231-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4914950aaa12d ("rcu: Stop treating in-kernel CPU-bound workloads as errors") added a (relatively) short-timeout call to resched_cpu(). This was inspired by as issue that was fixed by b7e7ade34e61 ("sched/core: Fix remote wakeups"). But given that this issue was fixed, it is time for the current commit to remove this call to resched_cpu(). Reported-by: Byungchul Park <byungchul.park@lge.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Add long-term CPU kickingPaul E. McKenney2017-01-231-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit prepares for the removal of short-term CPU kicking (in a subsequent commit). It does so by starting to invoke resched_cpu() for each holdout at each force-quiescent-state interval that is more than halfway through the stall-warning interval. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Remove unused but set variableTobias Klauser2017-01-231-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU"), the variable mask in rcu_init_percpu_data is set but no longer used. Remove it to fix the following warning when building with 'W=1': kernel/rcu/tree.c: In function ‘rcu_init_percpu_data’: kernel/rcu/tree.c:3765:16: warning: variable ‘mask’ set but not used [-Wunused-but-set-variable] Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| | * rcu: Only dump stalled-tasks stacks if there was a real stallByungchul Park2017-01-231-3/+3
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The print_other_cpu_stall() function currently unconditionally invokes rcu_print_detail_task_stall(). This is OK because if there was a stall sufficient to cause print_other_cpu_stall() to be invoked, that stall is very likely to persist through the entire print_other_cpu_stall() execution. However, if the stall did not persist, the variable ndetected will be zero, and that variable is already tested in an "if" statement. Therefore, this commit moves the call to rcu_print_detail_task_stall() under that pre-existing "if" to improve readability, with a very rare reduction in overhead. Signed-off-by: Byungchul Park <byungchul.park@lge.com> [ paulmck: Reworked commit log. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Adjust FQS offline checks for exact online-CPU detectionPaul E. McKenney2017-01-231-15/+2
| | | | | | | | | | | | | | | | | | | | Commit 7ec99de36f40 ("rcu: Provide exact CPU-online tracking for RCU"), as its title suggests, got rid of RCU's remaining CPU-hotplug timing guesswork. This commit therefore removes the one-jiffy kludge that was used to paper over this guesswork. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Check cond_resched_rcu_qs() state less often to reduce GP overheadPaul E. McKenney2017-01-231-12/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4a81e8328d37 ("rcu: Reduce overhead of cond_resched() checks for RCU") moved quiescent-state generation out of cond_resched() and commit bde6c3aa9930 ("rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops") introduced cond_resched_rcu_qs(), and commit 5cd37193ce85 ("rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors") introduced the per-CPU rcu_qs_ctr variable, which is frequently polled by the RCU core state machine. This frequent polling can increase grace-period rate, which in turn increases grace-period overhead, which is visible in some benchmarks (for example, the "open1" benchmark in Anton Blanchard's "will it scale" suite). This commit therefore reduces the rate at which rcu_qs_ctr is polled by moving that polling into the force-quiescent-state (FQS) machinery, and by further polling it only after the grace period has been in effect for at least jiffies_till_sched_qs jiffies. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Abstract extended quiescent state determinationPaul E. McKenney2017-01-231-14/+38
| | | | | | | | | | | | | | | | | | | | | | This commit is the fourth step towards full abstraction of all accesses to the ->dynticks counter, implementing previously open-coded checks and comparisons in new rcu_dynticks_in_eqs() and rcu_dynticks_in_eqs_since() functions. This abstraction will ease changes to the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Abstract dynticks extended quiescent state enter/exit operationsPaul E. McKenney2017-01-231-26/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit is the third step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of 1 and entry checks in a new rcu_dynticks_eqs_enter() function, and the same but with exit checks in a new rcu_dynticks_eqs_exit() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Fixed RCU_TRACE() statements added by this commit. ] Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Abstract the dynticks snapshot operationPaul E. McKenney2017-01-171-3/+16
| | | | | | | | | | | | | | | | | | | | This commit is the second step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of zero in a new rcu_dynticks_snap() function. This abstraction will ease changes o the ->dynticks counter operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
| * rcu: Abstract the dynticks momentary-idle operationPaul E. McKenney2017-01-161-5/+14
|/ | | | | | | | | | | | | | | | This commit is the first step towards full abstraction of all accesses to the ->dynticks counter, implementing the previously open-coded atomic add of two in a new rcu_dynticks_momentary_idle() function. This abstraction will ease changes to the ->dynticks counter operation. Note that this commit gets rid of the smp_mb__before_atomic() and the smp_mb__after_atomic() calls that were previously present. The reason that this is OK from a memory-ordering perspective is that the atomic operation is now atomic_add_return(), which, as a value-returning atomic, guarantees full ordering. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Narrow early boot window of illegal synchronous grace periodsPaul E. McKenney2017-01-151-13/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current preemptible RCU implementation goes through three phases during bootup. In the first phase, there is only one CPU that is running with preemption disabled, so that a no-op is a synchronous grace period. In the second mid-boot phase, the scheduler is running, but RCU has not yet gotten its kthreads spawned (and, for expedited grace periods, workqueues are not yet running. During this time, any attempt to do a synchronous grace period will hang the system (or complain bitterly, depending). In the third and final phase, RCU is fully operational and everything works normally. This has been OK for some time, but there has recently been some synchronous grace periods showing up during the second mid-boot phase. This code worked "by accident" for awhile, but started failing as soon as expedited RCU grace periods switched over to workqueues in commit 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue"). Note that the code was buggy even before this commit, as it was subject to failure on real-time systems that forced all expedited grace periods to run as normal grace periods (for example, using the rcu_normal ksysfs parameter). The callchain from the failure case is as follows: early_amd_iommu_init() |-> acpi_put_table(ivrs_base); |-> acpi_tb_put_table(table_desc); |-> acpi_tb_invalidate_table(table_desc); |-> acpi_tb_release_table(...) |-> acpi_os_unmap_memory |-> acpi_os_unmap_iomem |-> acpi_os_map_cleanup |-> synchronize_rcu_expedited The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y, which caused the code to try using workqueues before they were initialized, which did not go well. This commit therefore reworks RCU to permit synchronous grace periods to proceed during this mid-boot phase. This commit is therefore a fix to a regression introduced in v4.9, and is therefore being put forward post-merge-window in v4.10. This commit sets a flag from the existing rcu_scheduler_starting() function which causes all synchronous grace periods to take the expedited path. The expedited path now checks this flag, using the requesting task to drive the expedited grace period forward during the mid-boot phase. Finally, this flag is updated by a core_initcall() function named rcu_exp_runtime_mode(), which causes the runtime codepaths to be used. Note that this arrangement assumes that tasks are not sent POSIX signals (or anything similar) from the time that the first task is spawned through core_initcall() time. Fixes: 8b355e3bc140 ("rcu: Drive expedited grace periods from workqueue") Reported-by: "Zheng, Lv" <lv.zheng@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Stan Kain <stan.kain@gmail.com> Tested-by: Ivan <waffolz@hotmail.com> Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com> Tested-by: Bruno Pesavento <bpesavento@infinito.it> Tested-by: Borislav Petkov <bp@suse.de> Tested-by: Frederic Bezies <fredbezies@gmail.com> Cc: <stable@vger.kernel.org> # 4.9.0-
* rcu: Don't kick unless grace period or requestPaul E. McKenney2016-11-141-1/+2
| | | | | | | | | | | The current code can result in spurious kicks when there are no grace periods in progress and no grace-period-related requests. This is sort of OK for a diagnostic aid, but the resulting ftrace-dump messages in dmesg are annoying. This commit therefore avoids spurious kicks in the common case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Remove obsolete comment from __call_rcu()Paul E. McKenney2016-11-141-7/+0
| | | | | | | | | | | | The __call_rcu() comment about opportunistically noting grace period beginnings and endings is obsolete. RCU still does such opportunistic noting, but in __call_rcu_core() rather than __call_rcu(), and there already is an appropriate comment in __call_rcu_core(). This commit therefore removes the obsolete comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Remove obsolete rcu_check_callbacks() header commentPaul E. McKenney2016-11-141-2/+1
| | | | | | | | | | | In the deep past, rcu_check_callbacks() was only invoked if rcu_pending() returned true. Which was fine, but these days rcu_check_callbacks() is invoked unconditionally. This commit therefore removes the obsolete sentence from the header comment. Reported-by: Michalis Kokologiannakis <mixaskok@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Tighten up __call_rcu() rcu_head alignment checkPaul E. McKenney2016-11-141-1/+3
| | | | | | | | | | | | | | Commit 720abae3d68ae ("rcu: force alignment on struct callback_head/rcu_head") forced the rcu_head (AKA callback_head) structure's alignment to pointer size, that is, to 4-byte boundaries on 32-bit systems and to 8-byte boundaries on 64-bit systems. This commit therefore checks for this same alignment in __call_rcu(), which used to contain a looser check for two-byte alignment. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* Merge tag 'gcc-plugins-v4.9-rc1' of ↵Linus Torvalds2016-10-151-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins update from Kees Cook: "This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals" * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: latent_entropy: Mark functions with __latent_entropy gcc-plugins: Add latent_entropy plugin
| * latent_entropy: Mark functions with __latent_entropyEmese Revfy2016-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __latent_entropy gcc attribute can be used only on functions and variables. If it is on a function then the plugin will instrument it for gathering control-flow entropy. If the attribute is on a variable then the plugin will initialize it with random contents. The variable must be an integer, an integer array type or a structure with integer fields. These specific functions have been selected because they are init functions (to help gather boot-time entropy), are called at unpredictable times, or they have variable loops, each of which provide some level of latent entropy. Signed-off-by: Emese Revfy <re.emese@gmail.com> [kees: expanded commit message] Signed-off-by: Kees Cook <keescook@chromium.org>