linux - linux

	Commit message (Collapse)	Author	Files	Lines
2011-07-14	tracing: Have dynamic size event stack traces	Steven Rostedt	4	-19/+88
	Currently the stack trace per event in ftace is only 8 frames. This can be quite limiting and sometimes useless. Especially when the "ignore frames" is wrong and we also use up stack frames for the event processing itself. Change this to be dynamic by adding a percpu buffer that we can write a large stack frame into and then copy into the ring buffer. For interrupts and NMIs that come in while another event is being process, will only get to use the 8 frame stack. That should be enough as the task that it interrupted will have the full stack frame anyway. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14	perf: Robustify proc and debugfs file recording	Sonny Rao	1	-91/+29
	While attempting to create a timechart of boot up I found perf didn't tolerate modules being loaded/unloaded. This patch fixes this by reading the file once and then writing the size read at the correct point in the file. It also simplifies the code somewhat. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Sonny Rao <sonnyrao@chromium.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Link: http://lkml.kernel.org/r/10011.1310614483@neuling.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14	ftrace: Fix dynamic selftest failure on some archs	Steven Rostedt	1	-0/+26
	Archs that do not implement CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, will fail the dynamic ftrace selftest. The function tracer has a quick 'off' variable that will prevent the call back functions from being called. This variable is called function_trace_stop. In x86, this is implemented directly in the mcount assembly, but for other archs, an intermediate function is used called ftrace_test_stop_func(). In dynamic ftrace, the function pointer variable ftrace_trace_function is used to update the caller code in the mcount caller. But for archs that do not have CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST set, it only calls ftrace_test_stop_func() instead, which in turn calls __ftrace_trace_function. When more than one ftrace_ops is registered, the function it calls is ftrace_ops_list_func(), which will iterate over all registered ftrace_ops and call the callbacks that have their hash matching. The issue happens when two ftrace_ops are registered for different functions and one is then unregistered. The __ftrace_trace_function is then pointed to the remaining ftrace_ops callback function directly. This mean it will be called for all functions that were registered to trace by both ftrace_ops that were registered. This is not an issue for archs with CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST, because the update of ftrace_trace_function doesn't happen until after all functions have been updated, and then the mcount caller is updated. But for those archs that do use the ftrace_test_stop_func(), the update is immediate. The dynamic selftest fails because it hits this situation, and the ftrace_ops that it registers fails to only trace what it was suppose to and instead traces all other functions. The solution is to delay the setting of __ftrace_trace_function until after all the functions have been updated according to the registered ftrace_ops. Also, function_trace_stop is set during the update to prevent function tracing from calling code that is caused by the function tracer itself. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14	ftrace: Update filter when tracing enabled in set_ftrace_filter()	Steven Rostedt	1	-0/+4
	Currently, if set_ftrace_filter() is called when the ftrace_ops is active, the function filters will not be updated. They will only be updated when tracing is disabled and re-enabled. Update the functions immediately during set_ftrace_filter(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-14	ftrace: Balance records when updating the hash	Steven Rostedt	1	-16/+33
	Whenever the hash of the ftrace_ops is updated, the record counts must be balance. This requires disabling the records that are set in the original hash, and then enabling the records that are set in the updated hash. Moving the update into ftrace_hash_move() removes the bug where the hash was updated but the records were not, which results in ftrace triggering a warning and disabling itself because the ftrace_ops filter is updated while the ftrace_ops was registered, and then the failure happens when the ftrace_ops is unregistered. The current code will not trigger this bug, but new code will. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-08	ftrace: Do not disable interrupts for modules in mcount update	Steven Rostedt	1	-5/+11
	When I mounted an NFS directory, it caused several modules to be loaded. At the time I was running the preemptirqsoff tracer, and it showed the following output: # tracer: preemptirqsoff # # preemptirqsoff latency trace v1.1.5 on 2.6.33.9-rt30-mrg-test # -------------------------------------------------------------------- # latency: 1177 us, #4/4, CPU#3 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: modprobe-19370 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: ftrace_module_notify # => ended at: ftrace_module_notify # # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| /_--=> lock-depth # \|\|\|\|\|/ delay # cmd pid \|\|\|\|\|\| time \| caller # \ / \|\|\|\|\|\| \ \| / modprobe-19370 3d.... 0us!: ftrace_process_locs <-ftrace_module_notify modprobe-19370 3d.... 1176us : ftrace_process_locs <-ftrace_module_notify modprobe-19370 3d.... 1178us : trace_hardirqs_on <-ftrace_module_notify modprobe-19370 3d.... 1178us : <stack trace> => ftrace_process_locs => ftrace_module_notify => notifier_call_chain => __blocking_notifier_call_chain => blocking_notifier_call_chain => sys_init_module => system_call_fastpath That's over 1ms that interrupts are disabled on a Real-Time kernel! Looking at the cause (being the ftrace author helped), I found that the interrupts are disabled before the code modification of mcounts into nops. The interrupts only need to be disabled on start up around this code, not when modules are being loaded. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-08	tracing: Still trace filtered irq functions when irq trace is disabled	Steven Rostedt	2	-17/+35
	If a function is set to be traced by the set_graph_function, but the option funcgraph-irqs is zero, and the traced function happens to be called from a interrupt, it will not be traced. The point of funcgraph-irqs is to not trace interrupts when we are preempted by an irq, not to not trace functions we want to trace that happen to be in a irq. Luckily the current->trace_recursion element is perfect to add a flag to help us be able to trace functions within an interrupt even when we are not tracing interrupts that preempt the trace. Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2011-07-07	tracing, x86/irq: Do not trace arch_local_{,irq_}() functions	Steven Rostedt	1	-5/+6
	I triggered a triple fault with gcc 4.5.1 because it did not honor the inline annotation to arch_local_save_flags() function and that function was added to the pool of functions traced by the function tracer. When preempt_schedule() called arch_local_save_flags() (called by irqs_disabled()), it was traced, but the first thing the function tracer does is disable preemption. When it enables preemption, the NEED_RESCHED flag will not have been cleared and the preemption check will trigger the call to preempt_schedule() again. Although the dynamic function tracer crashed immediately, the static version of the function tracer (CONFIG_DYNAMIC_FTRACE is not set) actually was able to show where the problem was. swapper-1 3.N.. 103885us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103886us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103887us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule swapper-1 3.N.. 103888us : arch_local_save_flags <-preempt_schedule It went on for a while before it triple faulted with a corrupted stack. The arch_local_save_flags and arch_local_irq_* functions should not be traced. Even though they are marked as inline, gcc may still make them a function and enable tracing of them. The simple solution is to just mark them as notrace. I had to add the <linux/types.h> for this file to include the notrace tag. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110702033852.733414762@goodmis.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-05	perf report/annotate/script: Add option to specify a CPU range	Anton Blanchard	8	-0/+102
	Add an option to perf report/annotate/script to specify which CPUs to operate on. This enables us to take a single system wide profile and analyse each CPU (or group of CPUs) in isolation. This was useful when profiling a multiprocess workload where the bottleneck was on one CPU but this was hidden in the overall profile. Per process and per thread breakdowns didn't help because multiple processes were running on each CPU and no single process consumed an entire CPU. The patch converts the list of CPUs returned by cpu_map__new into a bitmap for fast lookup. I wanted to use -C to be consistent with perf top/record/stat, but unfortunately perf report already uses -C <comms>. v2: Incorporate suggestions from David Ahern: - Added -c to perf script - Check that SAMPLE_CPU is set when -c is used - Update documentation v3: Create perf_session__cpu_bitmap() Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: David Ahern <dsahern@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-02	x86: Don't use frame pointer to save old stack on irq entry	Frederic Weisbecker	2	-42/+15
	rbp is used in SAVE_ARGS_IRQ to save the old stack pointer in order to restore it later in ret_from_intr. It is convenient because we save its value in the irq regs and it's easily restored using the leave instruction. However this is a kind of abuse of the frame pointer which role is to help unwinding the kernel by chaining frames together, each node following the return address to the previous frame. But although we are breaking the frame by changing the stack pointer, there is no preceding return address before the new frame. Hence using the frame pointer to link the two stacks breaks the stack unwinders that find a random value instead of a return address here. There is no workaround that can work in every case. We are using the fixup_bp_irq_link() function to dereference that abused frame pointer in the case of non nesting interrupt (which means stack changed). But that doesn't fix the case of interrupts that don't change the stack (but we still have the unconditional frame link), which is the case of hardirq interrupting softirq. We have no way to detect this transition so the frame irq link is considered as a real frame pointer and the return address is dereferenced but it is still a spurious one. There are two possible results of this: either the spurious return address, a random stack value, luckily belongs to the kernel text and then the unwinding can continue and we just have a weird entry in the stack trace. Or it doesn't belong to the kernel text and unwinding stops there. This is the reason why stacktraces (including perf callchains) on irqs that interrupted softirqs don't work very well. To solve this, we don't save the old stack pointer on rbp anymore but we save it to a scratch register that we push on the new stack and that we pop back later on irq return. This preserves the whole frame chain without spurious return addresses in the middle and drops the need for the horrid fixup_bp_irq_link() workaround. And finally irqs that interrupt softirq are sanely unwinded. Before: 99.81% perf [kernel.kallsyms] [k] perf_pending_event \| --- perf_pending_event irq_work_run smp_irq_work_interrupt irq_work_interrupt \| \|--41.60%-- __read \| \| \| \|--99.90%-- create_worker \| \| bench_sched_messaging \| \| cmd_bench \| \| run_builtin \| \| main \| \| __libc_start_main \| --0.10%-- [...] After: 1.64% swapper [kernel.kallsyms] [k] perf_pending_event \| --- perf_pending_event irq_work_run smp_irq_work_interrupt irq_work_interrupt \| \|--95.00%-- arch_irq_work_raise \| irq_work_queue \| __perf_event_overflow \| perf_swevent_overflow \| perf_swevent_event \| perf_tp_event \| perf_trace_softirq \| __do_softirq \| call_softirq \| do_softirq \| irq_exit \| \| \| \|--73.68%-- smp_apic_timer_interrupt \| \| apic_timer_interrupt \| \| \| \| \| \|--96.43%-- amd_e400_idle \| \| \| cpu_idle \| \| \| start_secondary Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02	x86: Remove useless unwinder backlink from irq regs saving	Frederic Weisbecker	1	-1/+0
	The unwinder backlink in interrupt entry is very useless. It's actually not part of the stack frame chain and thus is never used. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02	x86,64: Separate arg1 from rbp handling in SAVE_REGS_IRQ	Frederic Weisbecker	1	-1/+2
	Just for clarity in the code. Have a first block that handles the frame pointer and a separate one that handles pt_regs pointer and its use. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02	x86,64: Simplify save_regs()	Frederic Weisbecker	1	-27/+17
	The save_regs function that saves the regs on low level irq entry is complicated because of the fact it changes its stack in the middle and also because it manipulates data allocated in the caller frame and accesses there are directly calculated from callee rsp value with the return address in the middle of the way. This complicates the static stack offsets calculation and require more dynamic ones. It also needs a save/restore of the function's return address. To simplify and optimize this, turn save_regs() into a macro. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jan Beulich <JBeulich@novell.com>
2011-07-02	x86: Fetch stack from regs when possible in dump_trace()	Frederic Weisbecker	1	-2/+5
	When regs are passed to dump_stack(), we fetch the frame pointer from the regs but the stack pointer is taken from the current frame. Thus the frame and stack pointers may not come from the same context. For example this can result in the unwinder to think the context is in irq, due to the current value of the stack, but the frame pointer coming from the regs points to a frame from another place. It then tries to fix up the irq link but ends up dereferencing a random frame pointer that doesn't belong to the irq stack: [ 9131.706906] ------------[ cut here ]------------ [ 9131.707003] WARNING: at arch/x86/kernel/dumpstack_64.c:129 dump_trace+0x2aa/0x330() [ 9131.707003] Hardware name: AMD690VM-FMH [ 9131.707003] Perf: bad frame pointer = 0000000000000005 in callchain [ 9131.707003] Modules linked in: [ 9131.707003] Pid: 1050, comm: perf Not tainted 3.0.0-rc3+ #181 [ 9131.707003] Call Trace: [ 9131.707003] <IRQ> [<ffffffff8104bd4a>] warn_slowpath_common+0x7a/0xb0 [ 9131.707003] [<ffffffff8104be21>] warn_slowpath_fmt+0x41/0x50 [ 9131.707003] [<ffffffff8178b873>] ? bad_to_user+0x6d/0x10be [ 9131.707003] [<ffffffff8100c2da>] dump_trace+0x2aa/0x330 [ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50 [ 9131.707003] [<ffffffff8101b164>] perf_callchain_kernel+0x54/0x70 [ 9131.707003] [<ffffffff810d391f>] perf_prepare_sample+0x19f/0x2a0 [ 9131.707003] [<ffffffff810d546c>] __perf_event_overflow+0x16c/0x290 [ 9131.707003] [<ffffffff810d5430>] ? __perf_event_overflow+0x130/0x290 [ 9131.707003] [<ffffffff810107d3>] ? native_sched_clock+0x13/0x50 [ 9131.707003] [<ffffffff8100fbb9>] ? sched_clock+0x9/0x10 [ 9131.707003] [<ffffffff810752e5>] ? T.375+0x15/0x90 [ 9131.707003] [<ffffffff81084da4>] ? trace_hardirqs_on_caller+0x64/0x180 [ 9131.707003] [<ffffffff810817bd>] ? trace_hardirqs_off+0xd/0x10 [ 9131.707003] [<ffffffff810d5764>] perf_event_overflow+0x14/0x20 [ 9131.707003] [<ffffffff810d588c>] perf_swevent_hrtimer+0x11c/0x130 [ 9131.707003] [<ffffffff817821a1>] ? error_exit+0x51/0xb0 [ 9131.707003] [<ffffffff81072e93>] __run_hrtimer+0x83/0x1e0 [ 9131.707003] [<ffffffff810d5770>] ? perf_event_overflow+0x20/0x20 [ 9131.707003] [<ffffffff81073256>] hrtimer_interrupt+0x106/0x250 [ 9131.707003] [<ffffffff812a3bfd>] ? trace_hardirqs_off_thunk+0x3a/0x3c [ 9131.707003] [<ffffffff81024833>] smp_apic_timer_interrupt+0x53/0x90 [ 9131.707003] [<ffffffff81789053>] apic_timer_interrupt+0x13/0x20 [ 9131.707003] <EOI> [<ffffffff817821a1>] ? error_exit+0x51/0xb0 [ 9131.707003] [<ffffffff8178219c>] ? error_exit+0x4c/0xb0 [ 9131.707003] ---[ end trace b2560d4876709347 ]--- Fix this by simply taking the stack pointer from regs->sp when regs are provided. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-07-02	x86: Save stack pointer in perf live regs savings	Frederic Weisbecker	1	-0/+5
	In order to prepare for fetching the stack pointer from the regs when possible in dump_trace() instead of taking the local one, save the current stack pointer in perf live regs saving. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-07-01	perf, powerpc: Fix build borkage	Peter Zijlstra	1	-1/+1
	The patch a8b0ca17b80e ("perf: Remove the nmi parameter from the swevent and overflow interface") missed a spot in the ppc hw_breakpoint code, fix this up so things compile again. Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-09pfip95g88s70iwkxu6nnbt@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf stat: Add noise output for csv mode	Zhengyu He	1	-3/+6
	Previously, when you want perf-stat to output the statistics in csv mode, no information of the noise will be printed out. For example right now we output this --repeat information: ./perf stat -r3 -x, sleep 1 1.164789,task-clock 8,context-switches 0,CPU-migrations 219,page-faults 3337800,cycles With this patch, the output will be appended with an additional entry for the noise value: ./perf stat -r3 -x, sleep 1 1.164789,task-clock,3.75% 8,context-switches,75.00% 0,CPU-migrations,100.00% 219,page-faults,0.00% 3337800,cycles,3.36% Signed-off-by: Zhengyu He <zhengyuh@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Stephane Eranian <eranian@google.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/1308861942-4945-1-git-send-email-zhengyuh@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf: export perf_event_refresh() to modules	Avi Kivity	2	-1/+7
	KVM needs one-shot samples, since a PMC programmed to -X will fire after X events and then again after 2^40 events (i.e. variable period). Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-4-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	x86, perf: Add constraints for architectural PMU	Avi Kivity	1	-5/+18
	The v1 PMU does not have any fixed counters. Using the v2 constraints, which do have fixed counters, causes an additional choice to be present in the weight calculation, but not when actually scheduling the event, leading to an event being not scheduled at all. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-3-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf: Add context field to perf_event	Avi Kivity	12	-20/+44
	The perf_event overflow handler does not receive any caller-derived argument, so many callers need to resort to looking up the perf_event in their local data structure. This is ugly and doesn't scale if a single callback services many perf_events. Fix by adding a context parameter to perf_event_create_kernel_counter() (and derived hardware breakpoints APIs) and storing it in the perf_event. The field can be accessed from the callback as event->overflow_handler_context. All callers are updated. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1309362157-6596-2-git-send-email-avi@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf, arch: Add generic NODE cache events	Peter Zijlstra	19	-2/+298
	Add a NODE level to the generic cache events which is used to measure local vs remote memory accesses. Like all other cache events, an ACCESS is HIT+MISS, if there is no way to distinguish between reads and writes do reads only etc.. The below needs filling out for !x86 (which I filled out with unsupported events). I'm fairly sure ARM can leave it like that since it doesn't strike me as an architecture that even has NUMA support. SH might have something since it does appear to have some NUMA bits. Sparc64, PowerPC and MIPS certainly want a good look there since they clearly are NUMA capable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: David Miller <davem@davemloft.net> Cc: Anton Blanchard <anton@samba.org> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf, intel: Try alternative OFFCORE encodings	Peter Zijlstra	2	-8/+41
	Since the OFFCORE registers are fully symmetric, try the other one when the specified one is already in use. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1306141897.18455.8.camel@twins Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf_events: Add Intel Sandy Bridge offcore_response low-level support	Stephane Eranian	2	-3/+11
	This patch adds Intel Sandy Bridge offcore_response support by providing the low-level constraint table for those events. On Sandy Bridge, there are two offcore_response events. Each uses its own dedictated extra register. But those registers are NOT shared between sibling CPUs when HT is on unlike Nehalem/Westmere. They are always private to each CPU. But they still need to be controlled within an event group. All events within an event group must use the same value for the extra MSR. That's not controlled by the second patch in this series. Furthermore on Sandy Bridge, the offcore_response events have NO counter constraints contrary to what the official documentation indicates, so drop the events from the contraint table. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110606145712.GA7304@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf_events: Fix validation of events using an extra reg	Stephane Eranian	2	-18/+57
	The validate_group() function needs to validate events with extra shared regs. Within an event group, only events with the same value for the extra reg can co-exist. This was not checked by validate_group() because it was missing the shared_regs logic. This patch changes the allocation of the fake cpuc used for validation to also point to a fake shared_regs structure such that group events be properly testing. It modifies __intel_shared_reg_get_constraints() to use spin_lock_irqsave() to avoid lockdep issues. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110606145708.GA7279@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf_events: Update Intel extra regs shared constraints management	Stephane Eranian	3	-152/+200
	This patch improves the code managing the extra shared registers used for offcore_response events on Intel Nehalem/Westmere. The idea is to use static allocation instead of dynamic allocation. This simplifies greatly the get and put constraint routines for those events. The patch also renames per_core to shared_regs because the same data structure gets used whether or not HT is on. When HT is off, those events still need to coordination because they use a extra MSR that has to be shared within an event group. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110606145703.GA7258@quad Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf: Remove the perf_output_begin(.sample) argument	Peter Zijlstra	4	-28/+23
	Since only samples call perf_output_sample() its much saner (and more correct) to put the sample logic in there than in the perf_output_begin()/perf_output_end() pair. Saves a useless argument, reduces conditionals and shrinks struct perf_output_handle, win! Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf: Remove the nmi parameter from the swevent and overflow interface	Peter Zijlstra	46	-141/+119
	The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf, x86: Add hw_watchdog_set_attr() in a sake of nmi-watchdog on P4	Cyrill Gorcunov	3	-1/+38
	Due to restriction and specifics of Netburst PMU we need a separated event for NMI watchdog. In particular every Netburst event consumes not just a counter and a config register, but also an additional ESCR register. Since ESCR registers are grouped upon counters (i.e. if ESCR is occupied for some event there is no room for another event to enter until its released) we need to pick up the "least" used ESCR (or the most available one) for nmi-watchdog purposes -- so MSR_P4_CRU_ESCR2/3 was chosen. With this patch nmi-watchdog and perf top should be able to run simultaneously. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> CC: Lin Ming <ming.m.lin@intel.com> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: Frederic Weisbecker <fweisbec@gmail.com> Tested-and-reviewed-by: Don Zickus <dzickus@redhat.com> Tested-and-reviewed-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110623124918.GC13050@sun Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	events: Ensure that timers are updated without requiring read() call	Eric B Munson	1	-2/+13
	The event tracing infrastructure exposes two timers which should be updated each time the value of the counter is updated. Currently, these counters are only updated when userspace calls read() on the fd associated with an event. This means that counters which are read via the mmap'd page exclusively never have their timers updated. This patch adds ensures that the timers are updated each time the values in the mmap'd page are updated. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308932786-5111-1-git-send-email-emunson@mgebm.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	events: Move lockless timer calculation into helper function	Eric B Munson	1	-7/+15
	Take the timer calculation from perf_output_read and move it to a helper function for any place that needs timer values but cannot take the ctx->lock. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308861279-15216-2-git-send-email-emunson@mgebm.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	events: Add note to update_event_times comment about holding ctx->lock	Eric B Munson	1	-0/+1
	Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1308861279-15216-1-git-send-email-emunson@mgebm.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf: Remove 64-bit alignment padding from perf_event_context	Richard Kennedy	1	-1/+1
	Reorder perf_event_context to remove 8 bytes of 64 bit alignment padding shrinking its size to 192 bytes, allowing it to fit into a smaller slab and use one fewer cache lines. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1307460819.1950.5.camel@castor.rsk Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	perf_events: Fix perf buffer watermark setting	Vince Weaver	2	-9/+13
	Since 2.6.36 (specifically commit d57e34fdd60b ("perf: Simplify the ring-buffer logic: make perf_buffer_alloc() do everything needed"), the perf_buffer_init_code() has been mis-setting the buffer watermark if perf_event_attr.wakeup_events has a non-zero value. This is because perf_event_attr.wakeup_events is a union with perf_event_attr.wakeup_watermark. This commit re-enables the check for perf_event_attr.watermark being set before continuing with setting a non-default watermark. This bug is most noticable when you are trying to use PERF_IOC_REFRESH with a value larger than one and perf_event_attr.wakeup_events is set to one. In this case the buffer watermark will be set to 1 and you will get extraneous POLL_IN overflows rather than POLL_HUP as expected. [ avoid using attr.wakeup_events when attr.watermark is set ] Signed-off-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.edu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	irq_work, alpha: Fix up arch hooks	Peter Zijlstra	1	-1/+1
	Commit e360adbe29 ("irq_work: Add generic hardirq context callbacks") fouled up the Alpha bit, not properly naming the arch specific function that raises the 'self-IPI'. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: stable@kernel.org # 37+ Link: http://lkml.kernel.org/n/tip-gukh0txmql2l4thgrekzzbfy@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-07-01	irq_work, ppc: Fix up arch hooks	Peter Zijlstra	1	-1/+1
	Commit e360adbe29 ("irq_work: Add generic hardirq context callbacks") fouled up the ppc bit, not properly naming the arch specific function that raises the 'self-IPI'. Cc: Huang Ying <ying.huang@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: stable@kernel.org # 37+ Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-eg0aqien8p1aqvzu9dft6dtv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-06-30	perf tools: Only display parent field if explictly sorted	Frederic Weisbecker	1	-1/+8
	We don't need to display the parent field if the parent sorting machinery is only used for parent filtering (as in "-p foo"). However if parent filtering is used in combination with explicit parent sorting ( -s parent), we want to display it. Result with: perf report -p kernel_thread -s parent Before: # Overhead Parent symbol # ........ ............. # 0.07% \| --- ioread8 ata_sff_check_status ata_sff_tf_load ata_sff_qc_issue ata_bmdma_qc_issue ata_qc_issue ata_scsi_translate ata_scsi_queuecmd scsi_dispatch_cmd scsi_request_fn __blk_run_queue __make_request generic_make_request submit_bio submit_bh journal_submit_commit_record jbd2_journal_commit_transaction kjournald2 kthread kernel_thread_helpe After: # Overhead Parent symbol # ........ ............. # 0.07% kernel_thread_helper \| --- ioread8 ata_sff_check_status ata_sff_tf_load ata_sff_qc_issue ata_bmdma_qc_issue ata_qc_issue ata_scsi_translate ata_scsi_queuecmd scsi_dispatch_cmd scsi_request_fn __blk_run_queue __make_request generic_make_request submit_bio submit_bh journal_submit_commit_record jbd2_journal_commit_transaction kjournald2 kthread kernel_thread_helper Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30	perf tools: Allow sort dimensions to be registered more than once	Frederic Weisbecker	1	-6/+6
	So that the parent sort dimension can be registered twice: once if we add it as an explicit sort dimension (-s parent) and twice if we request a parent filter (-p foo). We'll have only one parent sort dimension in the end but this allows to override the default parent filter with we gave in "-p" option. The goal of this is to prepare to allow the use of "-s parent" and "-p foo" at the same time, ie: sort by filtered parent. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30	perf tools: Don't display ignored entries on stdio ui	Frederic Weisbecker	1	-0/+3
	As for newt ui, don't display entries that have been marked as ignored. The practical current effect of this is to make parent filtering really working. Before, entries that were ignored were given a null parent but were still displayed. This resulted in some weird effects: # Overhead Command Shared Object Symbol # ........ ........... ................. ............ # ^A \| --- __lock_acquire \| \|--95.97%-- lock_acquire \| \| \| \|--30.75%-- _raw_spin_lock Discard these from the stdio display. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30	perf tools: Remove sort print helpers declarations	Frederic Weisbecker	1	-6/+0
	These are probably some old leftovers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30	perf tools: Make sort operations static	Frederic Weisbecker	2	-120/+99
	These don't need to be globally visible. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Cc: Sam Liao <phyomh@gmail.com>
2011-06-30	perf tools: Add inverted call graph report support.	Sam Liao	5	-11/+53
	Add "caller/callee" option to support inverted butterfly report, in the inverted report (with caller option), the call graph start from the callee's ancestor. Users can use such view to catch system's performance bottleneck from a sysprof like view. Using this option with specified sort order like pid gives us high level view of call graph statistics. Also add "-G" alias for inverted call graph. Signed-off-by: Sam Liao <phyomh@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2011-06-28	Linux 3.0-rc5v3.0-rc5	Linus Torvalds	1	-1/+1

2011-06-28	drm/i915: more struct_mutex locking	Hugh Dickins	2	-2/+6
	When auditing the locking in i915_gem.c (for a prospective change which I then abandoned), I noticed two places where struct_mutex is not held across GEM object manipulations that would usually require it. Since one is in initial setup and the other in driver unload, I'm guessing the mutex is not required for either; but post a patch in case it is. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drm/i915: use shmem_truncate_range	Hugh Dickins	1	-5/+2
	The interface to ->truncate_range is changing very slightly: once "tmpfs: take control of its truncate_range" has been applied, this can be applied. For now there is only a slight inefficiency while this remains unapplied, but it will soon become essential for managing shmem's use of swap. Change i915_gem_object_truncate() to use shmem_truncate_range() directly: which should also spare i915 later change if we switch from inode_operations->truncate_range to file_operations->fallocate. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drm/i915: use shmem_read_mapping_page	Hugh Dickins	2	-17/+15
	Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once "tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can be applied to ease the transition. Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in the one place it's needed; elsewhere use shmem_read_mapping_page(), with the mapping's gfp_mask properly initialized. Forget about __GFP_COLD: since tmpfs initializes its pages with memset, asking for a cold page is counter-productive. Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now declared there too, we shall remove the prototype from linux/mm.h later. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drm/ttm: use shmem_read_mapping_page	Hugh Dickins	1	-2/+3
	Soon tmpfs will stop supporting ->readpage and read_mapping_page(): once "tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can be applied to ease the transition. ttm_tt_swapin() and ttm_tt_swapout() use shmem_read_mapping_page() in place of read_mapping_page(), since their swap_space has been created with shmem_file_setup(). Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drivers/tty/serial/8250_pci.c: fix warning	Andrew Morton	1	-1/+1
	Fis the warning drivers/tty/serial/8250_pci.c:1457: warning: initialization from incompatible pointer type Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drivers/misc/ioc4.c: fix section mismatch / race condition	Ralf Baechle	1	-1/+1
	Fix this section mismatch: WARNING: drivers/misc/ioc4.o(.data+0x144): Section mismatch in reference from the variable ioc4_load_modules_work to the function .devinit.text:ioc4_load_modules() The variable ioc4_load_modules_work references the function __devinit ioc4_load_modules() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: driver, _template, _timer, _sht, _ops, _probe, _probe_one, _console This one is potentially fatal; by the time ioc4_load_modules is invoked it may already have been freed. For that reason ioc4_load_modules_work can't be turned to __devinitdata but also because it's referenced in ioc4_exit. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Brent Casavant <bcasavan@sgi.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drivers/leds/leds-lp5523.c: fix section mismatches	Ralf Baechle	1	-2/+2
	Fix this section mismatch: WARNING: drivers/leds/leds-lp5523.o(.text+0x12f4): Section mismatch in reference from the function lp5523_probe() to the function .init.text:lp5523_init_led() The function lp5523_probe() references the function __init lp5523_init_led(). This is often because lp5523_probe lacks a __init annotation or the annotation of lp5523_init_led is wrong. Fixing this one triggers one more mismatch, fix that one as well. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-06-28	drivers/leds/leds-lp5521.c: fix section mismatches	Ralf Baechle	1	-2/+2
	Fix this section mismatch: WARNING: drivers/leds/leds-lp5521.o(.text+0xf2c): Section mismatch in reference from the function lp5521_probe() to the function .init.text:lp5521_init_led() The function lp5521_probe() references the function __init lp5521_init_led(). This is often because lp5521_probe lacks a __init annotation or the annotation of lp5521_init_led is wrong. Fixing this mismatch triggers one more mismatch, fix that one as well. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>