summaryrefslogtreecommitdiffstats
path: root/include/uapi (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2015-05-311-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) default CONFIG_NETFILTER_INGRESS to y for easier compile-testing of all options. 2) Allow to bind a table to net_device. This introduces the internal NFT_AF_NEEDS_DEV flag to perform a mandatory check for this binding. This is required by the next patch. 3) Add the 'netdev' table family, this new table allows you to create ingress filter basechains. This provides access to the existing nf_tables features from ingress. 4) Kill unused argument from compat_find_calc_{match,target} in ip_tables and ip6_tables, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_tables: allow to bind table to net_devicePablo Neira Ayuso2015-05-261-0/+2
| | | | | | | | | | | | | | | | | | | | This patch adds the internal NFT_AF_NEEDS_DEV flag to indicate that you must attach this table to a net_device. This change is required by the follow up patch that introduces the new netdev table. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | Merge branch 'master' of ↵David S. Miller2015-05-311-0/+25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-05-28 This series contains updates to ethtool, ixgbe, i40e and i40evf. John adds helper routines for ethtool to pass VF to rx_flow_spec. Since the ring_cookie is 64 bits wide which is much larger than what could be used for actual queue index values, provide helper routines to pack a VF index into the cookie. Then John provides a ixgbe patch to allow flow director to use the entire queue space. Neerav provides a i40e patch to collect XOFF Rx stats, where it was not being collected before. Anjali provides ATR support for tunneled packets, as well as stats to count tunnel ATR hits. Cleaned up PF struct members which are unnecessary, since we can use the stat index macro directly. Cleaned up flow director ATR/SB messages to a higher debug level since they are not useful unless silicon validation is happening. Greg provides a patch to disable offline diagnostics if VFs are enabled since ethtool offline diagnostic tests are not designed (out of scope) to disable VF functions for testing and re-enable afterward. Also cleans up TODO comment that is no longer needed. Vasu provides a fix an FCoE EOF case where i40e_fcoe_ctxt_eof() maybe called before i40e_fcoe_eof_is_supported() is called. Jesse adds skb->xmit_more support for i40evf. Then provides a performance enhancement for i40evf by inlining some functions which provides a 15% gain in small packet performance. Also cleans up the use of time_stamp since it is no longer used to determine if there is a tx_hang and was a part of a previous tx_hang design which is no longer used. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ethtool: Add helper routines to pass vf to rx_flow_specJohn Fastabend2015-05-281-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ring_cookie is 64 bits wide which is much larger than can be used for actual queue index values. So provide some helper routines to pack a VF index into the cookie. This is useful to steer packets to a VF ring without having to know the queue layout of the device. CC: Alex Duyck <alexander.h.duyck@redhat.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | | bpf: allow BPF programs access skb->skb_iif and skb->dev->ifindex fieldsAlexei Starovoitov2015-05-311-0/+2
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | classic BPF already exposes skb->dev->ifindex via SKF_AD_IFINDEX extension. Allow eBPF program to access it as well. Note that classic aborts execution of the program if 'skb->dev == NULL' (which is inconvenient for program writers), whereas eBPF returns zero in such case. Also expose the 'skb_iif' field, since programs triggered by redirected packet need to known the original interface index. Summary: __skb->ifindex -> skb->dev->ifindex __skb->ingress_ifindex -> skb->skb_iif Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Create percpu rt6_infoMartin KaFai Lau2015-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | After the patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception', we need to compensate the performance hit (bouncing dst->__refcnt). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-05-232-1/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2015-05-161-0/+3
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== The following patchset contains Netfilter fixes for your net tree, they are: 1) Fix a leak in IPVS, the sysctl table is not released accordingly when destroying a netns, patch from Tommi Rantala. 2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is compiled as module, from Florian Westphal. 3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch from Jesper Dangaard Brouer. 4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores a map, from Mirek Kratochvil. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | conntrack: RFC5961 challenge ACK confuse conntrack LAST-ACK transitionJesper Dangaard Brouer2015-05-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In compliance with RFC5961, the network stack send challenge ACK in response to spurious SYN packets, since commit 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets"). This pose a problem for netfilter conntrack in state LAST_ACK, because this challenge ACK is (falsely) seen as ACKing last FIN, causing a false state transition (into TIME_WAIT). The challenge ACK is hard to distinguish from real last ACK. Thus, solution introduce a flag that tracks the potential for seeing a challenge ACK, in case a SYN packet is let through and current state is LAST_ACK. When conntrack transition LAST_ACK to TIME_WAIT happens, this flag is used for determining if we are expecting a challenge ACK. Scapy based reproducer script avail here: https://github.com/netoptimizer/network-testing/blob/master/scapy/tcp_hacks_3WHS_LAST_ACK.py Fixes: 0c228e833c88 ("tcp: Restore RFC5961-compliant behavior for SYN packets") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | rename RTNH_F_EXTERNAL to RTNH_F_OFFLOADRoopa Prabhu2015-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RTNH_F_EXTERNAL today is printed as "offload" in iproute2 output. This patch renames the flag to be consistent with what the user sees. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: pkt_cls: remove unused macros from uapiFlorian Westphal2015-05-221-27/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jamal points out that this header also contains kernel internal magic that cannot be used from userspace for anything meaningful. Lets remove what the kernel doesn't use anymore and wrap remainder with __KERNEL__. Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: add tcpi_segs_in and tcpi_segs_out to tcp_infoMarcelo Ricardo Leitner2015-05-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tracks the total number of inbound and outbound segments on a TCP socket. One may use this number to have an idea on connection quality when compared against the retransmissions. RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut These are a 32bit field each and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->segs_out was placed near tp->snd_nxt for good data locality and minimal performance impact, while tp->segs_in was placed near tp->bytes_received for the same reason. Join work with Eric Dumazet. Note that received SYN are accounted on the listener, but sent SYNACK are not accounted. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | bpf: allow bpf programs to tail-call other bpf programsAlexei Starovoitov2015-05-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program. bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal. New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables. The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. Use cases: Acked-by: Daniel Borkmann <daniel@iogearbox.net> ========== - simplify complex programs by splitting them into a sequence of small programs - dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event; For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp; - for TC use case, bpf_tail_call() allows to implement reclassify-like logic - bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks. In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue. - bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid. - jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net-next: ethtool: Added port speed macros.Parav Pandit2015-05-191-1/+5
| |_|/ |/| | | | | | | | | | | Signed-off-by: Parav Pandit <parav.pandit@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: add netfilter ingress hook after handle_ing() under unique static keyPablo Neira2015-05-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the Netfilter ingress hook just after the existing tc ingress hook, that seems to be the consensus solution for this. Note that the Netfilter hook resides under the global static key that enables ingress filtering. Nonetheless, Netfilter still also has its own static key for minimal impact on the existing handle_ing(). * Without this patch: Result: OK: 6216490(c6216338+d152) usec, 100000000 (60byte,0frags) 16086246pps 7721Mb/sec (7721398080bps) errors: 100000000 42.46% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 25.92% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.81% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.62% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.70% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.34% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.44% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch: Result: OK: 6214833(c6214731+d101) usec, 100000000 (60byte,0frags) 16090536pps 7723Mb/sec (7723457280bps) errors: 100000000 41.23% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 26.57% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 7.72% kpktgend_0 [pktgen] [k] pktgen_thread_worker 5.55% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.78% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 2.06% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 1.43% kpktgend_0 [kernel.kallsyms] [k] __build_skb * Without this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9269001(c9268821+d179) usec, 100000000 (60byte,0frags) 10788648pps 5178Mb/sec (5178551040bps) errors: 100000000 40.99% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.50% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.77% kpktgend_0 [cls_u32] [k] u32_classify 5.62% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.18% kpktgend_0 [pktgen] [k] pktgen_thread_worker 3.23% kpktgend_0 [kernel.kallsyms] [k] tc_classify 2.97% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 1.83% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.50% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk 0.99% kpktgend_0 [kernel.kallsyms] [k] __build_skb * With this patch + tc ingress: tc filter add dev eth4 parent ffff: protocol ip prio 1 \ u32 match ip dst 4.3.2.1/32 Result: OK: 9308218(c9308091+d126) usec, 100000000 (60byte,0frags) 10743194pps 5156Mb/sec (5156733120bps) errors: 100000000 42.01% kpktgend_0 [kernel.kallsyms] [k] __netif_receive_skb_core 17.78% kpktgend_0 [kernel.kallsyms] [k] kfree_skb 11.70% kpktgend_0 [cls_u32] [k] u32_classify 5.46% kpktgend_0 [kernel.kallsyms] [k] tc_classify_compat 5.16% kpktgend_0 [pktgen] [k] pktgen_thread_worker 2.98% kpktgend_0 [kernel.kallsyms] [k] ip_rcv 2.84% kpktgend_0 [kernel.kallsyms] [k] tc_classify 1.96% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_internal 1.57% kpktgend_0 [kernel.kallsyms] [k] netif_receive_skb_sk Note that the results are very similar before and after. I can see gcc gets the code under the ingress static key out of the hot path. Then, on that cold branch, it generates the code to accomodate the netfilter ingress static key. My explanation for this is that this reduces the pressure on the instruction cache for non-users as the new code is out of the hot path, and it comes with minimal impact for tc ingress users. Using gcc version 4.8.4 on: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 [...] L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 8192K Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | geneve: add initial netdev driver for GENEVE tunnelsJohn W. Linville2015-05-131-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an initial implementation of a netdev driver for GENEVE tunnels. This implementation uses a fixed UDP port, and only supports point-to-point links with specific partner endpoints. Only IPv4 links are supported at this time. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | packet: rollover statisticsWillem de Bruijn2015-05-131-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rollover indicates exceptional conditions. Export a counter to inform socket owners of this state. If no socket with sufficient room is found, rollover fails. Also count these events. Finally, also count when flows are rolled over early thanks to huge flow detection, to validate its correctness. Tested: Read counters in bench_rollover on all other tests in the patchset Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tc: introduce Flower classifierJiri Pirko2015-05-131-0/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a flow-based filter. So far, the very essential packet fields are supported. This patch is only the first step. There is a lot of potential performance improvements possible to implement. Also a lot of features are missing now. They will be addressed in follow-up patches. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: use counter to break reclassify loopsFlorian Westphal2015-05-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seems all we want here is to avoid endless 'goto reclassify' loop. tc_classify_compat even resets this counter when something other than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't break hypothetical loops induced by something other than perpetual TC_ACT_RECLASSIFY return values. skb_act_clone is now identical to skb_clone, so just use that. Tested with following (bogus) filter: tc filter add dev eth0 parent ffff: \ protocol ip u32 match u32 0 0 police rate 10Kbit burst \ 64000 mtu 1500 action reclassify Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-05-132-0/+11
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Four minor merge conflicts: 1) qca_spi.c renamed the local variable used for the SPI device from spi_device to spi, meanwhile the spi_set_drvdata() call got moved further up in the probe function. 2) Two changes were both adding new members to codel params structure, and thus we had overlapping changes to the initializer function. 3) 'net' was making a fix to sk_release_kernel() which is completely removed in 'net-next'. 4) In net_namespace.c, the rtnl_net_fill() call for GET operations had the command value fixed, meanwhile 'net-next' adjusted the argument signature a bit. This also matches example merge resolutions posted by Stephen Rothwell over the past two days. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-05-133-0/+17
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Handle max TX power properly wrt VIFs and the MAC in iwlwifi, from Avri Altman. 2) Use the correct FW API for scan completions in iwlwifi, from Avraham Stern. 3) FW monitor in iwlwifi accidently uses unmapped memory, fix from Liad Kaufman. 4) rhashtable conversion of mac80211 station table was buggy, the virtual interface was not taken into account. Fix from Johannes Berg. 5) Fix deadlock in rtlwifi by not using a zero timeout for usb_control_msg(), from Larry Finger. 6) Update reordering state before calculating loss detection, from Yuchung Cheng. 7) Fix off by one in bluetooth firmward parsing, from Dan Carpenter. 8) Fix extended frame handling in xiling_can driver, from Jeppe Ledet-Pedersen. 9) Fix CODEL packet scheduler behavior in the presence of TSO packets, from Eric Dumazet. 10) Fix NAPI budget testing in fm10k driver, from Alexander Duyck. 11) macvlan needs to propagate promisc settings down the the lower device, from Vlad Yasevich. 12) igb driver can oops when changing number of rings, from Toshiaki Makita. 13) Source specific default routes not handled properly in ipv6, from Markus Stenberg. 14) Use after free in tc_ctl_tfilter(), from WANG Cong. 15) Use softirq spinlocking in netxen driver, from Tony Camuso. 16) Two ARM bpf JIT fixes from Nicolas Schichan. 17) Handle MSG_DONTWAIT properly in ring based AF_PACKET sends, from Mathias Kretschmer. 18) Fix x86 bpf JIT implementation of FROM_{BE16,LE16,LE32}, from Alexei Starovoitov. 19) ll_temac driver DMA maps TX packet header with incorrect length, fix from Michal Simek. 20) We removed pm_qos bits from netdevice.h, but some indirect references remained. Kill them. From David Ahern. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (90 commits) net: Remove remaining remnants of pm_qos from netdevice.h e1000e: Add pm_qos header net: phy: micrel: Fix regression in kszphy_probe net: ll_temac: Fix DMA map size bug x86: bpf_jit: fix FROM_BE16 and FROM_LE16/32 instructions netns: return RTM_NEWNSID instead of RTM_GETNSID on a get Update be2net maintainers' email addresses net_sched: gred: use correct backlog value in WRED mode pppoe: drop pppoe device in pppoe_unbind_sock_work net: qca_spi: Fix possible race during probe net: mdio-gpio: Allow for unspecified bus id af_packet / TX_RING not fully non-blocking (w/ MSG_DONTWAIT). bnx2x: limit fw delay in kdump to 5s after boot ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits. ARM: net fix emit_udiv() for BPF_ALU | BPF_DIV | BPF_K intruction. mpls: Change reserved label names to be consistent with netbsd usbnet: avoid integer overflow in start_xmit netxen_nic: use spin_[un]lock_bh around tx_clean_lock (2) net: xgene_enet: Set hardware dependency net: amd-xgbe: Add hardware dependency ...
| | * | mpls: Change reserved label names to be consistent with netbsdTom Herbert2015-05-101-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since these are now visible to userspace it is nice to be consistent with BSD (sys/netmpls/mpls.h in netBSD). Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | mpls: Move reserved label definitionsTom Herbert2015-05-061-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move to include/uapi/linux/mpls.h to be externally visibile. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Merge branch 'for-upstream' of ↵David S. Miller2015-05-042-0/+7
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-05-04 Here's the first bluetooth-next pull request for 4.2: - Various fixes for at86rf230 driver - ieee802154: trace events support for rdev->ops - HCI UART driver refactoring - New Realtek IDs added to btusb driver - Off-by-one fix for rtl8723b in btusb driver - Refactoring of btbcm driver for both UART & USB use Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the ↵Tatyana Nikolova2015-05-051-0/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | connecting peer to its clients Add functionality to enable the port mapper on the passive side to provide to its clients the actual (non-mapped) ip/tcp address information of the connecting peer 1) Adding remote_info_cb() to process the address info of the connecting peer The address info is provided by the user space port mapper service when the connection is initiated by the peer 2) Adding a hash list to store the remote address info 3) Adding functionality to add/remove the remote address info After the info has been provided to the port mapper client, it is removed from the hash list Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | | | net_sched: gred: add TCA_GRED_LIMIT attributeDavid Ward2015-05-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a GRED qdisc, if the default "virtual queue" (VQ) does not have drop parameters configured, then packets for the default VQ are not subjected to RED and are only dropped if the queue is larger than the net_device's tx_queue_len. This behavior is useful for WRED mode, since these packets will still influence the calculated average queue length and (therefore) the drop probability for all of the other VQs. However, for some drivers tx_queue_len is zero. In other cases the user may wish to make the limit the same for all VQs (including the default VQ with no drop parameters). This change adds a TCA_GRED_LIMIT attribute to set the GRED queue limit, in bytes, during qdisc setup. (This limit is in bytes to be consistent with the drop parameters.) The default limit is the same as for a bfifo queue (tx_queue_len * psched_mtu). If the drop parameters of any VQ are configured with a smaller limit than the GRED queue limit, that VQ will still observe the smaller limit instead. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | bonding: add netlink support for sys prio, actor sys mac, and port keyAndy Gospodarek2015-05-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds netlink support for the following bonding options: * BOND_OPT_AD_ACTOR_SYS_PRIO * BOND_OPT_AD_ACTOR_SYSTEM * BOND_OPT_AD_USER_PORT_KEY When setting the actor system mac address we assume the netlink message contains a binary mac and not a string representation of a mac. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> [jt: completed the setting side of the netlink attributes] Signed-off-by: Jonathan Toppins <jtoppins@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | codel: add ce_threshold attributeEric Dumazet2015-05-111-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For DCTCP or similar ECN based deployments on fabrics with shallow buffers, hosts are responsible for a good part of the buffering. This patch adds an optional ce_threshold to codel & fq_codel qdiscs, so that DCTCP can have feedback from queuing in the host. A DCTCP enabled egress port simply have a queue occupancy threshold above which ECT packets get CE mark. In codel language this translates to a sojourn time, so that one doesn't have to worry about bytes or bandwidth but delays. This makes the host an active participant in the health of the whole network. This also helps experimenting DCTCP in a setup without DCTCP compliant fabric. On following example, ce_threshold is set to 1ms, and we can see from 'ldelay xxx us' that TCP is not trying to go around the 5ms codel target. Queue has more capacity to absorb inelastic bursts (say from UDP traffic), as queues are maintained to an optimal level. lpaa23:~# ./tc -s -d qd sh dev eth1 qdisc mq 1: dev eth1 root Sent 87910654696 bytes 58065331 pkt (dropped 0, overlimits 0 requeues 42961) backlog 3108242b 364p requeues 42961 qdisc codel 8063: dev eth1 parent 1:1 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7363778701 bytes 4863809 pkt (dropped 0, overlimits 0 requeues 5503) rate 2348Mbit 193919pps backlog 255866b 46p requeues 5503 count 0 lastcount 0 ldelay 1.0ms drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 72384 qdisc codel 8064: dev eth1 parent 1:2 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 7636486190 bytes 5043942 pkt (dropped 0, overlimits 0 requeues 5186) rate 2319Mbit 191538pps backlog 207418b 64p requeues 5186 count 0 lastcount 0 ldelay 694us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 69873 qdisc codel 8065: dev eth1 parent 1:3 limit 1000p target 5.0ms ce_threshold 1.0ms interval 100.0ms Sent 11569360142 bytes 7641602 pkt (dropped 0, overlimits 0 requeues 5554) rate 3041Mbit 251096pps backlog 210446b 59p requeues 5554 count 0 lastcount 0 ldelay 889us drop_next 0us maxpacket 68130 ecn_mark 0 drop_overlimit 0 ce_mark 37780 ... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Glenn Judd <glenn.judd@morganstanley.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | netlink: allow to listen "all" netnsNicolas Dichtel2015-05-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | More accurately, listen all netns that have a nsid assigned into the netns where the netlink socket is opened. For this purpose, a netlink socket option is added: NETLINK_LISTEN_ALL_NSID. When this option is set on a netlink socket, this socket will receive netlink notifications from all netns that have a nsid assigned into the netns where the socket has been opened. The nsid is sent to userland via an anscillary data. With this patch, a daemon needs only one socket to listen many netns. This is useful when the number of netns is high. Because 0 is a valid value for a nsid, the field nsid_is_set indicates if the field nsid is valid or not. skb->cb is initialized to 0 on skb allocation, thus we are sure that we will never send a nsid 0 by error to the userland. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge tag 'linux-can-next-for-4.2-20150506' of ↵David S. Miller2015-05-101-0/+6
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2015-05-06 this is a pull request of a seven patches for net-next/master. Andreas Gröger contributes two patches for the janz-ican3 driver. In the first patch, the documentation for already existing sysfs entries is added, the second patch adds support for another module/firmware variant. A patch by Shawn Landden makes the padding in the struct can_frame explicit. The next 4 patches target the flexcan driver, the first one is by David Jander adding some documentation, the reaming three by me add more documentation and two small code cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | can.h: make padding given by gcc explicitShawn Landden2015-05-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current definition of struct can_frame has a 16-byte size, with 8-byte alignment, but the 3 bytes of padding are not explicit like the similar 2 bytes of padding of struct canfd_frame. Make it explicit so it is easier to read. Signed-off-by: Shawn Landden <shawn@churchofgit.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
* | | | | Merge tag 'mac80211-next-for-davem-2015-05-06' of ↵David S. Miller2015-05-091-12/+16
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Lots of updates for net-next for this cycle. As usual, we have a lot of small fixes and cleanups, the bigger items are: * proper mac80211 rate control locking, to fix some random crashes (this required changing other locking as well) * mac80211 "fast-xmit", a mechanism to reduce, in most cases, the amount of code we execute while going from ndo_start_xmit() to the driver * this also clears the way for properly supporting S/G and checksum and segmentation offloads ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STAArik Nemtsov2015-05-061-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GO_CONCURRENT regulatory definition can be extended to station interfaces requesting to IR as part of TDLS off-channel operations. Rename the GO_CONCURRENT flag to IR_CONCURRENT and allow the added use-case. Change internal users of GO_CONCURRENT to use the new definition. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | | | tcp: add TCPWinProbe and TCPKeepAlive SNMP countersEric Dumazet2015-05-091-0/+2
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Diagnosing problems related to Window Probes has been hard because we lack a counter. TCPWinProbe counts the number of ACK packets a sender has to send at regular intervals to make sure a reverse ACK packet opening back a window had not been lost. TCPKeepAlive counts the number of ACK packets sent to keep TCP flows alive (SO_KEEPALIVE) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: provide SYN headers for passive connectionsEric Dumazet2015-05-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows a server application to get the TCP SYN headers for its passive connections. This is useful if the server is doing fingerprinting of clients based on SYN packet contents. Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN. The first is used on a socket to enable saving the SYN headers for child connections. This can be set before or after the listen() call. The latter is used to retrieve the SYN headers for passive connections, if the parent listener has enabled TCP_SAVE_SYN. TCP_SAVED_SYN is read once, it frees the saved SYN headers. The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP headers. Original patch was written by Tom Herbert, I changed it to not hold a full skb (and associated dst and conntracking reference). We have used such patch for about 3 years at Google. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tc: remove unused redirect ttlJamal Hadi Salim2015-05-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | improves ingress+u32 performance from 22.4 Mpps to 22.9 Mpps Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: sched: remove TC_MUNGED bitsFlorian Westphal2015-05-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not used. pedit sets TC_MUNGED when packet content was altered, but all the core does is unset MUNGED again and then set OK2MUNGE. And the latter isn't tested anywhere. So lets remove both TC_MUNGED and TC_OK2MUNGE. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-05-031-1/+1
|\ \ \ \ \ | | |/ / / | |/| / / | |_|/ / |/| | | | | | | Merge net into net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | virtio: fix typo in vring_need_event() doc commentStefan Hajnoczi2015-05-021-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | Here the "other side" refers to the guest or host. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | tcp: add TCP_CC_INFO socket optionEric Dumazet2015-04-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some Congestion Control modules can provide per flow information, but current way to get this information is to use netlink. Like TCP_INFO, let's add TCP_CC_INFO so that applications can issue a getsockopt() if they have a socket file descriptor, instead of playing complex netlink games. Sample usage would be : union tcp_cc_info info; socklen_t len = sizeof(info); if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: prepare CC get_info() access from getsockopt()Eric Dumazet2015-04-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like that optional info provided by Congestion Control modules using netlink can also be read using getsockopt() This patch changes get_info() to put this information in a buffer, instead of skb, like tcp_get_info(), so that following patch can reuse this common infrastructure. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: add tcpi_bytes_received to tcp_infoEric Dumazet2015-04-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tracks total number of payload bytes received on a TCP socket. This is the sum of all changes done to tp->rcv_nxt RFC4898 named this : tcpEStatsAppHCThruOctetsReceived This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_received was placed near tp->rcv_nxt for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: add tcpi_bytes_acked to tcp_infoEric Dumazet2015-04-291-0/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tracks total number of bytes acked for a TCP socket. This is the sum of all changes done to tp->snd_una, and allows for precise tracking of delivered data. RFC4898 named this : tcpEStatsAppHCThruOctetsAcked This is a 64bit field, and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->bytes_acked was placed near tp->snd_una for best data locality and minimal performance impact. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Eric Salo <salo@google.com> Cc: Martin Lau <kafai@fb.com> Cc: Chris Rapier <rapier@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-04-271-1/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Another set of mainly bugfixes and a couple of cleanups. No new functionality in this round. Highlights include: Stable patches: - Fix a regression in /proc/self/mountstats - Fix the pNFS flexfiles O_DIRECT support - Fix high load average due to callback thread sleeping Bugfixes: - Various patches to fix the pNFS layoutcommit support - Do not cache pNFS deviceids unless server notifications are enabled - Fix a SUNRPC transport reconnection regression - make debugfs file creation failure non-fatal in SUNRPC - Another fix for circular directory warnings on NFSv4 "junctioned" mountpoints - Fix locking around NFSv4.2 fallocate() support - Truncating NFSv4 file opens should also sync O_DIRECT writes - Prevent infinite loop in rpcrdma_ep_create() Features: - Various improvements to the RDMA transport code's handling of memory registration - Various code cleanups" * tag 'nfs-for-4.1-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (55 commits) fs/nfs: fix new compiler warning about boolean in switch nfs: Remove unneeded casts in nfs NFS: Don't attempt to decode missing directory entries Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" NFS: Rename idmap.c to nfs4idmap.c NFS: Move nfs_idmap.h into fs/nfs/ NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h NFS: Add a stub for GETDEVICELIST nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes nfs: fix DIO good bytes calculation nfs: Fetch MOUNTED_ON_FILEID when updating an inode sunrpc: make debugfs file creation failure non-fatal nfs: fix high load average due to callback thread sleeping NFS: Reduce time spent holding the i_mutex during fallocate() NFS: Don't zap caches on fallocate() xprtrdma: Make rpcrdma_{un}map_one() into inline functions xprtrdma: Handle non-SEND completions via a callout xprtrdma: Add "open" memreg op xprtrdma: Add "destroy MRs" memreg op xprtrdma: Add "reset MRs" memreg op ...
| * \ Merge tag 'nfs-rdma-for-4.1-1' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust2015-04-232-4/+16
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS: NFSoRDMA Client Changes This patch series creates an operation vector for each of the different memory registration modes. This should make it easier to one day increase credit limit, rsize, and wsize. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | NFS: Move nfs_idmap.h into fs/nfs/Anna Schumaker2015-04-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file is only used internally to the NFS v4 module, so it doesn't need to be in the global include path. I also renamed it from nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include file. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-04-261-0/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second batch of KVM changes from Paolo Bonzini: "This mostly includes the PPC changes for 4.1, which this time cover Book3S HV only (debugging aids, minor performance improvements and some cleanups). But there are also bug fixes and small cleanups for ARM, x86 and s390. The task_migration_notifier revert and real fix is still pending review, but I'll send it as soon as possible after -rc1" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (29 commits) KVM: arm/arm64: check IRQ number on userland injection KVM: arm: irqfd: fix value returned by kvm_irq_map_gsi KVM: VMX: Preserve host CR4.MCE value while in guest mode. KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C KVM: PPC: Book3S HV: Streamline guest entry and exit KVM: PPC: Book3S HV: Use bitmap of active threads rather than count KVM: PPC: Book3S HV: Use decrementer to wake napping threads KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu KVM: PPC: Book3S HV: Minor cleanups KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update KVM: PPC: Book3S HV: Accumulate timing information for real-mode code KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT KVM: PPC: Book3S HV: Add ICP real mode counters KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock KVM: PPC: Book3S HV: Add guest->host real mode completion counters KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte ...
| * | | | KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.Michael Ellerman2015-04-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some PowerNV systems include a hardware random-number generator. This HWRNG is present on POWER7+ and POWER8 chips and is capable of generating one 64-bit random number every microsecond. The random numbers are produced by sampling a set of 64 unstable high-frequency oscillators and are almost completely entropic. PAPR defines an H_RANDOM hypercall which guests can use to obtain one 64-bit random sample from the HWRNG. This adds a real-mode implementation of the H_RANDOM hypercall. This hypercall was implemented in real mode because the latency of reading the HWRNG is generally small compared to the latency of a guest exit and entry for all the threads in the same virtual core. Userspace can detect the presence of the HWRNG and the H_RANDOM implementation by querying the KVM_CAP_PPC_HWRNG capability. The H_RANDOM hypercall implementation will only be invoked when the guest does an H_RANDOM hypercall if userspace first enables the in-kernel H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | | | | Merge tag 'sound-fix-4.1-rc1' of ↵Linus Torvalds2015-04-241-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Here are a few fixes that have been pending since the previous pull request: a regression fix for HD-audio multiple SPDIF / HDMI devices, several ALC256 codec fixes, a couple of i915 HDMI audio fixes, and various small fixes. Nothing exciting, just boring, but things good to have" * tag 'sound-fix-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - fix headset mic detection problem for one more machine ALSA: hda/realtek - Fix Headphone Mic doesn't recording for ALC256 ALSA: hda - fix "num_steps = 0" error on ALC256 ALSA: usb-audio: Fix audio output on Roland SC-D70 sound module ALSA: hda - add AZX_DCAPS_I915_POWERWELL to Baytrail ALSA: hda - only sync BCLK to the display clock for Haswell & Broadwell ALSA: hda - Mute headphone pin on suspend on XPS13 9333 sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) ALSA: asound.h - use SNDRV_CTL_ELEM_ID_NAME_MAXLEN ALSA: hda - potential (but unlikely) uninitialized variable ALSA: hda - Fix regression for slave SPDIF setups ALSA: intel8x0: Check pci_iomap() success for DEVICE_ALI ALSA: hda - simplify azx_has_pm_runtime
| * | | | | ALSA: asound.h - use SNDRV_CTL_ELEM_ID_NAME_MAXLENVinod Koul2015-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we have defined SNDRV_CTL_ELEM_ID_NAME_MAXLEN as size of name array so use this define instead of numeric value Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>