summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* bpf: allow option for setting bpf_l4_csum_replace from scratchDaniel Borkmann2017-01-241-0/+1
| | | | | | | | | | | | | When programs need to calculate the csum from scratch for small UDP packets and use bpf_l4_csum_replace() to feed the result from helpers like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0 that would ignore the case of current csum being 0, and which would still allow for the helper to set the csum and transform when needed to CSUM_MANGLED_0. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: Introduce sample tc actionYotam Gigi2017-01-243-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | This action allows the user to sample traffic matched by tc classifier. The sampling consists of choosing packets randomly and sampling them using the psample module. The user can configure the psample group number, the sampling rate and the packet's truncation (to save kernel-user traffic). Example: To sample ingress traffic from interface eth1, one may use the commands: tc qdisc add dev eth1 handle ffff: ingress tc filter add dev eth1 parent ffff: \ matchall action sample rate 12 group 4 Where the first command adds an ingress qdisc and the second starts sampling randomly with an average of one sampled packet per 12 packets on dev eth1 to psample group 4. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Introduce psample, a new genetlink channel for packet samplingYotam Gigi2017-01-243-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a general way for kernel modules to sample packets, without being tied to any specific subsystem. This netlink channel can be used by tc, iptables, etc. and allow to standardize packet sampling in the kernel. For every sampled packet, the psample module adds the following metadata fields: PSAMPLE_ATTR_IIFINDEX - the packets input ifindex, if applicable PSAMPLE_ATTR_OIFINDEX - the packet output ifindex, if applicable PSAMPLE_ATTR_ORIGSIZE - the packet's original size, in case it has been truncated during sampling PSAMPLE_ATTR_SAMPLE_GROUP - the packet's sample group, which is set by the user who initiated the sampling. This field allows the user to differentiate between several samplers working simultaneously and filter packets relevant to him PSAMPLE_ATTR_GROUP_SEQ - sequence counter of last sent packet. The sequence is kept for each group PSAMPLE_ATTR_SAMPLE_RATE - the sampling rate used for sampling the packets PSAMPLE_ATTR_DATA - the actual packet bits The sampled packets are sent to the PSAMPLE_NL_MCGRP_SAMPLE multicast group. In addition, add the GET_GROUPS netlink command which allows the user to see the current sample groups, their refcount and sequence number. This command currently supports only netlink dump mode. Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: Fix typo for MDIO module boilerplate commentFlorian Fainelli2017-01-241-1/+1
| | | | | | | | | | The module boilerplate macro is named mdio_module_driver and not module_mdio_driver, fix that. Fixes: a9049e0c513c ("mdio: Add support for mdio drivers.") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: multicast to unicastFelix Fietkau2017-01-242-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implements an optional, per bridge port flag and feature to deliver multicast packets to any host on the according port via unicast individually. This is done by copying the packet per host and changing the multicast destination MAC to a unicast one accordingly. multicast-to-unicast works on top of the multicast snooping feature of the bridge. Which means unicast copies are only delivered to hosts which are interested in it and signalized this via IGMP/MLD reports previously. This feature is intended for interface types which have a more reliable and/or efficient way to deliver unicast packets than broadcast ones (e.g. wifi). However, it should only be enabled on interfaces where no IGMPv2/MLDv1 report suppression takes place. This feature is disabled by default. The initial patch and idea is from Felix Fietkau. Signed-off-by: Felix Fietkau <nbd@nbd.name> [linus.luessing@c0d3.blue: various bug + style fixes, commit message] Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Introduce a sysctl that modifies the value of PROT_SOCK.Krister Johansen2017-01-242-0/+11
| | | | | | | | | | | | | | | | | | | Add net.ipv4.ip_unprivileged_port_start, which is a per namespace sysctl that denotes the first unprivileged inet port in the namespace. To disable all privileged ports set this to zero. It also checks for overlap with the local port range. The privileged and local range may not overlap. The use case for this change is to allow containerized processes to bind to priviliged ports, but prevent them from ever being allowed to modify their container's network configuration. The latter is accomplished by ensuring that the network namespace is not a child of the user namespace. This modification was needed to allow the container manager to disable a namespace's priviliged port restrictions without exposing control of the network namespace to processes in the user namespace. Signed-off-by: Krister Johansen <kjlx@templeofstupid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: add a longest prefix match trie map implementationDaniel Mack2017-01-231-0/+7
| | | | | | | | | | | | | | | | | | | | This trie implements a longest prefix match algorithm that can be used to match IP addresses to a stored set of ranges. Internally, data is stored in an unbalanced trie of nodes that has a maximum height of n, where n is the prefixlen the trie was created with. Tries may be created with prefix lengths that are multiples of 8, in the range from 8 to 2048. The key used for lookup and update operations is a struct bpf_lpm_trie_key, and the value is a uint64_t. The code carries more information about the internal implementation. Signed-off-by: Daniel Mack <daniel@zonque.org> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: bcm7xxx: Add entry for BCM7278Florian Fainelli2017-01-221-0/+1
| | | | | | | Add support for the BCM7278 28nm process Gigabit Ethernet PHY. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: Remove hwmon supportAndrew Lunn2017-01-201-8/+0
| | | | | | | | | | Only the Marvell mv88e6xxx DSA driver made use of the HWMON support in DSA. The temperature sensor registers are actually in the embedded PHYs, and the PHY driver now supports it. So remove all HWMON support from DSA and drivers. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'mlx5-updates-2017-01-19' of ↵David S. Miller2017-01-203-26/+278
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 and mlx5e updates 2017-01-19 This series includes some updates for mlx5 core and mlx5e netdevice driver. From Leon, a small fix that remove an unnecessary print. From Eli Cohen, a fix to the FW version printout in case of internal error. From Eugenia Emantayev, two patches, the 1st adds mlx5 1pps (pulse per second) mlx5 infrastructure support and the 2nd adds the necessary bits for mlx5e ptp logic and structures. From Mohamad, add support for s-tagged packet receive when in promiscuous mode. Form Gal Pressman, MCAM (Management capabilities mask register) and PCAM (Ports capabilities mask register) registers infrastructure, those registers are needed in order to query the different statistics registers support in FW, in order for the driver to enable/disable query and reporting them back to user. On top of this infrastructure we've exposed new set of statistics groups: - MPCNT: Physical layer statistical counters (For symbol errors) - PPCNT: PCIe performance counters In addition to the statistics capabilities series we've moved the mlx5 HCA capabilities fields to a dedicated struct under the driver private data. At the end a small patch to update & query statistics in the most desired order. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5: Move cached hca caps to designated caps structGal Pressman2017-01-192-19/+19
| | | | | | | | | | | | | | | | The caps structure consists of hca caps and port/management caps, all under one roof. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Add MPCNT register infrastructureGal Pressman2017-01-193-0/+47
| | | | | | | | | | | | | | Add the needed infrastructure for future use of MPCNT register. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Add PPCNT physical layer statistical group infrastructureGal Pressman2017-01-192-0/+38
| | | | | | | | | | | | | | | | | | Add the needed infrastructure for future use of PPCNT physical layer statistical group. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
| * net/mlx5: Query and cache PCAM, MCAM registers on initializationGal Pressman2017-01-192-0/+10
| | | | | | | | | | | | | | | | | | On load_one, we now cache our capabilities registers internally, similar to QUERY_HCA_CAP. Capabilities can later be queried using macros introduced in this patch. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Expose PCAM, MCAM registers infrastructureGal Pressman2017-01-193-1/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | PCAM: Ports capabilities mask register. MCAM: Management capabilities mask register. PCAM and MCAM registers will provide information regarding firmware support for different features, in order to avoid cases where new driver combined with old firmware results in syndromes (for ex. PCIe counters before this patchset). Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Add support to s-tag in mlx5 firmware interfaceMohamad Haj Yahia2017-01-191-5/+7
| | | | | | | | | | | | | | | | | | Add svlan_tag and rename vlan_tag to cvlan_tag in flow table entry match param. Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
| * net/mlx5: Add MTPPS and MTPPSE registers infrastructureEugenia Emantayev2017-01-193-1/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement query and set functionality for MTPPS and MTPPSE registers. MTPPS (Management Pulse Per Second) provides the device PPS capabilities, configures the PPS in and out modules and holds the PPS in time stamp. Query MTPPS is supported only when HCA_CAP.pps is set and modify is supported when HCA_CAP.pps_modify is set. MTPPSE (Management Pulse Per Second Event) configures the different event generation modes for PPS. Supported when HCA_CAP.pps is set. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | csum: eliminate sparse warning in remcsum_unadjust()Lance Richardson2017-01-201-1/+1
| | | | | | | | | | | | | | Cast second parameter of csum_sub() from __sum16 to __wsum. Signed-off-by: Lance Richardson <lrichard@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tipc: make replicast a user selectable optionJon Paul Maloy2017-01-201-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the bearer carrying multicast messages supports broadcast, those messages will be sent to all cluster nodes, irrespective of whether these nodes host any actual destinations socket or not. This is clearly wasteful if the cluster is large and there are only a few real destinations for the message being sent. In this commit we extend the eligibility of the newly introduced "replicast" transmit option. We now make it possible for a user to select which method he wants to be used, either as a mandatory setting via setsockopt(), or as a relative setting where we let the broadcast layer decide which method to use based on the ratio between cluster size and the message's actual number of destination nodes. In the latter case, a sending socket must stick to a previously selected method until it enters an idle period of at least 5 seconds. This eliminates the risk of message reordering caused by method change, i.e., when changes to cluster size or number of destinations would otherwise mandate a new method to be used. Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bpf: add bpf_probe_read_str helperGianluca Borello2017-01-201-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a simple helper with the same semantics of strncpy_from_unsafe(): int bpf_probe_read_str(void *dst, int size, const void *unsafe_addr) This gives more flexibility to a bpf program. A typical use case is intercepting a file name during sys_open(). The current approach is: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 bpf_probe_read(buf, sizeof(buf), ctx->di); /* consume buf */ } This is suboptimal because the size of the string needs to be estimated at compile time, causing more memory to be copied than often necessary, and can become more problematic if further processing on buf is done, for example by pushing it to userspace via bpf_perf_event_output(), since the real length of the string is unknown and the entire buffer must be copied (and defining an unrolled strnlen() inside the bpf program is a very inefficient and unfeasible approach). With the new helper, the code can easily operate on the actual string length rather than the buffer size: SEC("kprobe/sys_open") void bpf_sys_open(struct pt_regs *ctx) { char buf[PATHLEN]; // PATHLEN is defined to 256 int res = bpf_probe_read_str(buf, sizeof(buf), ctx->di); /* consume buf, for example push it to userspace via * bpf_perf_event_output(), but this time we can use * res (the string length) as event size, after checking * its boundaries. */ } Another useful use case is when parsing individual process arguments or individual environment variables navigating current->mm->arg_start and current->mm->env_start: using this helper and the return value, one can quickly iterate at the right offset of the memory area. The code changes simply leverage the already existent strncpy_from_unsafe() kernel function, which is safe to be called from a bpf program as it is used in bpf_trace_printk(). Signed-off-by: Gianluca Borello <g.borello@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | device: Implement a bus agnostic dev_num_vf routinePhil Sutter2017-01-202-2/+7
| | | | | | | | | | | | | | | | | | Now that pci_bus_type has num_vf callback set, dev_num_vf can be implemented in a bus type independent way and the check for whether a PCI device is being handled in rtnetlink can be dropped. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* | device: bus_type: Introduce num_vf callbackPhil Sutter2017-01-201-0/+4
| | | | | | | | | | | | | | | | This allows for bus types to implement their own method of retrieving the number of virtual functions a NIC on that type of bus supports. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sock: use hlist_entry_safeGeliang Tang2017-01-201-2/+1
| | | | | | | | | | | | | | Use hlist_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: remove bh disabling around percpu_counter accessesEric Dumazet2017-01-202-15/+2
| | | | | | | | | | | | | | | | | | | | | | Shaohua Li made percpu_counter irq safe in commit 098faf5805c8 ("percpu_counter: make APIs irq safe") We can safely remove BH disable/enable sections around various percpu_counter manipulations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phy: increase size of MII_BUS_ID_SIZE and bus_idVolodymyr Bendiuga2017-01-191-6/+2
|/ | | | | | | | | | | | Some bus names are pretty long and do not fit into 17 chars. Increase therefore MII_BUS_ID_SIZE and phy_fixup.bus_id to larger number. Now mii_bus.id can host larger name. Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com> Signed-off-by: Magnus Öberg <magnus.oberg@westermo.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Remove usage of net_device last_rx memberTobias Klauser2017-01-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The network stack no longer uses the last_rx member of struct net_device since the bonding driver switched to use its own private last_rx in commit 9f242738376d ("bonding: use last_arp_rx in slave_last_rx()"). However, some drivers still (ab)use the field for their own purposes and some driver just update it without actually using it. Previously, there was an accompanying comment for the last_rx member added in commit 4dc89133f49b ("net: add a comment on netdev->last_rx") which asked drivers not to update is, unless really needed. However, this commend was removed in commit f8ff080dacec ("bonding: remove useless updating of slave->dev->last_rx"), so some drivers added later on still did update last_rx. Remove all usage of last_rx and switch three drivers (sky2, atp and smc91c92_cs) which actually read and write it to use their own private copy in netdev_priv. Compile-tested with allyesconfig and allmodconfig on x86 and arm. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: dsa: store CPU switch structure in the treeVivien Didelot2017-01-181-4/+4
| | | | | | | | | | Store a dsa_switch pointer to the CPU switch in the tree instead of only its index. This avoids the need to initialize it to -1. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv6: remove nowait arg to rt6_fill_nodeDavid Ahern2017-01-181-1/+1
| | | | | | | | | | | All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and simplify rt6_fill_node accordingly. rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the nowait arg from it as well. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: implement sender-side procedures for SSN Reset Request ParameterXin Long2017-01-183-0/+18
| | | | | | | | | | | | | | | | This patch is to implement sender-side procedures for the Outgoing and Incoming SSN Reset Request Parameter described in rfc6525 section 5.1.2 and 5.1.3. It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2 for users. Note that the new asoc member strreset_outstanding is to make sure only one reconf request chunk on the fly as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add sockopt SCTP_ENABLE_STREAM_RESETXin Long2017-01-182-0/+11
| | | | | | | | | This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set strreset_enable to indicate which reconf request type it supports, which is described in rfc6525 section 6.3.1. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add reconf_enable in asoc ep and netnsXin Long2017-01-182-2/+8
| | | | | | | | | | | | | | | | This patch is to add reconf_enable field in all of asoc ep and netns to indicate if they support stream reset. When initializing, asoc reconf_enable get the default value from ep reconf_enable which is from netns netns reconf_enable by default. It is also to add reconf_capable in asoc peer part to know if peer supports reconf_enable, the value is set if ext params have reconf chunk support when processing init chunk, just as rfc6525 section 5.1.1 demands. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add stream reconf primitiveXin Long2017-01-183-1/+5
| | | | | | | | | | | | This patch is to add a primitive based on sctp primitive frame for sending stream reconf request. It works as the other primitives, and create a SCTP_CMD_REPLY command to send the request chunk out. sctp_primitive_RECONF would be the api to send a reconf request chunk. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add stream reconf timerXin Long2017-01-183-0/+9
| | | | | | | | | | | | | | | | | | | | This patch is to add a per transport timer based on sctp timer frame for stream reconf chunk retransmission. It would start after sending a reconf request chunk, and stop after receiving the response chunk. If the timer expires, besides retransmitting the reconf request chunk, it would also do the same thing with data RTO timer. like to increase the appropriate error counts, and perform threshold management, possibly destroying the asoc if sctp retransmission thresholds are exceeded, just as section 5.1.1 describes. This patch is also to add asoc strreset_chunk, it is used to save the reconf request chunk, so that it can be retransmitted, and to check if the response is really for this request by comparing the information inside with the response chunk as well. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: add support for generating stream reconf ssn reset request chunkXin Long2017-01-183-1/+34
| | | | | | | | | | | | This patch is to add asoc strreset_outseq and strreset_inseq for saving the reconf request sequence, initialize them when create assoc and process init, and also to define Incoming and Outgoing SSN Reset Request Parameter described in rfc6525 section 4.1 and 4.2, As they can be in one same chunk as section rfc6525 3.1-3 describes, it makes them in one function. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: reset tb->fastreuseport when adding a reuseport skJosef Bacik2017-01-181-0/+9
| | | | | | | | | | | | | | | | | If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and never set it again. Which means that in the future if we end up adding a bunch of reuseport sk's to that tb we'll have to do the expensive scan every time. Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family so we know what comparison to make, and the ipv6 only setting so we can make sure to compare with new sockets appropriately. Once one sk has made it onto the list we know that there are no potential bind conflicts on the owners list that match that sk's rcv_addr. So copy the sk's information into our bind bucket and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do an extra check for subsequent reuseport sockets and skip the expensive bind conflict check. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: kill smallest_size and smallest_portJosef Bacik2017-01-181-1/+0
| | | | | | | | | | | | | | | | | | | | In inet_csk_get_port we seem to be using smallest_port to figure out where the best place to look for a SO_REUSEPORT sk that matches with an existing set of SO_REUSEPORT's. However if we get to the logic if (smallest_size != -1) { port = smallest_port; goto have_port; } we will do a useless search, because we would have already done the inet_csk_bind_conflict for that port and it would have returned 1, otherwise we would have gone to found_tb and succeeded. Since this logic makes us do yet another trip through inet_csk_bind_conflict for a port we know won't work just delete this code and save us the time. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: drop ->bind_conflictJosef Bacik2017-01-182-11/+0
| | | | | | | | | The only difference between inet6_csk_bind_conflict and inet_csk_bind_conflict is how they check the rcv_saddr, so delete this call back and simply change inet_csk_bind_conflict to call inet_rcv_saddr_equal. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: collapse ipv4/v6 rcv_saddr_equal functions into oneJosef Bacik2017-01-183-8/+2
| | | | | | | | | | We pass these per-protocol equal functions around in various places, but we can just have one function that checks the sk->sk_family and then do the right comparison function. I've also changed the ipv4 version to not cast to inet_sock since it is unneeded. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: sparse fixes in br_ip6_multicast_alloc_query()Lance Richardson2017-01-171-2/+2
| | | | | | | | | | | | Changed type of csum field in struct igmpv3_query from __be16 to __sum16 to eliminate type warning, made same change in struct igmpv3_report for consistency. Fixed up an ntohs() where htons() should have been used instead. Signed-off-by: Lance Richardson <lrichard@redhat.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2017-01-1719-84/+136
|\
| * virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmitRolf Neugebauer2017-01-171-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch part reverts fd2a0437dc33 and e858fae2b0b8 which introduced a subtle change in how the virtio_net flags are derived from the SKBs ip_summed field. With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to ip_summed == CHECKSUM_NONE, which should be the same. Further, the virtio spec 1.0 / CS04 explicitly says that VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver. Fixes: fd2a0437dc33 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb") Fixes: e858fae2b0b8 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-01-176-7/+16
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Handle multicast packets properly in fast-RX path of mac80211, from Johannes Berg. 2) Because of a logic bug, the user can't actually force SW checksumming on r8152 devices. This makes diagnosis of hw checksumming bugs really annoying. Fix from Hayes Wang. 3) VXLAN route lookup does not take the source and destination ports into account, which means IPSEC policies cannot be matched properly. Fix from Martynas Pumputis. 4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger. 5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky. 6) If lwtunnel_fill_encap() fails, we do not abort the netlink message construction properly in fib_dump_info(), from David Ahern. 7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan Schmidt. 8) Openvswitch conntack actions need to maintain a correct checksum, fix from Lance Richardson. 9) ax25_disconnect() is missing a check for ax25->sk being NULL, in fact it already checks this, but not in all of the necessary spots. Fix from Basil Gunn. 10) Action GET operations in the packet scheduler can erroneously bump the reference count of the entry, making it unreleasable. Fix from Jamal Hadi Salim. Jamal gives a great set of example command lines that trigger this in the commit message. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits) net sched actions: fix refcnt when GETing of action after bind net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions net/mlx4_core: Fix racy CQ (Completion Queue) free net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered net/mlx5e: Fix a -Wmaybe-uninitialized warning ax25: Fix segfault after sock connection timeout bpf: rework prog_digest into prog_tag tipc: allocate user memory with GFP_KERNEL flag net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types ip6_tunnel: Account for tunnel header in tunnel MTU mld: do not remove mld souce list info when set link down be2net: fix MAC addr setting on privileged BE3 VFs be2net: don't delete MAC on close on unprivileged BE3 VFs be2net: fix status check in be_cmd_pmac_add() cpmac: remove hopeless #warning ravb: do not use zero-length alignment DMA descriptor mlx4: do not call napi_schedule() without care openvswitch: maintain correct checksum state in conntrack actions tcp: fix tcp_fastopen unaligned access complaints on sparc ...
| | * bpf: rework prog_digest into prog_tagDaniel Borkmann2017-01-164-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") was recently discussed, partially due to admittedly suboptimal name of "prog_digest" in combination with sha1 hash usage, thus inevitably and rightfully concerns about its security in terms of collision resistance were raised with regards to use-cases. The intended use cases are for debugging resp. introspection only for providing a stable "tag" over the instruction sequence that both kernel and user space can calculate independently. It's not usable at all for making a security relevant decision. So collisions where two different instruction sequences generate the same tag can happen, but ideally at a rather low rate. The "tag" will be dumped in hex and is short enough to introspect in tracepoints or kallsyms output along with other data such as stack trace, etc. Thus, this patch performs a rename into prog_tag and truncates the tag to a short output (64 bits) to make it obvious it's not collision-free. Should in future a hash or facility be needed with a security relevant focus, then we can think about requirements, constraints, etc that would fit to that situation. For now, rework the exposed parts for the current use cases as long as nothing has been released yet. Tested on x86_64 and s390x. Fixes: 7bd509e311f4 ("bpf: add prog_digest and expose it via fdinfo/netlink") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * Merge tag 'mac80211-for-davem-2017-01-13' of ↵David S. Miller2017-01-161-1/+3
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have a number of fixes, in part because I was late in actually sending them out - will try to do better in the future: * handle VHT opmode properly when hostapd is controlling full station state * two fixes for minimum channel width in mac80211 * don't leave SMPS set to junk in HT capabilities * fix headroom when forwarding mesh packets, recently broken by another fix that failed to take into account frame encryption * fix the TID in null-data packets indicating EOSP (end of service period) in U-APSD * prevent attempting to use (and then failing which results in crashes) TXQs on stations that aren't added to the driver yet ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * cfg80211: consider VHT opmode on station updateBeni Lev2017-01-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this attribute is only fetched on station addition, but not on station change. Since this info is only present in the assoc request, with full station state support in the driver it cannot be present when the station is added. Thus, add support for changing the VHT opmode on station update if done before (or while) the station is marked as associated. After this, ignore it, since it used to be ignored. Signed-off-by: Beni Lev <beni.lev@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | tcp: fix tcp_fastopen unaligned access complaints on sparcShannon Nelson2017-01-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up a data alignment issue on sparc by swapping the order of the cookie byte array field with the length field in struct tcp_fastopen_cookie, and making it a proper union to clean up the typecasting. This addresses log complaints like these: log_unaligned: 113 callbacks suppressed Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360 Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360 Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360 Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360 Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2017-01-161-0/+1
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd fixes from Bruce Fields: "Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for older bugs" * tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux: svcrdma: avoid duplicate dma unmapping during error recovery sunrpc: don't call sleeping functions from the notifier block callbacks svcrpc: don't leak contexts on PROC_DESTROY nfsd: fix supported attributes for acl & labels
| | * | | sunrpc: don't call sleeping functions from the notifier block callbacksScott Mayhew2017-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inet6addr_chain is an atomic notifier chain, so we can't call anything that might sleep (like lock_sock)... instead of closing the socket from svc_age_temp_xprts_now (which is called by the notifier function), just have the rpc service threads do it instead. Cc: stable@vger.kernel.org Fixes: c3d4879e01be "sunrpc: Add a function to close..." Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | Merge branch 'i2c/for-current' of ↵Linus Torvalds2017-01-151-0/+1
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Bugfixes for I2C. Mostly core this time which is a bit unusual but nothing really scary in there" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: piix4: Avoid race conditions with IMC i2c: fix spelling mistake: "insufficent" -> "insufficient" i2c: print correct device invalid address i2c: do not enable fall back to Host Notify by default i2c: fix kernel memory disclosure in dev interface
| | * | | | i2c: do not enable fall back to Host Notify by defaultDmitry Torokhov2017-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Falling back unconditionally to HostNotify as primary client's interrupt breaks some drivers which alter their functionality depending on whether interrupt is present or not, so let's introduce a board flag telling I2C core explicitly if we want wired interrupt or HostNotify-based one: I2C_CLIENT_HOST_NOTIFY. For DT-based systems we introduce "host-notify" property that we convert to I2C_CLIENT_HOST_NOTIFY board flag. Tested-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>