linux - linux

	Commit message (Collapse)	Author	Files	Lines
2018-02-13	net: Convert audit_net_ops	Kirill Tkhai	1	-0/+1
	This patch starts to convert pernet_subsys, registered from postcore initcalls. audit_net_init() creates netlink socket, while audit_net_exit() destroys it. The rest of the pernet_list are not interested in the socket, so we make audit_net_ops async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert rtnetlink_net_ops	Kirill Tkhai	1	-0/+1
	rtnetlink_net_init() and rtnetlink_net_exit() create and destroy netlink socket net::rtnl. The socket is used to send rtnl notification via rtnl_net_notifyid(). There is no a problem to create and destroy it in parallel with other pernet operations, as we link net in setup_net() after the socket is created, and destroy in cleanup_net() after net is unhashed from all the lists and there is no RCU references on it. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert netlink_net_ops	Kirill Tkhai	1	-0/+1
	The methods of netlink_net_ops create and destroy "netlink" file, which are not interesting for foreigh pernet_operations. So, netlink_net_ops may safely be made async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert net_defaults_ops	Kirill Tkhai	1	-0/+1
	net_defaults_ops introduce only net_defaults_init_net method, and it acts on net::core::sysctl_somaxconn, which is not interesting for the rest of pernet_subsys and pernet_device lists. Then, make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert net_inuse_ops	Kirill Tkhai	1	-0/+1
	net_inuse_ops methods expose statistics in /proc. No one from the rest of pernet_subsys or pernet_device lists touch net::core::inuse. So, it's safe to make net_inuse_ops async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert nf_log_net_ops	Kirill Tkhai	1	-0/+1
	The pernet_operations would have had a problem in parallel execution with others, if init_net had been able to released. But it's not, and the rest is safe for that. There is memory allocation, which nobody else interested in, and sysctl registration. So, we make them async. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert netfilter_net_ops	Kirill Tkhai	1	-0/+1
	Methods netfilter_net_init() and netfilter_net_exit() initialize net::nf::hooks and change net-related proc directory of net. Another pernet_operations are not interested in forein net::nf::hooks or proc entries, so it's safe to make them executed in parallel with methods of other pernet operations. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert sysctl_pernet_ops	Kirill Tkhai	1	-0/+1
	This patch starts to convert pernet_subsys, registered from core initcalls. Methods sysctl_net_init() and sysctl_net_exit() initialize net::sysctls table of a namespace. pernet_operations::init()/exit() methods from the rest of the list do not touch net::sysctls of strangers, so it's safe to execute sysctl_pernet_ops's methods in parallel with any other pernet_operations. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert net_ns_ops methods	Kirill Tkhai	1	-0/+1
	This patch starts to convert pernet_subsys, registered from pure initcalls. net_ns_ops::net_ns_net_init/net_ns_net_init, methods use only ida_simple_* functions, which are not need a synchronization. They are synchronized by idr subsystem. So, net_ns_ops methods are able to be executed in parallel with methods of other pernet operations. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Convert proc_net_ns_ops	Kirill Tkhai	1	-0/+1
	This patch starts to convert pernet_subsys, registered before initcalls. proc_net_ns_ops::proc_net_ns_init()/proc_net_ns_exit() {un,}register pernet net->proc_net and ->proc_net_stat. Constructors and destructors of another pernet_operations are not interested in foreign net's proc_net and proc_net_stat. Proc filesystem privitives are synchronized on proc_subdir_lock. So, proc_net_ns_ops methods are able to be executed in parallel with methods of any other pernet operations. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Allow pernet_operations to be executed in parallel	Kirill Tkhai	2	-10/+26
	This adds new pernet_operations::async flag to indicate operations, which ->init(), ->exit() and ->exit_batch() methods are allowed to be executed in parallel with the methods of any other pernet_operations. When there are only asynchronous pernet_operations in the system, net_mutex won't be taken for a net construction and destruction. Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic() without replacing with the equivalent net_sem check, as there is one more lockdep assert below. v3: Add comment near net_mutex. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Move mutex_unlock() in cleanup_net() up	Kirill Tkhai	1	-1/+2
	net_sem protects from pernet_list changing, while ops_free_list() makes simple kfree(), and it can't race with other pernet_operations callbacks. So we may release net_mutex earlier then it was. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Introduce net_sem for protection of pernet_list	Kirill Tkhai	3	-15/+29
	Currently, the mutex is mostly used to protect pernet operations list. It orders setup_net() and cleanup_net() with parallel {un,}register_pernet_operations() calls, so ->exit{,batch} methods of the same pernet operations are executed for a dying net, as were used to call ->init methods, even after the net namespace is unlinked from net_namespace_list in cleanup_net(). But there are several problems with scalability. The first one is that more than one net can't be created or destroyed at the same moment on the node. For big machines with many cpus running many containers it's very sensitive. The second one is that it's need to synchronize_rcu() after net is removed from net_namespace_list(): Destroy net_ns: cleanup_net() mutex_lock(&net_mutex) list_del_rcu(&net->list) synchronize_rcu() <--- Sleep there for ages list_for_each_entry_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list) list_for_each_entry_reverse(ops, &pernet_list, list) ops_free_list(ops, &net_exit_list) mutex_unlock(&net_mutex) This primitive is not fast, especially on the systems with many processors and/or when preemptible RCU is enabled in config. So, all the time, while cleanup_net() is waiting for RCU grace period, creation of new net namespaces is not possible, the tasks, who makes it, are sleeping on the same mutex: Create net_ns: copy_net_ns() mutex_lock_killable(&net_mutex) <--- Sleep there for ages I observed 20-30 seconds hangs of "unshare -n" on ordinary 8-cpu laptop with preemptible RCU enabled after CRIU tests round is finished. The solution is to convert net_mutex to the rw_semaphore and add fine grain locks to really small number of pernet_operations, what really need them. Then, pernet_operations::init/::exit methods, modifying the net-related data, will require down_read() locking only, while down_write() will be used for changing pernet_list (i.e., when modules are being loaded and unloaded). This gives signify performance increase, after all patch set is applied, like you may see here: %for i in {1..10000}; do unshare -n bash -c exit; done before real 1m40,377s user 0m9,672s sys 0m19,928s after real 0m17,007s user 0m5,311s sys 0m11,779 (5.8 times faster) This patch starts replacing net_mutex to net_sem. It adds rw_semaphore, describes the variables it protects, and makes to use, where appropriate. net_mutex is still present, and next patches will kick it out step-by-step. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Cleanup in copy_net_ns()	Kirill Tkhai	1	-11/+9
	Line up destructors actions in the revers order to constructors. Next patches will add more actions, and this will be comfortable, if there is the such order. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-13	net: Assign net to net_namespace_list in setup_net()	Kirill Tkhai	1	-10/+3
	This patch merges two repeating pieces of code in one, and they will live in setup_net() now. The only change is that assignment: init_net_initialized = true; becomes reordered with: list_add_tail_rcu(&net->list, &net_namespace_list); The order does not have visible effect, and it is a simple cleanup because of: init_net_initialized is used in !CONFIG_NET_NS case to order proc_net_ns_ops registration occuring at boot time: start_kernel()->proc_root_init()->proc_net_init(), with net_ns_init()->setup_net(&init_net, &init_user_ns) also occuring in boot time from the same init_task. When there are no another tasks to race with them, for the single task it does not matter, which order two sequential independent loads should be made. So we make them reordered. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12	i40e/i40evf: Add support for new mechanism of updating adaptive ITR	Alexander Duyck	8	-257/+528
	This patch replaces the existing mechanism for determining the correct value to program for adaptive ITR with yet another new and more complicated approach. The basic idea from a 30K foot view is that this new approach will push the Rx interrupt moderation up so that by default it starts in low latency and is gradually pushed up into a higher latency setup as long as doing so increases the number of packets processed, if the number of packets drops to 4 to 1 per packet we will reset and just base our ITR on the size of the packets being received. For Tx we leave it floating at a high interrupt delay and do not pull it down unless we start processing more than 112 packets per interrupt. If we start exceeding that we will cut our interrupt rates in half until we are back below 112. The side effect of these patches are that we will be processing more packets per interrupt. This is both a good and a bad thing as it means we will not be blocking processing in the case of things like pktgen and XDP, but we will also be consuming a bit more CPU in the cases of things such as network throughput tests using netperf. One delta from this versus the ixgbe version of the changes is that I have made the interrupt moderation a bit more aggressive when we are in bulk mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The main motivation behind moving this is to address the fact that we need to update less frequently, and have more fine grained control due to the separate Tx and Rx ITR times. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e/i40evf: Split container ITR into current_itr and target_itr	Alexander Duyck	8	-86/+115
	This patch is mostly prep-work for replacing the current approach to programming the dynamic aka adaptive ITR. Specifically here what we are doing is splitting the Tx and Rx ITR each into two separate values. The first value current_itr represents the current value of the register. The second value target_itr represents the desired value of the register. The general plan by doing this is to allow for deferring the update of the ITR value under certain circumstances. For now we will work with what we have, but in the future I hope to change the behavior so that we always only update one ITR at a time using some simple logic to determine which ITR requires an update. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40evf: Correctly populate rxitr_idx and txitr_idx	Alexander Duyck	1	-9/+15
	While testing code for the recent ITR changes I found that updating the Tx ITR appeared to have no effect with everything defaulting to the Rx ITR. A bit of digging narrowed it down the fact that we were asking the PF to associate all causes with ITR 0 as we weren't populating the itr_idx values for either Rx or Tx. To correct it I have added the configuration for these values to this patch. In addition I did some minor clean-up to just add a local pointer for the vector map instead of dereferencing it based off of the index repeatedly. In my opinion this makes the resultant code a bit more readable and saves us a few characters. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e/i40evf: Use usec value instead of reg value for ITR defines	Alexander Duyck	6	-56/+79
	Instead of using the register value for the defines when setting up the ring ITR we can just use the actual values and avoid the use of shifts and macros to translate between the values we have and the values we want. This helps to make the code more readable as we can quickly translate from one value to the other. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	net: make getname() functions return length rather than use int* parameter	Denys Vlasenko	55	-203/+159
	Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int socklen" parameter and return zero on success. "int socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12	i40e/i40evf: Don't bother setting the CLEARPBA bit	Alexander Duyck	2	-2/+20
	The CLEARPBA bit in the dynamic interrupt control register actually has no effect either way on the hardware. As per errata 28 in the XL710 specification update the interrupt is actually cleared any time the register is written with the INTENA_MSK bit set to 0. As such the act of toggling the enable bit actually will trigger the interrupt being cleared and could lead to potential lost events if auto-masking is not enabled. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e/i40evf: Clean-up of bits related to using q_vector->reg_idx	Alexander Duyck	3	-15/+15
	This patch is a further clean-up related to the change over to using q_vector->reg_idx when accessing the ITR registers. Specifically the code appears to have several other spots where we were computing the register offset manually and this resulted in errors in a few spots. Specifically in the i40evf functions for mapping queues to vectors it appears we may have had an off by 1 error since (v_idx - 1) for the first q_vector with an index of 0 would result in us returning -1 if I am not mistaken. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e: use changed_flags to check I40E_FLAG_DISABLE_FW_LLDP	Alan Brady	1	-7/+15
	Currently in i40e_set_priv_flags we use new_flags to check for the I40E_FLAG_DISABLE_FW_LLDP flag. This is an issue for a few a reasons. DISABLE_FW_LLDP is persistent across reboots/driver reloads. This means we need some way to detect if FW LLDP is enabled on init. We do this by trying to init_dcb and if it fails with EPERM we know LLDP is disabled in FW. This could be a problem on older FW versions or NPAR enabled PFs because there are situations where the FW could disable LLDP, but they do _not_ support using this flag to change it. If we do end up in this situation, the flag will be set, then when the user tries to change any priv flags, the driver thinks the user is trying to disable FW LLDP on a FW that doesn't support it and essentially forbids any priv flag changes. The fix is simple, instead of checking if this flag is set, we should be checking if the user is trying to _change_ the flag on unsupported FW versions. This patch also adds a comment explaining that the cmpxchg is the point of no return. Once we put the new flags into pf->flags we can't back out. Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e: Warn when setting link-down-on-close while in MFP	Paweł Jabłoński	1	-0/+6
	This patch adds a warning message when the link-down-on-close flag is setting on. The warning is printed only on MFP devices Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e: Add delay after EMP reset for firmware to recover	Filip Sadowski	1	-0/+11
	This patch adds necessary delay for 4.33 firmware to recover after EMP reset. Without this patch driver occasionally reinitializes structures too quickly to communicate with firmware after EMP reset causing AdminQ to timeout. Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e/i40evf: Clean up logic for adaptive ITR	Alexander Duyck	2	-86/+28
	The logic for dynamic ITR update is confusing at best as there were odd paths chosen for how to find the rings associated with a given queue based on the vector index and other inconsistencies throughout the code. This patch is an attempt to clean up the logic so that we can more easily understand what is going on. Specifically if there is a Rx or Tx ring that is enabled in dynamic mode on the q_vector it is allowed to override the other side of the interrupt moderation. While it isn't correct all this patch is doing is cleaning up the logic for now so that when we come through and fix it we can more easily identify that this is wrong. The other big change made here is that we replace references to: vsi->rx_rings[q_vector->v_idx]->itr_setting with: q_vector->rx.ring->itr_setting The general idea is we can avoid the long pointer chase since just accessing q_vector->rx.ring is a single pointer access versus having to chase down vsi->rx_rings, and then finding the pointer in the array, and finally chasing down the itr_setting from there. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx	Alexander Duyck	9	-55/+53
	The rings are already split out into Tx and Rx rings so it doesn't make sense to have any single ring store both a Tx and Rx itr_setting value. Since that is the case drop the pair in favor of storing just a single ITR value. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	i40e: fix typo in function description	Alan Brady	2	-2/+2
	'bufer' should be 'buffer' Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12	Linux 4.16-rc1v4.16-rc1	Linus Torvalds	1	-2/+2

2018-02-11	unify {de,}mangle_poll(), get rid of kernel-side POLL...	Al Viro	8	-142/+47
	except, again, POLLFREE and POLL_BUSY_LOOP. With this, we finally get to the promised end result: - POLL{IN,OUT,...} are plain integers and not in __poll_t, so any stray instances of ->poll() still using those will be caught by sparse. - eventpoll.c and select.c warning-free wrt __poll_t - no more kernel-side definitions of POLL... - userland ones are visible through the entire kernel (and used pretty much only for mangle/demangle) - same behavior as after the first series (i.e. sparc et.al. epoll(2) working correctly). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11	vfs: do bulk POLL* -> EPOLL* replacement	Linus Torvalds	297	-913/+913
	This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V \| grep -v '^t' \| grep -v /um/ \| grep -v '^sa' \| grep -v '/poll.h$'\|grep -v '^D'` for f in $L; do sed -i "-es/^$[^\"]$$\<POLL$V\>$/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11	xtensa: fix build with KASAN	Max Filippov	1	-0/+2
	The commit 917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage") removed KASAN_SHADOW_SCALE_SHIFT definition from include/linux/kasan.h and added it to architecture-specific headers, except for xtensa. This broke the xtensa build with KASAN enabled. Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h Reported by: kbuild test robot <fengguang.wu@intel.com> Fixes: 917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage") Acked-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-02-11	nios2: defconfig: Cleanup from old Kconfig options	Krzysztof Kozlowski	2	-2/+0
	Remove old, dead Kconfig option INET_LRO. It is gone since commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
2018-02-11	nios2: dts: Remove leading 0x and 0s from bindings notation	Mathieu Malaterre	1	-8/+8
	Improve the DTS files by removing all the leading "0x" and zeros to fix the following dtc warnings: Warning (unit_address_format): Node /XXX unit name should not have leading "0x" and Warning (unit_address_format): Node /XXX unit name should not have leading 0s Converted using the following command: find . -type f $ -iname .dts -o -iname .dtsi $ -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} + For simplicity, two sed expressions were used to solve each warnings separately. To make the regex expression more robust a few other issues were resolved, namely setting unit-address to lower case, and adding a whitespace before the the opening curly brace: https://elinux.org/Device_Tree_Linux#Linux_conventions This is a follow up to commit 4c9847b7375a ("dt-bindings: Remove leading 0x from bindings notation") Reported-by: David Daney <ddaney@caviumnetworks.com> Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
2018-02-10	powerpc/pci: Fix broken INTx configuration via OF	Alexey Kardashevskiy	1	-2/+3
	59f47eff03a0 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper") replaced of_irq_parse_pci() + irq_create_of_mapping() with of_irq_parse_and_map_pci(), but neglected to capture the virq returned by irq_create_of_mapping(), so virq remained zero, which caused INTx configuration to fail. Save the virq value returned by of_irq_parse_and_map_pci() and correct the virq declaration to match the of_irq_parse_and_map_pci() signature. Fixes: 59f47eff03a0 "powerpc/pci: Use of_irq_parse_and_map_pci() helper" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-02-10	mconsole_proc(): don't mess with file->f_pos	Al Viro	1	-1/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-10	kconfig: remove const qualifier from sym_expand_string_value()	Masahiro Yamada	3	-4/+4
	This function returns realloc'ed memory, so the returned pointer must be passed to free() when done. So, 'const' qualifier is odd. It is allowed to modify the expanded string. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-10	kconfig: add xrealloc() helper	Masahiro Yamada	6	-5/+16
	We already have xmalloc(), xcalloc(). Add xrealloc() as well to save tedious error handling. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-10	platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems	Vadim Pasternak	1	-0/+142
	It adds support for new Mellanox system types of basic classes qmb7, sn34, sn37, containing systems QMB700 (40x200GbE InfiniBand switch), SN3700 (32x200GbE and 16x400GbE Ethernet switch) and SN3410 (6x400GbE plus 48x50GbE Ethernet switch). These are the Top of the Rack systems, equipped with Mellanox COM-Express carrier board and switch board with Mellanox Quantum device, which supports InfiniBand switching with 40X200G ports and line rate of up to HDR speed or with Mellanox Spectrum-2 device, which supports Ethernet switching with 32X200G ports line rate of up to HDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-10	platform/x86: mlx-platform: Add support for new msn201x system type	Vadim Pasternak	1	-0/+59
	It adds support for new Mellanox system types of basic half unit size class msn201x, containing system MSN2010 (18x10GbE plus 4x4x25GbE) half and its derivatives. This is the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-10	platform/x86: mlx-platform: Add support for new msn274x system type	Vadim Pasternak	1	-0/+124
	It adds support for new Mellanox system types of basic class msn274x, containing system MSN2740 (32x100GbE Ethernet switch with cost reduction) and its derivatives. These are the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09	ibmvnic: Remove skb->protocol checks in ibmvnic_xmit	John Allen	1	-4/+1
	Having these checks in ibmvnic_xmit causes problems with VLAN tagging and balance-alb/tlb bonding modes. The restriction they imposed can be removed. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	bpf: fix rlimit in reuseport net selftest	Daniel Borkmann	1	-1/+20
	Fix two issues in the reuseport_bpf selftests that were reported by Linaro CI: [...] + ./reuseport_bpf ---- IPv4 UDP ---- Testing EBPF mod 10... Reprograming, testing mod 5... ./reuseport_bpf: ebpf error. log: 0: (bf) r6 = r1 1: (20) r0 = (u32 )skb[0] 2: (97) r0 %= 10 3: (95) exit processed 4 insns : Operation not permitted + echo FAIL [...] ---- IPv4 TCP ---- Testing EBPF mod 10... ./reuseport_bpf: failed to bind send socket: Address already in use + echo FAIL [...] For the former adjust rlimit since this was the cause of failure for loading the BPF prog, and for the latter add SO_REUSEADDR. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://bugs.linaro.org/show_bug.cgi?id=3502 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	sctp: verify size of a new chunk in _sctp_make_chunk()	Alexey Kodanev	1	-1/+6
	When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	s390/qeth: fix SETIP command handling	Julian Wiedmann	2	-6/+13
	send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses all command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	s390/qeth: fix underestimated count of buffer elements	Ursula Braun	1	-1/+1
	For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	ptr_ring: try vmalloc() when kmalloc() fails	Jason Wang	1	-5/+8
	This patch switch to use kvmalloc_array() for using a vmalloc() fallback to help in case kmalloc() fails. Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE	Jason Wang	1	-0/+2
	To avoid slab to warn about exceeded size, fail early if queue occupies more than KMALLOC_MAX_SIZE. Reported-by: syzbot+e4d4f9ddd4295539735d@syzkaller.appspotmail.com Fixes: 2e0ab8ca83c12 ("ptr_ring: array based FIFO for pointers") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	net: stmmac: remove redundant enable of PMT irq	Niklas Cassel	2	-4/+1
	For dwmac4, GMAC_INT_DEFAULT_ENABLE already includes GMAC_INT_PMT_EN, so it is redundant to check if hw->pmt is set, and if so, setting the bit again. For dwmac1000, GMAC_INT_DEFAULT_MASK does not include GMAC_INT_DISABLE_PMT, so it is redundant to check if hw->pmt is set, and if so, clearing an already cleared bit. Improve code readability by removing this redundant code. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09	net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4	Niklas Cassel	2	-3/+3
	GMAC_INT_DEFAULT_MASK is written to the interrupt enable register. In previous versions of the IP (e.g. dwmac1000), this register was instead an interrupt mask register. To improve clarity and reflect reality, rename GMAC_INT_DEFAULT_MASK to GMAC_INT_DEFAULT_ENABLE. Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>