summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ionic: add support for ethtool extended stat link_down_countNitya Sunkad2023-06-123-0/+12
| | | | | | | | | | | | | | | | Following the example of 'commit 9a0f830f8026 ("ethtool: linkstate: add a statistic for PHY down events")', added support for link down events. Add callback ionic_get_link_ext_stats to ionic_ethtool.c to support link_down_count, a property of netdev that gets reported exclusively on physical link down events. Run ethtool -I <devname> to display the device link down count. Signed-off-by: Nitya Sunkad <nitya.sunkad@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch '100GbE' of ↵David S. Miller2023-06-124-48/+84
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Improve miscellaneous interrupt code Jacob Keller says: This series improves the driver's use of the threaded IRQ and the communication between ice_misc_intr() and the ice_misc_intr_thread_fn() which was previously introduced by commit 1229b33973c7 ("ice: Add low latency Tx timestamp read"). First, a new custom enumerated return value is used instead of a boolean for ice_ptp_process_ts(). This significantly reduces the cognitive burden when reviewing the logic for this function, as the expected action is clear from the return value name. Second, the unconditional loop in ice_misc_intr_thread_fn() is removed, replacing it with a write to the Other Interrupt Cause register. This causes the MAC to trigger the Tx timestamp interrupt again. This makes it possible to safely use the ice_misc_intr_thread_fn() to handle other tasks beyond just the Tx timestamps. It is also easier to reason about since the thread function will exit cleanly if we do something like disable the interrupt and call synchronize_irq(). Third, refactor the handling for external timestamp events to use the miscellaneous thread function. This resolves an issue with the external time stamps getting blocked while processing the periodic work function task. Fourth, a simplification of the ice_misc_intr() function to always return IRQ_WAKE_THREAD, and schedule the ice service task in the ice_misc_intr_thread_fn() instead. Finally, the Other Interrupt Cause is kept disabled over the thread function processing, rather than immediately re-enabled. Special thanks to Michal Schmidt for the careful review of the series and pointing out my misunderstandings of the kernel IRQ code. It has been determined that the race outlined as being fixed in previous series was actually introduced by this series itself, which I've since corrected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ice: do not re-enable miscellaneous interrupt until thread_fn completesJacob Keller2023-06-081-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice driver uses threaded IRQ for managing Tx timestamps via the devm_request_threaded_irq() interface. The ice_misc_intr() handler function is responsible for processing the hard interrupt context, and can wake the ice_misc_intr_thread_fn() by returning IRQ_WAKE_THREAD. The request_threaded_irq() function comment says: @handler is still called in hard interrupt context and has to check whether the interrupt originates from the device. If yes, it needs to disable the interrupt on the device and return IRQ_WAKE_THREAD which will wake up the handler thread and run the @thread_fn. We currently re-enable the Other Interrupt Cause Register (OCIR) at the end of ice_misc_intr(). In practice, this seems to be ok, but it can make communicating between the handler function and the thread function difficult. This is because the interrupt can trigger again while the thread function is still processing. Move the OICR update to the end of the thread function, leaving the other interrupt cause disabled in hardware until we complete one pass of the thread function. This prevents the miscellaneous interrupt from firing until after we finish the thread function. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: trigger PFINT_OICR_TSYN_TX interrupt instead of pollingJacob Keller2023-06-081-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ice_misc_intr_thread_fn(), if we do not complete all Tx timestamp work, the thread function will poll continuously forever. For E822 hardware, this wastes time as the return value from ice_ptp_process_ts() is accurate and always reports correctly that the PHY actually has new timestamp data. In addition, if we receive enough timestamps with the right pacing, we may never exit this polling. Should this occur, other tasks handled by the ice_misc_intr_thread_fn() will never be processed. Fix this by instead writing to PFINT_OICR, causing an emulated interrupt to be triggered immediately. This does take slightly more processing than just re-checking the timestamps. However, it allows all of the other interrupt causes a chance to be processed first in the hard IRQ function. Note that the OICR interrupt is configured to be throttled to no more than once every 124 microseconds. This gives an effective interrupt rate of ~8000 interrupts per second. This should thus not cause a significant increase in overall CPU usage when compared to sleeping. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: introduce ICE_TX_TSTAMP_WORK enumerationJacob Keller2023-06-083-22/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice_ptp_process_ts() function and its various helper functions return a boolean value indicating whether any work is remaining. This use of a boolean has grown confusing as we have multiple helpers that pass status between each other. Readers must be aware of what "true" and "false" mean, and it is very easy to get their meaning inverted. The names of the functions are not standard "yes/no" questions, which is the best practice for boolean returns. Replace this use of an enumeration with a custom type, enum ice_tx_tstamp_work. This enumeration clearly indicates whether all work is done, or if more work is pending. To aid in readability, factor the actual list iteration and processing out into ice_ptp_process_tx_tstamp(), making it void. Then call this in ice_ptp_tx_tstamp() ensuring that we always check the Tracker list at the end when determining the appropriate return value. Now the return value is an explicit name instead of the true or false value. This is easier to follow and makes reading the resulting callers much simpler. In addition, this paves the way for future work to allow E822 hardware to process timestamps for all functions using a single interrupt on the clock owning PF. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: always return IRQ_WAKE_THREAD in ice_misc_intr()Karol Kolacinski2023-06-081-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the ice_misc_intr() function to always return IRQ_WAKE_THREAD, and schedule the service task during the soft IRQ thread function instead of at the end of the hard IRQ handler. Remove the duplicate call to ice_service_task_schedule() that happened when we got a PCI exception. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
| * ice: handle extts in the miscellaneous interrupt threadKarol Kolacinski2023-06-084-19/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both scheduled on the same kthread worker, pf.ptp.kworker. The ice_ptp_periodic_work() function sends to the firmware to interact with the PHY, and must block to wait for responses. This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt cause, ultimately resulting in disruption to processing an input signal of the frequency is high enough. In our testing, even 100 Hz signals get disrupted. Fix this by instead processing the signal inside the miscellaneous interrupt thread prior to handling Tx timestamps. Use atomic bits in a new pf->misc_thread bitmap in order to safely communicate which tasks require processing within the ice_misc_intr_thread_fn(). This ensures the communication of desired tasks from the ice_misc_intr() are correctly processed without racing even in the event that the interrupt triggers again before the thread function exits. Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
* | net: renesas: rswitch: Use hardware pause featuresYoshihiro Shimoda2023-06-102-22/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since this driver used the "global rate limiter" feature of GWCA, the TX performance of each port was reduced when multiple ports transmitted frames simultaneously. To improve performance, remove the use of the "global rate limiter" feature and use "hardware pause" features of the following: - "per priority pause" of GWCA - "global pause" of COMA Note that these features are not related to the ethernet PAUSE frame. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: renesas: rswitch: Use napi_gro_receive() in RXYoshihiro Shimoda2023-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | This hardware can receive multiple frames so that using napi_gro_receive() instead of netif_receive_skb() gets good performance of RX. Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sfc: generate encap headers for TC offloadEdward Cree2023-06-101-9/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Support constructing VxLAN and GENEVE headers, on either IPv4 or IPv6, using the neighbouring information obtained in encap->neigh to populate the Ethernet header. Note that the ef100 hardware does not insert UDP checksums when performing encap, so for IPv6 the remote endpoint will need to be configured with udp6zerocsumrx or equivalent. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sfc: neighbour lookup for TC encap action offloadEdward Cree2023-06-108-6/+569
| | | | | | | | | | | | | | | | | | | | | | | | | | | | For each neighbour we're interested in, create a struct efx_neigh_binder object which has a list of all the encap_actions using it. When we receive a neighbouring update (through the netevent notifier), find the corresponding efx_neigh_binder and update all its users. Since the actual generation of encap headers is still only a stub, the resulting rules still get left on fallback actions. Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sfc: MAE functions to create/update/delete encap headersEdward Cree2023-06-102-2/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | Besides the raw header data, also pass the tunnel type, so that the hardware knows it needs to update the IP Total Length and UDP Length fields (and corresponding checksums) for each packet. Also, populate the ENCAP_HEADER_ID field in efx_mae_alloc_action_set() with the fw_id returned from efx_mae_allocate_encap_md(). Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sfc: add function to atomically update a rule in the MAEEdward Cree2023-06-102-0/+24
| | | | | | | | | | | | | | | | | | | | | | efx_mae_update_rule() changes the action-set-list attached to an MAE flow rule in the Action Rule Table. We will use this when neighbouring updates change encap actions. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sfc: some plumbing towards TC encap action offloadEdward Cree2023-06-105-3/+284
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Create software objects to manage the metadata for encap actions that can be attached to TC rules. However, since we don't yet have the neighbouring information (needed to generate the Ethernet header), all rules with encap actions are marked as "unready" and thus insert the fallback action into hardware rather than actually offloading the encapsulation action. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | sfc: add fallback action-set-lists for TC offloadEdward Cree2023-06-102-0/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offloading a TC encap action, the action information for the hardware might not be "ready": if there's currently no neighbour entry available for the destination address, we can't construct the Ethernet header to prepend to the packet. In this case, we still offload the flow rule, but with its action-set-list ID pointing at a "fallback" action which simply delivers the packet to its default destination (as though no flow rule had matched), thus allowing software TC to handle it. Later, when we receive a neighbouring update that allows us to construct the encap header, the rule will become "ready" and we will update its action-set-list ID in hardware to point at the actual offloaded actions. This patch sets up these fallback ASLs, but does not yet use them. Reviewed-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | net: move gso declarations and functions to their own filesEric Dumazet2023-06-104-0/+4
| | | | | | | | | | | | | | | | | | | | | | Move declarations into include/net/gso.h and code into net/core/gso.c Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230608191738.3947077-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | chelsio/chtls: Use splice_eof() to flushDavid Howells2023-06-093-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow splice to end a Chelsio TLS record after prematurely ending a splice/sendfile due to getting an EOF condition (->splice_read() returned 0) after splice had called sendmsg() with MSG_MORE set when the user didn't set MSG_MORE. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Ayush Sawal <ayush.sawal@chelsio.com> cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | Merge tag 'mlx5-updates-2023-06-06' of ↵Jakub Kicinski2023-06-0914-63/+111
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-06-06 1) Support 4 ports VF LAG, part 2/2 2) Few extra trivial cleanup patches Shay Drory Says: ================ Support 4 ports VF LAG, part 2/2 This series continues the series[1] "Support 4 ports VF LAG, part1/2". This series adds support for 4 ports VF LAG (single FDB E-Switch). This series of patches refactoring LAG code that make assumptions about VF LAG supporting only two ports and then enable 4 ports VF LAG. Patch 1: - Fix for ib rep code Patches 2-5: - Refactors LAG layer. Patches 6-7: - Block LAG types which doesn't support 4 ports. Patch 8: - Enable 4 ports VF LAG. This series specifically allows HCAs with 4 ports to create a VF LAG with only 4 ports. It is not possible to create a VF LAG with 2 or 3 ports using HCAs that have 4 ports. Currently, the Merged E-Switch feature only supports HCAs with 2 ports. However, upcoming patches will introduce support for HCAs with 4 ports. In order to activate VF LAG a user can execute: devlink dev eswitch set pci/0000:08:00.0 mode switchdev devlink dev eswitch set pci/0000:08:00.1 mode switchdev devlink dev eswitch set pci/0000:08:00.2 mode switchdev devlink dev eswitch set pci/0000:08:00.3 mode switchdev ip link add name bond0 type bond ip link set dev bond0 type bond mode 802.3ad ip link set dev eth2 master bond0 ip link set dev eth3 master bond0 ip link set dev eth4 master bond0 ip link set dev eth5 master bond0 Where eth2, eth3, eth4 and eth5 are net-interfaces of pci/0000:08:00.0 pci/0000:08:00.1 pci/0000:08:00.2 pci/0000:08:00.3 respectively. User can verify LAG state and type via debugfs: /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/state /sys/kernel/debug/mlx5/0000\:08\:00.0/lag/type [1] https://lore.kernel.org/netdev/20230601060118.154015-1-saeed@kernel.org/T/#mf1d2083780970ba277bfe721554d4925f03f36d1 ================ * tag 'mlx5-updates-2023-06-06' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: simplify condition after napi budget handling change mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch manager net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failure net/mlx5e: TC, refactor access to hash key net/mlx5e: Remove RX page cache leftovers net/mlx5e: Expose catastrophic steering error counters net/mlx5: Enable 4 ports VF LAG net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 ports net/mlx5: LAG, block multipath LAG in case ldev have more than 2 ports net/mlx5: LAG, change mlx5_shared_fdb_supported() to static net/mlx5: LAG, generalize handling of shared FDB net/mlx5: LAG, check if all eswitches are paired for shared FDB {net/RDMA}/mlx5: introduce lag_for_each_peer RDMA/mlx5: Free second uplink ib port ==================== Link: https://lore.kernel.org/r/20230607210410.88209-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | net/mlx5e: simplify condition after napi budget handling changeJakub Kicinski2023-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | Since recent commit budget can't be 0 here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | mlx5/core: E-Switch, Allocate ECPF vport if it's an eswitch managerBodong Wang2023-06-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eswitch vport is needed for eswitch manager when creating LAG, to create egress rules. However, this was not handled when ECPF is an eswitch manager. Signed-off-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: Skip inline mode check after mlx5_eswitch_enable_locked() failureJiri Pirko2023-06-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode") added inline mode checking to esw_offloads_start() with a warning printed out in case there is a problem. Tne inline mode checking was done even after mlx5_eswitch_enable_locked() call failed, which is pointless. Later on, commit 8c98ee77d911 ("net/mlx5e: E-Switch, Add extack messages to devlink callbacks") converted the error/warning prints to extack setting, which caused that the inline mode check error to overwrite possible previous extack message when mlx5_eswitch_enable_locked() failed. User then gets confusing error message. Fix this by skipping check of inline mode after mlx5_eswitch_enable_locked() call failed. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5e: TC, refactor access to hash keyOz Shlomo2023-06-071-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, a temp object is filled and used as a key for rhashtable_lookup. Lookups will only works while key remains the first attribute in the relevant rhashtable node object. Fix this by passing a key, instead of a object containing the key. Signed-off-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5e: Remove RX page cache leftoversTariq Toukan2023-06-071-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unused definitions left after the removal of the RX page cache feature. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5e: Expose catastrophic steering error countersLama Kayal2023-06-071-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink heatlth reporter. generated_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation within the vport steering domain. handled_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation, originated by the vport. Also, update devlink reporter functionality documentation with the newly exposed counters. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: Enable 4 ports VF LAGShay Drory2023-06-073-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | Now, after all preparation are done, enable 4 ports VF LAG Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: LAG, block multiport eswitch LAG in case ldev have more than 2 portsShay Drory2023-06-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | multiport eswitch LAG is not supported over more than two ports. Add a check in order to block multiport eswitch LAG over such devices. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: LAG, block multipath LAG in case ldev have more than 2 portsShay Drory2023-06-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | multipath LAG is not supported over more than two ports. Add a check in order to block multipath LAG over such configurations. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: LAG, change mlx5_shared_fdb_supported() to staticShay Drory2023-06-072-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | mlx5_shared_fdb_supported() is used only in a single file. Change the function to be static. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: LAG, generalize handling of shared FDBShay Drory2023-06-071-28/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shared FDB handling is using the assumption that shared FDB can only be created from two devices. In order to support shared FDB of more than two devices, iterate over all LAG ports instead of hard coding only the first two LAG ports whenever handling shared FDB. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | net/mlx5: LAG, check if all eswitches are paired for shared FDBShay Drory2023-06-072-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shared FDB LAG can only work if all eswitches are paired. Also, whenever two eswitches are paired, devcom is marked as ready. Therefore, in case of device with two eswitches, checking devcom was sufficient. However, this is not correct for device with more than two eswitches, which will be introduced in downstream patch. Hence, check all eswitches are paired explicitly. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
| * | {net/RDMA}/mlx5: introduce lag_for_each_peerShay Drory2023-06-072-14/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a generic APIs to iterate over all the devices which are part of the LAG. This API replace mlx5_lag_get_peer_mdev() which retrieve only a single peer device from the lag. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
* | | net: fman_memac: use pcs-lynx's check for fwnode availabilityRussell King (Oracle)2023-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use pcs-lynx's check rather than our own when determining if the device is available. This fixes a bug where the reference gained by of_parse_phandle() is not dropped if the device is not available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: dpaa2: use pcs-lynx's check for fwnode availabilityRussell King (Oracle)2023-06-091-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | Use pcs-lynx's check rather than our own when determining if the device is available. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: fman_memac: use lynx_pcs_create_fwnode()Russell King (Oracle)2023-06-091-9/+4
| | | | | | | | | | | | | | | | | | | | | Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: dpaa2-mac: use lynx_pcs_create_fwnode()Russell King (Oracle)2023-06-091-8/+10
| | | | | | | | | | | | | | | | | | | | | Use lynx_pcs_create_fwnode() to create a lynx PCS from a fwnode handle. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: fman_memac: allow lynx PCS to handle mdiodev lifetimeRussell King (Oracle)2023-06-091-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: dpaa2-mac: allow lynx PCS to manage mdiodev lifetimeRussell King (Oracle)2023-06-091-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | Put the mdiodev after lynx_pcs_create() so that the Lynx PCS driver can manage the lifetime of the mdiodev its using. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | net: pch_gbe: Allow build on MIPS_GENERIC kernelJiaxun Yang2023-06-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MIPS Boston board, which is using MIPS_GENERIC kernel is using EG20T PCH and thus need this driver. Dependency of PCH_GBE, PTP_1588_CLOCK_PCH is also fixed for MIPS_GENERIC. Note that CONFIG_PCH_GBE is selected in arch/mips/configs/generic/ board-boston.config for a while, some how it's never wired up in Kconfig. Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230607055953.34110-1-jiaxun.yang@flygoat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | mlxsw: spectrum_nve_vxlan: Fix unsupported flag regressionIdo Schimmel2023-06-091-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recently added 'VXLAN_F_LOCALBYPASS' flag is set by default on VXLAN devices and denotes a behavior that is irrelevant for the hardware data path. Add it to the lists of IPv4 and IPv6 supported flags to avoid rejecting offload of VXLAN devices which have this flag set. Fixes: 69474a8a5837 ("net: vxlan: Add nolocalbypass option to vxlan.") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/5533e63643bf719bbe286fef60f749c9cad35005.1686139716.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2023-06-0819-108/+144
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cross-merge networking fixes after downstream PR. Conflicts: net/sched/sch_taprio.c d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") dced11ef84fb ("net/sched: taprio: don't overwrite "sch" variable in taprio_dump_class_stats()") net/ipv4/sysctl_net_ipv4.c e209fee4118f ("net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294") ccce324dabfe ("tcp: make the first N SYN RTO backoffs linear") https://lore.kernel.org/all/20230605100816.08d41a7b@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | bnxt_en: Implement .set_port / .unset_port UDP tunnel callbacksSomnath Kotur2023-06-081-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per the new udp tunnel framework, drivers which need to know the details of a port entry (i.e. port type) when it gets deleted should use the .set_port / .unset_port callbacks. Implementing the current .udp_tunnel_sync callback would mean that the deleted tunnel port entry would be all zeros. This used to work on older firmware because it would not check the input when deleting a tunnel port. With newer firmware, the delete will now fail and subsequent tunnel port allocation will fail as a result. Fixes: 442a35a5a7aa ("bnxt: convert to new udp_tunnel_nic infra") Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | bnxt_en: Prevent kernel panic when receiving unexpected PHC_UPDATE eventPavan Chebbi2023-06-082-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware can send PHC_RTC_UPDATE async event on a PF that may not have PTP registered. In such a case, there will be a null pointer deference for bp->ptp_cfg when we try to handle the event. Fix it by not registering for this event with the firmware if !bp->ptp_cfg. Also, check that bp->ptp_cfg is valid before proceeding when we receive the event. Fixes: 8bcf6f04d4a5 ("bnxt_en: Handle async event when the PHC is updated in RTC mode") Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | bnxt_en: Skip firmware fatal error recovery if chip is not accessibleVikas Gupta2023-06-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver starts firmware fatal error recovery by detecting heartbeat failure or fw reset count register changing. But these checks are not reliable if the device is not accessible. This can happen while DPC (Downstream Port containment) is in progress. Skip firmware fatal recovery if pci_device_is_present() returns false. Fixes: acfb50e4e773 ("bnxt_en: Add FW fatal devlink_health_reporter.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | bnxt_en: Query default VLAN before VNIC setup on a VFSomnath Kotur2023-06-081-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to call bnxt_hwrm_func_qcfg() on a VF to query the default VLAN that may be setup by the PF. If a default VLAN is enabled, the VF cannot support VLAN acceleration on the receive side and the VNIC must be setup to strip out the default VLAN tag. If a default VLAN is not enabled, the VF can support VLAN acceleration on the receive side. The VNIC should be set up to strip or not strip the VLAN based on the RX VLAN acceleration setting. Without this call to determine the default VLAN before calling bnxt_setup_vnic(), the VNIC may not be set up correctly. For example, bnxt_setup_vnic() may set up to strip the VLAN tag based on stale default VLAN information. If RX VLAN acceleration is not enabled, the VLAN tag will be incorrectly stripped and the RX data path will not work correctly. Fixes: cf6645f8ebc6 ("bnxt_en: Add function for VF driver to query default VLAN.") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | bnxt_en: Don't issue AP reset during ethtool's reset operationSreekanth Reddy2023-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only older NIC controller's firmware uses the PROC AP reset type. Firmware on 5731X/5741X and newer chips does not support this reset type. When bnxt_reset() issues a series of resets, this PROC AP reset may actually fail on these newer chips because the firmware is not ready to accept this unsupported command yet. Avoid this unnecessary error by skipping this reset type on chips that don't support it. Fixes: 7a13240e3718 ("bnxt_en: fix ethtool_reset_flags ABI violations") Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | bnxt_en: Fix bnxt_hwrm_update_rss_hash_cfg()Pavan Chebbi2023-06-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must specify the vnic id of the vnic in the input structure of this firmware message. Otherwise we will get an error from the firmware. Fixes: 98a4322b70e8 ("bnxt_en: update RSS config using difference algorithm") Reviewed-by: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
| * | net: bcmgenet: Fix EEE implementationFlorian Fainelli2023-06-083-14/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had a number of short comings: - EEE must be re-evaluated whenever the state machine detects a link change as wight be switching from a link partner with EEE enabled/disabled - tx_lpi_enabled controls whether EEE should be enabled/disabled for the transmit path, which applies to the TBUF block - We do not need to forcibly enable EEE upon system resume, as the PHY state machine will trigger a link event that will do that, too Fixes: 6ef398ea60d9 ("net: bcmgenet: add EEE support") Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20230606214348.2408018-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | eth: ixgbe: fix the wake conditionJakub Kicinski2023-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Flip the netif_carrier_ok() condition in queue wake logic. When I moved it to inside __netif_txq_completed_wake() I missed negating it. This made the condition ineffective and could probably lead to crashes. Fixes: 301f227fc860 ("net: piggy back on the memory barrier in bql when waking queues") Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230607010826.960226-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | eth: bnxt: fix the wake conditionJakub Kicinski2023-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The down condition should be the negation of the wake condition, IOW when I moved it from: if (cond && wake()) to if (__netif_txq_completed_wake(cond)) Cond should have been negated. Flip it now. This bug leads to occasional crashes with netconsole. It may also lead to queue never waking up in case BQL is not enabled. Reported-by: David Wei <davidhwei@meta.com> Fixes: 08a096780d92 ("bnxt: use new queue try_stop/try_wake macros") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20230607010826.960226-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
| * | ice: make writes to /dev/gnssX synchronousMichal Schmidt2023-06-074-72/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current ice driver's GNSS write implementation buffers writes and works through them asynchronously in a kthread. That's bad because: - The GNSS write_raw operation is supposed to be synchronous[1][2]. - There is no upper bound on the number of pending writes. Userspace can submit writes much faster than the driver can process, consuming unlimited amounts of kernel memory. A patch that's currently on review[3] ("[v3,net] ice: Write all GNSS buffers instead of first one") would add one more problem: - The possibility of waiting for a very long time to flush the write work when doing rmmod, softlockups. To fix these issues, simplify the implementation: Drop the buffering, the write_work, and make the writes synchronous. I tested this with gpsd and ubxtool. [1] https://events19.linuxfoundation.org/wp-content/uploads/2017/12/The-GNSS-Subsystem-Johan-Hovold-Hovold-Consulting-AB.pdf "User interface" slide. [2] A comment in drivers/gnss/core.c:gnss_write(): /* Ignoring O_NONBLOCK, write_raw() is synchronous. */ [3] https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20230217120541.16745-1-karol.kolacinski@intel.com/ Fixes: d6b98c8d242a ("ice: add write functionality for GNSS TTY") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>