linux - linux

	Commit message (Collapse)	Author	Files	Lines
2024-04-07	dmaengine: fsl-edma: add safety check for 'srcid'	Frank Li	1	-0/+7
	Ensure that 'srcid' is a non-zero value to avoid dtb passing invalid 'srcid' to the driver. Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240323-8ulp_edma-v3-2-c0e981027c05@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: fsl-edma: remove 'slave_id' from fsl_edma_chan	Frank Li	3	-8/+7
	The 'slave_id' field is redundant as it duplicates the functionality of 'srcid'. Remove 'slave_id' from fsl_edma_chan to eliminate redundancy. Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240323-8ulp_edma-v3-1-c0e981027c05@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dma: dw-axi-dmac: support per channel interrupt	Baruch Siach	2	-8/+23
	Hardware might not support a single combined interrupt that covers all channels. In that case we have to deal with interrupt per channel. Add support for that configuration. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Link: https://lore.kernel.org/r/ebab52e886ef1adc3c40e636aeb1ba3adfe2e578.1711453387.git.baruchs-c@neureality.ai Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	Avoid hw_desc array overrun in dw-axi-dmac	Joao Pinto	2	-4/+3
	I have a use case where nr_buffers = 3 and in which each descriptor is composed by 3 segments, resulting in the DMA channel descs_allocated to be 9. Since axi_desc_put() handles the hw_desc considering the descs_allocated, this scenario would result in a kernel panic (hw_desc array will be overrun). To fix this, the proposal is to add a new member to the axi_dma_desc structure, where we keep the number of allocated hw_descs (axi_desc_alloc()) and use it in axi_desc_put() to handle the hw_desc array correctly. Additionally I propose to remove the axi_chan_start_first_queued() call after completing the transfer, since it was identified that unbalance can occur (started descriptors can be interrupted and transfer ignored due to DMA channel not being enabled). Signed-off-by: Joao Pinto <jpinto@synopsys.com> Link: https://lore.kernel.org/r/1711536564-12919-1-git-send-email-jpinto@synopsys.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: dw-axi-dmac: Add support for StarFive JH8100 DMA	Tan Chun Hau	1	-0/+3
	JH8100 requires reset operation only in device probe. Signed-off-by: Tan Chun Hau <chunhau.tan@starfivetech.com> Link: https://lore.kernel.org/r/20240327025126.229475-3-chunhau.tan@starfivetech.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dt-bindings: dma: snps,dw-axi-dmac: Add JH8100 support	Tan Chun Hau	1	-0/+1
	Add support for StarFive JH8100 SoC in Sysnopsys Designware AXI DMA controller. Both JH8100 and JH7110 require reset operation in device probe. However, JH8100 doesn't need to apply different configuration on CH_CFG registers. Signed-off-by: Tan Chun Hau <chunhau.tan@starfivetech.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240327025126.229475-2-chunhau.tan@starfivetech.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: axi-dmac: move to device managed probe	Nuno Sa	1	-44/+34
	In axi_dmac_probe(), there's a mix in using device managed APIs and explicitly cleaning things in the driver .remove() hook. Move to use device managed APIs and thus drop the .remove() hook. Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240328-axi-dmac-devm-probe-v3-2-523c0176df70@analog.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: axi-dmac: fix possible race in remove()	Nuno Sa	1	-1/+1
	We need to first free the IRQ before calling of_dma_controller_free(). Otherwise we could get an interrupt and schedule a tasklet while removing the DMA controller. Fixes: 0e3b67b348b8 ("dmaengine: Add support for the Analog Devices AXI-DMAC DMA controller") Cc: stable@kernel.org Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20240328-axi-dmac-devm-probe-v3-1-523c0176df70@analog.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dma: Add lockdep asserts to virt-dma	Sean Anderson	1	-0/+10
	Add lockdep asserts to all functions with "vc.lock must be held by caller" in their documentation. This will help catch cases where these assumptions do not hold. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://lore.kernel.org/r/20240308210034.3634938-4-sean.anderson@linux.dev Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dma: xilinx_dpdma: Remove unnecessary use of irqsave/restore	Sean Anderson	1	-6/+4
	xilinx_dpdma_chan_done_irq and xilinx_dpdma_chan_vsync_irq are always called with IRQs disabled from xilinx_dpdma_irq_handler. Therefore we don't need to save/restore the IRQ flags. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Reviewed-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://lore.kernel.org/r/20240308210034.3634938-3-sean.anderson@linux.dev Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: imx-sdma: support dual fifo for DEV_TO_DEV	Shengjiu Wang	1	-1/+17
	SSI and SPDIF are dual fifo interface, when support ASRC P2P with SSI and SPDIF, the src fifo or dst fifo number can be two. The p2p watermark level bit 13 and 14 are designed for these use case. This patch is to complete this function in driver. Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Signed-off-by: Joy Zou <joy.zou@nxp.com> Acked-by: Iuliana Prodan <iuliana.prodan@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240329-sdma_upstream-v4-3-daeb3067dea7@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: imx-sdma: Support 24bit/3bytes for sg mode	Shengjiu Wang	1	-0/+4
	Update 3bytes buswidth that is supported by sdma. Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Signed-off-by: Vipul Kumar <vipul_kumar@mentor.com> Signed-off-by: Srikanth Krishnakar <Srikanth_Krishnakar@mentor.com> Acked-by: Robin Gong <yibin.gong@nxp.com> Reviewed-by: Joy Zou <joy.zou@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240329-sdma_upstream-v4-2-daeb3067dea7@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: imx-sdma: Support allocate memory from internal SRAM (iram)	Nicolin Chen	1	-10/+36
	Allocate memory from SoC internal SRAM to reduce DDR access and keep DDR in lower power state (such as self-referesh) longer. Check iram_pool before sdma_init() so that ccb/context could be allocated from iram because DDR maybe in self-referesh in lower power audio case while sdma still running. Reviewed-by: Shengjiu Wang <shengjiu.wang@nxp.com> Signed-off-by: Nicolin Chen <b42378@freescale.com> Signed-off-by: Joy Zou <joy.zou@nxp.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20240329-sdma_upstream-v4-1-daeb3067dea7@nxp.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dt-bindings: dma: snps,dma-spear1340: Fix data{-,_}width schema	Rob Herring	1	-21/+21
	'data-width' and 'data_width' properties are defined as arrays, but the schema is defined as a matrix. That works currently since everything gets decoded in to matrices, but that is internal to dtschema and could change. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240401204354.1691845-1-robh@kernel.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: idma64: Add check for dma_set_max_seg_size	Chen Ni	1	-1/+3
	As the possible failure of the dma_set_max_seg_size(), it should be better to check the return value of the dma_set_max_seg_size(). Fixes: e3fdb1894cfa ("dmaengine: idma64: set maximum allowed segment size for DMA") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20240403024932.3342606-1-nichen@iscas.ac.cn Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-04-07	dmaengine: idxd: Check for driver name match before sva user feature	Jerry Snitselaar	1	-8/+9
	Currently if the user driver is probed on a workqueue configured for another driver with SVA not enabled on the system, it will print out a number of probe failing messages like the following: [ 264.831140] user: probe of wq13.0 failed with error -95 On some systems, such as GNR, the number of messages can reach over 100. Move the SVA feature check to be after the driver name match check. Cc: Vinod Koul <vkoul@kernel.org> Cc: dmaengine@vger.kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> Link: https://lore.kernel.org/r/20240405213941.3629709-1-jsnitsel@redhat.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-03-28	MAINTAINERS: Drop Gustavo Pimentel as EDMA Reviewer	Manivannan Sadhasivam	1	-1/+0
	Gustavo Pimentel seems to have left Synopsys, so his email is bouncing. And there is no indication from him expressing willingness to continue contributing to the driver. So let's drop him from the MAINTAINERS entry. Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Link: https://lore.kernel.org/r/20240326085256.12639-1-manivannan.sadhasivam@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
2024-03-24	Linux 6.9-rc1v6.9-rc1	Linus Torvalds	1	-2/+2

2024-03-24	efi: fix panic in kdump kernel	Oleksandr Tymoshenko	1	-0/+2
	Check if get_next_variable() is actually valid pointer before calling it. In kdump kernel this method is set to NULL that causes panic during the kexec-ed kernel boot. Tested with QEMU and OVMF firmware. Fixes: bad267f9e18f ("efi: verify that variable services are supported") Signed-off-by: Oleksandr Tymoshenko <ovt@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24	x86/efistub: Don't clear BSS twice in mixed mode	Ard Biesheuvel	1	-1/+2
	Clearing BSS should only be done once, at the very beginning. efi_pe_entry() is the entrypoint from the firmware, which may not clear BSS and so it is done explicitly. However, efi_pe_entry() is also used as an entrypoint by the mixed mode startup code, in which case BSS will already have been cleared, and doing it again at this point will corrupt global variables holding the firmware's GDT/IDT and segment selectors. So make the memset() conditional on whether the EFI stub is running in native mode. Fixes: b3810c5a2cc4a666 ("x86/efistub: Clear decompressor BSS in native EFI entrypoint") Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24	x86/efistub: Call mixed mode boot services on the firmware's stack	Ard Biesheuvel	1	-0/+9
	Normally, the EFI stub calls into the EFI boot services using the stack that was live when the stub was entered. According to the UEFI spec, this stack needs to be at least 128k in size - this might seem large but all asynchronous processing and event handling in EFI runs from the same stack and so quite a lot of space may be used in practice. In mixed mode, the situation is a bit different: the bootloader calls the 32-bit EFI stub entry point, which calls the decompressor's 32-bit entry point, where the boot stack is set up, using a fixed allocation of 16k. This stack is still in use when the EFI stub is started in 64-bit mode, and so all calls back into the EFI firmware will be using the decompressor's limited boot stack. Due to the placement of the boot stack right after the boot heap, any stack overruns have gone unnoticed. However, commit 5c4feadb0011983b ("x86/decompressor: Move global symbol references to C code") moved the definition of the boot heap into C code, and now the boot stack is placed right at the base of BSS, where any overruns will corrupt the end of the .data section. While it would be possible to work around this by increasing the size of the boot stack, doing so would affect all x86 systems, and mixed mode systems are a tiny (and shrinking) fraction of the x86 installed base. So instead, record the firmware stack pointer value when entering from the 32-bit firmware, and switch to this stack every time a EFI boot service call is made. Cc: <stable@kernel.org> # v6.1+ Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-24	x86/boot/64: Move 5-level paging global variable assignments back	Tom Lendacky	1	-9/+7
	Commit 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") moved assignment of 5-level global variables to later in the boot in order to avoid having to use RIP relative addressing in order to set them. However, when running with 5-level paging and SME active (mem_encrypt=on), the variables are needed as part of the page table setup needed to encrypt the kernel (using pgd_none(), p4d_offset(), etc.). Since the variables haven't been set, the page table manipulation is done as if 4-level paging is active, causing the system to crash on boot. While only a subset of the assignments that were moved need to be set early, move all of the assignments back into check_la57_support() so that these assignments aren't spread between two locations. Instead of just reverting the fix, this uses the new RIP_REL_REF() macro when assigning the variables. Fixes: 63bed9660420 ("x86/startup_64: Defer assignment of 5-level paging global variables") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/2ca419f4d0de719926fd82353f6751f717590a86.1711122067.git.thomas.lendacky@amd.com
2024-03-24	x86/boot/64: Apply encryption mask to 5-level pagetable update	Tom Lendacky	1	-1/+1
	When running with 5-level page tables, the kernel mapping PGD entry is updated to point to the P4D table. The assignment uses _PAGE_TABLE_NOENC, which, when SME is active (mem_encrypt=on), results in a page table entry without the encryption mask set, causing the system to crash on boot. Change the assignment to use _PAGE_TABLE instead of _PAGE_TABLE_NOENC so that the encryption mask is set for the PGD entry. Fixes: 533568e06b15 ("x86/boot/64: Use RIP_REL_REF() to access early_top_pgt[]") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/8f20345cda7dbba2cf748b286e1bc00816fe649a.1711122067.git.thomas.lendacky@amd.com
2024-03-24	x86/cpu: Add model number for another Intel Arrow Lake mobile processor	Tony Luck	1	-0/+1
	This one is the regular laptop CPU. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322161725.195614-1-tony.luck@intel.com
2024-03-24	x86/fpu: Keep xfd_state in sync with MSR_IA32_XFD	Adamos Ttofari	2	-6/+13
	Commit 672365477ae8 ("x86/fpu: Update XFD state where required") and commit 8bf26758ca96 ("x86/fpu: Add XFD state to fpstate") introduced a per CPU variable xfd_state to keep the MSR_IA32_XFD value cached, in order to avoid unnecessary writes to the MSR. On CPU hotplug MSR_IA32_XFD is reset to the init_fpstate.xfd, which wipes out any stale state. But the per CPU cached xfd value is not reset, which brings them out of sync. As a consequence a subsequent xfd_update_state() might fail to update the MSR which in turn can result in XRSTOR raising a #NM in kernel space, which crashes the kernel. To fix this, introduce xfd_set_state() to write xfd_state together with MSR_IA32_XFD, and use it in all places that set MSR_IA32_XFD. Fixes: 672365477ae8 ("x86/fpu: Update XFD state where required") Signed-off-by: Adamos Ttofari <attofari@amazon.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20240322230439.456571-1-chang.seok.bae@intel.com Closes: https://lore.kernel.org/lkml/20230511152818.13839-1-attofari@amazon.de
2024-03-24	Documentation/x86: Document that resctrl bandwidth control units are MiB	Tony Luck	1	-4/+4
	The memory bandwidth software controller uses 2^20 units rather than 10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M" Linux define for 0x00100000. Update the documentation to use MiB when describing this feature. It's too late to fix the mount option "mba_MBps" as that is now an established user interface. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240322182016.196544-1-tony.luck@intel.com
2024-03-23	x86/mpparse: Register APIC address only once	Thomas Gleixner	1	-5/+5
	The APIC address is registered twice. First during the early detection and afterwards when actually scanning the table for APIC IDs. The APIC and topology core warn about the second attempt. Restrict it to the early detection call. Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
2024-03-23	x86/topology: Handle the !APIC case gracefully	Thomas Gleixner	1	-0/+11
	If there is no local APIC enumerated and registered then the topology bitmaps are empty. Therefore, topology_init_possible_cpus() will die with a division by zero exception. Prevent this by registering a fake APIC id to populate the topology bitmap. This also allows to use all topology query interfaces unconditionally. It does not affect the actual APIC code because either the local APIC address was not registered or no local APIC could be detected. Fixes: f1f758a80516 ("x86/topology: Add a mechanism to track topology via APIC IDs") Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.242709302@linutronix.de
2024-03-23	x86/topology: Don't evaluate logical IDs during early boot	Thomas Gleixner	1	-5/+7
	The local APICs have not yet been enumerated so the logical ID evaluation from the topology bitmaps does not work and would return an error code. Skip the evaluation during the early boot CPUID evaluation and only apply it on the final run. Fixes: 380414be78bf ("x86/cpu/topology: Use topology logical mapping mechanism") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.186943142@linutronix.de
2024-03-23	x86/cpu: Ensure that CPU info updates are propagated on UP	Thomas Gleixner	3	-37/+14
	The boot sequence evaluates CPUID information twice: 1) During early boot 2) When finalizing the early setup right before mitigations are selected and alternatives are patched. In both cases the evaluation is stored in boot_cpu_data, but on UP the copying of boot_cpu_data to the per CPU info of the boot CPU happens between #1 and #2. So any update which happens in #2 is never propagated to the per CPU info instance. Consolidate the whole logic and copy boot_cpu_data right before applying alternatives as that's the point where boot_cpu_data is in it's final state and not supposed to change anymore. This also removes the voodoo mb() from smp_prepare_cpus_common() which had absolutely no purpose. Fixes: 71eb4893cfaf ("x86/percpu: Cure per CPU madness on UP") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20240322185305.127642785@linutronix.de
2024-03-23	lkdtm/bugs: Improve warning message for compilers without counted_by support	Nathan Chancellor	1	-1/+1
	The current message for telling the user that their compiler does not support the counted_by attribute in the FAM_BOUNDS test does not make much sense either grammatically or semantically. Fix it to make it correct in both aspects. Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240321-lkdtm-improve-lack-of-counted_by-msg-v1-1-0fbf7481a29c@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-23	overflow: Change DEFINE_FLEX to take __counted_by member	Kees Cook	8	-22/+58
	The norm should be flexible array structures with __counted_by annotations, so DEFINE_FLEX() is updated to expect that. Rename the non-annotated version to DEFINE_RAW_FLEX(), and update the few existing users. Additionally add selftests for the macros. Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/20240306235128.it.933-kees@kernel.org Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-22	efi/libstub: fix efi_random_alloc() to allocate memory at alloc_min or ↵	KONDO KAZUMA(近藤　和真)	1	-1/+1
	higher address Following warning is sometimes observed while booting my servers: [ 3.594838] DMA: preallocated 4096 KiB GFP_KERNEL pool for atomic allocations [ 3.602918] swapper/0: page allocation failure: order:10, mode:0xcc1(GFP_KERNEL\|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0-1 ... [ 3.851862] DMA: preallocated 1024 KiB GFP_KERNEL\|GFP_DMA pool for atomic allocation If 'nokaslr' boot option is set, the warning always happens. On x86, ZONE_DMA is small zone at the first 16MB of physical address space. When this problem happens, most of that space seems to be used by decompressed kernel. Thereby, there is not enough space at DMA_ZONE to meet the request of DMA pool allocation. The commit 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") tried to fix this problem by introducing lower bound of allocation. But the fix is not complete. efi_random_alloc() allocates pages by following steps. 1. Count total available slots ('total_slots') 2. Select a slot ('target_slot') to allocate randomly 3. Calculate a starting address ('target') to be included target_slot 4. Allocate pages, which starting address is 'target' In step 1, 'alloc_min' is used to offset the starting address of memory chunk. But in step 3 'alloc_min' is not considered at all. As the result, 'target' can be miscalculated and become lower than 'alloc_min'. When KASLR is disabled, 'target_slot' is always 0 and the problem happens everytime if the EFI memory map of the system meets the condition. Fix this problem by calculating 'target' considering 'alloc_min'. Cc: linux-efi@vger.kernel.org Cc: Tom Englund <tomenglund26@gmail.com> Cc: linux-kernel@vger.kernel.org Fixes: 2f77465b05b1 ("x86/efistub: Avoid placing the kernel below LOAD_PHYSICAL_ADDR") Signed-off-by: Kazuma Kondo <kazuma-kondo@nec.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2024-03-22	kprobes/x86: Use copy_from_kernel_nofault() to read from unsafe address	Masami Hiramatsu (Google)	1	-1/+10
	Read from an unsafe address with copy_from_kernel_nofault() in arch_adjust_kprobe_addr() because this function is used before checking the address is in text or not. Syzcaller bot found a bug and reported the case if user specifies inaccessible data area, arch_adjust_kprobe_addr() will cause a kernel panic. [ mingo: Clarified the comment. ] Fixes: cc66bb914578 ("x86/ibt,kprobes: Cure sym+0 equals fentry woes") Reported-by: Qiang Zhang <zzqq0103.hey@gmail.com> Tested-by: Jinghao Jia <jinghao7@illinois.edu> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/171042945004.154897.2221804961882915806.stgit@devnote2
2024-03-22	x86/pm: Work around false positive kmemleak report in msr_build_context()	Anton Altaparmakov	1	-5/+5
	Since: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") kmemleak reports this issue: unreferenced object 0xf68241e0 (size 32): comm "swapper/0", pid 1, jiffies 4294668610 (age 68.432s) hex dump (first 32 bytes): 00 cc cc cc 29 10 01 c0 00 00 00 00 00 00 00 00 ....)........... 00 42 82 f6 cc cc cc cc cc cc cc cc cc cc cc cc .B.............. backtrace: [<461c1d50>] __kmem_cache_alloc_node+0x106/0x260 [<ea65e13b>] __kmalloc+0x54/0x160 [<c3858cd2>] msr_build_context.constprop.0+0x35/0x100 [<46635aff>] pm_check_save_msr+0x63/0x80 [<6b6bb938>] do_one_initcall+0x41/0x1f0 [<3f3add60>] kernel_init_freeable+0x199/0x1e8 [<3b538fde>] kernel_init+0x1a/0x110 [<938ae2b2>] ret_from_fork+0x1c/0x28 Which is a false positive. Reproducer: - Run rsync of whole kernel tree (multiple times if needed). - start a kmemleak scan - Note this is just an example: a lot of our internal tests hit these. The root cause is similar to the fix in: b0b592cf0836 x86/pm: Fix false positive kmemleak report in msr_build_context() ie. the alignment within the packed struct saved_context which has everything unaligned as there is only "u16 gs;" at start of struct where in the past there were four u16 there thus aligning everything afterwards. The issue is with the fact that Kmemleak only searches for pointers that are aligned (see how pointers are scanned in kmemleak.c) so when the struct members are not aligned it doesn't see them. Testing: We run a lot of tests with our CI, and after applying this fix we do not see any kmemleak issues any more whilst without it we see hundreds of the above report. From a single, simple test run consisting of 416 individual test cases on kernel 5.10 x86 with kmemleak enabled we got 20 failures due to this, which is quite a lot. With this fix applied we get zero kmemleak related failures. Fixes: 7ee18d677989 ("x86/power: Make restore_processor_context() sane") Signed-off-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20240314142656.17699-1-anton@tuxera.com
2024-03-22	x86/kexec: Do not update E820 kexec table for setup_data	Dave Young	1	-16/+1
	crashkernel reservation failed on a Thinkpad t440s laptop recently. Actually the memblock reservation succeeded, but later insert_resource() failed. Test steps: kexec load -> /* make sure add crashkernel param eg. crashkernel=160M */ kexec reboot -> dmesg\|grep "crashkernel reserved"; crashkernel memory range like below reserved successfully: 0x00000000d0000000 - 0x00000000da000000 But no such "Crash kernel" region in /proc/iomem The background story: Currently the E820 code reserves setup_data regions for both the current kernel and the kexec kernel, and it inserts them into the resources list. Before the kexec kernel reboots nobody passes the old setup_data, and kexec only passes fresh SETUP_EFI/SETUP_IMA/SETUP_RNG_SEED if needed. Thus the old setup data memory is not used at all. Due to old kernel updates the kexec e820 table as well so kexec kernel sees them as E820_TYPE_RESERVED_KERN regions, and later the old setup_data regions are inserted into resources list in the kexec kernel by e820__reserve_resources(). Note, due to no setup_data is passed in for those old regions they are not early reserved (by function early_reserve_memory), and the crashkernel memblock reservation will just treat them as usable memory and it could reserve the crashkernel region which overlaps with the old setup_data regions. And just like the bug I noticed here, kdump insert_resource failed because e820__reserve_resources has added the overlapped chunks in /proc/iomem already. Finally, looking at the code, the old setup_data regions are not used at all as no setup_data is passed in by the kexec boot loader. Although something like SETUP_PCI etc could be needed, kexec should pass the info as new setup_data so that kexec kernel can take care of them. This should be taken care of in other separate patches if needed. Thus drop the useless buggy code here. Signed-off-by: Dave Young <dyoung@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Bohac <jbohac@suse.cz> Cc: Eric DeVolder <eric.devolder@oracle.com> Cc: Baoquan He <bhe@redhat.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lore.kernel.org/r/Zf0T3HCG-790K-pZ@darkstar.users.ipa.redhat.com
2024-03-21	sched/doc: Update documentation for base_slice_ns and CONFIG_HZ relation	Mukesh Kumar Chaurasiya	1	-0/+3
	The tunable base_slice_ns is dependent on CONFIG_HZ (i.e. TICK_NSEC) for any significant performance improvement. The reason being the scheduler tick is not frequent enough to force preemption when base_slice expires in case of: base_slice_ns < TICK_NSEC The below data is of stress-ng: Number of CPU: 1 Stressor threads: 4 Time: 30sec On CONFIG_HZ=1000 \| base_slice \| avg-run (msec) \| context-switches \| \| ---------- \| -------------- \| ---------------- \| \| 3ms \| 2.914 \| 10342 \| \| 6ms \| 4.857 \| 6196 \| \| 9ms \| 6.754 \| 4482 \| \| 12ms \| 7.872 \| 3802 \| \| 22ms \| 11.294 \| 2710 \| \| 32ms \| 13.425 \| 2284 \| On CONFIG_HZ=100 \| base_slice \| avg-run (msec) \| context-switches \| \| ---------- \| -------------- \| ---------------- \| \| 3ms \| 9.144 \| 3337 \| \| 6ms \| 9.113 \| 3301 \| \| 9ms \| 8.991 \| 3315 \| \| 12ms \| 12.935 \| 2328 \| \| 22ms \| 16.031 \| 1915 \| \| 32ms \| 18.608 \| 1622 \| base_slice: the value of base_slice in ms avg-run (msec): average time of the stressor threads got on cpu before it got preempted context-switches: number of context switches for the stress-ng process Signed-off-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240320173815.927637-2-mchauras@linux.ibm.com
2024-03-21	fbdev: panel-tpo-td043mtea1: Convert sprintf() to sysfs_emit()	Li Zhijian	1	-9/+4
	Per filesystems/sysfs.rst, show() should only use sysfs_emit() or sysfs_emit_at() when formatting the value to be returned to user space. coccinelle complains that there are still a couple of functions that use snprintf(). Convert them to sysfs_emit(). CC: Helge Deller <deller@gmx.de> CC: linux-omap@vger.kernel.org CC: linux-fbdev@vger.kernel.org CC: dri-devel@lists.freedesktop.org Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Helge Deller <deller@gmx.de>
2024-03-21	nvmet-rdma: remove NVMET_RDMA_REQ_INVALIDATE_RKEY flag	Guixin Liu	1	-5/+3
	We can simply use invalidate_rkey to check instead of adding a flag. Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-21	nvme: remove redundant BUILD_BUG_ON check	Guixin Liu	1	-3/+0
	Remove redundant BUILD_BUG_ON check of struct nvme_dsm_range, it's already checked in nvme_init_ctrl(). Signed-off-by: Guixin Liu <kanie@linux.alibaba.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-21	dm-integrity: align the outgoing bio in integrity_recheck	Mikulas Patocka	1	-2/+10
	It is possible to set up dm-integrity with smaller sector size than the logical sector size of the underlying device. In this situation, dm-integrity guarantees that the outgoing bios have the same alignment as incoming bios (so, if you create a filesystem with 4k block size, dm-integrity would send 4k-aligned bios to the underlying device). This guarantee was broken when integrity_recheck was implemented. integrity_recheck sends bio that is aligned to ic->sectors_per_block. So if we set up integrity with 512-byte sector size on a device with logical block size 4k, we would be sending unaligned bio. This triggered a bug in one of our internal tests. This commit fixes it by determining the actual alignment of the incoming bio and then makes sure that the outgoing bio in integrity_recheck has the same alignment. Fixes: c88f5e553fe3 ("dm-integrity: recheck the integrity tag after a failure") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
2024-03-21	firewire: core: add memo about the caller of show functions for device ↵	Takashi Sakamoto	1	-0/+2
	attributes In the case of firewire core function, the caller of show functions for device attributes is not only sysfs user, but also device initialization. This commit adds memo about it against the typical assumption that the functions are just dedicated to sysfs user. Link: https://lore.kernel.org/lkml/20240318091759.678326-1-o-takashi@sakamocchi.jp/ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
2024-03-21	selftests: forwarding: Fix ping failure due to short timeout	Ido Schimmel	2	-4/+4
	The tests send 100 pings in 0.1 second intervals and force a timeout of 11 seconds, which is borderline (especially on debug kernels), resulting in random failures in netdev CI [1]. Fix by increasing the timeout to 20 seconds. It should not prolong the test unless something is wrong, in which case the test will rightfully fail. [1] # selftests: net/forwarding: vxlan_bridge_1d_port_8472_ipv6.sh # INFO: Running tests with UDP port 8472 # TEST: ping: local->local [ OK ] # TEST: ping: local->remote 1 [FAIL] # Ping failed [...] Fixes: b07e9957f220 ("selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6") Fixes: 728b35259e28 ("selftests: forwarding: Add VxLAN tests with a VLAN-aware bridge for IPv6") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/netdev/24a7051fdcd1f156c3704bca39e4b3c41dfc7c4b.camel@redhat.com/ Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240320065717.4145325-1-idosch@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21	spi: spi-mt65xx: Fix NULL pointer access in interrupt handler	Fei Shao	1	-10/+12
	The TX buffer in spi_transfer can be a NULL pointer, so the interrupt handler may end up writing to the invalid memory and cause crashes. Add a check to trans->tx_buf before using it. Fixes: 1ce24864bff4 ("spi: mediatek: Only do dma for 4-byte aligned buffers") Signed-off-by: Fei Shao <fshao@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://msgid.link/r/20240321070942.1587146-2-fshao@chromium.org Signed-off-by: Mark Brown <broonie@kernel.org>
2024-03-21	MAINTAINERS: step down as netfilter maintainer	Florian Westphal	1	-1/+0
	I do not feel that I'm up to the task anymore. I hope this to be a temporary emergeny measure, but for now I'm sure this is the best course of action for me. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20240319121223.24474-1-fw@strlen.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21	sh: hd64461: Make setup_hd64461() static	Artur Rojek	1	-1/+1
	Enforce internal linkage for setup_hd64461(). This fixes the following error: arch/sh/cchips/hd6446x/hd64461.c:75:12: error: no previous prototype for 'setup_hd64461' [-Werror=missing-prototypes] Signed-off-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/20240211193451.106795-1-contact@artur-rojek.eu Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-03-21	netfilter: nf_tables: Fix a memory leak in nf_tables_updchain	Quan Tian	1	-13/+14
	If nft_netdev_register_hooks() fails, the memory associated with nft_stats is not freed, causing a memory leak. This patch fixes it by moving nft_stats_alloc() down after nft_netdev_register_hooks() succeeds. Fixes: b9703ed44ffb ("netfilter: nf_tables: support for adding new devices to an existing netdev chain") Signed-off-by: Quan Tian <tianquan23@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-21	net: dsa: mt7530: fix handling of all link-local frames	Arınç ÜNAL	2	-4/+46
	Currently, the MT753X switches treat frames with :01-0D and :0F MAC DAs as regular multicast frames, therefore flooding them to user ports. On page 205, section "8.6.3 Frame filtering" of the active standard, IEEE Std 802.1Q™-2022, it is stated that frames with 01:80:C2:00:00:00-0F as MAC DA must only be propagated to C-VLAN and MAC Bridge components. That means VLAN-aware and VLAN-unaware bridges. On the switch designs with CPU ports, these frames are supposed to be processed by the CPU (software). So we make the switch only forward them to the CPU port. And if received from a CPU port, forward to a single port. The software is responsible of making the switch conform to the latter by setting a single port as destination port on the special tag. This switch intellectual property cannot conform to this part of the standard fully. Whilst the REV_UN frame tag covers the remaining :04-0D and :0F MAC DAs, it also includes :22-FF which the scope of propagation is not supposed to be restricted for these MAC DAs. Set frames with :01-03 MAC DAs to be trapped to the CPU port(s). Add a comment for the remaining MAC DAs. Note that the ingress port must have a PVID assigned to it for the switch to forward untagged frames. A PVID is set by default on VLAN-aware and VLAN-unaware ports. However, when the network interface that pertains to the ingress port is attached to a vlan_filtering enabled bridge, the user can remove the PVID assignment from it which would prevent the link-local frames from being trapped to the CPU port. I am yet to see a way to forward link-local frames while preventing other untagged frames from being forwarded too. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21	net: dsa: mt7530: fix link-local frames that ingress vlan filtering ports	Arınç ÜNAL	2	-9/+23
	Whether VLAN-aware or not, on every VID VLAN table entry that has the CPU port as a member of it, frames are set to egress the CPU port with the VLAN tag stacked. This is so that VLAN tags can be appended after hardware special tag (called DSA tag in the context of Linux drivers). For user ports on a VLAN-unaware bridge, frame ingressing the user port egresses CPU port with only the special tag. For user ports on a VLAN-aware bridge, frame ingressing the user port egresses CPU port with the special tag and the VLAN tag. This causes issues with link-local frames, specifically BPDUs, because the software expects to receive them VLAN-untagged. There are two options to make link-local frames egress untagged. Setting CONSISTENT or UNTAGGED on the EG_TAG bits on the relevant register. CONSISTENT means frames egress exactly as they ingress. That means egressing with the VLAN tag they had at ingress or egressing untagged if they ingressed untagged. Although link-local frames are not supposed to be transmitted VLAN-tagged, if they are done so, when egressing through a CPU port, the special tag field will be broken. BPDU egresses CPU port with VLAN tag egressing stacked, received on software: 00:01:25.104821 AF Unknown (382365846), length 106: \| STAG \| \| VLAN \| 0x0000: 0000 6c27 614d 4143 0001 0000 8100 0001 ..l'aMAC........ 0x0010: 0026 4242 0300 0000 0000 0000 6c27 614d .&BB........l'aM 0x0020: 4143 0000 0000 0000 6c27 614d 4143 0000 AC......l'aMAC.. 0x0030: 0000 1400 0200 0f00 0000 0000 0000 0000 ................ BPDU egresses CPU port with VLAN tag egressing untagged, received on software: 00:23:56.628708 AF Unknown (25215488), length 64: \| STAG \| 0x0000: 0000 6c27 614d 4143 0001 0000 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ BPDU egresses CPU port with VLAN tag egressing tagged, received on software: 00:01:34.311963 AF Unknown (25215488), length 64: \| Mess \| 0x0000: 0000 6c27 614d 4143 0001 0001 0026 4242 ..l'aMAC.....&BB 0x0010: 0300 0000 0000 0000 6c27 614d 4143 0000 ........l'aMAC.. 0x0020: 0000 0000 6c27 614d 4143 0000 0000 1400 ....l'aMAC...... 0x0030: 0200 0f00 0000 0000 0000 0000 ............ To prevent confusing the software, force the frame to egress UNTAGGED instead of CONSISTENT. This way, frames can't possibly be received TAGGED by software which would have the special tag field broken. VLAN Tag Egress Procedure For all frames, one of these options set the earliest in this order will apply to the frame: - EG_TAG in certain registers for certain frames. This will apply to frame with matching MAC DA or EtherType. - EG_TAG in the address table. This will apply to frame at its incoming port. - EG_TAG in the PVC register. This will apply to frame at its incoming port. - EG_CON and [EG_TAG per port] in the VLAN table. This will apply to frame at its outgoing port. - EG_TAG in the PCR register. This will apply to frame at its outgoing port. EG_TAG in certain registers for certain frames: PPPoE Discovery_ARP/RARP: PPP_EG_TAG and ARP_EG_TAG in the APC register. IGMP_MLD: IGMP_EG_TAG and MLD_EG_TAG in the IMC register. BPDU and PAE: BPDU_EG_TAG and PAE_EG_TAG in the BPC register. REV_01 and REV_02: R01_EG_TAG and R02_EG_TAG in the RGAC1 register. REV_03 and REV_0E: R03_EG_TAG and R0E_EG_TAG in the RGAC2 register. REV_10 and REV_20: R10_EG_TAG and R20_EG_TAG in the RGAC3 register. REV_21 and REV_UN: R21_EG_TAG and RUN_EG_TAG in the RGAC4 register. With this change, it can be observed that a bridge interface with stp_state and vlan_filtering enabled will properly block ports now. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-21	x86/config: Fix warning for 'make ARCH=x86_64 tinyconfig'	Masahiro Yamada	1	-0/+1
	Kconfig emits a warning for the following command: $ make ARCH=x86_64 tinyconfig ... .config:1380:warning: override: UNWINDER_GUESS changes choice state When X86_64=y, the unwinder is exclusively selected from the following three options: - UNWINDER_ORC - UNWINDER_FRAME_POINTER - UNWINDER_GUESS However, arch/x86/configs/tiny.config only specifies the values of the last two. UNWINDER_ORC must be explicitly disabled. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20240320154313.612342-1-masahiroy@kernel.org