2792 Commits

Author SHA1 Message Date
Linus Torvalds
48e3694ae7 Merge tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:

 - Add KUnit test for the printk ring buffer

 - Fix the check of the maximal record size which is allowed to be
   stored into the printk ring buffer. It prevents corruptions of the
   ring buffer.

   Note that printk() is on the safe side. The messages are limited by
   1kB buffer and are always small enough for the minimal log buffer
   size 4kB, see CONFIG_LOG_BUF_SHIFT definition.

* tag 'printk-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  printk: ringbuffer: Fix data block max size check
  printk: kunit: support offstack cpumask
  printk: kunit: Fix __counted_by() in struct prbtest_rbdata
  printk: ringbuffer: Explain why the KUnit test ignores failed writes
  printk: ringbuffer: Add KUnit test
2025-10-04 11:13:11 -07:00
Linus Torvalds
e406d57be7 Merge tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:

 - "ida: Remove the ida_simple_xxx() API" from Christophe Jaillet
   completes the removal of this legacy IDR API

 - "panic: introduce panic status function family" from Jinchao Wang
   provides a number of cleanups to the panic code and its various
   helpers, which were rather ad-hoc and scattered all over the place

 - "tools/delaytop: implement real-time keyboard interaction support"
   from Fan Yu adds a few nice user-facing usability changes to the
   delaytop monitoring tool

 - "efi: Fix EFI boot with kexec handover (KHO)" from Evangelos
   Petrongonas fixes a panic which was happening with the combination of
   EFI and KHO

 - "Squashfs: performance improvement and a sanity check" from Phillip
   Lougher teaches squashfs's lseek() about SEEK_DATA/SEEK_HOLE. A mere
   150x speedup was measured for a well-chosen microbenchmark

 - plus another 50-odd singleton patches all over the place

* tag 'mm-nonmm-stable-2025-10-02-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (75 commits)
  Squashfs: reject negative file sizes in squashfs_read_inode()
  kallsyms: use kmalloc_array() instead of kmalloc()
  MAINTAINERS: update Sibi Sankar's email address
  Squashfs: add SEEK_DATA/SEEK_HOLE support
  Squashfs: add additional inode sanity checking
  lib/genalloc: fix device leak in of_gen_pool_get()
  panic: remove CONFIG_PANIC_ON_OOPS_VALUE
  ocfs2: fix double free in user_cluster_connect()
  checkpatch: suppress strscpy warnings for userspace tools
  cramfs: fix incorrect physical page address calculation
  kernel: prevent prctl(PR_SET_PDEATHSIG) from racing with parent process exit
  Squashfs: fix uninit-value in squashfs_get_parent
  kho: only fill kimage if KHO is finalized
  ocfs2: avoid extra calls to strlen() after ocfs2_sprintf_system_inode_name()
  kernel/sys.c: fix the racy usage of task_lock(tsk->group_leader) in sys_prlimit64() paths
  sched/task.h: fix the wrong comment on task_lock() nesting with tasklist_lock
  coccinelle: platform_no_drv_owner: handle also built-in drivers
  coccinelle: of_table: handle SPI device ID tables
  lib/decompress: use designated initializers for struct compress_format
  efi: support booting with kexec handover (KHO)
  ...
2025-10-02 18:44:54 -07:00
Petr Mladek
7a75a5da79 Merge branch 'rework/ringbuffer-kunit-test' into for-linus 2025-10-02 10:33:08 +02:00
Linus Torvalds
7f70725741 Merge tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nathan Chancellor:

 - Extend modules.builtin.modinfo to include module aliases from
   MODULE_DEVICE_TABLE for builtin modules so that userspace tools (such
   as kmod) can verify that a particular module alias will be handled by
   a builtin module

 - Bump the minimum version of LLVM for building the kernel to 15.0.0

 - Upgrade several userspace API checks in headers_check.pl to errors

 - Unify and consolidate CONFIG_WERROR / W=e handling

 - Turn assembler and linker warnings into errors with CONFIG_WERROR /
   W=e

 - Respect CONFIG_WERROR / W=e when building userspace programs
   (userprogs)

 - Enable -Werror unconditionally when building host programs
   (hostprogs)

 - Support copy_file_range() and data segment alignment in gen_init_cpio
   to improve performance on filesystems that support reflinks such as
   btrfs and XFS

 - Miscellaneous small changes to scripts and configuration files

* tag 'kbuild-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux: (47 commits)
  modpost: Initialize builtin_modname to stop SIGSEGVs
  Documentation: kbuild: note CONFIG_DEBUG_EFI in reproducible builds
  kbuild: vmlinux.unstripped should always depend on .vmlinux.export.o
  modpost: Create modalias for builtin modules
  modpost: Add modname to mod_device_table alias
  scsi: Always define blogic_pci_tbl structure
  kbuild: extract modules.builtin.modinfo from vmlinux.unstripped
  kbuild: keep .modinfo section in vmlinux.unstripped
  kbuild: always create intermediate vmlinux.unstripped
  s390: vmlinux.lds.S: Reorder sections
  KMSAN: Remove tautological checks
  objtool: Drop noinstr hack for KCSAN_WEAK_MEMORY
  lib/Kconfig.debug: Drop CLANG_VERSION check from DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
  riscv: Remove ld.lld version checks from many TOOLCHAIN_HAS configs
  riscv: Unconditionally use linker relaxation
  riscv: Remove version check for LTO_CLANG selects
  powerpc: Drop unnecessary initializations in __copy_inst_from_kernel_nofault()
  mips: Unconditionally select ARCH_HAS_CURRENT_STACK_POINTER
  arm64: Remove tautological LLVM Kconfig conditions
  ARM: Clean up definition of ARM_HAS_GROUP_RELOCS
  ...
2025-10-01 20:58:51 -07:00
Linus Torvalds
4b81e2eb9e Merge tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull VDSO updates from Thomas Gleixner:

 - Further consolidation of the VDSO infrastructure and the common data
   store

 - Simplification of the related Kconfig logic

 - Improve the VDSO selftest suite

* tag 'timers-vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests: vDSO: Drop vdso_test_clock_getres
  selftests: vDSO: vdso_test_abi: Add tests for clock_gettime64()
  selftests: vDSO: vdso_test_abi: Test CPUTIME clocks
  selftests: vDSO: vdso_test_abi: Use explicit indices for name array
  selftests: vDSO: vdso_test_abi: Drop clock availability tests
  selftests: vDSO: vdso_test_abi: Use ksft_finished()
  selftests: vDSO: vdso_test_abi: Correctly skip whole test with missing vDSO
  selftests: vDSO: Fix -Wunitialized in powerpc VDSO_CALL() wrapper
  vdso: Add struct __kernel_old_timeval forward declaration to gettime.h
  vdso: Gate VDSO_GETRANDOM behind HAVE_GENERIC_VDSO
  vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
  vdso: Drop Kconfig GENERIC_VDSO_DATA_STORE
  vdso: Drop kconfig GENERIC_COMPAT_VDSO
  vdso: Drop kconfig GENERIC_VDSO_32
  riscv: vdso: Untangle Kconfig logic
  time: Build generic update_vsyscall() only with generic time vDSO
  vdso/gettimeofday: Remove !CONFIG_TIME_NS stubs
  vdso: Move ENABLE_COMPAT_VDSO from core to arm64
  ARM: VDSO: Remove cntvct_ok global variable
  vdso/datastore: Gate time data behind CONFIG_GENERIC_GETTIMEOFDAY
2025-09-30 16:58:21 -07:00
Linus Torvalds
755fa5b4fb Merge tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:

 - Extensive cpuset code cleanup and refactoring work with no functional
   changes: CPU mask computation logic refactoring, introducing new
   helpers, removing redundant code paths, and improving error handling
   for better maintainability.

 - A few bug fixes to cpuset including fixes for partition creation
   failures when isolcpus is in use, missing error returns, and null
   pointer access prevention in free_tmpmasks().

 - Core cgroup changes include replacing the global percpu_rwsem with
   per-threadgroup rwsem when writing to cgroup.procs for better
   scalability, workqueue conversions to use WQ_PERCPU and
   system_percpu_wq to prepare for workqueue default switching from
   percpu to unbound, and removal of unused code including the
   post_attach callback.

 - New cgroup.stat.local time accounting feature that tracks frozen time
   duration.

 - Misc changes including selftests updates (new freezer time tests and
   backward compatibility fixes), documentation sync, string function
   safety improvements, and 64-bit division fixes.

* tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits)
  cpuset: remove is_prs_invalid helper
  cpuset: remove impossible warning in update_parent_effective_cpumask
  cpuset: remove redundant special case for null input in node mask update
  cpuset: fix missing error return in update_cpumask
  cpuset: Use new excpus for nocpu error check when enabling root partition
  cpuset: fix failure to enable isolated partition when containing isolcpus
  Documentation: cgroup-v2: Sync manual toctree
  cpuset: use partition_cpus_change for setting exclusive cpus
  cpuset: use parse_cpulist for setting cpus.exclusive
  cpuset: introduce partition_cpus_change
  cpuset: refactor cpus_allowed_validate_change
  cpuset: refactor out validate_partition
  cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers
  cpuset: refactor CPU mask buffer parsing logic
  cpuset: Refactor exclusive CPU mask computation logic
  cpuset: change return type of is_partition_[in]valid to bool
  cpuset: remove unused assignment to trialcs->partition_root_state
  cpuset: move the root cpuset write check earlier
  cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock
  cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock
  ...
2025-09-30 09:55:41 -07:00
Linus Torvalds
9cc220a422 Merge tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:

 - Refactor SCLP memory hotplug code

 - Introduce common boot_panic() decompressor helper macro and use it to
   get rid of nearly few identical implementations

 - Take into account additional key generation flags and forward it to
   the ep11 implementation. With that allow users to modify the key
   generation process, e.g. provide valid combinations of XCP_BLOB_*
   flags

 - Replace kmalloc() + copy_from_user() with memdup_user_nul() in s390
   debug facility and HMC driver

 - Add DAX support for DCSS memory block devices

 - Make the compiler statement attribute "assume" available with a new
   __assume macro

 - Rework ffs() and fls() family bitops functions, including source code
   improvements and generated code optimizations. Use the newly
   introduced __assume macro for that

 - Enable additional network features in default configurations

 - Use __GFP_ACCOUNT flag for user page table allocations to add missing
   kmemcg accounting

 - Add WQ_PERCPU flag to explicitly request the use of the per-CPU
   workqueue for 3590 tape driver

 - Switch power reading to the per-CPU and the Hiperdispatch to the
   default workqueue

 - Add memory allocation profiling hooks to allow better profiling data
   and the /proc/allocinfo output similar to other architectures

* tag 's390-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
  s390/mm: Add memory allocation profiling hooks
  s390: Replace use of system_wq with system_dfl_wq
  s390/diag324: Replace use of system_wq with system_percpu_wq
  s390/tape: Add WQ_PERCPU to alloc_workqueue users
  s390/bitops: Switch to generic ffs() if supported by compiler
  s390/bitops: Switch to generic fls(), fls64(), etc.
  s390/mm: Use __GFP_ACCOUNT for user page table allocations
  s390/configs: Enable additional network features
  s390/bitops: Cleanup __flogr()
  s390/bitops: Use __assume() for __flogr() inline assembly return value
  compiler_types: Add __assume macro
  s390/bitops: Limit return value range of __flogr()
  s390/dcssblk: Add DAX support
  s390/hmcdrv: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/debug: Replace kmalloc() + copy_from_user() with memdup_user_nul()
  s390/pkey: Forward keygenflags to ep11_unwrapkey
  s390/boot: Add common boot_panic() code
  s390/bitops: Optimize inlining
  s390/bitops: Slightly optimize ffs() and fls64()
  s390/sclp: Move memory hotplug code for better modularity
  ...
2025-09-29 19:14:25 -07:00
Linus Torvalds
a5ba183bde Merge tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
 "One notable addition is the creation of the 'transitional' keyword for
  kconfig so CONFIG renaming can go more smoothly.

  This has been a long-standing deficiency, and with the renaming of
  CONFIG_CFI_CLANG to CONFIG_CFI (since GCC will soon have KCFI
  support), this came up again.

  The breadth of the diffstat is mainly this renaming.

   - Clean up usage of TRAILING_OVERLAP() (Gustavo A. R. Silva)

   - lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
     (Junjie Cao)

   - Add str_assert_deassert() helper (Lad Prabhakar)

   - gcc-plugins: Remove TODO_verify_il for GCC >= 16

   - kconfig: Fix BrokenPipeError warnings in selftests

   - kconfig: Add transitional symbol attribute for migration support

   - kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI"

* tag 'hardening-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  lib/string_choices: Add str_assert_deassert() helper
  kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
  kconfig: Add transitional symbol attribute for migration support
  kconfig: Fix BrokenPipeError warnings in selftests
  gcc-plugins: Remove TODO_verify_il for GCC >= 16
  stddef: Introduce __TRAILING_OVERLAP()
  stddef: Remove token-pasting in TRAILING_OVERLAP()
  lkdtm: fortify: Fix potential NULL dereference on kmalloc failure
2025-09-29 17:48:27 -07:00
Linus Torvalds
18b19abc37 Merge tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull namespace updates from Christian Brauner:
 "This contains a larger set of changes around the generic namespace
  infrastructure of the kernel.

  Each specific namespace type (net, cgroup, mnt, ...) embedds a struct
  ns_common which carries the reference count of the namespace and so
  on.

  We open-coded and cargo-culted so many quirks for each namespace type
  that it just wasn't scalable anymore. So given there's a bunch of new
  changes coming in that area I've started cleaning all of this up.

  The core change is to make it possible to correctly initialize every
  namespace uniformly and derive the correct initialization settings
  from the type of the namespace such as namespace operations, namespace
  type and so on. This leaves the new ns_common_init() function with a
  single parameter which is the specific namespace type which derives
  the correct parameters statically. This also means the compiler will
  yell as soon as someone does something remotely fishy.

  The ns_common_init() addition also allows us to remove ns_alloc_inum()
  and drops any special-casing of the initial network namespace in the
  network namespace initialization code that Linus complained about.

  Another part is reworking the reference counting. The reference
  counting was open-coded and copy-pasted for each namespace type even
  though they all followed the same rules. This also removes all open
  accesses to the reference count and makes it private and only uses a
  very small set of dedicated helpers to manipulate them just like we do
  for e.g., files.

  In addition this generalizes the mount namespace iteration
  infrastructure introduced a few cycles ago. As reminder, the vfs makes
  it possible to iterate sequentially and bidirectionally through all
  mount namespaces on the system or all mount namespaces that the caller
  holds privilege over. This allow userspace to iterate over all mounts
  in all mount namespaces using the listmount() and statmount() system
  call.

  Each mount namespace has a unique identifier for the lifetime of the
  systems that is exposed to userspace. The network namespace also has a
  unique identifier working exactly the same way. This extends the
  concept to all other namespace types.

  The new nstree type makes it possible to lookup namespaces purely by
  their identifier and to walk the namespace list sequentially and
  bidirectionally for all namespace types, allowing userspace to iterate
  through all namespaces. Looking up namespaces in the namespace tree
  works completely locklessly.

  This also means we can move the mount namespace onto the generic
  infrastructure and remove a bunch of code and members from struct
  mnt_namespace itself.

  There's a bunch of stuff coming on top of this in the future but for
  now this uses the generic namespace tree to extend a concept
  introduced first for pidfs a few cycles ago. For a while now we have
  supported pidfs file handles for pidfds. This has proven to be very
  useful.

  This extends the concept to cover namespaces as well. It is possible
  to encode and decode namespace file handles using the common
  name_to_handle_at() and open_by_handle_at() apis.

  As with pidfs file handles, namespace file handles are exhaustive,
  meaning it is not required to actually hold a reference to nsfs in
  able to decode aka open_by_handle_at() a namespace file handle.
  Instead the FD_NSFS_ROOT constant can be passed which will let the
  kernel grab a reference to the root of nsfs internally and thus decode
  the file handle.

  Namespaces file descriptors can already be derived from pidfds which
  means they aren't subject to overmount protection bugs. IOW, it's
  irrelevant if the caller would not have access to an appropriate
  /proc/<pid>/ns/ directory as they could always just derive the
  namespace based on a pidfd already.

  It has the same advantage as pidfds. It's possible to reliably and for
  the lifetime of the system refer to a namespace without pinning any
  resources and to compare them trivially.

  Permission checking is kept simple. If the caller is located in the
  namespace the file handle refers to they are able to open it otherwise
  they must hold privilege over the owning namespace of the relevant
  namespace.

  The namespace file handle layout is exposed as uapi and has a stable
  and extensible format. For now it simply contains the namespace
  identifier, the namespace type, and the inode number. The stable
  format means that userspace may construct its own namespace file
  handles without going through name_to_handle_at() as they are already
  allowed for pidfs and cgroup file handles"

* tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits)
  ns: drop assert
  ns: move ns type into struct ns_common
  nstree: make struct ns_tree private
  ns: add ns_debug()
  ns: simplify ns_common_init() further
  cgroup: add missing ns_common include
  ns: use inode initializer for initial namespaces
  selftests/namespaces: verify initial namespace inode numbers
  ns: rename to __ns_ref
  nsfs: port to ns_ref_*() helpers
  net: port to ns_ref_*() helpers
  uts: port to ns_ref_*() helpers
  ipv4: use check_net()
  net: use check_net()
  net-sysfs: use check_net()
  user: port to ns_ref_*() helpers
  time: port to ns_ref_*() helpers
  pid: port to ns_ref_*() helpers
  ipc: port to ns_ref_*() helpers
  cgroup: port to ns_ref_*() helpers
  ...
2025-09-29 11:20:29 -07:00
Linus Torvalds
b7ce6fa90f Merge tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
 "This contains the usual selections of misc updates for this cycle.

  Features:

   - Add "initramfs_options" parameter to set initramfs mount options.
     This allows to add specific mount options to the rootfs to e.g.,
     limit the memory size

   - Add RWF_NOSIGNAL flag for pwritev2()

     Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE
     signal from being raised when writing on disconnected pipes or
     sockets. The flag is handled directly by the pipe filesystem and
     converted to the existing MSG_NOSIGNAL flag for sockets

   - Allow to pass pid namespace as procfs mount option

     Ever since the introduction of pid namespaces, procfs has had very
     implicit behaviour surrounding them (the pidns used by a procfs
     mount is auto-selected based on the mounting process's active
     pidns, and the pidns itself is basically hidden once the mount has
     been constructed)

     This implicit behaviour has historically meant that userspace was
     required to do some special dances in order to configure the pidns
     of a procfs mount as desired. Examples include:

     * In order to bypass the mnt_too_revealing() check, Kubernetes
       creates a procfs mount from an empty pidns so that user
       namespaced containers can be nested (without this, the nested
       containers would fail to mount procfs)

       But this requires forking off a helper process because you cannot
       just one-shot this using mount(2)

     * Container runtimes in general need to fork into a container
       before configuring its mounts, which can lead to security issues
       in the case of shared-pidns containers (a privileged process in
       the pidns can interact with your container runtime process)

       While SUID_DUMP_DISABLE and user namespaces make this less of an
       issue, the strict need for this due to a minor uAPI wart is kind
       of unfortunate

       Things would be much easier if there was a way for userspace to
       just specify the pidns they want. So this pull request contains
       changes to implement a new "pidns" argument which can be set
       using fsconfig(2):

           fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
           fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0);

       or classic mount(2) / mount(8):

           // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc
           mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid");

  Cleanups:

   - Remove the last references to EXPORT_OP_ASYNC_LOCK

   - Make file_remove_privs_flags() static

   - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used

   - Use try_cmpxchg() in start_dir_add()

   - Use try_cmpxchg() in sb_init_done_wq()

   - Replace offsetof() with struct_size() in ioctl_file_dedupe_range()

   - Remove vfs_ioctl() export

   - Replace rwlock() with spinlock in epoll code as rwlock causes
     priority inversion on preempt rt kernels

   - Make ns_entries in fs/proc/namespaces const

   - Use a switch() statement() in init_special_inode() just like we do
     in may_open()

   - Use struct_size() in dir_add() in the initramfs code

   - Use str_plural() in rd_load_image()

   - Replace strcpy() with strscpy() in find_link()

   - Rename generic_delete_inode() to inode_just_drop() and
     generic_drop_inode() to inode_generic_drop()

   - Remove unused arguments from fcntl_{g,s}et_rw_hint()

  Fixes:

   - Document @name parameter for name_contains_dotdot() helper

   - Fix spelling mistake

   - Always return zero from replace_fd() instead of the file descriptor
     number

   - Limit the size for copy_file_range() in compat mode to prevent a
     signed overflow

   - Fix debugfs mount options not being applied

   - Verify the inode mode when loading it from disk in minixfs

   - Verify the inode mode when loading it from disk in cramfs

   - Don't trigger automounts with RESOLVE_NO_XDEV

     If openat2() was called with RESOLVE_NO_XDEV it didn't traverse
     through automounts, but could still trigger them

   - Add FL_RECLAIM flag to show_fl_flags() macro so it appears in
     tracepoints

   - Fix unused variable warning in rd_load_image() on s390

   - Make INITRAMFS_PRESERVE_MTIME depend on BLK_DEV_INITRD

   - Use ns_capable_noaudit() when determining net sysctl permissions

   - Don't call path_put() under namespace semaphore in listmount() and
     statmount()"

* tag 'vfs-6.18-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
  fcntl: trim arguments
  listmount: don't call path_put() under namespace semaphore
  statmount: don't call path_put() under namespace semaphore
  pid: use ns_capable_noaudit() when determining net sysctl permissions
  fs: rename generic_delete_inode() and generic_drop_inode()
  init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
  initramfs: Replace strcpy() with strscpy() in find_link()
  initrd: Use str_plural() in rd_load_image()
  initramfs: Use struct_size() helper to improve dir_add()
  initrd: Fix unused variable warning in rd_load_image() on s390
  fs: use the switch statement in init_special_inode()
  fs/proc/namespaces: make ns_entries const
  filelock: add FL_RECLAIM to show_fl_flags() macro
  eventpoll: Replace rwlock with spinlock
  selftests/proc: add tests for new pidns APIs
  procfs: add "pidns" mount option
  pidns: move is-ancestor logic to helper
  openat2: don't trigger automounts with RESOLVE_NO_XDEV
  namei: move cross-device check to __traverse_mounts
  namei: remove LOOKUP_NO_XDEV check from handle_mounts
  ...
2025-09-29 09:03:07 -07:00
Linus Torvalds
fde0ab43b9 Fix CC_HAS_ASM_GOTO_OUTPUT on non-x86 architectures
There's a silly problem with the CC_HAS_ASM_GOTO_OUTPUT test: even with
a working compiler it will fail on some architectures simply because it
uses the mnemonic "jmp" for testing the inline asm.

And as reported by Geert, not all architectures use that mnemonic, so
the test fails spuriously on such platforms (including arm and riscv,
but also several other architectures).

This issue avoided any obvious test failures because the build still
works thanks to falling back on the old non-asm-goto code, which just
generates worse code.

Just use an empty asm statement instead.

Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Fixes: e2ffa15b9b ("kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17")
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-09-29 08:54:12 -07:00
Christian Brauner
4055526d35 ns: move ns type into struct ns_common
It's misplaced in struct proc_ns_operations and ns->ops might be NULL if
the namespace is compiled out but we still want to know the type of the
namespace for the initial namespace struct.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-25 09:23:54 +02:00
Kees Cook
23ef9d4397 kcfi: Rename CONFIG_CFI_CLANG to CONFIG_CFI
The kernel's CFI implementation uses the KCFI ABI specifically, and is
not strictly tied to a particular compiler. In preparation for GCC
supporting KCFI, rename CONFIG_CFI_CLANG to CONFIG_CFI (along with
associated options).

Use new "transitional" Kconfig option for old CONFIG_CFI_CLANG that will
enable CONFIG_CFI during olddefconfig.

Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20250923213422.1105654-3-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-09-24 14:29:14 -07:00
Thomas Gleixner
e2ffa15b9b kbuild: Disable CC_HAS_ASM_GOTO_OUTPUT on clang < 17
clang < 17 fails to use scope local labels with CONFIG_CC_HAS_ASM_GOTO_OUTPUT=y:

     {
     	__label__ local_lbl;
	...
	unsafe_get_user(uval, uaddr, local_lbl);
	...
	return 0;
	local_lbl:
		return -EFAULT;
     }

when two such scopes exist in the same function:

  error: cannot jump from this asm goto statement to one of its possible targets

There are other failure scenarios. Shuffling code around slightly makes it
worse and fail even with one instance.

That issue prevents using local labels for a cleanup based user access
mechanism.

After failed attempts to provide a simple enough test case for the 'depends
on' test in Kconfig, the initial cure was to mark ASM goto broken on clang
versions < 17 to get this road block out of the way.

But Nathan pointed out that this is a known clang issue and indeed affects
clang < version 17 in combination with cleanup(). It's not even required to
use local labels for that.

The clang issue tracker has a small enough test case, which can be used as
a test in the 'depends on' section of CC_HAS_ASM_GOTO_OUTPUT:

void bar(void **);
void* baz(void);

int  foo (void) {
    {
	    asm goto("jmp %l0"::::l0);
	    return 0;
l0:
	    return 1;
    }
    void *x __attribute__((cleanup(bar))) = baz();
    {
	    asm goto("jmp %l0"::::l1);
	    return 42;
l1:
	    return 0xff;
    }
}

Add another dependency to config CC_HAS_ASM_GOTO_OUTPUT for it and use the
clang issue tracker test case for detection by condensing it to obfuscated
C-code contest format. This reliably catches the problem on clang < 17 and
did not show any issues on the non broken GCC versions.

That test might be sufficient to catch all issues and therefore could
replace the existing test, but keeping that around does no harm either.

Thanks to Nathan for pointing to the relevant clang issue!

Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1886
Link: f023f5cdb2
2025-09-24 09:29:15 +02:00
Nathan Chancellor
95ee3364b2 Merge 6.17-rc6 into kbuild-next
Commit bd7c231212 ("pinctrl: meson: Fix typo in device table macro")
is needed in kbuild-next to avoid a build error with a future change.

While at it, address the conflict between commit 41f9049cff ("riscv:
Only allow LTO with CMODEL_MEDANY") and commit 6578a1ff6a ("riscv:
Remove version check for LTO_CLANG selects"), as reported by Stephen
Rothwell [1].

Link: https://lore.kernel.org/20250908134913.68778b7b@canb.auug.org.au/ [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-09-19 13:43:11 -07:00
Christian Brauner
7cf7303211 ns: use inode initializer for initial namespaces
Just use the common helper we have.

Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 16:22:38 +02:00
Christian Brauner
024596a4e2 ns: rename to __ns_ref
Make it easier to grep and rename to ns_count.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 16:22:38 +02:00
Christian Brauner
b36c823b9a time: support ns lookup
Support the generic ns lookup infrastructure to support file handles for
namespaces.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-19 14:26:15 +02:00
Heiko Carstens
f72e2cff13 compiler_types: Add __assume macro
Make the statement attribute "assume" with a new __assume macro available.

The assume attribute is used to indicate that a certain condition is
assumed to be true. Compilers may or may not use this indication to
generate optimized code. If this condition is violated at runtime, the
behavior is undefined.

Note that the clang documentation states that optimizers may react
differently to this attribute, and this may even have a negative
performance impact. Therefore this attribute should be used with care.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2025-09-18 14:06:40 +02:00
Geert Uytterhoeven
7479260860 init: INITRAMFS_PRESERVE_MTIME should depend on BLK_DEV_INITRD
INITRAMFS_PRESERVE_MTIME is only used in init/initramfs.c and
init/initramfs_test.c.  Hence add a dependency on BLK_DEV_INITRD, to
prevent asking the user about this feature when configuring a kernel
without initramfs support.

Fixes: 1274aea127 ("initramfs: add INITRAMFS_PRESERVE_MTIME Kconfig option")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 15:02:17 +02:00
Thorsten Blum
afd77d2050 initramfs: Replace strcpy() with strscpy() in find_link()
strcpy() is deprecated; use strscpy() instead.

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:52:02 +02:00
Thorsten Blum
beb022ef92 initrd: Use str_plural() in rd_load_image()
Add the local variable 'nr_disks' and replace the manual ternary "s"
pluralization with the standardized str_plural() helper function.

Use pr_notice() instead of printk(KERN_NOTICE) to silence a checkpatch
warning.

No functional changes intended.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:47:14 +02:00
Thorsten Blum
e60625e7ce initramfs: Use struct_size() helper to improve dir_add()
Use struct_size() to calculate the number of bytes to allocate for a new
directory entry.  No functional changes.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:42:56 +02:00
Thorsten Blum
84f1766bdb initrd: Fix unused variable warning in rd_load_image() on s390
The local variables 'rotator' and 'rotate' (used for the progress
indicator) aren't used on s390. Building the kernel with W=1 generates
the following warning:

init/do_mounts_rd.c:192:17: warning: variable 'rotate' set but not used [-Wunused-but-set-variable]
 192 |         unsigned short rotate = 0;
     |                        ^
1 warning generated.

Remove the preprocessor directives and use the IS_ENABLED(CONFIG_S390)
macro instead, allowing the compiler to optimize away unused variables
and avoid the warning on s390.

Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-15 14:28:37 +02:00
Huacai Chen
e416f0ed3c init: handle bootloader identifier in kernel parameters
BootLoaders (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters.  But these identifiers are not
recognized by the kernel itself so will be passed to userspace.  However
user space init program also don't recognize it.

KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.

We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:

https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption) So the the best way is handle
(ignore) it by the kernel itself, which can avoid such boot warnings (if
we use something like init=/bin/bash, bootloader identifier can even cause
a crash):

Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.

[chenhuacai@loongson.cn: use strstarts()]
  Link: https://lkml.kernel.org/r/20250815090120.1569947-1-chenhuacai@loongson.cn
Link: https://lkml.kernel.org/r/20250721101343.3283480-1-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-13 17:32:45 -07:00
Linus Torvalds
4f553c1e2c Merge tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
 "20 hotfixes. 15 are cc:stable and the remainder address post-6.16
  issues or aren't considered necessary for -stable kernels. 14 of these
  fixes are for MM.

  This includes

   - kexec fixes from Breno for a recently introduced
     use-uninitialized bug

   - DAMON fixes from Quanmin Yan to avoid div-by-zero crashes
     which can occur if the operator uses poorly-chosen insmod
     parameters

   and misc singleton fixes"

* tag 'mm-hotfixes-stable-2025-09-10-20-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add tree entry to numa memblocks and emulation block
  mm/damon/sysfs: fix use-after-free in state_show()
  proc: fix type confusion in pde_set_flags()
  compiler-clang.h: define __SANITIZE_*__ macros only when undefined
  mm/vmalloc, mm/kasan: respect gfp mask in kasan_populate_vmalloc()
  ocfs2: fix recursive semaphore deadlock in fiemap call
  mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
  mm/mremap: fix regression in vrm->new_addr check
  percpu: fix race on alloc failed warning limit
  mm/memory-failure: fix redundant updates for already poisoned pages
  s390: kexec: initialize kexec_buf struct
  riscv: kexec: initialize kexec_buf struct
  arm64: kexec: initialize kexec_buf struct in load_other_segments()
  mm/damon/reclaim: avoid divide-by-zero in damon_reclaim_apply_parameters()
  mm/damon/lru_sort: avoid divide-by-zero in damon_lru_sort_apply_parameters()
  mm/damon/core: set quota->charged_from to jiffies at first charge window
  mm/hugetlb: add missing hugetlb_lock in __unmap_hugepage_range()
  init/main.c: fix boot time tracing crash
  mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range()
  mm/khugepaged: fix the address passed to notifier on testing young
2025-09-10 21:19:34 -07:00
Yi Tao
0568f89d4f cgroup: replace global percpu_rwsem with per threadgroup resem when writing to cgroup.procs
The static usage pattern of creating a cgroup, enabling controllers,
and then seeding it with CLONE_INTO_CGROUP doesn't require write
locking cgroup_threadgroup_rwsem and thus doesn't benefit from this
patch.

To avoid affecting other users, the per threadgroup rwsem is only used
when the favordynmods is enabled.

As computer hardware advances, modern systems are typically equipped
with many CPU cores and large amounts of memory, enabling the deployment
of numerous applications. On such systems, container creation and
deletion become frequent operations, making cgroup process migration no
longer a cold path. This leads to noticeable contention with common
process operations such as fork, exec, and exit.

To alleviate the contention between cgroup process migration and
operations like process fork, this patch modifies lock to take the write
lock on signal_struct->group_rwsem when writing pid to
cgroup.procs/threads instead of holding a global write lock.

Cgroup process migration has historically relied on
signal_struct->group_rwsem to protect thread group integrity. In commit
<1ed1328792ff> ("sched, cgroup: replace signal_struct->group_rwsem with
a global percpu_rwsem"), this was changed to a global
cgroup_threadgroup_rwsem. The advantage of using a global lock was
simplified handling of process group migrations. This patch retains the
use of the global lock for protecting process group migration, while
reducing contention by using per thread group lock during
cgroup.procs/threads writes.

The locking behavior is as follows:

write cgroup.procs/threads  | process fork,exec,exit | process group migration
------------------------------------------------------------------------------
cgroup_lock()               | down_read(&g_rwsem)    | cgroup_lock()
down_write(&p_rwsem)        | down_read(&p_rwsem)    | down_write(&g_rwsem)
critical section            | critical section       | critical section
up_write(&p_rwsem)          | up_read(&p_rwsem)      | up_write(&g_rwsem)
cgroup_unlock()             | up_read(&g_rwsem)      | cgroup_unlock()

g_rwsem denotes cgroup_threadgroup_rwsem, p_rwsem denotes
signal_struct->group_rwsem.

This patch eliminates contention between cgroup migration and fork
operations for threads that belong to different thread groups, thereby
reducing the long-tail latency of cgroup migrations and lowering system
load.

With this patch, under heavy fork and exec interference, the long-tail
latency of cgroup migration has been reduced from milliseconds to
microseconds. Under heavy cgroup migration interference, the multi-CPU
score of the spawn test case in UnixBench increased by 9%.

tj: Update comment in cgroup_favor_dynmods() and switch WARN_ONCE() to
    pr_warn_once().

Signed-off-by: Yi Tao <escape@linux.alibaba.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2025-09-10 07:44:51 -10:00
Linus Torvalds
b236920731 Merge tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rust fixes from Miguel Ojeda:

 - Two changes to prepare for the future Rust 1.91.0 release (expected
   2025-10-30, currently in nightly): a target specification format
   change and a renamed, soon-to-be-stabilized 'core' function.

* tag 'rust-fixes-6.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: support Rust >= 1.91.0 target spec
  rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
2025-09-06 12:33:09 -07:00
Thomas Weißschuh
bad53ae2dc vdso: Drop Kconfig GENERIC_VDSO_TIME_NS
All architectures implementing time-related functionality in the vDSO are
using the generic vDSO library which handles time namespaces properly.

Remove the now unnecessary Kconfig symbol.

Enables the use of time namespaces on architectures, which use the
generic vDSO but did not enable GENERIC_VDSO_TIME_NS, namely MIPS and arm.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/all/20250826-vdso-cleanups-v1-10-d9b65750e49f@linutronix.de
2025-09-04 11:23:50 +02:00
Mike Rapoport (Microsoft)
669602b5b7 init/main.c: fix boot time tracing crash
Steven Rostedt reported a crash with "ftrace=function" kernel command
line:

[    0.159269] BUG: kernel NULL pointer dereference, address: 000000000000001c
[    0.160254] #PF: supervisor read access in kernel mode
[    0.160975] #PF: error_code(0x0000) - not-present page
[    0.161697] PGD 0 P4D 0
[    0.162055] Oops: Oops: 0000 [#1] SMP PTI
[    0.162619] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.17.0-rc2-test-00006-g48d06e78b7cb-dirty #9 PREEMPT(undef)
[    0.164141] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[    0.165439] RIP: 0010:kmem_cache_alloc_noprof (mm/slub.c:4237)
[ 0.166186] Code: 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 83 ec 20 8b 05 c9 b6 7e 01 <44> 8b 77 1c 65 4c 8b 2d b5 ea 20 02 4c 89 6c 24 18 41 89 f5 21 f0
[    0.168811] RSP: 0000:ffffffffb2e03b30 EFLAGS: 00010086
[    0.169545] RAX: 0000000001fff33f RBX: 0000000000000000 RCX: 0000000000000000
[    0.170544] RDX: 0000000000002800 RSI: 0000000000002800 RDI: 0000000000000000
[    0.171554] RBP: ffffffffb2e03b80 R08: 0000000000000004 R09: ffffffffb2e03c90
[    0.172549] R10: ffffffffb2e03c90 R11: 0000000000000000 R12: 0000000000000000
[    0.173544] R13: ffffffffb2e03c90 R14: ffffffffb2e03c90 R15: 0000000000000001
[    0.174542] FS:  0000000000000000(0000) GS:ffff9d2808114000(0000) knlGS:0000000000000000
[    0.175684] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.176486] CR2: 000000000000001c CR3: 000000007264c001 CR4: 00000000000200b0
[    0.177483] Call Trace:
[    0.177828]  <TASK>
[    0.178123] mas_alloc_nodes (lib/maple_tree.c:176 (discriminator 2) lib/maple_tree.c:1255 (discriminator 2))
[    0.178692] mas_store_gfp (lib/maple_tree.c:5468)
[    0.179223] execmem_cache_add_locked (mm/execmem.c:207)
[    0.179870] execmem_alloc (mm/execmem.c:213 mm/execmem.c:313 mm/execmem.c:335 mm/execmem.c:475)
[    0.180397] ? ftrace_caller (arch/x86/kernel/ftrace_64.S:169)
[    0.180922] ? __pfx_ftrace_caller (arch/x86/kernel/ftrace_64.S:158)
[    0.181517] execmem_alloc_rw (mm/execmem.c:487)
[    0.182052] arch_ftrace_update_trampoline (arch/x86/kernel/ftrace.c:266 arch/x86/kernel/ftrace.c:344 arch/x86/kernel/ftrace.c:474)
[    0.182778] ? ftrace_caller_op_ptr (arch/x86/kernel/ftrace_64.S:182)
[    0.183388] ftrace_update_trampoline (kernel/trace/ftrace.c:7947)
[    0.184024] __register_ftrace_function (kernel/trace/ftrace.c:368)
[    0.184682] ftrace_startup (kernel/trace/ftrace.c:3048)
[    0.185205] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.185877] register_ftrace_function_nolock (kernel/trace/ftrace.c:8717)
[    0.186595] register_ftrace_function (kernel/trace/ftrace.c:8745)
[    0.187254] ? __pfx_function_trace_call (kernel/trace/trace_functions.c:210)
[    0.187924] function_trace_init (kernel/trace/trace_functions.c:170)
[    0.188499] tracing_set_tracer (kernel/trace/trace.c:5916 kernel/trace/trace.c:6349)
[    0.189088] register_tracer (kernel/trace/trace.c:2391)
[    0.189642] early_trace_init (kernel/trace/trace.c:11075 kernel/trace/trace.c:11149)
[    0.190204] start_kernel (init/main.c:970)
[    0.190732] x86_64_start_reservations (arch/x86/kernel/head64.c:307)
[    0.191381] x86_64_start_kernel (??:?)
[    0.191955] common_startup_64 (arch/x86/kernel/head_64.S:419)
[    0.192534]  </TASK>
[    0.192839] Modules linked in:
[    0.193267] CR2: 000000000000001c
[    0.193730] ---[ end trace 0000000000000000 ]---

The crash happens because on x86 ftrace allocations from execmem require
maple tree to be initialized.

Move maple tree initialization that depends only on slab availability
earlier in boot so that it will happen right after mm_core_init().

Link: https://lkml.kernel.org/r/20250824130759.1732736-1-rppt@kernel.org
Fixes: 5d79c2be50 ("x86/ftrace: enable EXECMEM_ROX_CACHE for ftrace allocations")
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reported-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Closes: https://lore.kernel.org/all/20250820184743.0302a8b5@gandalf.local.home/
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-03 17:10:36 -07:00
Alice Ryhl
c09461a0d2 rust: use the new name Location::file_as_c_str() in Rust >= 1.91.0
As part of the stabilization of Location::file_with_nul(), it was brought
up that the with_nul() suffix usually means something else in Rust APIs,
so the API is being renamed prior to stabilization [1].

Thus, use the new name on new rustc versions.

Link: https://www.github.com/rust-lang/rust/pull/145928 [1]
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/20250827-file_as_c_str-v1-1-d3f5a3916a9c@google.com
[ Kept `cfg` separation. Reworded slightly. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-08-31 23:34:34 +02:00
Nathan Chancellor
86a9b12506 hardening: Require clang 20.1.0 for __counted_by
After an innocuous change in -next that modified a structure that
contains __counted_by, clang-19 start crashing when building certain
files in drivers/gpu/drm/xe. When assertions are enabled, the more
descriptive failure is:

  clang: clang/lib/AST/RecordLayoutBuilder.cpp:3335: const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: Assertion `D && "Cannot get layout of forward declarations!"' failed.

According to a reverse bisect, a tangential change to the LLVM IR
generation phase of clang during the LLVM 20 development cycle [1]
resolves this problem. Bump the version of clang that enables
CONFIG_CC_HAS_COUNTED_BY to 20.1.0 to ensure that this issue cannot be
hit.

Link: 160fb1121c [1]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20250807-fix-counted_by-clang-19-v1-1-902c86c1d515@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-08-29 12:04:53 -07:00
David Disseldorp
6da752f55b initramfs_test: add filename padding test case
Confirm that cpio filenames with multiple trailing zeros (accounted for
in namesize) extract successfully.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Acked-by: Nicolas Schier <nsc@kernel.org>
Link: https://lore.kernel.org/r/20250819032607.28727-9-ddiss@suse.de
[nathan: Fix duplicate filesize initialization, reported at
         https://lore.kernel.org/202508200304.wF1u78il-lkp@intel.com/]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2025-08-21 12:00:10 -07:00
Linus Torvalds
e991acf1bc Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
 "Significant patch series in this pull request:

   - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets
     us closer to being able to remove page->mapping

   - "relayfs: misc changes" (Jason Xing) does some maintenance and
     minor feature addition work in relayfs

   - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches
     us from static preallocation of the kdump crashkernel's working
     memory over to dynamic allocation. So the difficulty of a-priori
     estimation of the second kernel's needs is removed and the first
     kernel obtains extra memory

   - "generalize panic_print's dump function to be used by other
     kernel parts" (Feng Tang) implements some consolidation and
     rationalization of the various ways in which a failing kernel
     splats information at the operator

* tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits)
  tools/getdelays: add backward compatibility for taskstats version
  kho: add test for kexec handover
  delaytop: enhance error logging and add PSI feature description
  samples: Kconfig: fix spelling mistake "instancess" -> "instances"
  fat: fix too many log in fat_chain_add()
  scripts/spelling.txt: add notifer||notifier to spelling.txt
  xen/xenbus: fix typo "notifer"
  net: mvneta: fix typo "notifer"
  drm/xe: fix typo "notifer"
  cxl: mce: fix typo "notifer"
  KVM: x86: fix typo "notifer"
  MAINTAINERS: add maintainers for delaytop
  ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below()
  ucount: fix atomic_long_inc_below() argument type
  kexec: enable CMA based contiguous allocation
  stackdepot: make max number of pools boot-time configurable
  lib/xxhash: remove unused functions
  init/Kconfig: restore CONFIG_BROKEN help text
  lib/raid6: update recov_rvv.c zero page usage
  docs: update docs after introducing delaytop
  ...
2025-08-03 16:23:09 -07:00
Andrew Morton
fefbeed8c6 init/Kconfig: restore CONFIG_BROKEN help text
Linus added it in 2003, it later was removed.  Put it back.

Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Betkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:37 -07:00
Linus Torvalds
6a68cec16b Merge tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:

 - Add support for cgroup "cpu.max" interface

 - Code organization cleanup so that ext_idle.c doesn't depend on the
   source-file-inclusion build method of sched/

 - Drop UP paths in accordance with sched core changes

 - Documentation and other misc changes

* tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix scx_bpf_reenqueue_local() reference
  sched_ext: Drop kfuncs marked for removal in 6.15
  sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic
  kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments
  sched_ext: Add support for cgroup bandwidth control interface
  sched_ext, sched/core: Factor out struct scx_task_group
  sched_ext: Return NULL in llc_span
  sched_ext: Always use SMP versions in kernel/sched/ext_idle.h
  sched_ext: Always use SMP versions in kernel/sched/ext_idle.c
  sched_ext: Always use SMP versions in kernel/sched/ext.h
  sched_ext: Always use SMP versions in kernel/sched/ext.c
  sched_ext: Documentation: Clarify time slice handling in task lifecycle
  sched_ext: Make scx_locked_rq() inline
  sched_ext: Make scx_rq_bypassing() inline
  sched_ext: idle: Make local functions static in ext_idle.c
  sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
2025-07-31 16:29:46 -07:00
Linus Torvalds
beace86e61 Merge tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
 "As usual, many cleanups. The below blurbiage describes 42 patchsets.
  21 of those are partially or fully cleanup work. "cleans up",
  "cleanup", "maintainability", "rationalizes", etc.

  I never knew the MM code was so dirty.

  "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes)
     addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly
     mapped VMAs were not eligible for merging with existing adjacent
     VMAs.

  "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park)
     adds a new kernel module which simplifies the setup and usage of
     DAMON in production environments.

  "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig)
     is a cleanup to the writeback code which removes a couple of
     pointers from struct writeback_control.

  "drivers/base/node.c: optimization and cleanups" (Donet Tom)
     contains largely uncorrelated cleanups to the NUMA node setup and
     management code.

  "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman)
     does some maintenance work on the userfaultfd code.

  "Readahead tweaks for larger folios" (Ryan Roberts)
     implements some tuneups for pagecache readahead when it is reading
     into order>0 folios.

  "selftests/mm: Tweaks to the cow test" (Mark Brown)
     provides some cleanups and consistency improvements to the
     selftests code.

  "Optimize mremap() for large folios" (Dev Jain)
     does that. A 37% reduction in execution time was measured in a
     memset+mremap+munmap microbenchmark.

  "Remove zero_user()" (Matthew Wilcox)
     expunges zero_user() in favor of the more modern memzero_page().

  "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand)
     addresses some warts which David noticed in the huge page code.
     These were not known to be causing any issues at this time.

  "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park)
     provides some cleanup and consolidation work in DAMON.

  "use vm_flags_t consistently" (Lorenzo Stoakes)
     uses vm_flags_t in places where we were inappropriately using other
     types.

  "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy)
     increases the reliability of large page allocation in the memfd
     code.

  "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple)
     removes several now-unneeded PFN_* flags.

  "mm/damon: decouple sysfs from core" (SeongJae Park)
     implememnts some cleanup and maintainability work in the DAMON
     sysfs layer.

  "madvise cleanup" (Lorenzo Stoakes)
     does quite a lot of cleanup/maintenance work in the madvise() code.

  "madvise anon_name cleanups" (Vlastimil Babka)
     provides additional cleanups on top or Lorenzo's effort.

  "Implement numa node notifier" (Oscar Salvador)
     creates a standalone notifier for NUMA node memory state changes.
     Previously these were lumped under the more general memory
     on/offline notifier.

  "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan)
     cleans up the pageblock isolation code and fixes a potential issue
     which doesn't seem to cause any problems in practice.

  "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park)
     adds additional drgn- and python-based DAMON selftests which are
     more comprehensive than the existing selftest suite.

  "Misc rework on hugetlb faulting path" (Oscar Salvador)
     fixes a rather obscure deadlock in the hugetlb fault code and
     follows that fix with a series of cleanups.

  "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport)
     rationalizes and cleans up the highmem-specific code in the CMA
     allocator.

  "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand)
     provides cleanups and future-preparedness to the migration code.

  "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park)
     adds some tracepoints to some DAMON auto-tuning code.

  "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park)
     does that.

  "mm/damon: misc cleanups" (SeongJae Park)
     also does what it claims.

  "mm: folio_pte_batch() improvements" (David Hildenbrand)
     cleans up the large folio PTE batching code.

  "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park)
     facilitates dynamic alteration of DAMON's inter-node allocation
     policy.

  "Remove unmap_and_put_page()" (Vishal Moola)
     provides a couple of page->folio conversions.

  "mm: per-node proactive reclaim" (Davidlohr Bueso)
     implements a per-node control of proactive reclaim - beyond the
     current memcg-based implementation.

  "mm/damon: remove damon_callback" (SeongJae Park)
     replaces the damon_callback interface with a more general and
     powerful damon_call()+damos_walk() interface.

  "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes)
     implements a number of mremap cleanups (of course) in preparation
     for adding new mremap() functionality: newly permit the remapping
     of multiple VMAs when the user is specifying MREMAP_FIXED. It still
     excludes some specialized situations where this cannot be performed
     reliably.

  "drop hugetlb_free_pgd_range()" (Anthony Yznaga)
     switches some sparc hugetlb code over to the generic version and
     removes the thus-unneeded hugetlb_free_pgd_range().

  "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park)
     augments the present userspace-requested update of DAMON sysfs
     monitoring files. Automatic update is now provided, along with a
     tunable to control the update interval.

  "Some randome fixes and cleanups to swapfile" (Kemeng Shi)
     does what is claims.

  "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand)
     provides (and uses) a means by which debug-style functions can grab
     a copy of a pageframe and inspect it locklessly without tripping
     over the races inherent in operating on the live pageframe
     directly.

  "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan)
     addresses the large contention issues which can be triggered by
     reads from that procfs file. Latencies are reduced by more than
     half in some situations. The series also introduces several new
     selftests for the /proc/pid/maps interface.

  "__folio_split() clean up" (Zi Yan)
     cleans up __folio_split()!

  "Optimize mprotect() for large folios" (Dev Jain)
     provides some quite large (>3x) speedups to mprotect() when dealing
     with large folios.

  "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian)
     does some cleanup work in the selftests code.

  "tools/testing: expand mremap testing" (Lorenzo Stoakes)
     extends the mremap() selftest in several ways, including adding
     more checking of Lorenzo's recently added "permit mremap() move of
     multiple VMAs" feature.

  "selftests/damon/sysfs.py: test all parameters" (SeongJae Park)
     extends the DAMON sysfs interface selftest so that it tests all
     possible user-requested parameters. Rather than the present minimal
     subset"

* tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits)
  MAINTAINERS: add missing headers to mempory policy & migration section
  MAINTAINERS: add missing file to cgroup section
  MAINTAINERS: add MM MISC section, add missing files to MISC and CORE
  MAINTAINERS: add missing zsmalloc file
  MAINTAINERS: add missing files to page alloc section
  MAINTAINERS: add missing shrinker files
  MAINTAINERS: move memremap.[ch] to hotplug section
  MAINTAINERS: add missing mm_slot.h file THP section
  MAINTAINERS: add missing interval_tree.c to memory mapping section
  MAINTAINERS: add missing percpu-internal.h file to per-cpu section
  mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()
  selftests/damon: introduce _common.sh to host shared function
  selftests/damon/sysfs.py: test runtime reduction of DAMON parameters
  selftests/damon/sysfs.py: test non-default parameters runtime commit
  selftests/damon/sysfs.py: generalize DAMON context commit assertion
  selftests/damon/sysfs.py: generalize monitoring attributes commit assertion
  selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion
  selftests/damon/sysfs.py: test DAMOS filters commitment
  selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion
  selftests/damon/sysfs.py: test DAMOS destinations commitment
  ...
2025-07-31 14:57:54 -07:00
Linus Torvalds
56d5e32929 Merge tag 'x86-boot-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:

 - Implement support for embedding EFI SBAT data (Secure Boot Advanced
   Targeting: a secure boot image revocation facility) on x86 (Vitaly
   Kuznetsov)

 - Move the efi_enter_virtual_mode() initialization call from the
   generic init code to x86 init code (Alexander Shishkin)

* tag 'x86-boot-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Implement support for embedding SBAT data for x86
  x86/efi: Move runtime service initialization to arch/x86
2025-07-29 18:58:22 -07:00
Linus Torvalds
bf76f23aa1 Merge tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "Core scheduler changes:

   - Better tracking of maximum lag of tasks in presence of different
     slices duration, for better handling of lag in the fair scheduler
     (Vincent Guittot)

   - Clean up and standardize #if/#else/#endif markers throughout the
     entire scheduler code base (Ingo Molnar)

   - Make SMP unconditional: build the SMP scheduler's data structures
     and logic on UP kernel too, even though they are not used, to
     simplify the scheduler and remove around 200 #ifdef/[#else]/#endif
     blocks from the scheduler (Ingo Molnar)

   - Reorganize cgroup bandwidth control interface handling for better
     interfacing with sched_ext (Tejun Heo)

  Balancing:

   - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris
     Mason)

   - Remove sched_domain_topology_level::flags to simplify the code
     (Prateek Nayak)

   - Simplify and clean up build_sched_topology() (Li Chen)

   - Optimize build_sched_topology() on large machines (Li Chen)

  Real-time scheduling:

   - Add initial version of proxy execution: a mechanism for
     mutex-owning tasks to inherit the scheduling context of higher
     priority waiters.

     Currently limited to a single runqueue and conditional on
     CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra,
     Valentin Schneider)

   - Deadline scheduler (Juri Lelli):
      - Fix dl_servers initialization order (Juri Lelli)
      - Fix DL scheduler's root domain reinitialization logic (Juri
        Lelli)
      - Fix accounting bugs after global limits change (Juri Lelli)
      - Fix scalability regression by implementing less agressive
        dl_server handling (Peter Zijlstra)

  PSI:

   - Improve scalability by optimizing psi_group_change() cpu_clock()
     usage (Peter Zijlstra)

  Rust changes:

   - Make Task, CondVar and PollCondVar methods inline to avoid
     unnecessary function calls (Kunwu Chan, Panagiotis Foliadis)

   - Add might_sleep() support for Rust code: Rust's "#[track_caller]"
     mechanism is used so that Rust's might_sleep() doesn't need to be
     defined as a macro (Fujita Tomonori)

   - Introduce file_from_location() (Boqun Feng)

  Debugging & instrumentation:

   - Make clangd usable with scheduler source code files again (Peter
     Zijlstra)

   - tools: Add root_domains_dump.py which dumps root domains info (Juri
     Lelli)

   - tools: Add dl_bw_dump.py for printing bandwidth accounting info
     (Juri Lelli)

  Misc cleanups & fixes:

   - Remove play_idle() (Feng Lee)

   - Fix check_preemption_disabled() (Sebastian Andrzej Siewior)

   - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis
     Claudio R. Goncalves)

   - Correct the comment in place_entity() (wang wei)"

* tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits)
  sched/idle: Remove play_idle()
  sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
  sched: Start blocked_on chain processing in find_proxy_task()
  sched: Fix proxy/current (push,pull)ability
  sched: Add an initial sketch of the find_proxy_task() function
  sched: Fix runtime accounting w/ split exec & sched contexts
  sched: Move update_curr_task logic into update_curr_se
  locking/mutex: Add p->blocked_on wrappers for correctness checks
  locking/mutex: Rework task_struct::blocked_on
  sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
  sched/topology: Remove sched_domain_topology_level::flags
  x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled
  x86/smpboot: moves x86_topology to static initialize and truncate
  x86/smpboot: remove redundant CONFIG_SCHED_SMT
  smpboot: introduce SDTL_INIT() helper to tidy sched topology setup
  tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info
  tools/sched: Add root_domains_dump.py which dumps root domains info
  sched/deadline: Fix accounting after global limits change
  sched/deadline: Reset extra_bw to max_bw when clearing root domains
  sched/deadline: Initialize dl_servers after SMP
  ...
2025-07-29 17:42:52 -07:00
Linus Torvalds
f38b1f243e Merge tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull futex updates from Thomas Gleixner:

 - Switch the reference counting to a RCU based per-CPU reference to
   address a performance bottleneck vs the single instance rcuref
   variant

 - Make the futex selftest build on 32-bit architectures which only
   support 64-bit time_t, e.g. RISCV-32

 - Cleanups and improvements in selftests and futex bench

* tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully"
  selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t
  perf bench futex: Remove support for IMMUTABLE
  selftests/futex: Remove support for IMMUTABLE
  futex: Remove support for IMMUTABLE
  futex: Make futex_private_hash_get() static
  futex: Use RCU-based per-CPU reference counting instead of rcuref_t
  selftests/futex: Adapt the private hash test to RCU related changes
2025-07-29 14:39:42 -07:00
Linus Torvalds
0d5ec7919f Merge tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO / other driver updates from Greg KH:
 "Here is the big set of char/misc/iio and other smaller driver
  subsystems for 6.17-rc1. It's a big set this time around, with the
  huge majority being in the iio subsystem with new drivers and dts
  files being added there.

  Highlights include:
   - IIO driver updates, additions, and changes making more code const
     and cleaning up some init logic
   - bus_type constant conversion changes
   - misc device test functions added
   - rust miscdevice minor fixup
   - unused function removals for some drivers
   - mei driver updates
   - mhi driver updates
   - interconnect driver updates
   - Android binder updates and test infrastructure added
   - small cdx driver updates
   - small comedi fixes
   - small nvmem driver updates
   - small pps driver updates
   - some acrn virt driver fixes for printk messages
   - other small driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
  binder: Use seq_buf in binder_alloc kunit tests
  binder: Add copyright notice to new kunit files
  misc: ti_fpc202: Switch to of_fwnode_handle()
  bus: moxtet: Use dev_fwnode()
  pc104: move PC104 option to drivers/Kconfig
  drivers: virt: acrn: Don't use %pK through printk
  comedi: fix race between polling and detaching
  interconnect: qcom: Add Milos interconnect provider driver
  dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
  mei: more prints with client prefix
  mei: bus: use cldev in prints
  bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
  bus: mhi: host: Detect events pointing to unexpected TREs
  bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
  bus: mhi: host: Use str_true_false() helper
  bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
  bus: mhi: host: Fix endianness of BHI vector table
  bus: mhi: host: pci_generic: Disable runtime PM for QDU100
  bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
  dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
  ...
2025-07-29 09:52:01 -07:00
Linus Torvalds
c3018a2c6a Merge tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe:

 - Optimization to avoid reference counts on non-cloned registered
   buffers. This is how these buffers were handled prior to having
   cloning support, and we can still use that approach as long as the
   buffers haven't been cloned to another ring.

 - Cleanup and improvement for uring_cmd, where btrfs was the only user
   of storing allocated data for the lifetime of the uring_cmd. Clean
   that up so we can get rid of the need to do that.

 - Avoid unnecessary memory copies in uring_cmd usage. This is
   particularly important as a lot of uring_cmd usage necessitates the
   use of 128b SQEs.

 - A few updates for recv multishot, where it's now possible to add
   fairness limits for limiting how much is transferred for each retry
   loop. Additionally, recv multishot now supports an overall cap as
   well, where once reached the multishot recv will terminate. The
   latter is useful for buffer management and juggling many recv streams
   at the same time.

 - Add support for returning the TX timestamps via a new socket command.
   This feature can work in either singleshot or multishot mode, where
   the latter triggers a completion whenever new timestamps are
   available. This is an alternative to using the existing error queue.

 - Add support for an io_uring "mock" file, which is the start of being
   able to do 100% targeted testing in terms of exercising io_uring
   request handling. The idea is to have a file type that can be
   anything the tester would like, and behave exactly how you want it to
   behave in terms of hitting the code paths you want.

 - Improve zcrx by using sgtables to de-duplicate and improve dma
   address handling.

 - Prep work for supporting larger pages for zcrx.

 - Various little improvements and fixes.

* tag 'for-6.17/io_uring-20250728' of git://git.kernel.dk/linux: (42 commits)
  io_uring/zcrx: fix leaking pages on sg init fail
  io_uring/zcrx: don't leak pages on account failure
  io_uring/zcrx: fix null ifq on area destruction
  io_uring: fix breakage in EXPERT menu
  io_uring/cmd: remove struct io_uring_cmd_data
  btrfs/ioctl: store btrfs_uring_encoded_data in io_btrfs_cmd
  io_uring/cmd: introduce IORING_URING_CMD_REISSUE flag
  io_uring/zcrx: account area memory
  io_uring: export io_[un]account_mem
  io_uring/net: Support multishot receive len cap
  io_uring: deduplicate wakeup handling
  io_uring/net: cast min_not_zero() type
  io_uring/poll: cleanup apoll freeing
  io_uring/net: allow multishot receive per-invocation cap
  io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags
  io_uring/net: use passed in 'len' in io_recv_buf_select()
  io_uring/zcrx: prepare fallback for larger pages
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: introduce io_populate_area_dma
  ...
2025-07-28 16:30:12 -07:00
Randy Dunlap
61a789ad43 pc104: move PC104 option to drivers/Kconfig
Put the PC104 kconfig option in drivers/Kconfig along with
other buses (AMBA, EISA, PCI, CXL, PCCard, & RapidIO).
This localizes PC104 with option bus kconfig options to make
it easier to find.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: William Breathitt Gray <wbg@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250722235431.3671754-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24 11:42:16 +02:00
Randy Dunlap
d1fbe1ebf4 io_uring: fix breakage in EXPERT menu
Add a dependency for IO_URING for the GCOV_PROFILE_URING symbol.

Without this patch the EXPERT config menu ends with
"Enable IO uring support" and the menu prompts for
GCOV_PROFILE_URING and IO_URING_MOCK_FILE are not subordinate to it.
This causes all of the EXPERT Kconfig options that follow
GCOV_PROFILE_URING to be display in the "upper" menu (General setup),
just following the EXPERT menu.

Fixes: 1802656ef8 ("io_uring: add GCOV_PROFILE_URING Kconfig option")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: io-uring@vger.kernel.org
Link: https://lore.kernel.org/r/20250720010456.2945344-1-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-20 12:32:54 -06:00
Lillian Berry
98aa4d5d24 init/main.c: add warning when file specified in rdinit is inaccessible
Avoid silently ignoring the initramfs when the file specified in rdinit is
not usable.  This prints an error that clearly explains the issue (file
was not found, vs initramfs was not found).

Link: https://lkml.kernel.org/r/20250707091411.1412681-1-lillian@star-ark.net
Signed-off-by: Lillian Berry <lillian@star-ark.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 19:08:26 -07:00
Kirill A. Shutemov
fdc5001b00 mm/vmstat: make MEMCG select VM_EVENT_COUNTERS
The vmstat_text array contains labels for counters displayed in
/proc/vmstat.  It is important to keep the labels in sync with the
counters.

There is a BUILD_BUG_ON() check in vmstat_start() that ensures the size of
the vmstat_text is not smaller than VM_EVENT_COUNTERS.  This helps to
catch cases where a new counter is added but the label is not.  However,
it does not help if a counter is removed but the label remains.

It would be nice to make the BUILD_BUG_ON() check more strict to catch
such cases.  However, when compiling with MEMCG enabled but
VM_EVENT_COUNTERS disabled, the vmstat_text array is larger than
NR_VMSTAT_ITEMS.

This issue arises because some elements of the vmstat_text array are
present when either MEMCG or VM_EVENT_COUNTERS is enabled, but
NR_VMSTAT_ITEMS only accounts for these elements if VM_EVENT_COUNTERS is
enabled.

Instead of adjusting the NR_VMSTAT_ITEMS definition to account for MEMCG,
make MEMCG select VM_EVENT_COUNTERS.  VM_EVENT_COUNTERS is enabled in most
configurations anyway.

Link: https://lkml.kernel.org/r/20250604095111.533783-1-kirill.shutemov@linux.intel.com
Fixes: ebc5d83d04 ("mm/memcontrol: use vmstat names for printing statistics")
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19 18:59:46 -07:00
John Stultz
25c411fce7 sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable
Add a CONFIG_SCHED_PROXY_EXEC option, along with a boot argument
sched_proxy_exec= that can be used to disable the feature at boot
time if CONFIG_SCHED_PROXY_EXEC was enabled.

Also uses this option to allow the rq->donor to be different from
rq->curr.

Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://lkml.kernel.org/r/20250712033407.2383110-2-jstultz@google.com
2025-07-14 17:16:31 +02:00
Peter Zijlstra
8f2146159b Merge branch 'tip/sched/urgent'
Avoid merge conflicts

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-07-14 17:16:28 +02:00
Peter Zijlstra
56180dd20c futex: Use RCU-based per-CPU reference counting instead of rcuref_t
The use of rcuref_t for reference counting introduces a performance bottleneck
when accessed concurrently by multiple threads during futex operations.

Replace rcuref_t with special crafted per-CPU reference counters. The
lifetime logic remains the same.

The newly allocate private hash starts in FR_PERCPU state. In this state, each
futex operation that requires the private hash uses a per-CPU counter (an
unsigned int) for incrementing or decrementing the reference count.

When the private hash is about to be replaced, the per-CPU counters are
migrated to a atomic_t counter mm_struct::futex_atomic.
The migration process:
- Waiting for one RCU grace period to ensure all users observe the
  current private hash. This can be skipped if a grace period elapsed
  since the private hash was assigned.

- futex_private_hash::state is set to FR_ATOMIC, forcing all users to
  use mm_struct::futex_atomic for reference counting.

- After a RCU grace period, all users are guaranteed to be using the
  atomic counter. The per-CPU counters can now be summed up and added to
  the atomic_t counter. If the resulting count is zero, the hash can be
  safely replaced. Otherwise, active users still hold a valid reference.

- Once the atomic reference count drops to zero, the next futex
  operation will switch to the new private hash.

call_rcu_hurry() is used to speed up transition which otherwise might be
delay with RCU_LAZY. There is nothing wrong with using call_rcu(). The
side effects would be that on auto scaling the new hash is used later
and the SET_SLOTS prctl() will block longer.

[bigeasy: commit description + mm get/ put_async]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20250710110011.384614-3-bigeasy@linutronix.de
2025-07-11 16:02:00 +02:00
Pavel Begunkov
3a0ae385f6 io_uring/mock: add basic infra for test mock files
io_uring commands provide an ioctl style interface for files to
implement file specific operations. io_uring provides many features and
advanced api to commands, and it's getting hard to test as it requires
specific files/devices.

Add basic infrastucture for creating special mock files that will be
implementing the cmd api and using various io_uring features we want to
test. It'll also be useful to test some more obscure read/write/polling
edge cases in the future.

Suggested-by: chase xd <sl1589472800@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/93f21b0af58c1367a2b22635d5a7d694ad0272fc.1750599274.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-02 08:10:26 -06:00