Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 resource control updates from Borislav Petkov:
 "Add support on AMD for assigning QoS bandwidth counters to resources
  (RMIDs) with the ability for those resources to be tracked by the
  counters as long as they're assigned to them.

  Previously, due to hw limitations, bandwidth counts from untracked
  resources would get lost when those resources are not tracked.

  Refactor the code and user interfaces to be able to also support
  other, similar features on ARM, for example"

* tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled
  MAINTAINERS: resctrl: Add myself as reviewer
  x86/resctrl: Configure mbm_event mode if supported
  fs/resctrl: Introduce the interface to switch between monitor modes
  fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled
  fs/resctrl: Introduce the interface to modify assignments in a group
  fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group
  fs/resctrl: Auto assign counters on mkdir and clean up on group removal
  fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir
  fs/resctrl: Provide interface to update the event configurations
  fs/resctrl: Add event configuration directory under info/L3_MON/
  fs/resctrl: Support counter read/reset with mbm_event assignment mode
  x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read()
  x86/resctrl: Refactor resctrl_arch_rmid_read()
  fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode
  fs/resctrl: Pass struct rdtgroup instead of individual members
  fs/resctrl: Add the functionality to unassign MBM events
  fs/resctrl: Add the functionality to assign MBM events
  x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC
  fs/resctrl: Introduce event configuration field in struct mon_evt
  ...
This commit is contained in:
Linus Torvalds
2025-09-30 13:29:42 -07:00
16 changed files with 2025 additions and 233 deletions

View File

@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
MBA (Memory Bandwidth Allocation) "mba"
SMBA (Slow Memory Bandwidth Allocation) ""
BMEC (Bandwidth Monitoring Event Configuration) ""
ABMC (Assignable Bandwidth Monitoring Counters) ""
=============================================== ================================
Historically, new features were made visible by default in /proc/cpuinfo. This
@@ -256,6 +257,144 @@ with the following files:
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
0=0x30;1=0x30;3=0x15;4=0x15
"mbm_assign_mode":
The supported counter assignment modes. The enclosed brackets indicate which mode
is enabled. The MBM events associated with counters may reset when "mbm_assign_mode"
is changed.
::
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_event]
default
"mbm_event":
mbm_event mode allows users to assign a hardware counter to an RMID, event
pair and monitor the bandwidth usage as long as it is assigned. The hardware
continues to track the assigned counter until it is explicitly unassigned by
the user. Each event within a resctrl group can be assigned independently.
In this mode, a monitoring event can only accumulate data while it is backed
by a hardware counter. Use "mbm_L3_assignments" found in each CTRL_MON and MON
group to specify which of the events should have a counter assigned. The number
of counters available is described in the "num_mbm_cntrs" file. Changing the
mode may cause all counters on the resource to reset.
Moving to mbm_event counter assignment mode requires users to assign the counters
to the events. Otherwise, the MBM event counters will return 'Unassigned' when read.
The mode is beneficial for AMD platforms that support more CTRL_MON
and MON groups than available hardware counters. By default, this
feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
Monitoring Counters) capability, ensuring counters remain assigned even
when the corresponding RMID is not actively used by any processor.
"default":
In default mode, resctrl assumes there is a hardware counter for each
event within every CTRL_MON and MON group. On AMD platforms, it is
recommended to use the mbm_event mode, if supported, to prevent reset of MBM
events between reads resulting from hardware re-allocating counters. This can
result in misleading values or display "Unavailable" if no counter is assigned
to the event.
* To enable "mbm_event" counter assignment mode:
::
# echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
* To enable "default" monitoring mode:
::
# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
"num_mbm_cntrs":
The maximum number of counters (total of available and assigned counters) in
each domain when the system supports mbm_event mode.
For example, on a system with maximum of 32 memory bandwidth monitoring
counters in each of its L3 domains:
::
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=32;1=32
"available_mbm_cntrs":
The number of counters available for assignment in each domain when mbm_event
mode is enabled on the system.
For example, on a system with 30 available [hardware] assignable counters
in each of its L3 domains:
::
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
0=30;1=30
"event_configs":
Directory that exists when "mbm_event" counter assignment mode is supported.
Contains a sub-directory for each MBM event that can be assigned to a counter.
Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
Each MBM event's sub-directory contains a file named "event_filter" that is
used to view and modify which memory transactions the MBM event is configured
with. The file is accessible only when "mbm_event" counter assignment mode is
enabled.
List of memory transaction types supported:
========================== ========================================================
Name Description
========================== ========================================================
dirty_victim_writes_all Dirty Victims from the QOS domain to all types of memory
remote_reads_slow_memory Reads to slow memory in the non-local NUMA domain
local_reads_slow_memory Reads to slow memory in the local NUMA domain
remote_non_temporal_writes Non-temporal writes to non-local NUMA domain
local_non_temporal_writes Non-temporal writes to local NUMA domain
remote_reads Reads to memory in the non-local NUMA domain
local_reads Reads to memory in the local NUMA domain
========================== ========================================================
For example::
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads,local_non_temporal_writes,local_reads_slow_memory
Modify the event configuration by writing to the "event_filter" file within
the "event_configs" directory. The read/write "event_filter" file contains the
configuration of the event that reflects which memory transactions are counted by it.
For example::
# echo "local_reads, local_non_temporal_writes" >
/sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
local_reads,local_non_temporal_writes
"mbm_assign_on_mkdir":
Exists when "mbm_event" counter assignment mode is supported. Accessible
only when "mbm_event" counter assignment mode is enabled.
Determines if a counter will automatically be assigned to an RMID, MBM event
pair when its associated monitor group is created via mkdir. Enabled by default
on boot, also when switched from "default" mode to "mbm_event" counter assignment
mode. Users can disable this capability by writing to the interface.
"0":
Auto assignment is disabled.
"1":
Auto assignment is enabled.
Example::
# echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
0
"max_threshold_occupancy":
Read/write file provides the largest value (in
bytes) at which a previously used LLC_occupancy
@@ -380,10 +519,77 @@ When monitoring is enabled all MON groups will also contain:
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
where "YY" is the node number.
When the 'mbm_event' counter assignment mode is enabled, reading
an MBM event of a MON group returns 'Unassigned' if no hardware
counter is assigned to it. For CTRL_MON groups, 'Unassigned' is
returned if the MBM event does not have an assigned counter in the
CTRL_MON group nor in any of its associated MON groups.
"mon_hw_id":
Available only with debug option. The identifier used by hardware
for the monitor group. On x86 this is the RMID.
When monitoring is enabled all MON groups may also contain:
"mbm_L3_assignments":
Exists when "mbm_event" counter assignment mode is supported and lists the
counter assignment states of the group.
The assignment list is displayed in the following format:
<Event>:<Domain ID>=<Assignment state>;<Domain ID>=<Assignment state>
Event: A valid MBM event in the
/sys/fs/resctrl/info/L3_MON/event_configs directory.
Domain ID: A valid domain ID. When writing, '*' applies the changes
to all the domains.
Assignment states:
_ : No counter assigned.
e : Counter assigned exclusively.
Example:
To display the counter assignment states for the default group.
::
# cd /sys/fs/resctrl
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
Assignments can be modified by writing to the interface.
Examples:
To unassign the counter associated with the mbm_total_bytes event on domain 0:
::
# echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=_;1=e
mbm_local_bytes:0=e;1=e
To unassign the counter associated with the mbm_total_bytes event on all the domains:
::
# echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=_;1=_
mbm_local_bytes:0=e;1=e
To assign a counter associated with the mbm_total_bytes event on all domains in
exclusive mode:
::
# echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
"mba_MBps_event":
@@ -1429,6 +1635,125 @@ View the llc occupancy snapshot::
# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
11234000
Examples on working with mbm_assign_mode
========================================
a. Check if MBM counter assignment mode is supported.
::
# mount -t resctrl resctrl /sys/fs/resctrl/
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
[mbm_event]
default
The "mbm_event" mode is detected and enabled.
b. Check how many assignable counters are supported.
::
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
0=32;1=32
c. Check how many assignable counters are available for assignment in each domain.
::
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
0=30;1=30
d. To list the default group's assign states.
::
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
e. To unassign the counter associated with the mbm_total_bytes event on domain 0.
::
# echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=_;1=e
mbm_local_bytes:0=e;1=e
f. To unassign the counter associated with the mbm_total_bytes event on all domains.
::
# echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignment
mbm_total_bytes:0=_;1=_
mbm_local_bytes:0=e;1=e
g. To assign a counter associated with the mbm_total_bytes event on all domains in
exclusive mode.
::
# echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
# cat /sys/fs/resctrl/mbm_L3_assignments
mbm_total_bytes:0=e;1=e
mbm_local_bytes:0=e;1=e
h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
no change in reading the events with the assignment.
::
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
562324232
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
212122123
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
121212144
i. Check the event configurations.
::
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads,local_non_temporal_writes,local_reads_slow_memory
j. Change the event configuration for mbm_local_bytes.
::
# echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
/sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
local_reads,local_non_temporal_writes,local_reads_slow_memory,remote_reads
k. Now read the local events again. The first read may come back with "Unavailable"
status. The subsequent read of mbm_local_bytes will display the current value.
::
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
Unavailable
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
2252323
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
Unavailable
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
1566565
l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
done using the following command. Note that switching the mbm_assign_mode may reset all
the MBM counters (and thus all MBM events) of all the resctrl groups.
::
# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
mbm_event
[default]
m. Unmount the resctrl filesystem.
::
# umount /sys/fs/resctrl/
Intel RDT Errata
================