SHORT Main memory bandwidth in MBytes/s (experimental)

EVENTSET
FIXC1 ACTUAL_CPU_CLOCK
FIXC2 MAX_CPU_CLOCK
PMC0  RETIRED_INSTRUCTIONS
PMC1  CPU_CLOCKS_UNHALTED
DFC0  DATA_FROM_LOCAL_DRAM_CHANNEL
DFC1  DATA_TO_LOCAL_DRAM_CHANNEL

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz]  1.E-06*(FIXC1/FIXC2)/inverseClock
CPI  PMC1/PMC0
Memory read bandwidth [MBytes/s] 1.0E-06*(DFC0)*(4.0/num_numadomains)*64.0/time
Memory read data volume [GBytes] 1.0E-09*(DFC0)*(4.0/num_numadomains)*64.0
Memory write bandwidth [MBytes/s] 1.0E-06*(DFC1)*(4.0/num_numadomains)*64.0/time
Memory write data volume [GBytes] 1.0E-09*(DFC1)*(4.0/num_numadomains)*64.0
Memory bandwidth [MBytes/s] 1.0E-06*(DFC0+DFC1)*(4.0/num_numadomains)*64.0/time
Memory data volume [GBytes] 1.0E-09*(DFC0+DFC1)*(4.0/num_numadomains)*64.0

LONG
Formulas:
Memory read bandwidth [MBytes/s] = 4.0E-06*(DATA_FROM_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0/runtime
Memory read data volume [GBytes] = 4.0E-09*(DATA_FROM_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0
Memory write bandwidth [MBytes/s] = 4.0E-06*(DATA_TO_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0/runtime
Memory write data volume [GBytes] = 4.0E-09*(DATA_TO_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0
Memory bandwidth [MBytes/s] = 4.0E-06*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0/runtime
Memory data volume [GBytes] = 4.0E-09*(DATA_FROM_LOCAL_DRAM_CHANNEL+DATA_TO_LOCAL_DRAM_CHANNEL)*(4.0/num_numadomains)*64.0
-
Profiling group to measure memory bandwidth drawn by all cores of a socket.
Since this group is based on Uncore events it is only possible to measure on a
per socket base.
Even though the group provides almost accurate results for the total bandwidth
and data volume, the read and write bandwidths and data volumes seem off.
The metric formulas contain a correction factor of (4.0/num_numadomains) to
return the value for all 4 memory controllers in NPS1 mode. This is probably
a work-around. Requested info from AMD.
