Topic: CPU affinity and local memory usage

On dual socket systems. That be Xeon systems. Let say you have two cpus. Memory is shared between the cpus, but local, that is memory attached to that specific cpu is faster than getting it from the other cpu.
On optimizing programs with cpu affinity, I am looking for an answer to this:
Will the memory of a program bound to a specific core (cpu affinity) use the local memory of that cpu as long as there is sufficient space? Or will it most likely use it but there is no absolute guaranty?

I will post the answer if I find. But if you know please share, thanks :-)

Re: CPU affinity and local memory usage

According to this intel paper the answer is this:
https://software.intel.com/en-us/articl … s-for-numa

If there is enough available memory on the local node, data will be local. So if you set affinity to a core of one or the other cpu then that will be used, and only on memory shortage will non local memory be used. This you have to avoid off course.

This means for me that as I run a series of programs with strict affinity, I can get double the memory bandwidth than that of a single cpu system.

It also means I set via mmcss the audio drivers at same cpu as windows uses and I can have my cpu heaviest vst prosessing on the other cpu with its own memory.

Then fill up the last cores on the cpu that windows uses with the least heavy processing. Leaving core 0 and 1 for windows and drivers only.

Re: CPU affinity and local memory usage

Difficult to answer, surely also depends on the design of the mainboard.
I would ask this the manufacturer of the mainboard tbh .. thats very HW (but not RME-) specific.

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

Re: CPU affinity and local memory usage

Here comes more answers. It seams that this is OS policy dependent. But that tools for both linux and windows exists to control this behavior. Both from controlling programs to coding of programs.

http://queue.acm.org/detail.cfm?id=2513149

This documents show you how to start programs with windows "start" command and the relevant part is affinity and node.

https://ss64.com/nt/start.html

start /node 1  music-program.exe

As you can see setting affinity hex mask requires math skills, but here you can use a program like process lasso to do the core affinity placement, just using the start command to ensure memory placement, like this: Node 1 = CPU 1, Node2 = CPU 2.
What is not all clear, is does nodes start on 0 or 1?

If this does not work you need to apply this hotfix from microsoft:
https://support.microsoft.com/en-us/hel … -windows-7


In this microsoft article you have explanation of MMCSS:

https://msdn.microsoft.com/en-us/librar … s.85).aspx

I can tell you forcing the affinity for audio drivers to the same core as windows was the last tweaking that made this laptop realtime audio capable. Also using firewire with the modern driver not legacy. (A little off topic I know).

Re: CPU affinity and local memory usage

Sorry, but maybe somebody other can answer this to you.

BTW, are you having problems that you fiddle around with process lasso and stuff ?

I would guess that in a DUAL CPU system the Windows process scheduler simply does his job and usually all applications should run fine by this. Maybe not 100% ideal but fine enough.

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

Re: CPU affinity and local memory usage

I can not afford to solve audio latency with buying over-priced newest cpus. But what I have is time and the ability to learn.
I have fund that optimizing, absolutely everything possible and then some can make a computer system into a very low latency music making machine.
When you get there you notice have everything suddenly goes so smooth.

I use modular vst hosts like bidule and vsthost. One cpu core and its hyper thread per instance of the program. Making each program preform specific tasks.
This is for live music. Eks one program is a multi effect guitar prosessor. Another is a ambience/reverb/echo prosessor. Another one is a midi synth prosessor. Another can be a drum machine.

This way there will be no cpu swap and cache swap and everything runs silk smooth.

So I am only solving the issue of getting the absolute minimum latency and a usable system.

On the laptop I use now that is many years old, 2.2GHz and 4 cores with hyper treads I can have a stable system with firewire at 44.100 at 48 buffers.
All 3 none windows cores running hard.

Still I am now waiting for my new(old) Lenvo c20 dual xeon 4 core per cpu music workstation. Also an older computer, but stable as it gets and with top optimization I will get it to run perhaps even lower latency. Like 48.000 at 48.

This is why all of this is important. And as said the result is great.