Thanks for sharing - interesting findings.
For further insights into the benchmark procedure, I opted for a more Daw-centric methodology, progressively loading Ableton with increasing plug-in instance count (Waves L4 VST3 and Ableton built-in Echo, independently), with no other programs running.
I'm convinced that testing MMCSS through other means can yield different observations, since adding an independent background process load with CPU-Z may steer the test into a different context, likely with other repercussions. Restricting to Ableton (or most DAWs), things are prioritized differently, being a foreground process (including the ASIO .dll driver) with different handling by the OS scheduler. Even the driver can get quite a noble CPU usage under high load.

The tested system is somewhat aged but reliable and performant - Coffee Lake i7 8700K/16 GB Ram/140 mm Noctua Cooler installed with Windows 11 25H2 and very low DPC latency. Also simplified the system setting config. for easier analysis (hitting the system ceiling earlier without having a super high plug-in count), restricting CPU to 4 physical cores (No HT). Power plan set to ultimate performance, with a very lean system: 1 M.2 NVMe SSD and no extra peripherals connected. Other tweaks include C-States off and VBS disabled.
Buffer sizes:
AIO Pro / Madi FX : 32 samples (Redundancy/mirror active in Madi FX for theoretical decreased resource usage)
Babyface Pro FS : 48 samples
I also tested everything at a more level playing field of 64 samples, but found that the differences are more pronounced at the lowest buffer sizes.
For extra coverage, AIO Pro/Madi FX cards were installed in different configurations - swapping PCIe slots with each other (one x16 lanes slot is direct to CPU, the other x1 slot to PCH) and just each one by itself in the two available slots - no differences observed.
Interesting to note that AIO Pro RTL @ 64 samples is quite closer to Babyface RTL @ 48 samples than 64 :

To test single-core exhaustion, I assembled a project with a single track and increased the plug-in count. The other 2 test iterations used 4 and 8 track counts to evenly distribute the load across the cores.
Allow me to reiterate, my test results with Ableton 12 - MMCSS active is consistently better at very high loads in all 3 interfaces. This could be verified audibly and visually using Live’s real-time meter (set to current, to observe peaks and dips). In the vast majority of test cases, when setting the flag off and reloading the ASIO driver, audible dropouts occurred immediately, whereas with MMCSS enabled, there were no playback stutter/dropouts. I have recorded most of the test results' audio clips using a laptop/external interface, recording each interface's headphone output, in case anyone is curious to hear.
Babyface Pro FS USB 2.0 performance hit earlier ceilings for stutter/dropout free playback, as expected (not by huge margins, to be clear) - more prominent at 48 buffer size (vs. AIO/Madi FX @ 32). The last one to hold up consistently was AIO Pro, with MADI FX a very close second.
Some audio clips from the 4 track load test @ 32 samples - at the verge of dropout occurrence :
AIO MMCSS https://od.lk/s/NDlfMzg5MjkzMjVf/AIO%20MMCSS.mp3
AIO https://od.lk/s/NDlfMzg5MjkzMjZf/AIO.mp3
MadiFX MMCSS https://od.lk/s/NDlfMzg5MjkzMjNf/MadiFX%20MMCSS.mp3
MadiFX https://od.lk/s/NDlfMzg5MjkzMjRf/MadiFX.mp3
*note that Madi FX presents audible dropout artifacts while AIO Pro is still clean (with MMCSS)
For Babyface Pro FS, here's another load test example @ 48 samples, with less plug-in instances than above :
MMCSS enabled
https://od.lk/s/NDlfMzg5MjkzMjFf/babyface%20MMCSS.mp3

MMCSS disabled
https://od.lk/s/NDlfMzg5MjkzMjJf/babyface.mp3

By the way, Ramses, I follow your postings and blogs with great interest. Your contributions to the community are highly valued.