Thanks everyone for your enlightening input. I think I need to move to RME immediately just because of the fantastic responses on this forum! Excellent.
So I was unaware that the RME driver did not offer buffer settings down to 16 samples, and thanks to Ramses' explanation, I totally understand why. In order to achieve stability across so many users in the user base, it makes perfect business sense. I was originally thrown off by the RME RayDat marketing material on their website under "specs", which says (quoted precisely):
"8 buffer sizes/latencies available: 0.7 ms, 1.5 ms, 3 ms, 6 ms, 12 ms…"
Regardless, the facts are the facts as explained by Ramses, and although I am certain I can achieve complete stability with RME at 96kHz and 64 sample buffer (with a very high-end and finely tuned system of course), it would not facilitate the latency I'm after and my goal remains sub-1ms RTL with direct DAW monitoring if at all possible.
My use case will seem ridiculous, and it was crazy to me initially when it surfaced, but here it is: I only record classical guitarists, and I have three astute and very demanding clients. They come to me because I can endure endless takes over long periods of time, and I do everything I can to meet their expectations regarding the recording environment, particularly monitoring. The big demand is that when they play, they want to hear EXACTLY (and I mean exactly) what the finished product would sound like in their headphones while they are recording - the same reverb, the same compression, and the same EQ. I use an array of Neuman KM184 mics (usually 3 or 4) and high-end UA and Great River external preamps. I'm running a Black Lion Revolution ADAT converter via SMUX into the Presonus Quantum 2626, and I'm direct monitoring their tracks through Reaper with Valhalla reverb, IK eq, and UA 1176 plugins on the master bus.
When one client in particular started complaining about the guitar sounding "different" when using a 128 or 256 sample buffer (vs 16 samples), I thought maybe I was dealing with some mental instability. However, the more I listened to it (and played through it myself), there was a discernable difference. These high-end classical guitars have a very fast and loud attack, and at higher buffers the VERY initial part of the attack is lost to the listener while at the same time being able to feel (and vaguely hear via headphone leakage) the vibration of the guitar while playing in real time. It comes across as slightly less bright sounding in the headphones at higher latency.
So there you have it. I know a solution can be found via direct monitoring through the interface, but the 2626 does not offer that - it's direct through the DAW, or nothing. Furthermore, direct monitoring in the interface means it will be difficult to offer the same compression, EQ, and reverb as what will eventually be used for production. I know there are solutions via other monitoring schemes and other interfaces, but I have grown to really prefer the simplicity and convenience of handling everything within the DAW including monitoring effects.
In terms of PC system tuning to achieve stability at 16 samples with my CURRENT system, I believe I have it maxed out. Over the last 8 years I've implemented BIOS tweaks, adjusted CPU c-states, toggled on and off multithreading, optimized water cooling fans, maximized performance plans, optimized overclocking, etc, and currently it runs continuously at 4.5 GHz in a stable state. At a buffer of 16 samples, I get a quiet pop or a click every 30 seconds or so. My main problematic client says the sound is perfect but the clicks are too annoying.
Optimally, I should be able to stabilize 16 sample operation by upgrading my system (after all, my PC is 8 years old), but the big problem I have right now is that there are no real native Thunderbolt implementations on AMD x870e chipset boards to support the Quantum 2626. AMD is the clear market leader in terms of performance currently (CPU would be AMD Ryzen 9 9950x3d) and I would hate to invest in an entirely new $4K build that is not the best of the best. I could build an Intel Core Ultra 9 box, but it seems clear now that this new AMD chip is the fastest consumer CPU available by a significant margin.
Final note: staying with my current PC is also not an option - I'm starting to lose fans, drives, etc due to age, and I'm sure it won't be long before it signs off for good.