Topic: RayDat with RTL <1ms?

Greetings everyone.

I'm considering a new AMD system build along with a purchase of an RME RayDat interface.  Assuming that the system is tuned properly for audio, and it's built with the very fastest consumer products avalable (AMD Ryzen 9 9950X, 64GB of 6000+ MT RAM, and PCIe 5.0 NVMe M.2), is it realistic to expect stable, sub-1 millisecond RTL when recording with direct monitoring on?  I'd be creating very simple projects (under 8 tracks) using Reaper and a few very efficient plugins.

Currently I'm on an Intel i9-7900X, and I can achieve sub-1ms RTL with a Presonus 2626 at a 16 sample buffer (at least that's what Reaper reports), but I have issues with cracks/pops/sizzles occasionally.

Hoping to take full advantage of what the RayDat can do.

Thanks!

2 (edited by ramses 2025-01-15 22:01:08)

Re: RayDat with RTL <1ms?

I think this is over-optimistic, look at these RTL values for different RME solutions at 44.1 kHz.

https://www.tonstudio-forum.de/blog/ent … cts-en-de/

If you talk about RTL, then you mean the complete path from A/D conversion through the PC and finally D/A.

The fast converter of UFX III need alone 5/6 samples for D/A and A/D at single speed.
So, 11 samples for AD/DA at 44.1 kHz.
Then you need a few samples for processing inside the FPGA and transport over e.g. ADAT if applicable.
For MADI you need at single speed 3 samples per device (if you chain them).

The biggest delay you have for the way over USB, back and forth, which heavily depends on
buffer settings. For ASIO the ASIO buffer size. For Apple, the buffer size of application (plus safety buffers)
and the Apple internal audio processing also needs a few, as everything has to go through the internal sound system.

If you look at my tabular (or according to my last measurements here, comparing with RTL utility)
The RTL for an UFX III for its analog ports with 32 sample ASIO buffer size is 2,994 ms.
The absolute minimum is 1.953 ms when using the lowest ASIO buffer size of 128 samples at 192 kHz.

Note: it might be challenging for your system to support processing of such amount fo data.
Important is not the benchmark power of your CPU.
You need a CPU and drivers to support realtime processing capabilities.
So to say: good driver, low DPC latencies.
Also the CPU design might be of interest.
Distributed L3 caches between two different Chiplets (AMD design for some CPUs)
adds to internal latencies (DPC latencies). Not the "heard" latency of audio.
Its more the capability whether your system enables lower ASIO buffersizes at high loads.

Here are the results about some measurements using RTL utility between analog ports on UFX III.

https://www.dropbox.com/scl/fi/9v4p3e2oyrunc54is4sz5/2025-01-15-21_55_14-UFX-III-Latencies.xlsx-Excel.jpg?rlkey=cumb1tww48dwegxp9yidimz8c&amp;st=fislziim&amp;dl=1

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

Re: RayDat with RTL <1ms?

Thanks Ramses for all the info!  Really appreciate it.

I must be doing something wrong.  Using the RTL Utility v1.0.8 by Oblique Audio, I am getting the following RTL values on my older Intel machine and the Presonus Quantum 2626 connected via Thunderbolt 3:

At 44.1 kHz:

16 sample buffer - 1.429 ms RTL
32 sample buffer - 2.154 ms RTL

At 96 kHz (which is what I normally use):

16 sample buffer - 0.688 ms RTL
32 sample buffer - 1.021 ms RTL
64 sample buffer - 1.688 ms RTL

I thought with a more powerful PC and the RME PCIe RayDat I'd be able to at least equal these RTL times.  Any ideas why the RayDat numbers you provided seem higher?

My goal is to get RTL under 1 ms at 96 kHz with stability.

4 (edited by ramses 2025-01-16 09:30:56)

Re: RayDat with RTL <1ms?

The RTL is fix, it doesn't change with CPU power!

A more powerful CPU with good drivers (low DPC latency) can only process a higher workload and reduce the likeliness of audio drops. But this is also limited, as it depends on the application/DAW how good it can utilize many cores.
It also depends on the DAW and the project. Some DAW projects need a higher single thread performance if you use many inserts in one track (runs on one core) or when working with CPU hungry VST/VSTi.

The Presonus allows for an unusual low ASIO buffer size of 16 samples (at single speed).
Maybe they even reduced safety buffers.
In combination with Thunderbolt, this results in these low values.
As nice as it is to see such low values, the question is whether this allows for stable operation under a usual audio load.

I remember with FireWire and the older RME USB driver, RME decided not to go below 48 samples buffer size at single speed.
Only with the PCIe-/TB-based products and the newer MADIface USB driver, 32 samples are possible in the driver settings.

But why do you have such a demand for an RTL under 1ms?
For stability reasons and to have a little headroom I wouldn't use such low values at all.

Not everybody has a good system with low DPC Latencies (needs good drivers).
In order to achieve a stability that is practicable for as many systems as possible, it does not make much sense in my opinion to reduce the ASIO buffersize to 16 samples or to push it so far that it can only be used stably in a few setups / use cases.

I wouldn't even use the 32/48 samples from RME in every use case, and in most cases this isn't even necessary.
For typical recording you would use a much higher ASIO buffersize for highest stability.
For most recording situation the latency compensation supports you well.

Regarding your requirement below 1ms, what is your use case for such a demand?

To put two examples

1. all that I need to play Guitar through a virtual amp is below 10 ms RTL which you can achieve with ASIO buffer sizes of 128 samples at single speed.

2. as musician in a room or on stage you have much higher latencies between musicians:
https://forum.rme-audio.de/viewtopic.ph … 20#p230720

From the link above:
a buffersize of 32 samples at 44.1 kHz with a RTL of 2,993ms is a distance of only 1,03 m.

Where I need (want) lower latencies in my setup is with the AD/DA conversion on the device itself.
The reason is, because I am using the UFX III as parallel effect loop for my Marshall amp.

The dry guitar signal from Marshall Preamp ("Effect Send") is being sent to UFX III (Analog IN) and being sent directly back to the Marshall Power Amp ("Effect Return"), to keep the punch in the amp signal. FX from Lexicon PCM units are added 100% wet to the Analog output towards power amp.
In this signal flow the RTL is not applicable as all happens on the unit without going over USB to the PC and back.
I have 5+6=11 samples delay by AD and DA conversion, which is 0.246 ms, completely independend of ASIO buffersize.

Could you please describe the rationale / use case behind your demand of below 1 ms? That would be interesting.

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

5 (edited by vinark 2025-01-16 11:55:36)

Re: RayDat with RTL <1ms?

I don't think a new PC will help with this, but a better tuning of the one you have might.
Your CPU's base frequency is 3.30mhz and max turbo 4.5. If not already done you need it to be at a fixed frequency. What is doable will be dependent on the chip and your cooling and your OC capabilities and what you find acceptable volts and temps. It can be anywhere between 4 and 5 ghz and this will have a big impact on your lowest latency possible, if you don't run heavy projects and are careful with the used plugins.
It is a very fine cpu and since you are not taxing the cpu with a high load, more processing power in the sense of more cores like the AMD will probably not help. Also worth a try is disabling hyperthreading and or running your daw on one core only which in cubase is done by disabling multi core processing (don't know about other daws). And trying different daws is important too, might I suggest to give reaper a try.
Good luck with your quest and keep me\us posted!

Vincent, Amsterdam
https://soundcloud.com/thesecretworld
BFpro fs, 2X HDSP9652 ADI-8AE, 2X HDSP9632

6 (edited by ramses 2025-01-16 12:20:44)

Re: RayDat with RTL <1ms?

vinark wrote:

I don't think a new PC will help with this, but a better tuning of the one you have might.
Your CPU's base frequency is 3.30mhz and max turbo 4.5. If not already done you need it to be at a fixed frequency. What is doable will be dependent on the chip and your cooling and your OC capabilities and what you find acceptable volts and temps. It can be anywhere between 4 and 5 ghz and this will have a big impact on your lowest latency possible, if you don't run heavy projects and are careful with the used plugins.
It is a very fine cpu and since you are not taxing the cpu with a high load, more processing power in the sense of more cores like the AMD will probably not help. Also worth a try is disabling hyperthreading and or running your daw on one core only which in cubase is done by disabling multi core processing (don't know about other daws). And trying different daws is important too, might I suggest to give reaper a try.
Good luck with your quest and keep me\us posted!

Maybe I misunderstood your post, but the more important point for him is here to understand,
that RTL is not being improved by higher CPU performance or lower DPC latencies.
And I had a certain feeling that he thought exactly this by saying

> I thought with a more powerful PC and the RME PCIe RayDat I'd be able to at least equal these RTL times.

RTL is fix and only depends on converter latency and the latency due to the transport to/from PC (USB/...)
and the latter depends on the chosen buffer size (ASIO buffer size) and additional safety buffers in the driver.
This is all fix and independent of CPU performance / DPC latency.

If the performance for a RayDAT should become better, then RME would need to allow e.g.
- lower ASIO buffersizes for the HDSPe driver (16 at single speed) like Presonus does
- maybe also lower safety buffers

But .. where is the purpose / advantage to drive this up to to the "bleeding edge" where issues become much more likely for potentially a lot of computers .. at the end everybody would only complain about RME drivers (not understanding the causalities and how a PC works internally).

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

7 (edited by vinark 2025-01-16 12:39:34)

Re: RayDat with RTL <1ms?

He would then be able to use his presonus in his current system. Since going RME does not give him sub 1ms latencies. But I agree wholeheartedly with what you posted Ramses. Personaly I never go sub 128 samples except for testing and I don't mind using 256 when playing VSTi (sometimes Kontakt needs this) and mixing at 1024 (more does not help performance)'but I run heavy loads when mixing itb. But the OP goal is different; very low latency under very low load.
Also I don't suggest bleeding edge with anywhere from 4 to 5 ghz. Saying you must run at 5.3 per se, that is bleeding edge, or should I say chip backing edge lol.

Vincent, Amsterdam
https://soundcloud.com/thesecretworld
BFpro fs, 2X HDSP9652 ADI-8AE, 2X HDSP9632

Re: RayDat with RTL <1ms?

Thanks everyone for your enlightening input.  I think I need to move to RME immediately just because of the fantastic responses on this forum!  Excellent.

So I was unaware that the RME driver did not offer buffer settings down to 16 samples, and thanks to Ramses' explanation, I totally understand why.  In order to achieve stability across so many users in the user base, it makes perfect business sense.  I was originally thrown off by the RME RayDat marketing material on their website under "specs", which says (quoted precisely):

"8 buffer sizes/latencies available: 0.7 ms, 1.5 ms, 3 ms, 6 ms, 12 ms…"

Regardless, the facts are the facts as explained by Ramses, and although I am certain I can achieve complete stability with RME at 96kHz and 64 sample buffer (with a very high-end and finely tuned system of course), it would not facilitate the latency I'm after and my goal remains sub-1ms RTL with direct DAW monitoring if at all possible.

My use case will seem ridiculous, and it was crazy to me initially when it surfaced, but here it is:  I only record classical guitarists, and I have three astute and very demanding clients.  They come to me because I can endure endless takes over long periods of time, and I do everything I can to meet their expectations regarding the recording environment, particularly monitoring.  The big demand is that when they play, they want to hear EXACTLY (and I mean exactly) what the finished product would sound like in their headphones while they are recording - the same reverb, the same compression, and the same EQ.  I use an array of Neuman KM184 mics (usually 3 or 4) and high-end UA and Great River external preamps.  I'm running a Black Lion Revolution ADAT converter via SMUX into the Presonus Quantum 2626, and I'm direct monitoring their tracks through Reaper with Valhalla reverb, IK eq, and UA 1176 plugins on the master bus.

When one client in particular started complaining about the guitar sounding "different" when using a 128 or 256 sample buffer (vs 16 samples), I thought maybe I was dealing with some mental instability.  However, the more I listened to it (and played through it myself), there was a discernable difference.  These high-end classical guitars have a very fast and loud attack, and at higher buffers the VERY initial part of the attack is lost to the listener while at the same time being able to feel (and vaguely hear via headphone leakage) the vibration of the guitar while playing in real time.  It comes across as slightly less bright sounding in the headphones at higher latency.

So there you have it.  I know a solution can be found via direct monitoring through the interface, but the 2626 does not offer that - it's direct through the DAW, or nothing.  Furthermore, direct monitoring in the interface means it will be difficult to offer the same compression, EQ, and reverb as what will eventually be used for production.  I know there are solutions via other monitoring schemes and other interfaces, but I have grown to really prefer the simplicity and convenience of handling everything within the DAW including monitoring effects.

In terms of PC system tuning to achieve stability at 16 samples with my CURRENT system, I believe I have it maxed out.  Over the last 8 years I've implemented BIOS tweaks, adjusted CPU c-states, toggled on and off multithreading, optimized water cooling fans, maximized performance plans, optimized overclocking, etc, and currently it runs continuously at 4.5 GHz in a stable state.  At a buffer of 16 samples, I get a quiet pop or a click every 30 seconds or so.  My main problematic client says the sound is perfect but the clicks are too annoying. 

Optimally, I should be able to stabilize 16 sample operation by upgrading my system (after all, my PC is 8 years old), but the big problem I have right now is that there are no real native Thunderbolt implementations on AMD x870e chipset boards to support the Quantum 2626.  AMD is the clear market leader in terms of performance currently (CPU would be AMD Ryzen 9 9950x3d) and I would hate to invest in an entirely new $4K build that is not the best of the best.  I could build an Intel Core Ultra 9 box, but it seems clear now that this new AMD chip is the fastest consumer CPU available by a significant margin.

Final note: staying with my current PC is also not an option - I'm starting to lose fans, drives, etc due to age, and I'm sure it won't be long before it signs off for good.

9 (edited by vinark 2025-01-16 20:14:56)

Re: RayDat with RTL <1ms?

If money is not a real problem, you could ask scan audio for a system advice with your use case, I am not 100% sure AMD is the way to lowest latency, CPU power with heavy project yes certain but not at lowest latency. A very interesting use case btw. Classical guitarist by origin here too.

Vincent, Amsterdam
https://soundcloud.com/thesecretworld
BFpro fs, 2X HDSP9652 ADI-8AE, 2X HDSP9632

Re: RayDat with RTL <1ms?

I think your customers do not seem to understand the studio workflow: recording, mixing, mastering.
If I were them, then I would concentrate on playing the guitar and later listen to the final mix / master.

To do all this at once would maybe be possible by using an analog console and equipment
and to use PC, recording interface and DAW only for digital recording purposes.

But using analog equipment of quality is expensive.
You also might have it easier with SNR when using digital equipment.

When I sold my UFX+ some time ago I visited a guy with a bigger Tascam Analog Console which had a MADI module.
This way he could record up to 64 channels at single speed or 32 channels at double speed directly from the analog console using the UFX+.

Sorry, I have no concrete idea, but I think you need to find your way through it.
But I think it is wasted time if you tried to fulfill the demands of your customers, which such low RTLs.

What would be possible is to route the recorded signal directly from HW inputs to their headphones.
Then you have only the converter latency in between.

UFX III has at least FX like EQ and compressor/limiter and the converter are fast even at single speed.
5/6 samples for AD / DA. At higher sample rates, even faster.
I think it is sufficient to use double speed.

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

11 (edited by vinark Yesterday 09:24:34)

Re: RayDat with RTL <1ms?

Yes Ramses is right, I would use tmfx with its eq and compressors set the same is in the daw and fly in the reverb from the daw, you are going to use (you don't record the tmfx FX). On the reverb a few Ms extra latency is no problem and you can even set the reverb pre delay to zero.
The tmfx eq's will sound identical, the compressors maybe a little different, but if they notice that while playing they are either superhuman or it's expectation bias in the negative sense.

Vincent, Amsterdam
https://soundcloud.com/thesecretworld
BFpro fs, 2X HDSP9652 ADI-8AE, 2X HDSP9632

12 (edited by ramses 2025-01-16 20:58:46)

Re: RayDat with RTL <1ms?

Possible Setup, assumed you have a separate control and recording room

PC----USB2----ARC USB
|
| USB3
|
|    +-- Sandisk USB3 stick for DURec recording (backup recording in addition to DAW recording)
|   /
UFX III (control room, monitoring)
|      |     \------- ADAT1-----[SPDIF(o)]-----up to 15m to recording room-----ADI-2 Pro FS---2x Headphone output
|      |      \------ ADAT2-----[SPDIF(o)]-----up to 15m to recording room-----ADI-2 Pro FS---2x Headphone output
|      |       
|      |       
|      +---12Mic (recording room)----up to 12Mic inputs // 1x Headphone Output (reserve)
|      |                                     \ (optional ADI-2 Pro / Headphone outputs connected through the 12Mic):
|      |                                      \------ADAT1---ADI-2 Pro FS R BE- 2x Headphone out
+----+                                      \-----ADAT2---ADI-2 Pro FS R BE- 2x Headphone out
MADI(o)                                    \----ADAT3--....

Galvanic isolation by ADAT (optical SPDIF) and MADI (optical multimode cable: OM3 or OM4)

The high quality reference converter as headphone preamps are no must, but I don't know what quality they expect
or what high impedance headphone monsters they want to connect ....

Instead of using the UFX III you could also consider to use HDSPe MADI FX card, which also has full implementation of FX,
but not features like Autoset or DURec. But I think the UFX III might be better as it offers the best mix of I/O ports that you might need and has also two nice headphone outputs. The Mic input could be useful for talkback...

MADI  / 12Mic would be nice as MADI cables between each of the devices in a MADI chain can be up to 2km long.
So you have most flexibility, even if your studio is separated into several rooms.
The advantage over AVB/Dante is, that clock synchronization is much easier. You have MADI dedicated for audio.
And clock slaves can easily follow the master if you work with different sample rates.
With the network based protocols (AVB, Dante) you have to reconfigure audio streams on each device for that.
And AVB needs AVB capable switch, Dante needs QoS configuration on the Layer-2 Switch and on Layer-3 level,
if you are routing between subnets. With MADI things are easier and less expensive.

If you need Room EQ / Crossfeed, then take one of the newer HDSPe MADI FX cards, only those support that feature,
not the older cards.

The UFX III is also nice because it offers already high quality Mic inputs and analog converter (those of the ADI-2 Pro FS, slightly older model of reference converters).

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

Re: RayDat with RTL <1ms?

dino757 wrote:

  I was originally thrown off by the RME RayDat marketing material on their website under "specs", which says (quoted precisely):

"8 buffer sizes/latencies available: 0.7 ms, 1.5 ms, 3 ms, 6 ms, 12 ms…"

I guess that means one way latency, not RTL that is by principle aproximately the double.

As already suggested by others, using direct monitoring in TotalMix would do the trick to get below 1 ms.

FF UCX II, Digiface USB, Babyface Pro FS

14 (edited by Kubrak Yesterday 01:01:21)

Re: RayDat with RTL <1ms?

vinark wrote:

I am not 100% sure AMD is the way to lowest latency, CPU power with heavy project yes certain but not at lowest latency.

Why do you think AMD would have higher latencies than Intel? It was true for Zen1, there were higher latencies because of cache and cores structure. But it has been solved in later generations.

On contrary, certain big.little Intels have latency problems.... Beside other problems, like oxidation, random failures and so on.

But I agree that concerning latencies AMD Ryzen 9 9950X3D might not be the best choise from AMD products (from latencies point of view) as it has huge cache on one chiplet and ordinary cache on the other chiplet.

FF UCX II, Digiface USB, Babyface Pro FS

15

Re: RayDat with RTL <1ms?

Kubrak wrote:
dino757 wrote:

  I was originally thrown off by the RME RayDat marketing material on their website under "specs", which says (quoted precisely):

"8 buffer sizes/latencies available: 0.7 ms, 1.5 ms, 3 ms, 6 ms, 12 ms…"

I guess that means one way latency

Yes, in some older manuals the buffer size was stated as ms, thus equals the pure value of samples from the buffer size dropdown in the Settings dialog, not RTL.

Regards
Matthias Carstens
RME

16 (edited by vinark Yesterday 09:32:27)

Re: RayDat with RTL <1ms?

Kubrak wrote:
vinark wrote:

I am not 100% sure AMD is the way to lowest latency, CPU power with heavy project yes certain but not at lowest latency.

Why do you think AMD would have higher latencies than Intel? It was true for Zen1, there were higher latencies because of cache and cores structure. But it has been solved in later generations.

On contrary, certain big.little Intels have latency problems.... Beside other problems, like oxidation, random failures and so on.

But I agree that concerning latencies AMD Ryzen 9 9950X3D might not be the best choise from AMD products (from latencies point of view) as it has huge cache on one chiplet and ordinary cache on the other chiplet.

Yes that is all true! I would never buy Intel big little CPU. But I would also not assume going latest amd will give better results then you have now. 1 click every 30 seconds does not indicate cpu shortage per se, could just be a driver or windows causing a hiccup.
There is no way to be sure without trying unfortunately.

Vincent, Amsterdam
https://soundcloud.com/thesecretworld
BFpro fs, 2X HDSP9652 ADI-8AE, 2X HDSP9632

Re: RayDat with RTL <1ms?

Kubrak wrote:
dino757 wrote:

I was originally thrown off by the RME RayDat marketing material on their website under "specs", which says (quoted precisely):
"8 buffer sizes/latencies available: 0.7 ms, 1.5 ms, 3 ms, 6 ms, 12 ms…"

I guess that means one way latency, not RTL that is by principle aproximately the double.

0.7ms is the input latency, see my table here:
https://www.tonstudio-forum.de/blog/ent … cts-en-de/

Also, keep in mind that the RayDAT is a full digital card without AD/DA.

Kubrak wrote:

As already suggested by others, using direct monitoring in TotalMix would do the trick to get below 1 ms.

Simply TM FX routing from input to output is enough as it has already mentioned in the thread.
If you meant ADM (ASIO "Direct Monitoring"), this is not necessarily needed.

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14