Topic: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

I'm an audio programmer here at Naughy Dog in the US.  We're in the process of upgrading our DAWs here, and we're switching to dual HDSPe AES PCIe cards under Windows 10 as our primary digital I/O AES interface.  We've been trying to get this working since last year, and we've almost got all our issues ironed out, but we are still having a serious problem with random RME ASIO driver failures.  We reported this issue to Synthax back in October of last year but they were not able to help; they said they've never heard of this issue and can't reproduce it.  So, I thought I would try posting here to see if RME can directly help with the problem.  Recently, we believe we've been able to devise a 100% reproduction case, so hopefully RME engineers can now fix this.

Our configuration is as follows: A Windows 10 PC with dual HDSPe AES PCIe cards and an nVidia 1060 display adapter, 64GB of RAM, an Intel Xeon E5-2690 CPU, a 1TB SSD system drive, and a 6TB RAID using the motherboard's Intel RST driver.  We have used Pro Tools, Nuendo, and Reaper on this configuration.  The first RME card is the master system clock; the second one is clocked by the first using the external BNC connecter (we've tried the internal sync cable, no difference).  We normally run at 96kHz though the bug also happens at 48kHz.  We have both cards in multichannel mode, with WDM enabled for all I/O.  Windows Speaker devices are enabled for outputs 9-16 on the second card; this is looped back to the inputs via TotalMix FX; in this way we can monitor and capture any OS audio from any app (web browser, sound library, etc.) directly into the DAW.  We have other lower numbered inputs used for various other input sources, all clocked against the first RME card (though the bug happens even if not inputting anything, i.e. all other input physically disconnected).  The main outputs are the first card's outputs 1-8, which are sent to our speaker monitor controller.

The bug is that after some random amount of time, for no reason - even if just left idle, the ASIO outputs for the first card's I/O simply stop working.  This is without warning; there is no error dialog, or system event, that indicates anything is wrong.  The first card's 16 output simply stop producing audio.  This is also confirmed by looking at the meters in TotalMix - they are dead.  This failure happens no matter what ASIO software you a using.  We've also run soak tests and noticed that this can happen even if no ASIO software is running: i.e., boot the computer, let it sit idle for a week, then try to run an ASIO app, and the first 16 outputs will not work right from the start.  The second card's ASIO I/O continues to work.  When this failure happens, it is always the first logical card that fails.  This is true even if the cards are swapped in the RME preferences; in that case, the other card will now be the first card, and it will lose its ASIO I/O.

We've tried swapping cards, swapping parts, changing drivers (this has happened on every RME driver since October, including the lastest 4.18 drivers); nothing matters.  The frequency of the bug is somewhat random: it can take almost a week to happen, or it can happen multiple times in one day.  We we first encountered this, we were working around it by rebooting the PC, which resets the driver and fixes the problem.  The more we started using the PCs, the more frequent this would happen, until it got so frequent that rebooting is killing our workflow.

Some interesting points to note are that we have a few PCs that don't need 32 channels of I/O and are only using one RME AES card.  Those PCs are rock solid - they are not affected by this bug.  Those PCs are identical in every way to the ones that are failing: same motherboard, CPU, drives, and video card - the only difference is they have 1 RME card instead of 2.  So the bug seems unique to the scenario of using 2 RME cards.

We discovered an inelegant workaround that often saves us from rebooting: if, when this bug happens, you exit all audio applications, and then run a program called ASIOSigGen (you can find this on the internet), you can use the app to force both cards to change sample rates.  If you do this, then change the sample rate back again, it seems to reset the ASIO driver such that it works again.  For example: if you are working at 96kHz and the bug happens, close all the DAW applications, run ASIOSigGen, pick 48kHz, wait a few seconds, then pick 96kHz, wait a few seconds, exit the app, restart the DAW software, and now it's working again.

Although this is an improvement over rebooting, this is still not ideal.  However, it gets worse: we've recently started experimenting with Reaper, and we've found a normal workflow pattern in Reaper that causes the RME ASIO driver to fail 100% of the time (at least, for us).  That is good news, I guess - hopefully that will help RME engineers identify the bug and fix it.  It's bad news for our sound designers using Reaper because this failure is now much more frequent.

Here are the steps to reproduce the 2-card ASIO bug 100% of the time in Reaper:
1.    Restart the computer
2.    Load up a blank session in Reaper (latest version)
3.    Select a track in Reaper
4.    Go to “Actions” -> “Show actions list”
5.    Type inside the “filter” line: “move tracks to subproject”
6.    Select “Track: Move tracks to subproject” in the actions list and press “Run” at the bottom of the window

We really want to get this fixed because it is impeding our ability to finish upgrading our DAW hardware; none of the sound designers here want to use these new PCs until this bug is fixed.  We've got 1 user here who's been suffering with this bug since last October and another one who's been dealing with it since April.  People are starting to complain that maybe we should start looking at other audio hardware and I don't want to do that if I can avoid it because the RME has a unique feature set that really works well for us.  I need to roll out 6 more of these machines, so any help RME can give in fixing this would be appreciated.

I'm also an audio programmer, and I have access to developer tools and I am willing to gather whatever diagnostics or forensics that will help track the problem down.

Any help would be greatly appreciated.  Thanks!

- Jonathan

2 (edited by ramses 2017-08-19 07:34:41)

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

Very good problem description.

I ask myself whether this issue is related to
- RME driver solely
or
- Win10
- HW release of the RME cards
- mainboard / BIOS
- combination of hardware
- nVidia driver

Additional questions to get the full picture:
1. Which build of Win10 and which mainboard are you using ?
2. Which board revision do the RME cards have ? Might be of interest for RME.
3. Which mainboard slots do you use for the 2 cards ? Is there maybe an impact from shared PCIe lanes depending on board design ?
4. Is energy saving disabled in the BIOS and did you set energy options under Windows for full power ?
5. Can it be excluded that the driver issues happen based on system going to sleep, then wakeup ?
6. Do you use with Cubase/Nuendo the option "modus for optimized audio performance ...."
    Which enables a special Steinberg energy profile and disables CPU core parking.
    Does this make a difference ?
7. Xeon based systems offer usually a lot of configuration options in the BIOS
    Did you base your BIOS settings on optimized defaults (to have a certain basement,
    usually it doesnt change more than maybe one parameter)
    And what other changes did you do in what areas ?

Things that I would additionally check to find out or "underpin" whether this is a RME driver issue
(if RME does not have a different idea).

First things that can easily be performed and reverted:

8. Disable all WDM devices in the driver settings dialog of all cards in a system. Does it make any difference ?
9. Does it make a difference if you disable Hyperthreading in the BIOS ?
10. Did you check whether an upgrade to the latest BIOS version makes a difference ? I think BIOS upgrade is not too hard and can be reverted easily if the newest BIOS should cause issues.

Rationale for Win7 cross test:
I am reading this forum since long. And I don't remember a similar sounding bug description.
Therefore I assume that the RME AES card and drivers can be regarded as stable.
So .. especially interesting I would regard a parallel installation of Win7 on one system to see whether it makes a difference.
You require at least the Win7 Professional version to be able to address all DRAM.

Windows 10 with its new idea of "windows as a service" is a "moving target". You will get all upgrades, maybe delayed, but at the end you will get them, many of them. Its not possible anymore to limit the amount of changes like in Win7 to security upgrades only. Of course this can have more negative impact on the stability of a recording PC.
I personally see that even MacOS X - with a much more limited set of HW - that some updates cause issues.
By this I want to say that changes to an OS always can have an impact to an OS and for my taste this happens
too often with Win10, so I would like to crosscheck against a "proven" Win7 SP1.

If your setup would work with Win7 then the issue might be either fully related to Win10 or
RME would know at least that it might happen only in combination with Windows 10
:

11. Parallel installation of Windows 7 to check whether this issue is related to Win10, to exclude any impact of the combination of mainboard / BIOS / deployed Hardware / drivers.

These tests cost more efforts so I would perform them after the crosscheck with Win7:

12. Does this also happen on a different PC with different HW using the 2 RME AES cards ?
13. Do you have another graphic card than nVidia to exclude any impact from there (HW and/or driver) ?

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

3

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

Thanks for all the details, we will have a look in the next days.

BTW, I don't think ASIOSIG is necessary at all. Any software that supports ASIO should be able to do the same, even our own tool DIGICheck. Changing buffer size or sample rate will issue a kASIOResetRequest, which disposes the current buffers and completely resets/renews the ASIO driver state.

Regards
Matthias Carstens
RME

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

Thanks for the suggestions; here's my comments:

ramses wrote:

1. Which build of Win10 and which mainboard are you using ?

Windows 10 Enterprise Anniversary Edition with the latest security updates.  The motherboad is an ASUS X99-E-10G WS.

ramses wrote:

2. Which board revision do the RME cards have ? Might be of interest for RME.

I'm not sure how to check that, but the boards are all new stock purchased within the last year.  All have the lastest RME firmware installed.

ramses wrote:

3. Which mainboard slots do you use for the 2 cards ? Is there maybe an impact from shared PCIe lanes depending on board design ?

We use the lowest slot and the second slot up from that (slots 7 and 5), but there should be no impact here; the Xeon has 40 PCIe lanes and this motherboard can leverage all of them, and 32 lanes are dedicated to the PCIe slots.  All PCIe slots are direct to CPU, meaning they do not go through the PCH.  This is fairly unique for a single-CPU motherboard and one of the reasons we picked it: none of the integrated peripherals share any bandwidth or lanes with the card slots.

ramses wrote:

4. Is energy saving disabled in the BIOS and did you set energy options under Windows for full power ?

Enhanced SpeedStep and CPU C-States are both disabled in the BIOS.  Windows is configured through group policies to disable sleep and hibernation modes, and is configured for full power mode, as is the nVidia driver.  The USB driver also has USB low power mode disabled.  These PCs are set up to be used 24/7 if necessary.

ramses wrote:

5. Can it be excluded that the driver issues happen based on system going to sleep, then wakeup ?

Sleep mode is disabled.

ramses wrote:

6. Do you use with Cubase/Nuendo the option "modus for optimized audio performance ...."
    Which enables a special Steinberg energy profile and disables CPU core parking.
    Does this make a difference ?

We have not yet made any special tweaks to Nuendo; we've just started using it.  However, this bug affects all ASIO apps, so I don't think it has anything specific to do with Nuendo.  I will make note of your suggestion for tuning Nuendo, though so far Nuendo 8 has worked perfectly with no changes.

ramses wrote:

7. Xeon based systems offer usually a lot of configuration options in the BIOS
    Did you base your BIOS settings on optimized defaults (to have a certain basement,
    usually it doesnt change more than maybe one parameter)
    And what other changes did you do in what areas ?

We've hand tuned these systems for DAW use.  This includes everything I've mentioned above, plus some additional tweaks to minimize interrupt load on the network drivers, as well as tuning for driver interrupt affinity to keep high frequency, high latency drivers from overlapping with the RME drivers.  We've been able to fully load the RAID drive, download at maximum bandwidth over the network, and copy multiple gigabytes from the USB 3 port simultaneously while bouncing in Pro Tools with no errors or dropouts.  The system is remarkably solid, with the exception of this ASIO issue.

ramses wrote:

8. Disable all WDM devices in the driver settings dialog of all cards in a system. Does it make any difference ?

All non-RME WDM audio devices are already disabled.  Motherboard audio is disabled in the BIOS and does not show up in the device list, and the HDMI audio device from the nVidia is disabled.  The only active audio devices in the system are the RME devices.

ramses wrote:

9. Does it make a difference if you disable Hyperthreading in the BIOS ?

No.

ramses wrote:

10. Did you check whether an upgrade to the latest BIOS version makes a difference ? I think BIOS upgrade is not too hard and can be reverted easily if the newest BIOS should cause issues.

We have the latest BIOS, no difference.  This problem has persisted for several months; we've tried every BIOS from ASUS from the launch of the motherboard up until now with no changes.  I don't think there is a BIOS issue.

ramses wrote:

I am reading this forum since long. And I don't remember a similar sounding bug description.
Therefore I assume that the RME AES card and drivers can be regarded as stable.

I don't think that's a valid assumption; I'm not sure how many RME users will be using 2 cards instead of 1, and this bug can be fairly rare.  Also, it is cured via an ASIO reset event, so if a user's normal workflow does anything that regularly resets the ASIO device, they might not ever see this issue.  We've only recently been able to increase the frequency of failure by using Reaper in the steps that I mentioned.  I suspect our workflow is also slightly different than typical users.

ramses wrote:

11. Parallel installation of Windows 7 to check whether this issue is related to Win10, to exclude any impact of the combination of mainboard / BIOS / deployed Hardware / drivers.

It would be possible to install Windows 7 on a test machine, but I'm not sure what that would accomplish, other than possibly getting a data point about whether or not the bug was unique to Windows 10.  We require Windows 10 here, so I need it to work on Windows 10, regardless.  I honestly don't think Windows has anything to do with this issue, because the WDM drivers are perfectly functional.  To be clear: this bug only affects ASIO.  When the bug happens, WDM drivers on the same I/O that fail for ASIO continue to work perfectly via WDM.  I don't know how the RME drivers are architected internally, but from my perspective, it appears that this issue is exclusive to the ASIO driver layer.  The RME WDM driver is working flawlessly in all cases.

ramses wrote:

Windows 10 with its new idea of "windows as a service" is a "moving target". You will get all upgrades, maybe delayed, but at the end you will get them, many of them. Its not possible anymore to limit the amount of changes like in Win7 to security upgrades only. Of course this can have more negative impact on the stability of a recording PC.

Any change can have impact on the stability of any PC.  We recently solved an issue with DAW instability caused by merely having an old flash card reader plugged into a USB port; this caused random blue screens.  Unplugging it solved the problem.  But anyway, we actually want the service updates.  This is why we are transitioning away from Mac OS DAWs to Windows 10 DAWs; security is critical for us, and Apple is making it impossible to stay up-to-date with Mac OS without breaking our DAW software.  Remaining vulnerable for months while waiting for Pro Tools to get certified for a Mac OS update, by which time another OS update has been released, is not tenable.  Staying unsecure for any length of time is untenable; therefore using frozen OS releases and avoiding software updates to keep the DAW working does not work for us.  At least Windows is more backwards compatible, in general; but we will always have to be vigilant about testing updates before deploying them.  I don't see any other way around it if you want your DAW to be secure.

ramses wrote:

12. Does this also happen on a different PC with different HW using the 2 RME AES cards ?

We have not yet tested this.  I don't currently have another set of HW that I feel confident would be a reliable DAW motherboard to test this.  We went through a few motherboards before we settled on the X99-E-10G WS, all with varying issues.  It's not easy to just take any motherboard and expect it to work as a reliable DAW.  It might be possible to take one of our older PC DAWs (using an older ASUS board that we vetted several years ago) and test it with the 2 cards, but currently those aren't yet available for me to experiment with.

ramses wrote:

13. Do you have another graphic card than nVidia to exclude any impact from there (HW and/or driver) ?

We do not.  We make video games, and unfortunately our proprietary in-house tools are written exclusively for nVidia hardware, so I have no choice but to use nVidia.  We don't have any non-nVidia cards here to test with.

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

MC wrote:

Thanks for all the details, we will have a look in the next days.

BTW, I don't think ASIOSIG is necessary at all. Any software that supports ASIO should be able to do the same, even our own tool DIGICheck. Changing buffer size or sample rate will issue a kASIOResetRequest, which disposes the current buffers and completely resets/renews the ASIO driver state.

OK, well that's good to know.  Would it be possible to add an ASIO driver reset button in the RME Hammerfall DSP settings app?  Possibly, to reset the drivers without quitting the apps and have the audio recover?  Of course I'd still like to get the bug fixed, but if the driver can be reset independently, that would still be a useful feature to have integrated into the settings.

Let me know if there's any more information I can provide that will help track this problem down.  Thanks!

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

I see you tested a lot. I wrote all those questions, because this I would have done that in this situation
to try everything what is possible my side.

I see now you need Win10, no matter what Win7 results would bring.
And yes you are right, 2 card setup might be rare so it could be a driver bug.

RME is starting evaluation, so good luck !

BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub14

7

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

In fact a 2 card setup with HDSPe AES is anything but rare. Still jlanier described some pretty unusual settings (like WDM multichannel on the second card) that might cause this effect, while the majority will never see it.

Regarding the ASIO Reset Request: all you have to do is change the buffer size in the RME Settings dialog while Nuendo is active. OR go to Nuendo, the dialog where you set the RME ASIO driver, there is a reset button to restart the ASIO engine.

Regards
Matthias Carstens
RME

8

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

Mail sent.

Regards
Matthias Carstens
RME

Re: ASIO bug: driver failure when using multiple HDSPe AES PCIe cards

As a follow-up to this post, I'd like to give kudos to RME for their excellent support in solving all the issues we were having.  For anyone with dual cards encountering similar problems, driver version 4.21 fixed our issues.  See the readme.txt in the 4.21 driver for more information.