Topic: UFX II - FiFo Errors after exhaustive system work. Please help
GENERAL DESCRIPTION
Per the subject of this topic, I have experienced persistent and difficult-to-diagnose FiFo errors, which result in the UFX II entering a state of glitching. The specific character of the glitch is a gapping of playback accompanied by a loss of timing sync that manifests system wide (for instance, playback timing of video files slows). In Pro Tools 12.8.2 this manifests initially as what I interpret from the USB Diagnosis as incrementing "drops," numbering up usually to a few hundred, before eventually entering the gapping-state and displaying the FiFo Error. At this point, the system wide timing becomes corrupt as I described earlier.
Because Pro Tools is coded in a way that restarting the UFX II results in system failure, the session must be restarted. However, once the problem has occurred, in most cases the destabilization seems to linger on some level of the system. On occasion, a power down/up of the UFX II allows work to continue.
The problem happens regardless of system load, but happens more severely and quickly in a PT Session with higher loads touching more subsystems. My work currently requires that the Avid Video Engine is instantiated.
I will give system specs next, and detail the diagnostic steps I have taken to date. Both Jeff and I have exhausted our ability to conjure up more tests, so I am posting this to provide documentation to him and those of you at RME who might be able to guide me--or find the information useful.
One other note: The UFX II was purchased to replace a Fireface 800 which experienced motherboard failure. It functioned well on this very same computer system!!! Rather than invest in that, I decided to move to the latest RME flagship. I generally have amazing luck with your products, and still have an HDSP 9652 in an XP Gigastudio machine that is soild as a rock. But something about the UFX II does not agree with this system.
HELP IS WELCOME!!! Or...maybe I have discovered the world's most elusive bug, and I can help you kill it.
SYSTEM AND HARDWARE CONFIGURATION
ASUS X99 Deluxe Rev 1.xx
Bios 3902 04/19/2018 (current, but the problem has manifested through other Bios versions)
Intel Core i7 5930K
128 GB RAM, Corsair DDR4 (has never failed repeated memory testing)
AMD Radeon R9 380 Series 4GB GDDR5 (MSI branded)
Boot Drive is Seagate FireCuda hybrid SSD/Spinner (passes error checks)
Data Drives are 7200 RPM Spinners (all pass error checking)
UFX II (drivers/firmware current as of today...problem has persisted through several driver iterations with UFX II changes)
Sonnet Allegro Pro USB 3.0 PCIe - 4 discrete ports, one chip per port (Jeff recommendation). UFX II has its own discrete port.
All system drivers are latest. ASUS-specific peripherals, such as on-board Audio, WiFi, Bluetooth, USB, Disk Controllers, etc. are disabled on BIOS level. Disk Controllers and on board Intel LAN ports are operated with latest generic Microsoft-provided drivers. They have been tested with the ASUS drivers as well.
All power-saving or clock-related throttling has been disabled. Hyperthreading is disabled. Turbo Mode disabled/enabled has no effect. Intel Onboard Ethernet power saving, jumbo packet, offloading to CPU, etc. is disabled.
I have disabled the swap file as a diagnostic, since I have a large enough amount of RAM to do so. That had no effect, either.
Overall, the system is being run as cleanly as I have the ability to make it.
DIAGNOSTIC ACTIVITIES
As you can infer from the above, I have undertaken an exhaustive clean-up and optimization effort.
Working in Pro Tools is the quickest way to make this problem occur. However, it is not the only way, as it has occurred even when playing back streaming video files on YouTube. Obviously Pro Tools reaches out far and wide, and touches many subsystems on the machine. I've replaced hard drives as an elimination possibility, divided project files onto different drives, etc. None of the projects are taking this system to its limits.
Seemingly, the only two subsystems I suspect are potentially implicated are Video and Network. I will explain why I think so...
Here are other specific diagnostic activities I have undertaken:
1) I have toggled the GPU-usage modes in Pro Tools. The problem happens whether Pro Tools is offloading video tasks to the GPU or not.
2) I have toggled the "Use D2D" setting in Total Mix Prefs. No effect on the problem.
3) I've moved the USB 3.0 card to different slots, used different cables, tried a repeater cable, etc. I feel I've exhausted those kinds of basic failure points
4) I have undertaken Latency Testing with Latency Mon and the accompanying In Depth Latency Tests app.
5) This does not seem related to Buffer Size in the driver. In fact, this system plays back flawlessly with latencies down to 32 Samples!! The problem comes at any buffer size, all the way to 2048 Samples.
LATENCY TESTING
Here is whereC it gets interesting, to me.
Any time I bring up the system from a reboot, warm or cold, the Latency performance is outstanding on this machine. Especially after the exhaustive optimization efforts, and leaning down of subsystems and TSR/startup items, the system is otherwise rock solid. It could run for a year with no crashes...I have literally never seen a blue screen on it.
But there are interesting phenomena with the Latency Testing.
First, I have narrowed down a way to QUICKLY cause a problem. On a very lean Pro Tools file (one video, small footprint Avid codec that Pro Tools recommends) and simple audio, all I have to do is instantiate TOTAL MIX, and the problems will come. This is why I experimented with the video modes. The FiFo errors will tend to come on once I've had to, say, mute an output, or do something in Total Mix.
If, after I have had the FiFo error, I run Latency Mon, the system latencies are significantly higher, sometimes fatally high DPC latencies where Latency will throw up the "problems" message. They'll be in the Kernel Mode driver generally, but will seem to implicate video/network subsystems as well.
But if I run the short-loop tests in the "In Depth Latency Tests," there will be a significant change there..."red" values of generally in the 385 microsecond range in one or more processors.
Once the processor is in this state, restarting Pro Tools again results in immediate FiFo Errors. Pro Tools will throw Video Engine errors at that point--which I suspect are not very accurate or precise, just indicating that playback can't start everything in sync. On rare occasion, after I have run those latency tests, I can restart the UFX II and those latency tests will return to the normal system baseline. In those cases, I can restart Pro Tools, and it will run normally until the next FiFo Error. But in most cases, at that point the system cannot recover normal timing, and a boot is required. Once I reboot, the system returns to its very low latencies, with tight loop latencies maxing out at less than 30 microseconds on CPU0 and less than 8 microseconds on CPUs1-5 at the Dispatch Level.
THAT seems interesting.
But then it couples with this ERROR MESSAGE that is occasionally thrown:
TMFXCallback: TotalMixFX.exe - Application Error
The instruction at 0x0000000000000000 referenced memory at
0x0000000000000000. The memory could not be read.
Click on OK to terminate the program
Could this all be related to an obscure total mix bug? Do you test against large memory footprints like mine, 128GB in my case? I feel a bit incompoetent and out of my depth technically when it comes to computer coding, so please bear with a question such as this. I simply cannot find many other things that seem to be related.
Opening Total Mix while in a Pro Tools session DEFINITELY hastens the FiFo Error. That much is certain.
SO THAT LEADS ME TO THESE QUESTIONS:
Might I have stumbled upon an obscure Total Mix bug relating to something unusual in my system?
Can anyone explain EXACTLY what the USB Diagnostic is polling to throw up the "FiFo Error?"
Is there ANY possibility that a bad UFX II hardware component could be throwing bad data into the system?
Why would instantiating Total Mix hasten the problem?
How is Total Mix communicating with the hardware? Does Total Mix talk directly to the GPU? Does that relate to the "D2D" setting? Are there any other settings which could affect this issue?
I am at the end of my rope
Why would that happen? What is Total Mix adding to the equation that would bring on the error so much more quickly?
Why does Pro Tools cause the error to manifest so quickly? I would assume that you test exhaustively against it, and that 12.8.2 would still have a significant user install base among RME customers.
Why would a system that ran the Fireface 800 without these kinds of errors (and which is very robust) be so susceptible to FiFo errors with the USB Driver?
I am trying to finish a feature film. I have just a few weeks, so I am unable to just rip my system to shreds. Is there a debug driver of some kind for Windows, which might catch more useful information? Is there anything I have not described above that you feel would be worthwhile to try? Am I missing the most stupid, obvious thing in the world?
I feel like I am pretty competent at wringing out systems...I've been doing it for years, and this is literally the most frustrating problem I have ever experienced with a Pro Tools system. I have never had an experience where an RME interface was anything but rock solid and tenaciously robust about working even when the system was badly taxed.
I am really hoping (pleading, actually) for help with this. Please let me know if there is anything else I can provide.
Best regards,
Bruce Richardson