The only way I see this could happen is to use the PC's CPU, and this means using ASIO and its latency just like a DAW would (or some other proprietary streaming protocol - don't see how it could be much faster than ASIO @ 32 Samples regardless).
I seriously doubt the FPGA used for TM-FX would be able to emulate x86/x64 code to run a VST w/o involving the host CPU and OS (meaning ASIO would still be needed to stream back and forth). So no free lunch. Maybe it could run 1 or 2 x86 VST's under some sort of emulation - but efficiency goes out the window under most emulation environments...
UAD is basically tons more powerful than the FPGA used in RME's TM-FX, and obviously UAD has an API that other plug-in vendors can write to.
UAD obviously incurs latency when streaming from DAW to UAD and back as an "Insert". The UAD Apollo seems to address this a little differently in that it is acting on the digital signal before it is handed off to the ASIO driver (so it is pretty quick with INPUT processing as it is not streaming back and forth when tracking). I'd have to assume the Apollo still incurs noticeable latency when used as an Insert within the DAW (just like the UAD cards do).
We can dream - but physical limitations are what they are...
MADIface-XT+ARC / 3x HDSP MADI / ADI648
2x SSL Alphalink MADI AX
2x Multiface / 2x Digiface /2x ADI8