From what I gather, the RME Driver should be reporting the true analog I/O latencies to Cubase. This means that as long as Cubase adheres to these reported offsets, an analog I/O loop WILL report "0" as Cubase is correctly using RME's reported analog values. Your -13 sample offset is likely a Mac driver issue as Matthias hinted at fixing (reporting the incorrect AD/DA offsets to the host).
I've also had issues with using digital I/O on the "hybrid" RME products (any RME PC Interface that has analog AD/DA and digital I/O - the driver will always report the analog I/O latency) when the other AD/DA converters are "quicker" than the RME's built in AD/DA conversion. This is where a negative delay would be needed, and Cubase cannot currently accommodate this AFAIK.
I would LOVE to see a way to report the analog and digital I/O independently, but ASIO 2.0 does not support this (so RME can't really "fix" it in the driver) - and I believe this is the only way to solve the issue apart from Cubase adding individual offsets for each input and output connection.
The only way I came up with was to have RME add the ability to toggle the reported AD/DA latency in their driver. If you run into a situation where you get a negative offset using 3rd party AD/DA converters through an RME PC Interface, swap the RME driver to the "Report Digital I/O ASIO Latency", and then the offset would naturally become positive at that point (but then the RME's Analog I/O is no longer sample-accurate for regular I/O needs including record offset placement - might not be an issue depending on your specific circumstance).
There is not currently a perfect solution - Using AD/DA's with varying delays will always be a kludge under ASIO 2.0 AFAIK. This issue is not as pronounced on the "Ditigal only" boxes like the Digiface and MADI cards - but I still have issues over MADI due to my setup (I run two SSL Alphalinks, one Alphalink feeds the other via ADAT which adds like 4-6 samples of latency compared to the Alphalink that is connected directly to the MADI card). Then, I specify the total analog I/O latency (above and beyond the ASIO's reported digital latency) in Cubase/Nuendo's "Record Offset Placement" field for sample-accurate overdubs. Not perfect, but as close as I can get with ASIO 2.0 and Nuendo...
MADIface-XT+ARC / 3x HDSP MADI / ADI648
2x SSL Alphalink MADI AX
2x Multiface / 2x Digiface /2x ADI8