I would record in 96 kHz only if this is a real must, if you have i.e. special quality requirements for Classic or Jazz recordings.
If this is not really required do yourself a favour and stick to 44.1 kHz, its enough quality in terms of sample frequency.
But very important is, that you record with at least 24bit. This offers a higher dynamik range, so that you can have higher
"safety buffers" for Gain settings on your inputs .. to work with -12db ... to work in safer distance before 0dB ....
One of the advantages is, that you have in the final mix more headroom for postprocessing / mastering.
Other advantages: in the case of recordings with very big dynamics (Classics) you record the very silent signals with more bits as if you would have with only 16bit depth. Also the distance to the "noise floor" is higher.
Then you mention a Buffer size of 4096. Well I think you mean the ASIO Buffer size.
This value of 4096 is very very high. I use usually between 64 .. 256, in rare cases either 32 or 512.
On the one hand a buffer of 4096 saves much CPU time, as the CPU does not need to be very busy to transfer data in time, as much becomes buffered, but the end to end Latency between the Recording Interface to the DAW and back will have longer time to travel...
Which leads me to your last question in terms of Zero Latency. There is no "Zero Latency".
There is always a little latency, as the signal conversion from analog to digital needs always a little time.
And also the way from your recording interface via PCI/PCIe/USB/Firewire to the PC/DAW and back to the recording Interface, where your Headphones are connected.
So .. if you want to hear on the headphone the Music, like you Mix/Master it on your DAW including additional Effects and such, then there will always be latency.
And this Latency you make much bigger, by increasing the ASIO Buffersize to the highest value which is possible (here 4096)
because audio will be processed now in bigger chunks of 4096 Samples Buffersize.
On the other Hand ... if you take the lowest value possible, which might be 48 samples or even 32 samples on PCI/PCIe based solutions, then the CPU needs to hurry, that no Audio Data gets lost, as the buffering is now lower.
This increases the CPU time and the likeliness to loose data .....
Should you have an Audio Project, where you use a lot of VSTs (EQs, Compressor, etc) and eventually even virtual instruments, then the CPU has a lot of things to calculate. Depending on how exactly the Performance of your System is and how complex / CPU intensive the project is, then it can be the case that the CPU can't be quick enough, Audio Packet loss can happen and then you hear Audio interruptions or clicks ....
In such a case, depending on your DAWs capabilities, you have possibilities to workaround
a) in Cubase you can freeze tracks
b) you can make a dowmix and record your stuff to the downmix, then the complexity is away and you can eventually record with only 32 / 48 samples ASIO buffer.
But in most cases you might not want or need to hear the Mix that comes from the DAW ....
Near Zero latency you get best, when you listen to all the signals in i.e. Phones that come directly from your recording interface.
So if you record Guitar or Vocals to a "playback track", then configure the routing in such a way, that on your phones you directly hear the signals that you record with your recording interface and also hear the playback.
Then you have only the very very low latency of the A/D converters ... plus D/A conversion to your headphones.
So ... you can shoot potentially into your own knee by using maybe much too high ASIO buffers like in your case now (4096).
With RME you have proven products with excellent drivers which have already a very low latency, so this is already a very good basement.
With more experience and workflow optimizations you will over time use your gear in the best possible way.
I hope my comments inspire you to work on your setup and find something which fully fits your demands.
BR Ramses - UFX III, 12Mic, XTC, ADI-2 Pro FS R BE, RayDAT, X10SRi-F, E5-1680v4, Win10Pro22H2, Cub13