Mostly because it is used in live systems where low latency is a must.
There are a number of reasons why live systems switched from point-to-point connections (like MADI) to network systems, such as cabling, routing, and endpoint control; one of the reasons AVB was chosen above Dante was the fact that it is by design deterministic. You can alter latency between two Dante endpoints by running other traffic on the network, which can become audible under some circumstances.
The latency itself (2 ms) comes from one of the AVB standards and was calculated as a worst case connection from talker to listener over seven 100 Mbit switches. Bringing this to Gigabit networks, it scales to roughly a dozen switches if I recall correctly. This means that you can safely connect ten switches in a row, and one 12Mic to each switch. Route the mic input of the first 12Mic to the phones output of all other 12Mics in the network - they will play the signal from the first 12Mic at the same time, sample and phase accurate, even if you run long cable distances between each switch and are downloading Windows or macOS updates from the internet on the same network. This is the out of the box behavior without any configuration.
You can reduce the PTO (nowadays referred to as MTT, maximum transit time) to a lower value on each talker. For manufacturers, a higher latency means they have to buffer all the incoming audio for up to 2 ms in hardware. That translates to a benefit for the end user, because it means that they get guaranteed latency in very large networks (and can reduce it if they are on a small network). If a manufacturer claims that they have a "super low latency AVB" implementation, it simply means that they use a non-standard default MTT, which in turn reduces the size where the network can reliably operate (in a best case, the stream will simply not be established when the calculated transit time exceeds the MTT settings).
Some AVB devices have already implemented MTT auto negotiation, which reduces the latency to a minimum automatically. That is only useful for static 1-1 connections, because when adding more listeners with higher distance to the talker, the latency would have to increase (you have to stop running streams to the first device).
It is really all about deterministic latency, not so much about the actual value of the latency. 2 ms is very low and acceptable for PA; it translates to sound traveling less than a meter. For monitoring applications where low latency is very important, you can reduce it if needed.