There's always lots of verbiage in threads like this, and there's meaningful content interspersed with irrelevant and/or incorrect statements. I'll itemize the basics here for reference.
1. Latency is
undetectable whenever a listener is only exposed to the
output of a system. That's a Really Good Thing, too, since the total latency when we listen to a recording can now exceed 100 years.
2. When a signal is mixed with a delayed version of itself, there will be
multisource inferference. For example, when a signal mixed with the same signal delayed by
.5ms (note the decimal point), there will be a broad magnitude null centered at 1kHz. This is painfully audible, but the ability to hear it has nothing to do with your ability to detect latency.
3. Many purely acoustic instruments have substantial latency. A piano can have latency - defined by the time it takes from the player keying a note until the hammer strikes the string - of as much as
200ms and seldom less than ca. 20ms. See:
Touch and temporal behavior of grand piano actions. A pipe organ can have latencies of multiple seconds.
4. If you're singing or playing a wind instrument or horn and monitoring yourself with cans or IEMs, there will
always be mixing of the sort described in 2. above: you hear your voice or instrument via bone conduction as well as though your cans or IEMs. Any latency in one path relative to the other will cause multisource interference.
5. When a player initiates an event with physical movement - plucking a guitar string, pressing a piano key, drawing a bow across violin strings, etc. - they will experience latency
viscerally: some amount of time elapses between the
stimulus (physical action) and the
response (audible sound). If that amount of time is sufficient, a player will be able to detect it. If you can hear both acoustic and amplified sound from your instrument, there will, in addition to latency, be the multisource interference described in 2. above.
While relatively small amounts of latency may not by themselves cause a problem, the
addition of a small amount of latency to a system that already has latency may push a given player beyond a perceptual threshold and become noticable. 10ms may not be a problem for a given individual, but 12-13ms could well be. That by itself is a strong argument in favor of quantifying every contribution to latency in a signal processing/recording/monitoring system.