Latency of pitch shifting algorithms cage fight

Dragon Ball Db GIF by BANDAI NAMCO Entertainment
 
This is what you said:
Well, it basically really only depends on CPU power. More juice = less calculation time needed.

That quite literally says nothing about the technique used to do the pitch shifting. All it asserts - 1000% incorrectly btw - is that the more CPU power you have for your algorithm, the less calculation time for the algorithm to run. The implication being that you think latency is a result of not enough CPU power. And it simply isn't.
I'll throw some chum in the waters.... I agree.
Sascha's offhand statement about throwing more CPU would "fix the problem", and it doesn't. If it did then this would have been done forever ago. By not mentioning an approach its suggesting that all approaches would have less calculation time with more horsepower.

Now that its been dissected (by Orv) we can all see that theres multiple approaches to the task all with latency trade offs. You can move the goal posts "well actually I was talking about xyz" but that is different to the original offhand comment.

I dont really have a dog in this fight and dont really care about the outcome. I just dont like when someone makes some high level armchair claim about something technical which you look at and go, yeah thats not true.

underwater water GIF
 
Simple - almost trivial - trig functions (with a tiny bit of Fourier) give the answer. A sinusoid contains a single frequency, and - this is the important part - that frequency cannot exist in a time interval that is not an integer multiple of a full period. Take 100 Hz for example. That frequency has a period of 10ms. Before a pitch shift algorithm can do anything with 100 Hz, it must first detect 100 Hz, and that is only possible when 100 Hz actually exists. To make this simple: if it doesn't exist, you can't detect it. I hope this helps, but I'm not making any bets....
 
Simple - almost trivial - trig functions (with a tiny bit of Fourier) give the answer. A sinusoid contains a single frequency, and - this is the important part - that frequency cannot exist in a time interval that is not an integer multiple of a full period. Take 100 Hz for example. That frequency has a period of 10ms. Before a pitch shift algorithm can do anything with 100 Hz, it must first detect 100 Hz, and that is only possible when 100 Hz actually exists. To make this simple: if it doesn't exist, you can't detect it. I hope this helps, but I'm not making any bets....

So, how do pitch shifters manage to do the job with less latency? Because that's what's actually happening.
I know you folks prefer blaming me and refuse to answer this, but by your logic a pitch shifter can't even work at low latencies at all because it has to "analyze" or "detect" the input frequency first. Which is looking plausible "on paper" but isn't backed up by how these things actually work in real life (as in me actually using them with those "required" latencies).
So, educate me.
 
Last edited:
And btw, sample rate conversion can be used as some sort of pitch shifting. This is in fact what some software samplers seem to do under the hood, they're resampling internally. For that purpose, within a certain range of applications, it works pretty well, too. And it works without ever detecting any frequencies.
And before anyone starts going at me yet again: I'm not saying that this is an applicable method for our use case, but it still perfectly proves that pitch shifting doesn't need to "detect" frequencies.
For our use cases, we'd also have to realtime stretch the resulting material as sample rate conversion comes along with length changes as well - and I have zero ideas of whether that'd be possible.

Anyhow, as said, no, pitch shifting doesn't need to "detect" frequencies per se. It does only have to do that in case you wanted things such as pitch-to-MIDI, intelligent pitch shifting, pitch shifting of individual notes within a chord and what not. For plain, static pitch shifting, this is not a necessity.
Sure, it might work best in case the algorithm would actually detect the frequencies, but that result in incredibly large latencies - which we are not seeing (that alone should render this explanation moot).
 
Back
Top