Latency of pitch shifting algorithms cage fight

Sascha Franck · 2025-11-28T03:55:35-0600

Orvillain said:
This entire exchange just gave me a taste of what it is actually like to be @jay mitchell - I take back all of those idiotic things I ever said about the man. He actually has the patience of a saint!

Sure, just as you he's been posting stuff that isn't justified in real life.

Sascha Franck · 2025-11-28T05:15:00-0600

Our virtual friend says this:

---

Short answer:

No — for static pitch shifting (e.g., always shifting by a fixed interval like a perfect fifth), you generally do not need to detect the input pitch first.

Why pitch detection is not required

Plugins and hardware pitch-shifters that apply a fixed transposition amount (e.g., +7 semitones) usually rely on signal-processing algorithms that shift the entire waveform without needing to know what note is being played.

Common algorithms include:

FFT-based phase vocoder pitch shifters
Time-domain harmonic scaling (e.g., PSOLA)
Granular pitch shifters
Delay-based shifters (classic Eventide-style)

All of these methods mathematically “rescale” or recombine the waveform to raise or lower it by the desired interval.

None of these need pitch detection to shift by a fixed amount like +700 cents.

When pitch detection is required

Pitch tracking becomes necessary when the processor works in a note-aware or pitch-dependent way, such as:

Autotune / pitch correction (maps detected notes to target notes)
Harmonizers that generate musical intervals (e.g., add a 3rd or 5th based on the current note)
Adaptive effects that react differently depending on the fundamental frequency
Audio-to-MIDI conversion

These systems must know which note is present to compute the musical interval properly.

---

I am perfectly aware that in the OG Helix thread Orvillain mentioned those "non analysing" methods as well. But his first answer has still been that a full length wave cycle would have to be analyzed for any pitch shifting to work at all. Apparently he only looked up the other options later on (more on that below, btw), because otherwise, why would anyone try to come across with such an exclusive, generalizing (and even more: plain wrong for most use cases discussed here) statement in the first place? But hey, fortunately we have Jay backing him up with the same nonsense.

Then, about the other ways of pitch shifiting: I looked them up. At least as good as I could within my "user only brain" limitations.
And yes, I was wrong with this:

Sascha Franck said:
Well, it basically really only depends on CPU power. More juice = less calculation time needed.

And yes, it seems that only pretty bad sounding pitch shifting algorithms (such as, say, delay-based ones) would be able to work within, say, the boundaries of the buffersize you've set for your DAW, most decent sounding ones do apparently need an extra chunk of latency, which apparently is caused by some "analysis" or "grain length" window. And then there's also some DSP technique called "PSOLA" which then indeed needs at least one full waveform cycle as its analysis window.

The latter very apparently is what Orvillain and Jay are refering to. Just that they came up with answers (at least in their initial forms) making it seem as the only way pitch shifting would work at all. Which, as clear as a morning sky, is just complete bogus (and as said, yes, I know that Orvillain came up with some of the other options - but that was only after he got called out on that absolutistic, generalizing "needs a full wave cycle" blurb).
In fact, most pitch shifting algorithms we are dealing with in our modelers and plugins are NOT using that very technique as it'd cause too much latency, especially in case you also wanted to cover lower frequencies. Which would render it useless for most realtime applications.

TL;DR, the nutshell:
- Yes, there is pitch shifting methods requiring at least one full length wave cycle to work.
- But also yes: There's some other methods which simply don't need a full length wave cycle.
- And even more so: The vast majority of pitch shifting algorithms we're dealing with in modeling land are of the latter kind (which can easily be proven by measuring their latency).

And fwiw, as the last thing: I already admitted I was wrong with my initial statement, but yet, those methods not requiring a full length wave cycle to work, do in fact profit from faster CPUs, especially in case they're grain/granular based. At least that's what I got out of my (admittedly very incomplete) research. They don't profit as much as I thought they would, though.
So it seems to me, at least as long as there's no new pitch shifting methods, we won't be seeing decent sounding pitch shifting and very low latencies.
Still hasn't got anything to do with full wave lengths cycles in the vast majority of all cases, regardless of what Orvillain and Jay want to make you believe.

Sascha Franck · 2025-11-28T05:23:36-0600

But hey:

Orvillain said:
You really don't English very well do you?

At least I've learned that "to English" is a verb.

Chocol8 · 2025-11-28T05:35:08-0600

Sascha Franck said:
So, how do pitch shifters manage to do the job with less latency? Because that's what's actually happening.

They are basically cheating which is why they all sound like shit or have a lot of latency. Those are the only two choices. Why is this so hard for you to understand?

Sascha Franck · 2025-11-28T05:36:47-0600

Chocol8 said:
They are basically cheating which is why they all sound like shit or have a lot of latency.

They're not cheating. They're just using different algorithms.

Chocol8 · 2025-11-28T05:40:39-0600

Sascha Franck said:
They're not cheating. They're just using different algorithms.

They are using algorithms that can’t do a full and accurate polyphonic pitch shift. Call it cheating, taking a shortcut, doing a poor job, whatever. It’s still not going to be able to shift 6 notes and their harmonic content without errors and artifacts.

Chocol8 · 2025-11-28T05:44:58-0600

Why do you think after all these decades of digital signal processing no one has come up with a high quality low latency pitch shifter? Do you think no one has wanted to?

Sascha Franck · 2025-11-28T05:50:48-0600

Chocol8 said:
Why do you think after all these decades of digital signal processing no one has come up with a high quality low latency pitch shifter? Do you think no one has wanted to?

Please read my last longer post. I have done some reading up and learned a thing or two.
I still don't think it's "cheating", but it seems to be in the nature of things that you'll always have to deal with shortcomings.

Besides, that's not what this discussion started - but the wrong assertion that pitch shifting required a full length wave cycle to start with in any case. Which it doesn't.

Chocol8 · 2025-11-28T05:53:29-0600

Sascha Franck said:
Which it doesn't.

Unless you want it to actually sound transparent and not a shitty mess.

Sascha Franck · 2025-11-28T05:56:02-0600

Chocol8 said:
Unless you want it to actually sound transparent and not a shitty mess.

As said, I'm aware of that. And still, even algorithms requiring a full wave length cycle don't seem to work that great, especially in a polyphonic context.

laxu · 2025-11-28T05:56:19-0600

Sascha Franck said:
Besides, that's not what this discussion started - but the wrong assertion that pitch shifting required a full length wave cycle to start with in any case. Which it doesn't.

In the other thread @Orvillain posted a list of different methods and their pros and cons. Here, I'll even link it for you.

Post in thread 'Line 6 Helix Stadium Talk'

Thursday at 5:10 AM

laxu said:
Can you improve pitch detection latency simply by deciding you are supporting e.g only standard guitar range or maybe just high pass filtering the signal before detection? If I understand it correctly, this would allow for faster pitch detection but would produce errors if you are using a downtuned guitar, extended range guitar etc.

Pitch detection latency, yes you can save some time by only supporting an expected range, and anything outside of that range gets filtered out.

I'm not an expert, but from my brief reading on the topic:

Autocorrelation approaches - latency is...

Yet you got stuck at that one specific thing, and did not address your own claim of "throwing more CPU horsepower at it solves pitch shifting issues".

ian_dissonance · 2025-11-28T05:58:37-0600

Sascha Franck said:
Please read my last longer post. I have done some reading up and learned a thing or two.
I still don't think it's "cheating", but it seems to be in the nature of things that you'll always have to deal with shortcomings.

Besides, that's not what this discussion started - but the wrong assertion that pitch shifting required a full length wave cycle to start with in any case. Which it doesn't.

I’m pretty sure this discussion started because you said that all you need for less pitch shifting latency is more CPU. In a discussion about polyphonic pitch shifting g, correct? And then you’ve just moved the goalposts to address this cycle length discussion. And then you ChatGPT’d an answer to a question that wasn’t even being asked until you changed from “cpu power” to “cycle time” for your point of contention.

Sascha Franck · 2025-11-28T06:04:14-0600

laxu said:
In the other thread @Orvillain posted a list of different methods and their pros and cons. Here, I'll even link it for you.

I have already acknowledged that. Yet, he started all of it with a posting saying that analyzing a full length wave cycle would be a requirement for all pitch shifting to happen. The other stuff came after the fact.

ian_dissonance said:
And then you’ve just moved the goalposts to address this cycle length discussion.

No, I didn't. The one starting the full cycle length discussion wasn't me.
I already said so that I was wrong - but it absolutely wasn't because pitch shifters would need a full length wave cycle (hence what Orvillain used as his entry post to the discussion) because that method typically isn't even used.

Orvillain · 2025-11-28T06:10:25-0600

Chocol8 said:
They are basically cheating which is why they all sound like shit or have a lot of latency. Those are the only two choices. Why is this so hard for you to understand?

I'd probably describe it as masking, or estimating perhaps, not necessarily cheating.

I did a LOT of reading on this yesterday. Much more than I have in the past. So you've got a few issues that crop up:

- Latency that comes from the fundamental physics truth that you cannot reliably predict a frequency without at least 1 full cycle of that frequency. This is a first principle. It is simply not debateable. Attempting to debate it is the height of ignorance, and the insistence that it is wrong, is the height of hubris.

- This is also further complicated that real signals in the real world, are not test sine-waves. They're not easily predictable, and you cannot always guarantee that you'll even be able to recognise a full cycle of a given note - harmonics, noise floor, everything throws it off. So you actually often need 2-3 cycles for robustness. Essentially - no periodicity, no fundamental. No fundamental, no pitch. No pitch, no tracking. No tracking, no pitch shift. All of these are again, fundamental first principles.

- Enter FFT analysis. Using FFT you can then perform such things like noise and harmonic suppression, in order to arrive at the most likely fundamental frequency. You can also track stability and identify peaks more reliably.

- But this introduces its own problems, because FFT analysis itself introduces latency. This is determined by the FFT window size, the hop size, and the overlap size. With a window size of 1024 samples, that has its own 21.33ms latency. Even with overlap add techniques, you cannot fully eliminate window latency.

- So how else could we get at the fundamental? We could perform an autocorrelation. You can think of an autocorrelation as the signal multiplied by a delayed version of itself. Essentially sliding the signal against a shifted copy of itself, and measuring how well they line up. When the peak value of the shifted signal is a strong positive peak, that means the alignment is good and you've found a likely fundamental in that period. If the peak is low, then the alignment is worse, so the correlation is low.

- But this also has a latency. It is a little more controllable, but with caveats. Make your window to small and you get octave errors, jitter, instability, false pitches, and artifacts. Make it too long, and you get .... more latency.

- Enter resampling. Resampling is zero or near zero latency. But the downsides are transient smearing, formant destruction, modulation artifacts; all leading to a warbling robotic tone. Hello Whammy pedal. This is all fundamentally because you are not shifting pitch when you resample. You are shifting TIME.

- Phase-Vocoder. Is essentially a process of performing an STFT (Short Time Fourier Transform), then phase manipulation, and then an inverse FFT with overlap-add techniques. This is one of the most used techniques in pitch shifting, as I understand it.

- This gives you, accurate pitch shifting, and stable. But also smears transients. Phase artifacts. Pretty easy to understand if you know the physics behind it and how windowing a signal works. You get latency from the window size you decide to use. Long windows lead to good pitch extraction, but bad transients, and latency. Short windows lead to good transients, bad pitch resolution, and less latency.

I might have some of that wrong in the details. But the principles are correct. My original statement that sent Sascha into a spazz attack, was correct.

ian_dissonance said:
I’m pretty sure this discussion started because you said that all you need for less pitch shifting latency is more CPU. In a discussion about polyphonic pitch shifting g, correct? And then you’ve just moved the goalposts to address this cycle length discussion. And then you ChatGPT’d an answer to a question that wasn’t even being asked until you changed from “cpu power” to “cycle time” for your point of contention.

Yes, indeed. He's completely moved the goalposts to obfuscate from his initial comment that was completely wrong. I still maintain that it doesn't matter how much CPU you throw at the pitch shifting problem, there will ALWAYS be trade-offs because of inherent latency and qualities issues in all of the options we have to perform the operation.

ian_dissonance · 2025-11-28T06:10:48-0600

If the internet has taught me anything it’s not to argue with guys who make plugins about signal processing or Jay about math. They know more than us. And that’s ok.

Sascha Franck · 2025-11-28T06:12:54-0600

Orvillain said:
He's completely moved the goalposts to obfuscate from his initial comment that was completely wrong.

The one moving goalposts was you just as much, coming up with an "explanation" that it would be mandatory for any pitch shifting to read out a full length wave cycle to work at all. Which simply isn't a) the only truth, nor is it b) even happening in pretty much all pitch shifters.

Orvillain · 2025-11-28T06:43:51-0600

Sascha Franck said:
a full length wave cycle to work at all.

I literally never said that Sascha. Grow the hell up. You keep arguing about things that are inarguable. Silly person.

Sascha Franck · 2025-11-28T07:17:15-0600

Orvillain said:
I literally never said that Sascha.

Right.

Orvillain said:
Because you need at least a full cycle in order to correctly detect the pitch, in order to perform any shifting in the first place.

Try your BS on someone else.

ian_dissonance · 2025-11-28T07:18:51-0600

Maybe I’m dumb, but aren’t any of the shifters NOT doing that doing some fancy guessing and not truly detecting a pitch?

banned · 2025-11-28T07:29:10-0600

Do you need the full wave to detect pitch? Assuming a pure Sine wave (which is not what a guitar low E string is), wouldn't half a wave suffice (0 to peak to 0)? Or even a quarter wave? (0 to peak...would need to detect that maxima and falling edge)

Now assuming a regular E string which will also have a bunch of harmonics - could the harmonic makeup be used to also detect probability that it is an E string without waiting for the fundamental to complete its cycle?

Sorry if dumb questions.

Latency of pitch shifting algorithms cage fight

Goatlord

Goatlord

Short answer:​

Why pitch detection is not required​

When pitch detection is required​

Goatlord

Shredder

Goatlord

Shredder

Shredder

Goatlord

Shredder

Goatlord

Rock Star

Shredder

Goatlord

Rock Star

Shredder

Goatlord

Rock Star

Goatlord

Shredder

Roadie

Similar threads

Short answer:

Why pitch detection is not required

When pitch detection is required