Kemper Profiler MK 2

  • Thread starter Thread starter Deleted member 490
  • Start date Start date
The issue isn’t just using LUFS, it’s that null tests are useless besides from telling you if something is identical or not.

My example above shows files sounding identical, but which have underlying differences that result in a poor null. How much it nulls doesn’t really tell us anything useful at all, especially in this context. There is literally nothing of value, Leo continually using this methodology and (even worse) framing it as “scientific” needs to stop.

Appreciate your response. I think your Y/T video is really simple and easy to understand and "de-bunks" this whole thing very succinctly.

This discussion and the stuff posted here in the last few days has been a final-total-eye-opening-confirmation for me on this whole "null-testing" issue.

I like to think I am self aware enough to know when I am or have been out of my depth - hence my questions - but I try to catch up as best I can
 
Appreciate your response. I think your Y/T video is really simple and easy to understand and "de-bunks" this whole thing very succinctly.

This discussion and the stuff posted here in the last few days has been a final-total-eye-opening-confirmation for me on this whole "null-testing" issue.

I like to think I am self aware enough to know when I am or have been out of my depth - hence my questions - but I try to catch up as best I can
The other thing to bear in mind is that LUFS is sensitive to the input signals you're using. So you can do the null to get the residual, sure. But the residual depends entirely on the stimulus signal.

So if you don't use palm mutes, it won't tell you anything about palm mutes. If you only use palm mutes, it won't tell you about anything else.

Essentially, the technique doesn't tell you anything about accuracy, because the question of accuracy is how similar is this model to a reference model - and in order to answer that question, you need a multi-factorial approach. LUFS+null test simply isn't that.

A null test with a single stimulus measures reproduction accuracy for that stimulus, not system accuracy.

And I said all of this weeks ago.
 
A null test checks whether two signals produce identical outputs for a given input: you invert one signal and sum them. Perfect cancellation indicates identical outputs for that stimulus. This works extremely well for linear, time-invariant systems (like filters, IRs, or convolution engines) because for those systems, identical outputs imply identical system behavior.

But guitar amplifiers are nonlinear and dynamic. Their response depends on input level, frequency content, and signal history. Distortion, compression, bias shifts, transformer sag, and speaker interactions all contribute. A null test is only valid for the specific stimulus used. A model might do well on one DI track but behave differently with another playing style or pick attack.

Null tests are also highly sensitive to tiny gain or phase differences, latency offsets, ADC/DAC noise, or time-variant behavior. Even two recordings of the same amp can produce a noticeable residual.

Residuals can be diagnostically useful, but using integrated LUFS to rank captures or technologies is misleading. LUFS measures average loudness and masks where errors occur spectrally or dynamically. Two captures could have identical residual LUFS yet differ substantially in harmonic character, dynamic response, or transient behavior. One capture might measure “worse” but sound nearly identical in practice.

Null tests are very useful for alignment, debugging, and verifying linear blocks. But they are not reliable for ranking amp capture technologies. Accurate evaluation requires multiple tests: different signal levels, harmonic and intermodulation distortion analysis, transient response, and controlled listening experiments. A single null test and LUFS residual tells very little about overall modeling accuracy.
 
The other thing to bear in mind is that LUFS is sensitive to the input signals you're using. So you can do the null to get the residual, sure. But the residual depends entirely on the stimulus signal.

So if you don't use palm mutes, it won't tell you anything about palm mutes. If you only use palm mutes, it won't tell you about anything else.

Essentially, the technique doesn't tell you anything about accuracy, because the question of accuracy is how similar is this model to a reference model - and in order to answer that question, you need a multi-factorial approach. LUFS+null test simply isn't that.

A null test with a single stimulus measures reproduction accuracy for that stimulus, not system accuracy.

And I said all of this weeks ago.

Again. Much appreciated.
 
I think Leo just started using that method as it was showcased by Steve right around when NAM was released:



Still, something like a frequency-based analysis like seen at the end of this other vid might be overall better but it's still a static "one-shot" average representation:



In any case, Leo's vids don't really offend me personally but I'm sure if someone else lays out a better framework for gauging accuracy, we'll see a lot of folks like him follow suit.

It's still worth noting that the LUFS-I approach, for the most part, correlates to most folks' takes on the perceived accuracy of these profilers.
 
I don't think that is true.
Fair game but just going from what my take on it was when I tested them & I've seen others post online as well tho.

Kemper was clearly perceived behind QC when Neural Capture was out. ToneX vs QC were a mixed bag depending what was being profiled and most of the time I felt them as close to eachother in their v1 incarnations; NAM was audibly above all of them and those LUFS assessments, whether they're a bad take or not, kinda showed the same.

I'm not saying it's a scientific / prefered approach, but I think educating folks on the appropriate way to do it (if this is even widely documented / addopted) goes a longer way than tearing down something that's quasi-serviceable.
 
Fair game but just going from what my take on it was when I tested them & I've seen others post online as well tho.

Kemper was clearly perceived behind QC when Neural Capture was out. ToneX vs QC were a mixed bag depending what was being profiled and most of the time I felt them as close to eachother in their v1 incarnations; NAM was audibly above all of them and those LUFS assessments, whether they're a bad take or not, kinda showed the same.

I'm not saying it's a scientific / prefered approach, but I think educating folks on the appropriate way to do it (if this is even widely documented / addopted) goes a longer way than tearing down something that's quasi-serviceable.
There isn’t really a good way to correlate audibly perceived differences into some kind of numbered ranking, otherwise everyone would use that. Using null tests and comparing the depth of null is just assigning meaning onto random numbers. It seems like it should make sense but it’s way too flawed in too many ways to show anything meaningful or close to what it intends to.

This works extremely well for linear, time-invariant systems (like filters, IRs, or convolution engines) because for those systems, identical outputs imply identical system behavior.
phase differences are also LTI, so it doesn’t really work well at all. The only thing that a null test can really tell you (in this context) is whether 2 files are identical, or not.

How much they null can’t correlate to how similar certain files sound. You can have 2 files that have a terrible null and sound identical. They fail a null test, because they are different, but audibly they sound the same.
 
I think that null tests have benefited from correlating with subjective quality results. More or less, I think We all perceive NAM as the best quality capturing tech, followed by ToneX and then QC (Well, I have to say that currently I like QC more than all of them, but couldn´t certify that, since it´s a personal perception). And null tests have always shown the similar quality ranking... so one could think: yeah, null tests are telling the truth, becasue We all agree that is the accuracy order of those platforms.

And it´s very consistently been so. NAM converted files in Valetons throw worse LUFS numbers (just the same as We feel it and what logic tells us). Full NAM players throw extremely good ones (as We feel it). Kemper had the worst ones (as We feel it). I mean... it´s been a correlation which always had sense. It´s never been a horrible null test for full NAM.

Being sure, as I am, about what MirrorProfiles, Orv, and others explain about this bullshit... I think the question is: then, why on earth are nulltests always in the same line as We perceive captures quality?

The only way to end up this "null tests flu" is getting NAM to throw horrible LUFS numbers.... so all people out there get convinced once and for all... :rofl

For me it´s OK. I love QC V2 captures, and don´t care about nulltests.
 
(…) phase differences are also LTI, so it doesn’t really work well at all. (…)

Yes, even small phase or latency differences and other LTI variations can inflate residual LUFS without perceptibly affecting tone.

I just wanted to acknowledge that null tests remain valuable diagnostically for alignment, debugging, and verifying linear processing blocks, even though we agree fully that null tests and LUFS residuals alone are not suitable for perceptually ranking amp captures.
 
So A new rig manager just popped up on my screen, 4.056, the current one 4.0.53.
Do I install it? is it just the removal of the 'resonance control bug'? nothing to see here...pay no attention to the man behind the curtain... the Great and Powerful Oz is just doing some housecleaning...? No mention of it on the forum where updates usually are put up

I wouldn't have such cynical suspicion of a company I have loved and supported for at least ten years if not for recent controversy that they did too little to clear up.

edit to add: CKemper has been in the forums there the last day or so answering questions and commenting so that is a good sign at least. I put my tin foil hat away and installed the update.
 
Last edited:
I think that null tests have benefited from correlating with subjective quality results. More or less, I think We all perceive NAM as the best quality capturing tech, followed by ToneX and then QC (Well, I have to say that currently I like QC more than all of them, but couldn´t certify that, since it´s a personal perception). And null tests have always shown the similar quality ranking... so one could think: yeah, null tests are telling the truth, becasue We all agree that is the accuracy order of those platforms.

And it´s very consistently been so. NAM converted files in Valetons throw worse LUFS numbers (just the same as We feel it and what logic tells us). Full NAM players throw extremely good ones (as We feel it). Kemper had the worst ones (as We feel it). I mean... it´s been a correlation which always had sense. It´s never been a horrible null test for full NAM.

Being sure, as I am, about what MirrorProfiles, Orv, and others explain about this bullshit... I think the question is: then, why on earth are nulltests always in the same line as We perceive captures quality?

The only way to end up this "null tests flu" is getting NAM to throw horrible LUFS numbers.... so all people out there get convinced once and for all... :rofl

For me it´s OK. I love QC V2 captures, and don´t care about nulltests.
Totally agree, null test and ear test goes in the same direction always, it cant be casuality. Maybe them are not the best tests, but they works as a reference.
 
So A new rig manager just popped up on my screen, 4.056, the current one 4.0.53.
Do I install it? is it just the removal of the 'resonance control bug'? nothing to see here...pay no attention to the man behind the curtain... the Great and Powerful Oz is just doing some housecleaning...? No mention of it on the forum where updates usually are put up

I wouldn't have such cynical suspicion of a company I have loved and supported for at least ten years if not for recent controversy that they did too little to clear up.

edit to add: CKemper has been in the forums there the last day or so answering questions and commenting so that is a good sign at least. I put my tin foil hat away and installed the update.
Lettuce know if those controls disappear or stop working!
 
I put the test on YouTube too, but used a different source file and shuffled them up (so if anyone cheats by saving the previous files, they'd have to get this right too to prove me wrong):



What % difference is -infinity and -12.6 LUFS?


THANK YOU!

I don't disagree with the idea of some kind of way to test if profiles/captures/modeling is more or less accurate but the null tests as conducted are not it. There's needs to be some kind of normalization of timing/phase as part of the process, which may be something at least like calculating sample differences on a sine wave with a clean bypassed profile/capture and adjusting for that. Possibly other things that my feeble brain can't think of either.
 
THANK YOU!

I don't disagree with the idea of some kind of way to test if profiles/captures/modeling is more or less accurate but the null tests as conducted are not it. There's needs to be some kind of normalization of timing/phase as part of the process, which may be something at least like calculating sample differences on a sine wave with a clean bypassed profile/capture and adjusting for that. Possibly other things that my feeble brain can't think of either.
Phase can’t really be corrected like that because phase can vary by frequencies. You can have the timing aligned but the time difference between frequencies be slightly different.

If you try and re-align my examples, the cancellation won’t get any better.
 
RE: Accuracy.

To my ears, the very first thing I tend to spot is rolled off or attenuated high-frequencies. A lot of these machine learning models do that on purpose, because that frequency range is a lot more prone to noise errors.

The next thing I tend to spot is things being undergained. I'm not always able to spot this with big wide open chords. But when I jump between palm mutes and big wide open chords, it becomes quite apparent.

After that, the other thing that jumps out to me more often than not is some kind of cocked-wah sound that sneaks into the capture.

Kemper specifically, but beyond all of the above, there was always some kind of textural difference. Even if the gain was the same, and even if the palm mutes felt good, the texture of the distortion would often jump out to me. I think this might've been partly caused by aliasing.

I literally never evaluate any of this stuff using lead playing, because I find it harder to tell the difference between the real amp and the model under such conditions.

To me, LUFS+risidual null tests don't answer any of the questions above. They might hint in a certain direction, but generally rarely lines up with what I'm hearing as I play. And that is not something you can really communicate in a Youtube video.
 
Phase can’t really be corrected like that because phase can vary by frequencies. You can have the timing aligned but the time difference between frequencies be slightly different.

If you try and re-align my examples, the cancellation won’t get any better.

Then it almost gets back to you need to somehow have a perfect capturing device that can capture both the original and the modeled capture and compare them, right?

RE: Accuracy.

To my ears, the very first thing I tend to spot is rolled off or attenuated high-frequencies. A lot of these machine learning models do that on purpose, because that frequency range is a lot more prone to noise errors.

The next thing I tend to spot is things being undergained. I'm not always able to spot this with big wide open chords. But when I jump between palm mutes and big wide open chords, it becomes quite apparent.

After that, the other thing that jumps out to me more often than not is some kind of cocked-wah sound that sneaks into the capture.

Kemper specifically, but beyond all of the above, there was always some kind of textural difference. Even if the gain was the same, and even if the palm mutes felt good, the texture of the distortion would often jump out to me. I think this might've been partly caused by aliasing.

I literally never evaluate any of this stuff using lead playing, because I find it harder to tell the difference between the real amp and the model under such conditions.

To me, LUFS+risidual null tests don't answer any of the questions above. They might hint in a certain direction, but generally rarely lines up with what I'm hearing as I play. And that is not something you can really communicate in a Youtube video.

Agreed from a frequency response perspective, in this case I would think a pink noise average response chart would be helpful. Guessing waterfall tests may help as well but I'm way over my skis here.
 
Then it almost gets back to you need to somehow have a perfect capturing device that can capture both the original and the modeled capture and compare them, right?
In terms of capturing, it kind of works because you have a fixed reference and a fixed source. So the way ESR stuff works is to try and reduce those differences.

It makes less sense when you are comparing a capture on one platform to another capture on another - the numbers on one might look better than another but it doesn't say anything about what those differences are, or how much they matter. It's kind of why ESR numbers on their own don't tell the whole story on how accurate a model is. It gives some clues about the progress of the training of the model, but in terms of what we perceive its all kind of a wash.

Agreed from a frequency response perspective, in this case I would think a pink noise average response chart would be helpful. Guessing waterfall tests may help as well but I'm way over my skis here.
I think our perception of frequencies has a LOT to do with time domain which frequency response graphs don't convey. Something with a fast attack might be perceived as brighter vs something with a slower attack. The harmonics also affect our perception a lot, and they are very non-linear. Theres also some time domain stuff that won't really come across well in a frequency response curve.
 
Back
Top