Threaded Mode | Linear Mode

dm319 · 11-26-2024, 08:12 PM

(11-25-2024 03:58 PM)carey Wrote: It works, but working with absolute values in subsequent equations becomes unwieldly.

In what way?

(11-25-2024 03:58 PM)carey Wrote: The other way is to square the deviations, then take the mean and the square root as done in the standard deviation. In fact, the standard deviation is the root-mean-square (RMS) deviation or RMSD. If we read its name backwards (from right to left), applying one word at a time (like an RPL or FORTH program , the SD algorithm is generated.

I like that!

(11-25-2024 04:01 PM)EdS2 Wrote: Think of Pythagoras and the hypotenuse: root mean square is a distance, in a usefully general sense. Sum of absolute differences is a different kind of distance (Manhattan distance) and isn't quite so well-behaved.

I'm familiar with euclidean distances (I use it a lot in high dimensional analysis) but it's more like pythagoras than SD.

(11-25-2024 04:49 PM)carey Wrote: e.g., the standard deviation is needed to define some continuous functions, e.g., the Gaussian (normal) distribution, Since the Central Limit Theorem ensures Gaussian distributions occur over a wide range of typical experimental conditions, this suggests that the standard deviation is Nature's way to characterize variation.

\[
Gaussian function = f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right),
\]

This seems to be more convincing, but I'll need to think about this more. In my mind SD had little to do with normal distribution other than we are familiar with the SD in a normal distribution covering 68/95/99.7 in 1/2/3. Having said that, the SD is equally spaced along the normal distribution - is this what you mean? I.e. that the SD somehow defines the normal distribution? I need to think about this one...!

In my mind, squaring the values, and then finding the mean of the variance is going to bias towards larger deviations.

Also, it seems like it's using a property of maths for convenience (square then square-root giving a positive value). Maybe instead of RMSD, we could have had MRSD - take the root first then then mean, which gives you the mean of absolute deviation.

D

dm319 · 11-26-2024, 10:58 PM

(11-26-2024 08:12 PM)dm319 Wrote: This seems to be more convincing, but I'll need to think about this more. In my mind SD had little to do with normal distribution other than we are familiar with the SD in a normal distribution covering 68/95/99.7 in 1/2/3. Having said that, the SD is equally spaced along the normal distribution - is this what you mean? I.e. that the SD somehow defines the normal distribution? I need to think about this one...!

OK, so the gaussian/normal distribution does seem to naturally use the SD/variance as part of its description. From what I've read the mean absolute deviation is related to the SD by a constant for a given normal distribution.

I found this interesting discussion on stack overflow:
https://stats.stackexchange.com/question...-deviation

especially about the euclidean/manhattan and minowski distance.

carey · (This post was last modified: Yesterday 12:55 AM by carey.)

dm319, thank you for asking these questions!

(11-26-2024 08:12 PM)dm319 Wrote:
(11-25-2024 03:58 PM)carey Wrote: It works, but working with absolute values in subsequent equations becomes unwieldly.

In what way?

Deriving equations when some terms include absolute values can be challenging, e.g., moving these terms from one side of an equation to the other and eventually removing the absolute value brackets and getting the signs and inequality symbols correct.

(11-26-2024 08:12 PM)dm319 Wrote:
(11-25-2024 04:49 PM)carey Wrote: e.g., the standard deviation is needed to define some continuous functions, e.g., the Gaussian (normal) distribution, Since the Central Limit Theorem ensures Gaussian distributions occur over a wide range of typical experimental conditions, this suggests that the standard deviation is Nature's way to characterize variation.

\[
Gaussian function = f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right),
\]

This seems to be more convincing, but I'll need to think about this more. In my mind SD had little to do with normal distribution other than we are familiar with the SD in a normal distribution covering 68/95/99.7 in 1/2/3. Having said that, the SD is equally spaced along the normal distribution - is this what you mean? I.e. that the SD somehow defines the normal distribution? I need to think about this one...!

The normal distribution cannot be defined without the standard deviation (it can be approximated without the standard deviation, but that's not the same). The actual definition of the normal distribution function requires the standard deviation and since the normal distribution is the typical distribution of experimental uncertainties (guaranteed by the Central Limit Theorem), it is not an exaggeration to claim that the standard deviation is Nature's measure of variation or uncertainty.

(11-26-2024 08:12 PM)dm319 Wrote: In my mind, squaring the values, and then finding the mean of the variance is going to bias towards larger deviations.

Yes, but note how this is bias cancels a similar bias in applications involving the standard deviation, e.g., (i) defining the normal distribution and (ii) the outlier problem in least squares regression.

1) The numerator of the normal distribution includes squared deviations that suffer from a similar bias. However, the denominator of the normal distribution includes the square of the standard deviation. This has the effect of weighting contributions of deviations inversely with the squares of their uncertainties. Hence, large deviations (which usually have larger uncertainties) contribute less.

2) This is also how the outlier problem of ordinary least squares regression (where large deviations are given extra weight due to squaring) is solved. By dividing by the square of the standard error, each contribution is weighted inversely with the square of its uncertainty. As large deviations usually are more uncertain they are weighted less. The name for least squares weighted inversely by the square of the standard errors is called chi-squared (related to, but not the same as, the chi-squared used in counting experiments).
\[
\chi^{2} = \sum{\left(\frac{(x - \mu)^2}{\sigma^2}\right)}
\]

(11-26-2024 08:12 PM)dm319 Wrote: Also, it seems like it's using a property of maths for convenience (square then square-root giving a positive value). Maybe instead of RMSD, we could have had MRSD - take the root first then then mean, which gives you the mean of absolute deviation.

While SD is more convenient than using absolute values, the rationale for the RMSD algorithm is rooted in its occurrence, by necessity (not convenience), in the naturally occurring normal distribution.

Albert Chan · Yesterday, 01:04 AM

(11-26-2024 08:12 PM)dm319 Wrote:
(11-25-2024 03:58 PM)carey Wrote: It works, but working with absolute values in subsequent equations becomes unwieldly.

In what way?

Here, I look at computation part of deviation calculations, not their merit.

With absolute deviation, you can't start sum deviations without a final mean.
This implied we may need to scan data set in 2 passes.

Also, adding or removal a data point may require recalculations of all deviations.
(we can keep data sorted, but that is also expensive, both time and space)

With variance, you can add or remove element, for instant O(1) update.
If you only need {µ, σ} statistics, you don't even need to store the data.

You don't even need a 'final' mean, and can do running mean and variance.

John D. Cook, Accurately computing running variance

Σ defined from i = 1 to k-1
Let m = Σ(xi) / (k-1)
Let S = Σ(xi - m)^2

Now, we add a data xk to the mix, for new mean and variance.

mk = ((k-1)*m + xk)/k = (k*m + (xk-m))/k

Running mean: mk = m + z, where z = (xk - m) / k

Sk = Σ(xi - mk)^2     + (xk - mk)^2
     = Σ((xi-m)-z)^2    + (xk - mk)^2
     = S + (k-1)*z^2   + (xk - mk)^2            -- NOTE: Σ(xi - m) = 0
     = S +  k*z  * z      + (xk - mk)^2 - z^2
     = S + (xk - m) * z + (xk - m) * (xk - mk - z)

Running sample variance: Sk = S + (xk - m) * (xk - mk) --> s^2 = Sk / (k−1)

Formulas also work when we remove xk from data set. (back solve {m, S} from {mk, Sk})

Albert Chan · Yesterday, 01:09 PM

(11-25-2024 04:49 PM)carey Wrote: \[
Gaussian function = f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right),
\]

OT, I find standard normal (µ=0, σ=1) less confusing.
Above f(x) has unit of 1/σ, but pdf(z) is dimensionless.

pdf(z) := exp(-z*z/2) / sqrt(2*pi) , where z = (x-µ)/σ

Dimensionless equation is better. More compact, and *only* 1 shape
We have a rough idea of probability without calculations, 68–95–99.7 rule

Also, with standard normal, we can solve for missing µ, σ

carey · (This post was last modified: Yesterday 06:34 PM by carey.)

Albert, All good points!

(Yesterday 01:09 PM)Albert Chan Wrote:
(11-25-2024 04:49 PM)carey Wrote: \[
Gaussian function = f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right),
\]

OT, I find standard normal (µ=0, σ=1) less confusing.
Above f(x) has unit of 1/σ, but pdf(z) is dimensionless.

pdf(z) := exp(-z*z/2) / sqrt(2*pi) , where z = (x-µ)/σ

1) The fact that the normal distribution has “unit of 1/σ” supports the claim that σ is Nature’s measure (unit of) variability.
2) Interestingly, σ has no units other than the units of the numbers used to calculate it. If numbers only are used, then σ is unitless.

(Yesterday 01:09 PM)Albert Chan Wrote: Dimensionless equation is better. More compact, and *only* 1 shape

Yes, there are many benefits of non-dimensionalization…except one: model testing, because models have units! For example, while y=t is a linear function, it doesn’t become a family of one-parameter linear models until we introduce a parameter, e.g., y = A t, and that doesn’t become a specific one-parameter linear model until A is a number (with units), y (m) = 4.02 (m/s) t (s), which can be tested experimentally.

dm319 · Yesterday, 07:56 PM

Great discussion and thoughts Carey and Albert!

I always find it strange how the very foundations of statistics can be so complicated...

carey · (This post was last modified: Yesterday 11:15 PM by carey.)

(Yesterday 07:56 PM)dm319 Wrote: Great discussion and thoughts Carey and Albert!

I always find it strange how the very foundations of statistics can be so complicated...

Yes, and toss in the debate between the frequency and Bayesian approaches to statistics and the complications grow even worse!