Deviation of mean

Few days ago while conducting some experiments in Octave, I've noticed a weird behaviour. For array containing several equal elements mean value was significantly different from actual value of each of them...I believe that this is due to non-uniform distribution of float-point values when they are represented in form ofย double type, but sudden encountering of this phenomenon gives a surprise feeling at first.

I've decided to research this topic a bit more and tested the effect various array sizes and powers of array value.

  1. arrs = zeros(100);
  2. for p = 1:100
  3. for N = 1:100
  4. same_vals = repmat(randn*10^p, N*100, 1);
  5. arrs(p, N) = abs(same_vals(1)-mean(same_vals));
  6. end
  7. end

So now arrs stores absolute values of differences between mean values and actual values of elements.
Figure 1 shows deviation of array mean value from array values in log-scale (so it is simply imagesc of log(arrs)).

Figure 1
Figure 1. Deviations of mean values from actual values.

The deviation seems to be independent from the size of the array, at least for sizes >= 100. So, change of deviation with power of array value could be examined using only some selected array length, say N = 7000. This dependency is visualized in figure 2.

Figure 2. Linear dependency between logarithm of deviation and power of array value.
Figure 2. Linear dependency between logarithm of deviation and power of array value.

In the process of data generation array value had been sampled from the normal distribution and multiplied by 10 to some power. Randomization has been introduced to prevent undesirable effects that could be caused by accident selection of some "special" number for all tests. Nevertheless, value of the power can be used as a rough estimate of base-10 logarithm of array value.

By eyeballing figures, one can conclude that relation between deviation of mean value and actual value of array elements is close to linear. Tests involving more points for each array values interval should possibly reveal some nonlinearities especially for higher powers, but anyway the magnitude of deviation for mean should be expected a dozen orders less than actual value of array elements. I guess it is very rear (and most probably unwise) to work with numbers of such magnitudes that this effect gets noticeable, but it is good to be aware of it.

Same analysis should work for other functions that operate with floating-point data. So, don't be scared, when you got large values where zero is expected: maybe the problem is not in algorithm, but in data? ๐Ÿ˜‰

2 thoughts on “Deviation of mean”

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.