Threaded Mode | Linear Mode

Claudio L. · (This post was last modified: 07-30-2018 06:52 PM by Claudio L..)

I recently implemented an adaptive Simpson, almost straight from Wikipedia's article.

I immediately noticed that on every test I threw at it, it produces a result that is consistently 2 or 3 digits more accurate than requested.
Since the algorithm halves the step on each pass, every additional digit requested needs roughly 3 more "splits". Roughly speaking, each time it splits the interval in 2, we need twice as many function evaluations as the step before, so those last 3 to 6 extra splits can be extremely costly in terms of time.

The first thing I notice is that this algorithm halves the requested tolerance every time it splits. Basically it's saying Error=Error_Left+Error_Right, therefore they want Error_Left and Error_Right to be less than half the requested tolerance to make sure it meets the requirement.
This seems to me quite conservative, once you are doing hundreds or thousands of steps, shouldn't the total error be proportional to the sqrt(N)*AvgError with N being the number of intervals considered?
If so, we could perhaps use 1/sqrt(2)=70%, and whenever we split an interval in half, simply request 70% of the error instead of 50%.

And there's also a magic factor 15 that I don't know where is coming from, perhaps that's another factor to play with.

EDIT:
For example (from another thread): Integral of exp(-x) between 0 and 500.
If I request 1e-8 tolerance, the result is:
1.00000000000704832....
This case has 2 more digits than requested.

Dieter · 07-31-2018, 09:35 AM

(07-30-2018 06:42 PM)Claudio L. Wrote: The first thing I notice is that this algorithm halves the requested tolerance every time it splits. Basically it's saying Error=Error_Left+Error_Right, therefore they want Error_Left and Error_Right to be less than half the requested tolerance to make sure it meets the requirement.
This seems to me quite conservative, once you are doing hundreds or thousands of steps, shouldn't the total error be proportional to the sqrt(N)*AvgError with N being the number of intervals considered?

Maybe a few general thoughts on this kind of algorithm may help.

First of all, adaptive Simpson methods have already been discussed here. Take a look at this thread and at this program.

The idea basically is to estimate the error of Simpson approximations with n and 2n intervals. This yields an even more accurate result with the value (16·S_2n – S_n)/15. That's where the 15 comes from.

For more details on this take a look at the German Wikipedia article where this is explained. The error term E^(N)(f) with the 16/15 factor is discussed earlier in that article.

As far as the threshold for quitting the iteration is concerned: this is a general problem with any method that produces successive approximations of the true result. Since the latter is unknown all you can do is compare the last two approximations, i.e. the last and the previous one. If these two agree within the desired accuracy, the iteration stops and the result is returned. But – if the final and the previous value agree in the desired, say, 8 places, this means that already the previous value was sufficiently accurate! And since the last approximation is more accurate than this, it may be correct to e.g. 11 places. So its true accuracy is greater than requred. That's what you observe here.

(07-30-2018 06:42 PM)Claudio L. Wrote: For example (from another thread): Integral of exp(-x) between 0 and 500.
If I request 1e-8 tolerance, the result is:
1.00000000000704832....
This case has 2 more digits than requested.

Yes, that's the ex-post accuracy when comparing the result with the true value. But the algorithm doesn't know the true result. All it can do is compare the final and the previous approximation.

Here and there the exit condition can be improved. Assume you know for sure (!) that the iteration converges quadratically and the difference between the last two approximations was 1E–2, 1E–4 and 1E–8. If the next approximation can be expected to change the result only in the 16th digit this means that the current (!) approximation already is this precise. So if you need 12 places you may stop now although the last two approximation only agreed in 8 places. Because one more iteration will only change the result in the 16th place.

BTW, integrating exp(–x) up to 500 doesn't make sense. At x=20 the function value has dropped to about 2E–9 so anything beyond this will not affect the first 8 decimals.

Dieter

Albert Chan · 07-31-2018, 11:42 AM

(07-30-2018 06:42 PM)Claudio L. Wrote: For example (from another thread): Integral of exp(-x) between 0 and 500.
If I request 1e-8 tolerance, the result is:
1.00000000000704832....
This case has 2 more digits than requested.

Was the thread Casio fx-991EX vs Hp 50g speed differences ?

Tolerance setting is against successive iterations, not against true result.
If the algorithm knew the true result, why ask for tolerance ?

The extra precision may be just lucky (the integral is well-behaved)
Integral convergence rate may stall or (rare, but possible) even got worse on some iterations.

Worse, integral may converge very well, but converged to a wrong result
See above thread post#12, integral of sin(x), from 0 to 200.

So, if time permitted, double check the numbers with other methods.

Claudio L. · (This post was last modified: 07-31-2018 05:05 PM by Claudio L..)

(07-31-2018 09:35 AM)Dieter Wrote: Maybe a few general thoughts on this kind of algorithm may help.

First of all, adaptive Simpson methods have already been discussed here. Take a look at this thread and at this program.

The idea basically is to estimate the error of Simpson approximations with n and 2n intervals. This yields an even more accurate result with the value (16·S_2n – S_n)/15. That's where the 15 comes from.

For more details on this take a look at the German Wikipedia article where this is explained. The error term E^(N)(f) with the 16/15 factor is discussed earlier in that article.

Thanks, now I know where the 15 comes from, yet we are talking very different algorithms from the one you provided links to. I'm not even sure why they still call the adaptive one "Simpson", since you are not using the summation that is the hallmark of the Simpson method, but the error analysis still applies.

(07-31-2018 09:35 AM)Dieter Wrote: As far as the threshold for quitting the iteration is concerned: this is a general problem with any method that produces successive approximations of the true result. Since the latter is unknown all you can do is compare the last two approximations, i.e. the last and the previous one. If these two agree within the desired accuracy, the iteration stops and the result is returned. But – if the final and the previous value agree in the desired, say, 8 places, this means that already the previous value was sufficiently accurate! And since the last approximation is more accurate than this, it may be correct to e.g. 11 places. So its true accuracy is greater than requred. That's what you observe here.

I understand, and I agree with your logic more than the wikipedia article for adaptive simpson: if the difference between the areas calculated with a step h and the one calculated with h/2 is less than the target error then we should be good, right?
The thing is, every time it halves the step size, it also halves the target error, and that's what I'm questioning if perhaps is too conservative, leading to the overshoot in accuracy. When you add all the tiny pieces back together the error of the sum will never be the sum of the estimated errors because they are randomly going to cancel each other, it's more like:

Error_whole ≅ √N * AvgError

Couldn't we at least say that (for one single "split" section)

Error_whole ≅ √2 * AvgError

So we could say our target error wouldn't be halved each way, but:

AvgError = Error_whole/√2 ≅ 0.7 * Error_whole

So we accept slightly more error as we reduce the step, knowing that when we look at the whole interval, we are going to be very close to meet the required error target. I guess the other approach tries to guarantee you'll meet the target, versus this approach wants to get close to the intended target, not necessarily guarantee it, but that could help reduce the number of unnecessary subdivisions.

(07-31-2018 09:35 AM)Dieter Wrote: Here and there the exit condition can be improved. Assume you know for sure (!) that the iteration converges quadratically and the difference between the last two approximations was 1E–2, 1E–4 and 1E–8. If the next approximation can be expected to change the result only in the 16th digit this means that the current (!) approximation already is this precise. So if you need 12 places you may stop now although the last two approximation only agreed in 8 places. Because one more iteration will only change the result in the 16th place.

Isn't the error proportional to h^4? Anyway, I'm not so sure how this one behaves because the function is subdivided more in the areas where it's less well-behaved and less in other areas, as long as the target error is met.
It's basically "minimal effort to locally achieve a target error".

You do have a good point, perhaps the overshoot in error is because halving the step increases the accuracy a lot more than what we needed.

(07-31-2018 09:35 AM)Dieter Wrote: BTW, integrating exp(–x) up to 500 doesn't make sense. At x=20 the function value has dropped to about 2E–9 so anything beyond this will not affect the first 8 decimals.

Dieter

I know, it came up on another thread as a speed test, and I wanted to compare speeds as well. But I tested with several other functions (polynomials, combined exponentials with trig, trig alone, etc) and I always get a few more digits than I requested, at the expense of a huge amount of wasted time.
For that example, using 1E-8 as a target error, I get good 11 digits in 3 seconds (doing 741 function evaluations), now if I request 11 digits I get 13 good digits but it takes 15 seconds doing 4293 function evaluations.
This is the real reason why I'm seeking advice to see if perhaps we could do better.

Claudio L. · 07-31-2018, 06:37 PM

Quote:Worse, integral may converge very well, but converged to a wrong result
See above thread post#12, integral of sin(x), from 0 to 200.

Great example, this is a badly behaved integral, all numerical methods will suffer due to massive cancellation.
So I requested a tolerance of 1e-6 and my algorithm converged to a completely wrong value in a couple of seconds. It clearly stopped prematurely, which can happen because of the function, not really anything bad about the algorithm.
Then I requested 8 digits, giving a tolerance of 1E-8. Now it took some time (didn't time it but felt close to a minute) and gave a good answer with 16 digits correct. It seems now the tighter tolerance avoided the premature stop and the algorithm was able to deal with the cancellation of areas. I would've expected all the smaller areas to cancel each other out with errors around 1E-10, to provide a final area close to the 1E-8 requested tolerance, but it seems it had to subdivide a lot to get the result.

Albert Chan · 07-31-2018, 07:53 PM

Hi, Claudio

The problem with the sin integral is not cancellation, but its periodic nature.
Simpson's Rule is also based on periodic sampling (all equally spaced).

If the sampling and periodic function were in sync, the samples are going to be biased.
For sin(x) 0 to 200, the first few iterations, none of sin(x) samples were above zero.

A non-linear transformed function can fix this, by scrambling the sample points.
Have you tried the non-linear transformed sin integral (same thread, post #13) ?

---

Regarding "excessive" accuracy, it will happen even if tolerance is good estimater for accuracy.
Not all integral have the same convergence rate.

Say, tolerance of 1e-8 somehow guaranteed 7 digits accuracy (it does not)
What is the chance of really getting 7 digits accuracy ? Almost zero.

With guaranteed minimum accuracy, average accuracy is going to be higher, say, 10 digits.
Iterations that not quite make it to tolerance will doubled the points, "wasting" accuracy.

Claudio L. · 07-31-2018, 08:40 PM

So I modified the algorithm to not change the error tolerance at all, so the tolerance stays constant no matter how tiny is the subdivision (in other words, we halve the step, keep the tolerance the same).

I did the SIN(x) from 0 to 200 with 1E-8 tolerance and got the following results:
Original algorithm: Error = -9.2E-16, number of evaluations = 28277
With fixed tolerance: Error = 2.0E-11, number of evaluations = 4025

Tried a couple of other functions as well and got errors much closer to the requested tolerance, so I think I will leave it with the fixed tolerance. It seems to be a closer match to the final tolerance that the user is requesting, and it's a time saver. For smaller tolerances it again over-delivers in accuracy on most functions (if I request 16 digits I get 21 for example).

Vtile · 07-31-2018, 09:06 PM

Sorry again replying half sleep from phone. Derailing a bit, but the title took my attention. Is there adaptive intehration method which would use simple derivative (or constant lenght vector) to analyse the function to see if there is proportionally rabid transition and therefore need for higher accuracy. Propably yes and propably computionally extremely inefficient.

Dieter · (This post was last modified: 08-01-2018 08:21 AM by Dieter.)

This is a reply to Claudio L's post #4 in this thread. Since the "Quote" button currently doesn't seem to work I have to do it this way:

Quote:Thanks, now I know where the 15 comes from, yet we are talking very different algorithms from the one you provided links to. I'm not even sure why they still call the adaptive one "Simpson", since you are not using the summation that is the hallmark of the Simpson method, but the error analysis still applies.

If you refer to the HP-41 program I linked to, or the VBA version (post #8) in the other thread: they both use the classic Simpson method. The sums with weight 4 and 2 are summed separately and a Simpson approximation is calculated from these. Then the number of intervals is doubled and the two sums are added together to get the new weight-2-sum. This way only the new points have to be calculated which sum up to the new weight-4-sum.

The additional "non-Simpson"-step then is the extrapolation (16·S_new – S_old)/15 which gives an improved, more accurate value for the integral. When two successive values of this improved approximation match with the desired accuracy the iteration exits.

Take a look at the VBA code: simp_old and simp_new are the regular Simpson approximations for n/2 and n intervals, whereas improved_old and improved_new are the extrapolated improved approximations.

Both the HP-41 program and the VBA version in the 2016 thread only use the last two Simpson approximations to calculate a new, improved integral. In his recent answer Albert Chan suggested using all previous Simpson approximations, similar to the Romberg method. This is another worthwile option, but I didn't want to work with arrays, so only the last two approximations were used.

Quote:You do have a good point, perhaps the overshoot in error is because halving the step increases the accuracy a lot more than what we needed.

The essential point here is that we cannot and do not know the error of the calculated approximation. We can't say for sure if it's 7, 8 or 9 digits that match the true result. All we got is an upper bound, and even this is only reliable if we assume that the iteration converges to the true result. In this case the upper error bound is the difference between the last two approximations. If they match in 6 decimals this can mean that the result is accurate to 6, 7, 8 or even more digits. But we can't say anything more than this.

To make this more clear, take a look at the Simpson approximations for the integral of f(x)=1/x from 1 to 2:

Code:

   n      Standard Simpson    Error

 -------------------------------------

   2      0,694444444444      1,3 E-03

   4      0,693253968254      1,1 E-04

   8      0,693154530655      7,4 E-06

  16      0,693147652819      4,7 E-07

  32      0,693147210290      3,0 E-08

  64      0,693147182421      1,9 E-09

 128      0,693147180676      1,2 E-10

Assume you want an accuracy of six decimals. The Error column shows that n=16 already met that target. But we can't know this during the calculation unless we also know the true value of the integral! The Simpson approximations for n=16 matches the previous one (n=8) in (nearly) 5 decimals, so after calculating the n=16 iapproximation there are only two things we can say:

1. The n=8 approximation agrees with the (most probably) more accurate n=16 approximation when rounded to 5 decimals (0,69315). So already the n=8 approximation probably had 5 valid decimals. Which is what we know after having calculated the n=16 approximation.

2. The n=16 approximaion most probably is more accurate than this, but we can't say by how much. It may have 6 valid decimals or more.

Dieter

Albert Chan · 08-01-2018, 02:23 PM

I finally have the time to learn about Adaptive Simpson's method.
I were misled to believe it is plain Simpson + correction (/15 only) ... Sorry

It is not one integral with increasing steps for better accuracy,
but increasing number of mini-integrals (2 intervals, thus 3 points) sum together.

The question really is, when sub-dividing integral into a bunch of recursive
mini-integrals, what should their tolerance be ?

Mini-integrals don't talk to each other, so don't know which is the more important.
Adaptive Simpson method have to be conservative, tolerance cut in half for each step.
So, even if both splitted integrals equally important, total error still below tolerance.

This is probably an overkill, but without knowing the integral, it is the best it can do.
For one sided function, say exp(-x), tolerance can really stay the same all the way down.

--

By setting a tolerance, adaptive Simpson rule ignore the "details", and
concern itself with the dominant sub-divided integrals, thus is faster.

This is just my opinion, speed-up by ignoring the details come with costs:

not recommended for function where details matter (dominant peaks cancelled out)
not recommended for non-linear transformed integral, with details suppressed.
But, how to ensure the integral not behaved like a non-linear transformed type ?
we need information about approximate size of expected result, to set tolerance.
tolerance too loose, important details will be ignored, and may get wrong result.
tolerance too tight, it may reach software recursion limit and crashes.
Since corrections are really extrapolation, by the time the corrected mini-integral
results cascade back to the top, result might be totally off.
Good corrections should be for actual iterations, not corrected estimates (which the cascade does).
I learn about this from Dieter's post regarding Namir's basic code (version 2) bug.
Hp thread: Simpson Revival, post #6
returned result have higher uncertainly due to all of above

For example, this transformed exponential integral will not work well with Adatpive scheme:

\(\int_0^{500}e^{-x}dx \) = \(\int_{-1}^{1}375(1-u^2) e^{-(125u (3 - u^2) + 250)} du \) = 1 - \(e^{-500}\) ~ 1.0

OTTH, if used correctly, this is a great tool. For example:

I(1000) = 4 \(\int_{0}^{\pi/2} cos(x) cos(2x) ... cos(1000x) dx \) = 0.000274258153608 ...

Adaptive Simpson Method (eps = 1e-9) were able to give 11 digits accuracy in 0.5 sec
Romberg's Method can only reached 8 digits accuracy ... in 400 seconds !

Thanks, Claudio

Claudio L. · 08-01-2018, 04:28 PM

Quote:Dieter wrote:
The additional "non-Simpson"-step then is the extrapolation (16·Snew – Sold)/15 which gives an improved, more accurate value for the integral. When two successive values of this improved approximation match with the desired accuracy the iteration exits.

The adaptive algorithm I implemented uses the exact same extrapolation, somewhat disguised in the code but it's there as Snew+ (Snew-Sold)/15 . It does it for each individual strip.

Quote:Dieter wrote:
The essential point here is that we cannot and do not know the error of the calculated approximation.

Exactly, which means we can play with the criteria, since "nobody" knows the right answer. Ideally I want the final output to be close to the tolerance the user provided, so if the user wants 1E-8 we don't waste a lot of time computing beyond that.
So I think my relaxed tolerance, which still overshoots accuracy by a long shot, is a fair compromise.

Quote:Albert Chan wrote:
The question really is, when sub-dividing integral into a bunch of recursive
mini-integrals, what should their tolerance be ?

Mini-integrals don't talk to each other, so don't know which is the more important.
Adaptive Simpson method have to be conservative, tolerance cut in half for each step.
So, even if both splitted integrals equally important, total error still below tolerance.

This is probably an overkill, but without knowing the integral, it is the best it can do.
For one sided function, say exp(-x), tolerance can really stay the same all the way down.

Yes, normally I wouldn't care and agree with you that it's better to be conservative. However, we are deciding between getting the answer you want within a few seconds or waiting a long time (in other words, whether the calculator would be useful or not to the user).
On a PC, both cases would feel instantaneous, so it wouldn't make sense to sacrifice accuracy.
Another good point (that Dieter made clear) is that we are really comparing that tolerance with the difference between two consecutive runs, on a little single strip, which is very loosely related to the actual tolerance on the total area, and I think that's why there is a disconnect between the tolerance the user requests on the final result and the tolerance we should be comparing to.
In the end, the user is getting way more than requested, at the expense of a lot of wasted time and coffee.

Albert Chan · (This post was last modified: 08-01-2018 06:01 PM by Albert Chan.)

Now I see why you set a fixed tolerance down the chain.
A slightly off answer is better than no answer.

But, are you better off keep the algorithm as-is, and raise tolerance ?
(To take advantage of the "wasted" accuracy)

To get 6 digits accuracy, say, set tolerance to 1e-4 ?

I do that for my Guassian Quadrature routine, expecting 2 extra good digits.
So, if difference between 2 iteration is +/- 1e-4, last result is about 6 digits accurate.

Edit:
another trick is to limit recursion depth, say, 10 levels deep.
This forced the calculator to give a reasonable answer in seconds.