Threaded Mode | Linear Mode

DavidM · (This post was last modified: 03-16-2018 11:50 PM by DavidM.)

(03-16-2018 09:47 PM)pier4r Wrote: I do not know if LSHUF is changed much since 1.1.0d (that is in one of my 50g, as I hadn't the need to update it), but it seems to behave a bit differently than random picks.

Changes were definitely made to LSHUF since that version. Although I don't know of anything specific to the 1.1.0d LSHUF that would cause that type of problem, it's certainly possible.

Is there some reason you haven't updated the library since then? There's been quite a few additions, and some rather significant speed improvements for several commands. Why not update?

pier4r · (This post was last modified: 03-17-2018 08:40 AM by pier4r.)

No particular reason. I learned the hard way that when I can move "according to necessity" is better so I keep doing it everywhere.

A little example that applies on this page: http://www.wiki4hp.com/doku.php?id=resources:start or here http://www.hpmuseum.org/forum/thread-9090.html instead of trying to find all the possible resources in the shortest time possible (although I ask for suggestions to speed up the thing), I settle on "if I find it in my explorations, I will add it". Otherwise the task feels overwhelming more often than not.

Same with software. If I need the update, I update the software. For example: otherwise I must pay (windows 7 to windows 10 update); or because of new feature (android apps or your library), or because the current version is not stable, or due to security concerns, etc..

This while I enjoy the development of a solution (in this case your library) that I know is great.

So now I have a nice need. I will test the LSHUF on 1.1.0d and then I update on the last version to see if there are changes. I'll post the test result soon.

Side note. It is interesting that most of the time only test programs, especially if they have to accumulate statistics, are what runs for long time on my 50g. Test driven usage!

pier4r · 03-17-2018, 10:14 AM

Anyway an additional list creation request, due to the testing being made (n1), is something like

LRSEQ (or whatehever the name)
@input:
@L2: a list of elements
@L1: size of the output list, say 'n'
@output:
@L1: a list of size 'n' containing elements of the input list randomly chosen.

A possible userRPL code at the moment (not in a program, typed on the fly).

Code:

{ 10 20 30 }

6 DUP 'listSize' STO

SWAP 'listEL' STO

1 SWAP 

START listEL LSHUF HEAD NEXT

listSize \->LIST

presuming LSHUF working as well as RAND. I am testing this indeed.

n1: For 20 iterations it will likely take 8 hours , given two different versions, 16 hours. For this I say that my 50g is mostly spending time on tests, that require random input and data collection, rather than problems that are solved quickly.

DavidM · 03-17-2018, 03:12 PM

(03-17-2018 10:14 AM)pier4r Wrote: presuming LSHUF working as well as RAND. I am testing this indeed.

n1: For 20 iterations it will likely take 8 hours , given two different versions, 16 hours. For this I say that my 50g is mostly spending time on tests, that require random input and data collection, rather than problems that are solved quickly.

One thing to keep in mind about your method:

Calling LSHUF to randomize a list just to get the HEAD entry will always be much slower than using <list> DUP SIZE RAND * IP 1 + GET. In the former case, you are essentially exploding the list, calling RAND for every list element and then swapping positions, then rebuilding the entire list only to then extract the first entry. In the latter, you are only determining a random pick once and extracting it. Much less work to do for that one, so it will be much faster. A quick test on even a small list (4 elements) shows the 'LSHUF' method takes about .057s whereas the 'RAND' method only takes .021s. The LSHUF method will get even worse as list sizes grow.

As it appears that you are running long loops of these types of tests, performance of the algorithm is important to consider.

Another consideration: could some of these initial tests be performed on a 50g emulator? It appears that there's a need to "test the test" first, and you might be able to validate your test method with a smaller sample size with much faster results on an emulated 50g than doing the full-sample-set test on the real hardware.

pier4r · (This post was last modified: 03-17-2018 09:16 PM by pier4r.)

For the speed, thanks for sharing. I imagined that LSHUF would have been costly on a large list, but I thought on small list (less than 100 elements) the difference would be negligible. It seems not from what you wrote.

Anyway before speed, come effectiveness. So I am testing the effectiveness now, whather it takes in time.

For the "wouldn't it be better to test if the test works on a smaller input rather than waiting 8 hours for maybe nothing?" I agree. While I fall in this trap from time to time (damn me), I am slowly making progress so I already tested the test on one iteration with 4 elements building a list of size 6 (max 4096 possibilities). It took 25 minutes (I expected less, but I did not consider the LSHUF slow down as you wrote) and so from this I estimated 8 hours for 20 iterations.

It worked as expected, or better, it worked but the results of LSHUF (1.1.0d) are not that encouraging.

testing code

Code:

gpLSHUFtestArg

  \<<

    @execute to get the arguments

    { 10 20 30 40 } @list of elements

    @produce a list of N picks of elements that is the goal list

    1 6

    START 

      { 10 20 30 40 } @list of elements

      4 RAND * 1 + IP

      GET

    NEXT 

    6 \->LIST

    1 @number of simulations

  \>>

  gpLSHUFtestEng

  \<<

    {} "lvAttempsLSHUF" DROP

    {} "lvAttempsRAND"  DROP

    0  "lvListElemSize" DROP

    0  "lvListGoalSize" DROP

    0  "lvAttemps1Sim"  DROP

    0  "lvMaxAttempts"  DROP

    {} "lvListRes"      DROP

    0 "lvGoalFound"    DROP

    \->

    @input

    lvListElem

    lvListGoal

    lvSimNum

    @var

    lvAttempsLSHUF

    lvAttempsRAND

    lvListElemSize

    lvListGoalSize

    lvAttemps1Sim

    lvMaxAttempts

    lvListRes

    lvGoalFound

    \<<

      lvListElem SIZE 'lvListElemSize' STO

      lvListGoal SIZE 'lvListGoalSize' STO

      lvListElemSize lvListGoalSize ^ 'lvMaxAttempts' STO

      WHILE

        lvSimNum 0 >

      REPEAT

        @####

        @LSHUF attempt

        0 'lvAttemps1Sim' STO

        0 'lvGoalFound' STO

        WHILE

          lvAttemps1Sim lvMaxAttempts <

          lvGoalFound NOT

          AND

        REPEAT

          1 lvListGoalSize

          START

            lvListElem LSHUF HEAD

          NEXT

          lvListGoalSize \->LIST

          IF

            lvListGoal ==

              @if equal to the goal

          THEN

            1 'lvGoalFound' STO

          END

          'lvAttemps1Sim' 1 STO+

        END

        'lvAttempsLSHUF' lvAttemps1Sim STO+

        lvAttemps1Sim "AttempsLSHUF" \->TAG

          @dropping on the stack if the simulation is interrupted

        @####

        @RAND attempt

        0 'lvAttemps1Sim' STO

        0 'lvGoalFound' STO

        WHILE

          lvAttemps1Sim lvMaxAttempts <

          lvGoalFound NOT

          AND

        REPEAT

          1 lvListGoalSize

          START

            lvListElem 

            lvListElemSize RAND * 1 + IP

            GET

          NEXT

          lvListGoalSize \->LIST

          IF

            lvListGoal ==

              @if equal to the goal

          THEN

            1 'lvGoalFound' STO

          END

          'lvAttemps1Sim' 1 STO+

        END

        'lvAttempsRAND' lvAttemps1Sim STO+

        lvAttemps1Sim "AttempsRAND" \->TAG

          @dropping on the stack if the simulation is interrupted

        'lvSimNum' 1 STO-

      END

      lvListGoal "listGoal" \->TAG

      lvMaxAttempts "MaxAttempts" \->TAG

      lvAttempsLSHUF "AttempsLSHUF" \->TAG

      lvAttempsRAND "AttempsRAND" \->TAG

    \>>

  \>>

21 iterations results (time is not the main focus)

AttempsLSHUF: { 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. }
the one iteration done before hand is also 4096. So LSHUF 1.1.0d was never able to produce a solution before doing enough attempts to go through all the possibilities.

AttempsRAND: { 2018. 1663. 1779. 4096. 2131. 370. 342. 2049. 2220. 4096. 2832. 822. 4096. 1305. 4096. 3202. 1997. 3936. 612. 3866. }
The one iteration I had before returned 2574.
Those made by rand are a bit more... reasonable?

So if my test is not wrong, I would say that LSHUF 1.1.0d is not producing enough randomness (I am not sure if the sentence makes sense). Now I will install the last update of listExt and I will see if something change. The 50g can crunch for 8 hours more in the night. As I said it, test driven usage.

More source code (although I think it is not needed), as usual in my assembla RPL repo: https://app.assembla.com/spaces/various-...cMathDir.s

pier4r · (This post was last modified: 03-17-2018 09:39 PM by pier4r.)

Hmm. Installed 1.1.3b4, but instead of LSHUF, SPLIT is called.

I thought "ok, the pointers are not yet updated, so the program, although userRPL, is pointing to old references". I edited the program, I added some spaces and I saved it again. The problem still persist. Split is called instead of LSHUF.

Oh, ok...now I see. Somehow the LSHUF name in the program changed to SPLIT. But I swear I did not change the program in the meanwhile.
Is it automatic that, when the command reference change, the 50g renames the commands in the RPL programs? That is pretty a neat work done for the user.

How much little but precious details the 50g has, after all those years without updates.

DavidM · 03-17-2018, 10:28 PM

(03-17-2018 09:38 PM)pier4r Wrote: Hmm. Installed 1.1.3b4, but instead of LSHUF, SPLIT is called. ...

Sorry about that. The order of the commands in the library changed shortly after that 1.1.0d was released. I had hoped the limited distribution of that older release would mean that few people would be impacted. There was some discussion in the thread about the potential for this to happen... many messages back I'm afraid.

Library command references in a UserRPL program aren't actually stored by name, but rather by number. They are converted to text just before you edit the program, then are compiled back into a number when you are done editing. Changing the library meant that the older references were now pointing to different commands.

You can "fix" this by re-installing the old library, convert the programs to text with →STR (and then saving wherever you like), then reinstall the new library and convert the text versions of the programs back into compiled RPL by issuing STR→ commands on them.

The good news is that this shouldn't happen again, as the command numbers shouldn't change moving forward. That's one of the benefits of the custom menu.

As for the test, I'm wondering if you would find different results with a list of 5 items instead of 4. See this thread for a potential issue that could be reducing the effectiveness of shuffles on a list of 4 items (specifically). A list of 4 elements seems to hit a harmonic with the built-in PRNG, at least given the way I'm currently using the seed results. If you don't mind performing your test with 5 instead of 4 elements, I'd be very curious as to the results.

pier4r · (This post was last modified: 03-17-2018 11:01 PM by pier4r.)

No problem about the command change. If I know how to solve a problem, then it is minor annoyance.

thanks for precising that the references to commands are numbers, I was presuming this (maybe I was not clear) as within computers normally ID are numbers. Well, even my ID in the EU is a number, this let me reflect...

For the fix once again no problem. The code lives on my pc (and in my git repo), I almost never have code written only on the calculator. Normally I clean up the used code - as I don't have much memory free - and I copy the new one.

For the test about 5 or more elements, sure. But now the 50g is running through the test with the last update of your library. Unless you say that 1.1.0d and 1.1.3b4 have no changes in code of LSHUF, so I can terminate the current test and start another.

Last observation though. In the other tread you say "look there it seems that the period is very short when random is mapped to few numbers". But if this is true, not only LSHUF should be affected, also it should happen to GET coupled with the RAND position. But this does not happen with GET coupled with RAND.

The crucial code in the two cases is:
lvListElem LSHUF HEAD

and
lvListElem
lvListElemSize RAND * 1 + IP
GET

They work with the same size. Only with IP I sample the leftmost digits of the random number. In the other thread you sample the rightmost digits.

DavidM · 03-17-2018, 11:11 PM

(03-17-2018 10:58 PM)pier4r Wrote: For the test about 5 or more elements, sure. But now the 50g is running through the test with the last update of your library. Unless you say that 1.1.0d and 1.1.3b4 have no changes in code of LSHUF, so I can terminate the current test and start another.

There were definitely changes (some performance related and at least one other that wasn't), but I'm not sure if they will have an impact on what you are seeing.

(03-17-2018 10:58 PM)pier4r Wrote: Last observation though. In the other tread you say "look there it seems that the period is very short when random is mapped to few numbers". But if this is true, not only LSHUF should be affected, but this should happen also to GET coupled with the RAND position. But this does not happen with GET coupled with RAND.

Using RAND/GET, you are multiplying the random result with an integer, which means the most significant digits of the seed have the most significant impact on the result. By using the RAND result as an integer and performing MOD 4 on it, the only digits that matter are literally the last two bits in the seed. Presuming what John was referencing is true (and I have no reason to doubt it), the least significant bits have a shorter period than the most significant, so that would explain one potential source for the difference.

I'm hopeful that a simple shift before the MOD will take care of that particular problem. But I'm still not sure that's the cause of what you're seeing. Testing with a list of 5 might help clarify that.

pier4r · (This post was last modified: 03-18-2018 07:36 PM by pier4r.)

Yup why not. Test driven usage is the best usage. At least the 50g is used to get some more bits of knowledge, until I forget them - for this I hope that the online discussions outlast my memory and myself.

So the current test will likely finish in the middle of tonight for me. In the US will be afternoon or so. The next test will be done tomorrow, with a list of 5. (or should I make 6 to be sure)

In the worst case if you have a spare 50g, you can copy/adapt my code and get the results faster.

One more question. Why did you use the rightmost digits of the random number, if you are going to draw another random number afterwards? I mean, any particular reason?

DavidM · 03-18-2018, 02:35 AM

(03-17-2018 11:19 PM)pier4r Wrote: In the worst case if you have a spare 50g, you can copy/adapt my code and get the results faster.

I'm attempting to run your code on an emulated 50g. The first run was with all the defaults as you set them in your earlier program. I used the 1.1.3b4 library, and it still had similar results (in one run, standard RAND method was 583, 4096 for LSHUF).

I then changed the "list of elements" (in both places) by adding a value of 50 to make it now have 5 elements, and changed the number before RAND in the "produce a list of N picks of elements that is the goal list" to 5 so that it would include all 5 elements in its picks. I left the length of the goal at 6. This time, both methods went "to the wall" at 15625 attempts. So neither method seemed to produce the targeted list. Did I neglect to change something appropriately? Granted, this was only 1 loop iteration so it may not be as meaningful.

(03-17-2018 11:19 PM)pier4r Wrote: One more question. Why did you use the rightmost digits of the random number, if you are going to draw another random number afterwards? I mean, any particular reason?

The choice I made wasn't specifically related to using the rightmost digits, but rather using a method that employed integer operations as opposed to floating point math. I knew that using MOD against a seed would allow me to quickly come up with a result in the range that I needed. It just happens that using that particular method inherently derives its result from the least significant digits in the seed. That method is now suspect due to the apparent "random qualities" of the bits in that area of the seed.

So the question remains: does shifting the seed give enough improvement to continue using this method, or should some other method be deployed?

Ironically, the main reason I opted to use the internal RAND functionality was because I thought it would keep me from having to come up with an alternate version that I would then have to validate. It just turns out that the method I chose to convert the resulting seed to the final number I need is now shown to be questionable.

pier4r · 03-18-2018, 09:34 AM

So results crunched in the night
:AttempsLSHUF: { 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. 4096. } plus 4096 as additional iteration

:AttempsRAND: { 1329. 1112. 1568. 4096. 4096. 2121. 486. 4096. 4096. 2326. 4096. 1711. 203. 4096. 4096. 39. 1288. 1991. 4096. 1811. } plus 4096 as additional iteration.

(03-18-2018 02:35 AM)DavidM Wrote: I then changed the "list of elements" (in both places) by adding a value of 50 to make it now have 5 elements, and changed the number before RAND in the "produce a list of N picks of elements that is the goal list" to 5 so that it would include all 5 elements in its picks. I left the length of the goal at 6. This time, both methods went "to the wall" at 15625 attempts. So neither method seemed to produce the targeted list. Did I neglect to change something appropriately? Granted, this was only 1 loop iteration so it may not be as meaningful.

No I believe you did exactly what was needed (sorry, lazy to put variables to avoid several changes at once. I'll do it later on, maybe). Also as you can see from the 4096, sometimes also RAND goes to the wall. The search is not that good.

Although this spawn an interesting problem. Knowing nothing about the "goal" to find, and assuming one cannot change the order of elements because they are needed, how can the search be improved? The best that I can see for the moment is: consume all the possibilities, just going hopping randomly around. Although if I am not wrong I did a test between sequential search and random search, and the random search was poorer. I may need to do it again as I don't remember if I removed the possibilities checked.
Also a random search requires to keep track of which possibilities were checked already, so the space requirement grows, and it may cost more than the speed requirement.

Quote:The choice I made wasn't specifically related to using the rightmost digits, but rather using a method that employed integer operations as opposed to floating point math.

Understood, well but at least you (re?)discovered, for you and those you never observe the property, how the LCG random generator behave with rightmost digits. I'd say: not bad!

Quote:So the question remains: does shifting the seed give enough improvement to continue using this method, or should some other method be deployed?

As far as I know, either one finds a solid theoretical hint to decide or, time for some tests? (plus here).

The 50g is not doing much anyway most of the time, so can be employed for science! (well, my understanding of science at least) That is quite a good employment. Sure it will take a while, as I employ mostly the real device and to get data one either employs a clever test, or a lot of iterations are needed.

DavidM · (This post was last modified: 03-18-2018 05:28 PM by DavidM.)

(03-18-2018 09:34 AM)pier4r Wrote: So results crunched in the night...

I've been experimenting with an alternative way to use the RAND seed, and a 20-simulation run that just completed gave the following results:

:AttempsRAND: { 4096. 800. 872. 1280. 2209. 116. 4096. 678. 1978. 4096. 3343. 798. 4096. 4096. 1372. 2945. 4096. 456. 335. 4096. }
:AttempsLSHUF: { 1625. 4096. 1756. 1525. 3452. 2842. 1662. 2474. 1800. 3436. 4096. 3265. 4096. 3699. 4096. 803. 4096. 930. 1481. 4096. }

A previous 10-simulation test (using the same method) gave these results:

:AttempsRAND: { 1748. 4096. 1751. 785. 754. 1933. 1816. 4096. 4096. 4096. }
:AttempsLSHUF: { 3846. 4096. 181. 197. 712. 1484. 33. 727. 864. 796. }

(03-18-2018 09:34 AM)pier4r Wrote: Although this spawn an interesting problem. Knowing nothing about the "goal" to find, and assuming one cannot change the order of elements because they are needed, how can the search be improved? The best that I can see for the moment is: consume all the possibilities, just going hopping randomly around. Although if I am not wrong I did a test between sequential search and random search, and the random search was poorer. I may need to do it again as I don't remember if I removed the possibilities checked.
Also a random search requires to keep track of which possibilities were checked already, so the space requirement grows, and it may cost more than the speed requirement.

I'm not sure what you're referring to in terms of "search" here, at least in terms of how that applies to the program you posted. It seems to me that the program isn't performing a search in the usual context, but rather is simply constructing random permutations of a list and counting how many permutations are tried before a match occurs. The program does nothing to sequence the permutations, so it could be generating duplicates as it progresses. To be honest, I'm not really sure what the value of the result is in this case (other than showing that the older version of LSHUF never seems to pick the first list element in the special order designated by the target within the allotted attempts Smile

).

(03-18-2018 09:34 AM)pier4r Wrote: Understood, well but at least you (re?)discovered, for you and those you never observe the property, how the LCG random generator behave with rightmost digits. I'd say: not bad!

Yes, I continue to learn how little I understand PRNGs and how to measure their effectiveness. Smile

But I'm glad you pointed out your observations, because I wouldn't have known there was an issue otherwise. I keep feeling like there must be a way to test the efficacy of the shuffle compared to the "<num> RAND * IP 1 +" alternative without porting everything to a different platform and investing huge amounts of time and effort to fine-tune the tests. So far I haven't seen any simpler methods.

pier4r · (This post was last modified: 03-18-2018 06:06 PM by pier4r.)

(03-18-2018 04:45 PM)DavidM Wrote: I've been experimenting with an alternative way to use the RAND seed, and a 20-simulation run that just completed gave the following results:

:AttempsRAND: { 4096. 800. 872. 1280. 2209. 116. 4096. 678. 1978. 4096. 3343. 798. 4096. 4096. 1372. 2945. 4096. 456. 335. 4096. }
:AttempsLSHUF: { 1625. 4096. 1756. 1525. 3452. 2842. 1662. 2474. 1800. 3436. 4096. 3265. 4096. 3699. 4096. 803. 4096. 930. 1481. 4096. }

A previous 10-simulation test (using the same method) gave these results:

:AttempsRAND: { 1748. 4096. 1751. 785. 754. 1933. 1816. 4096. 4096. 4096. }
:AttempsLSHUF: { 3846. 4096. 181. 197. 712. 1484. 33. 727. 864. 796. }

So doing some base stats (that one can use to get a glipse of things, although the result could be biased or meaningless sometimes)

2367.5 RAND avg
2275.4 LSHUF avg

1480 RAND std
1433 LSHUF std

71025 RAND sum
68262 LSHUF sum

233882079 RAND sum^2
216940694 LSHUF sum^2
(sum^2 together with sum is especially neat to see how much the sum distributed itself in "big numbers". One can have a smaller sum than another but a bigger sum^2, for example)

I'd say that - although the stats above are not rigorous - it is better than before.

Quote:I'm not sure what you're referring to in terms of "search" here, at least in terms of how that applies to the program you posted. It seems to me that the program isn't performing a search in the usual context, but rather is simply constructing random permutations of a list and counting how many permutations are tried before a match occurs. The program does nothing to sequence the permutations, so it could be generating duplicates as it progresses.

Ah sorry. I slightly extended the topic. The program that we are using for testing compares a list built with LSHUF or RAND vs a specific goal. This is a test but it can be seen also a "search the goal list". A random search of the goal list (with RAND or the new LSHUF) would need on average X attemps (considering the cutoff limit). See 2367 or 2275 above.
A sequential search (build all the possible lists, and go through the list of all the possible lists) may need more or less attempts.
This spawned the question. Given a goal list, that is "unknown" to the generator program, aside from its elements. Given that one cannot say anything about the elements (so one cannot apply a binary search, or order the elements in a tree, and so on). Which type of search could be better - on average - than backtracking/going through all the possible combinations of elements? That was the topic of my little digression.

Quote:Yes, I continue to learn how little I understand PRNGs and how to measure their effectiveness. But I'm glad you pointed out your observations, because I wouldn't have known there was an issue otherwise.

Indeed, it is like we discover a marginal bug that happens in certain cases that were not used until now. It happens.

Quote:I keep feeling like there must be a way to test the efficacy of the shuffle compared to the "<num> RAND IP 1 +" alternative without porting everything to a different platform and investing huge amounts of time and effort to fine-tune the tests.

Here I cannot follow. Why should you test the things on another platform? For the efficacy, you need to find (or develop) a reasonable metric for the context.

I, for starters, look at the basic things first, listed above. They are not "final" indicators, but they give an idea within the context of the test. Those indicators may fail with another type of test. I do not know if there exists a "quite comprehensive test and related metric" that would rule out problems in most of the possible applications. If someone knows it, please let us know!

DavidM · 03-18-2018, 06:36 PM

(03-18-2018 06:05 PM)pier4r Wrote:
Quote:I keep feeling like there must be a way to test the efficacy of the shuffle compared to the "<num> RAND * IP 1 +" alternative without porting everything to a different platform and investing huge amounts of time and effort to fine-tune the tests.

Here I cannot follow. Why should you test the things on another platform? For the efficacy, you need to find (or develop) a reasonable metric for the context.

...because the established test suites seem to require that the PRNGs exist as functions on the same platform that the suite itself is on. I've seen multiple comments about how they specifically exclude testing datasets produced by a PRNG as they aren't sufficient to determine effectiveness. So they apparently require the PRNG to exist as a function within the suite's environment so that both the input and output can be controlled and tested.

All of this amounts to waaaay more effort (and headache) than was ever intended for a simple command to help shuffle a list. If there's truly no other way to reasonably validate it, then I'll simply remove it from the library. I have no desire to misrepresent the effectiveness of the command by leaving it in place with uncertain results, nor do I intend to become an expert in PRNGs simply to include a command in a library. Smile

It's beginning to look like that expertise is required to create a reasonable shuffling routine, though.

pier4r · (This post was last modified: 03-18-2018 07:10 PM by pier4r.)

(03-18-2018 06:36 PM)DavidM Wrote: All of this amounts to waaaay more effort (and headache) than was ever intended for a simple command to help shuffle a list. If there's truly no other way to reasonably validate it, then I'll simply remove it from the library. I have no desire to misrepresent the effectiveness of the command by leaving it in place with uncertain results, nor do I intend to become an expert in PRNGs simply to include a command in a library. It's beginning to look like that expertise is required to create a reasonable shuffling routine, though.

I agree that too much effort, unless it is an excuse to learn something, is, well, too much. But removing it? Couldn't you put just a disclaimer "tested under this condition, it perturbates the list. Although for some specific context the perturbation may be not chaotic enough, refer to the discussion XY"? I find LSHUF still pretty useful in a lot of cases.

Because if I understood your argument you say "either I get someone (or myself) that I can trust that says that my command is random enough, or I remove it" it will be just removed, as the chances to find someone that worked for such an application (perturbating lists and related metrics) are slim.

Wouldn't be enough to update LSHUF with your new version, as it seems working better, until the next error report? (if any)

Anyway this is a good chance to expose another reason why I do not update a software (or a product). In the case a feature that was there and it was good enough for me, gets removed. It happened already with listExt although I don't remember the command. Was one that you said "oh, it can be replaced with a couple of RPL commands".

By the way the removal of expected features happens often also in other contexts. See for example a new calculator that doesn't have the previous (loved) programming language (see the 48 with RPL and little RPN. Prime with no RPL). This or that game that gets simplified. This or that software suite that drops features and so on. In all those case, I often consider whether to move on the update or not.

DavidM · 03-18-2018, 08:04 PM

(03-18-2018 07:05 PM)pier4r Wrote: Because if I understood your argument you say "either I get someone (or myself) that I can trust that says that my command is random enough, or I remove it" it will be just removed, as the chances to find someone that worked for such an application (perturbating lists and related metrics) are slim.

Unlike most commands where the ability to test effectiveness is a binary result (works or doesn't) and is fairly straightforward to test, it appears that there may not be a reasonable assessment for the effectiveness of LSHUF. Not only that, the evidence has shown that the likelihood of lurking failures is good unless someone possesses the expertise to design around them. This simply isn't my area of expertise, and I acknowledge that.

I went to great lengths in an attempt to use the built-in RAND routine in LSHUF precisely because I knew that I wasn't qualified to come up with an alternative, but it appears that my implementation still uses the RAND result in an inappropriate way. As I'm not knowledgeable enough to have confidence in the implementation, I'd rather not publish it (even with disclaimers). If anything, this is simply the culmination of a growing sense of reluctance I've had about this command from the beginning.

It's not hard to come up with an algorithm for something like the Fisher-Yates shuffle (which is what LSHUF uses). It's effectiveness depends entirely on the quality of the random selection for swapping, and I'm happy to let others choose their own level of comfort for how to do that. Smile

pier4r · 03-18-2018, 08:24 PM

mi

. I built the random list generator already at the time of the list challenges, but yours was faster, shorter and always there Sad

.

It wouldn't be even possible just to use RAND * 1 + IP within sysRPL? (that would solve it, otherwise even RAND is broken).

I mean already the missing checks for arguments when calling the internal RAND + and IP should save a lot of time, especially with hundreds/thousands of calls.

DavidM · 03-18-2018, 09:00 PM

(03-18-2018 08:24 PM)pier4r Wrote: mi . I built the random list generator already at the time of the list challenges, but yours was faster, shorter and always there .

It wouldn't be even possible just to use RAND * 1 + IP within sysRPL? (that would solve it, otherwise even RAND is broken).

I mean already the missing checks for arguments when calling the internal RAND + and IP should save a lot of time, especially with hundreds/thousands of calls.

The only part of the command that's written in SysRPL is the outer shell -- the bulk of it is in Saturn code. I'd have to rewrite the whole thing, and there's already been some discussion about how the RAND * IP 1 + approach introduces a bias. It seems like that was part of the "Random thoughts" thread some time back. I still think that method is probably "good enough", but I'm sure there are others who don't.

The method that I've currently implemented (see this post) may in fact be "good enough" as well, but I don't know of a reasonable way to validate that assumption.

pier4r · (This post was last modified: 03-18-2018 09:47 PM by pier4r.)

Hmm. If you are referring to this https://groups.google.com/d/topic/comp.s...discussion discussion, I read it.

While JH Meyers is more often than providing great info, I learned the hard way that is not bad sometimes to see the data of the analysis (not only the conclusion, as one may see the data a bit differently than others). In any case, seeing the data that led to the conclusion helps to understand the conclusion themselves (in the worst case, one can disprove the conclusions).

Anyway what I got from that post is:
- there are better LCRNG than the one used in the RPL systems. Ok.
- the 15C has a better LCRNG. Ok.
- if one uses the rightmost digits, one may have very short periods. And you confirmed it. Confirming claims is healthy.
- if one uses IP after random, one _might_ - due to how IP works with digits - get sometimes a rounding errors due to the guard digits sneaking values that shouldn't be there. On this I am a bit skeptical. I may just run 16 RAND * IP until I see a 16 slipping in. If it is after some X million times, it doesn't matter much in a calculator.
- TL;DR: RAND is not random enough for cryptography applications. Ok. For all the rest, it seems good enough.

My perspective is:
- I take RAND as good enough. As selected by HP and used by thousands of people in thousands of calculators maybe millions of times. It can be that millions of people are wrong, but in this case it seems that RAND was good enough for them.
- I set to build a command to shuffle a list that, compared to a shuffle based on RAND, is mostly impossible to discern for practical purpose. For this I may go after a super refined metric, but since I don't want to encrypt anything to let the NSA out, I set a couple of metrics that should capture an idea of the difference between a list shuffled with RAND and one built with my routine. I would say that comparing the number of attempts against a goal list is not a bad start. (see my test code for example)
- If the metric somehow shows close values on a large amount of simulations, that's it. Until someone provides a better insight.

It is like life, one settles on a solution that doesn't look bad until a better one is found.

Then again, that is how I would approach the problem.