(25C) Sampling without repetition
|
04-27-2017, 04:07 AM
Post: #1
|
|||
|
|||
(25C) Sampling without repetition
This program generates random sample of k elements out of total set of N elements. Element can be used only once. All such subsets are drawn with equal probabilitites (assuming RNG is fair).
Algorithm considers each integer i from 0 to N-1 and decides on each whether to include it in a sample or not with probability (k-done)/(N-i) where done is the number of already selected items. The generated elements are not stored, so k and N can be any positive integers, however this algorithm is only practical with relatively small N. Instruction: <N> sto 0 <k> sto 1 <0.random_seed> sto 5 f prgm r/s Output: first element of a sample 0..N-1 r/s Output: next element of a sample. When no more elements outputs -1 r/s after that starts a new sample. Code:
Example: 100 sto 0 10 sto 1 0.1234 sto 5 f prgm r/s [2] r/s [4] r/s [23] r/s [30] r/s [50] r/s [53] r/s [57] r/s [59] r/s [66] r/s [96] r/s [-1] -- indicates end of sample I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000. |
|||
10-27-2018, 03:52 PM
Post: #2
|
|||
|
|||
(42S) Sampling without repetition
Here's a translation of your program for the HP-42S:
Code: 00 { 41-Byte Prgm } However, I reversed the order so I could use DSE. Thus the generated samples appear in descending order. Example: 100 STO 00 10 STO 01 XEQ "SAMPLE" 85 R/S 64 R/S 62 R/S 59 R/S 55 R/S 53 R/S 39 R/S 32 R/S 30 R/S 11 R/S -1 (04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000. In this case we might simply generate them at random and hope there's no collision. Kind regards Thomas |
|||
10-28-2018, 09:18 AM
(This post was last modified: 10-28-2018 09:20 AM by Dieter.)
Post: #3
|
|||
|
|||
RE: (25C) Sampling without repetition
(10-27-2018 03:52 PM)Thomas Klemm Wrote: Here's a translation of your program for the HP-42S: One more case where something like INV DSZ would be useful (greetings to the TI community ;-)). This can be accomplished quite easily by having the DSE followed by a test that always tests false. Since X holds a positive number at this point, a simple x<0? will do. Code: ... Slightly shorter, slightly faster, saves one label. Edit: OK, in this special case it's neither shorter nor label-saving, simply because LBL 01 is required anyway. #-) (10-27-2018 03:52 PM)Thomas Klemm Wrote:(04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, for example, a sample of 30 out of 10000000.In this case we might simply generate them at random and hope there's no collision. In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-) Dieter |
|||
10-28-2018, 02:18 PM
(This post was last modified: 10-28-2018 04:52 PM by Albert Chan.)
Post: #4
|
|||
|
|||
RE: (25C) Sampling without repetition
(04-27-2017 04:07 AM)nsg Wrote: I wonder if it is possible to improve on the algorithm and generate a sample of modest k for arbitrary N, Code: import random Above code only need k random numbers, and return sorted samples. >>> randindex(10,10) # no repeat [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> random.seed(1) >>> randindex(30,10000000) [21060, 254459, 283474, 290410, 305901, 938595 ... |
|||
10-28-2018, 03:42 PM
Post: #5
|
|||
|
|||
RE: (25C) Sampling without repetition | |||
10-28-2018, 08:03 PM
(This post was last modified: 10-28-2018 08:04 PM by John Keith.)
Post: #6
|
|||
|
|||
RE: (25C) Sampling without repetition
(10-28-2018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-) Going further of-topic... This would be a variation of the Fisher-Yates shuffle, implemented in the ListExt Library as LSHUF. |
|||
10-28-2018, 08:51 PM
(This post was last modified: 10-28-2018 08:52 PM by Albert Chan.)
Post: #7
|
|||
|
|||
RE: (25C) Sampling without repetition
(10-28-2018 08:03 PM)John Keith Wrote:(10-28-2018 09:18 AM)Dieter Wrote: In such a case there is no significant difference between sampling with or without repetition. So it doesn't matter much if a number is drawn twice. Otherwise the idea is: record the sample in a (sorted) list, generate the next number, see if it's part of the list. If yes: get a new number, if not: insert it into the list. Hmmm... somehow this sounds like a job for the HP50. ;-) I do not see any shuffling from Dieter description. That seems more like "rinse, and repeat", throwing away (rare) repeated samples. My Python randindex(...) look more like shuffling, except working thru the samples list. |
|||
« Next Oldest | Next Newest »
|
User(s) browsing this thread: 2 Guest(s)