Threaded Mode | Linear Mode

DavidM · 12-27-2018, 08:31 PM

(12-27-2018 06:52 PM)pier4r Wrote: David I am not sure I understood your test. I am not sure I decoded the data properly.

That's fair, because I'm not clear on exactly what it is I'm supposed to be looking for with the original problem. Smile

I was simply experimenting to determine some of the possible outcomes for a series of randomized lists.

(12-27-2018 06:52 PM)pier4r Wrote: For example:

Code:

1 60726 2 60743 3 60133 4 59444 5 58566 6 57093 7 55206

It means that out of 1 million attempts, 60726 of them had at least one list of 70 elements that matches the requirement? 60743 times there were two matching lists?

Almost. I would change your description slightly to make it "out of 1 million randomly-ordered lists of 140 balls, 60726 of them had exactly one sublist of 70 elements that matches the requirement, 60743 times there were exactly two matching sublists..."

In the original problem description, you write:

pier4r Wrote:Could they find a way to ensure that Anna and Berta pick both 50 blue balls and 20 white balls? If yes, is a special arrangement of the balls needed to achieve it?

I believe the answer to the first question is simply "yes", as has been pointed out by ijabbott. There will always be at least one contiguous sublist of 70 balls containing the proper mix of 50 blue and 20 white balls. I also believe the answer to the second question is "no", because it does not appear that any special arrangement matters.

I'm not mathematically skilled enough to offer a rigorous proof of those assertions, so I should probably just call them "opinions". I am , however, mathematically curious enough to experiment with the results of repeated trials to see what patterns emerge. Hence my testing to see the results of counting the number of matches in random lists.

It's been a long, long time since I completed my last statistics course, but it appears to me that the distribution of match counts fits the right half of a normal curve (ie. the peak is at/near the lowest value in this case). I'm sure that has some significance, but I'm still not clear on exactly what that tells us. The shape of the curve seems to be more than a coincidence, though.

pier4r · 12-27-2018, 08:47 PM

Ok thanks for the clarification. And yes exploring patterns through data is totally fine, indeed I posted it in this thread.

Thomas Klemm · 12-27-2018, 10:20 PM

(12-27-2018 04:59 PM)DavidM Wrote: please be kind

Here's a variant of your program:

Code:

import random

def match_count(thelist):

    return sum(1 for blue in

        (sum(thelist[i:i+70])

            for i in range(71))

                if blue == 50)

def main(n):

    thelist = [1]*100 + [0]*40

    result = [0] * 72

    for _ in range(n):

        random.shuffle(thelist)

        result[match_count(thelist)] += 1

    for i, value in enumerate(result[1:], start=1):

        print('%d\t%d' % (i, value))

if __name__ == '__main__':

    main(1000000)

The function match_count uses generator expressions.
And since random.shuffle shuffles in place we can reuse thelist.
You can use enumerate to iterate over a list when both index and value are needed.

Cheers
Thomas

DavidM · 12-27-2018, 11:58 PM

(12-27-2018 10:20 PM)Thomas Klemm Wrote: The function match_count uses generator expressions.
And since random.shuffle shuffles in place we can reuse thelist.
You can use enumerate to iterate over a list when both index and value are needed.

Hi Thomas -

Thanks for the reminders (and great examples) about using generator expressions and enumerate. I've used both of those before in sample exercises I've worked through, but obviously haven't adopted them into my python code-building thoughts yet. Practice, practice, practice.

I'll need more time to analyze your implementation of match_count, though, as its use of the "blue" variable is puzzling me. Your code clearly works, though, so I'm sure you've crafted it carefully (as you always do).

Despite your code being much more compact, it was a bit slower on my laptop (506 seconds vs. 483 seconds for my original version). I may try some additional runs to see how much variance there is in both versions' run times. I'm wondering if garbage collection can account for at least part of the difference.

Thanks for doing this -- it's a great help in my learning process!

Albert Chan · (This post was last modified: 12-28-2018 01:55 AM by Albert Chan.)

Perhaps variable "blue" is unnecessary, since False=0, True=1:

Code:

def match_count(thelist):

    return sum(sum(thelist[i:i+70]) == 50 for i in range(71))

To speed up above, cache sublist sum:

Code:

def match_count(thelist):

    head, tail = thelist[:70], thelist[70:]

    s = sum(head)

    n = s == 50

    for h, t in zip(head, tail):

        s = s - h + t

        if s==50: n = n + 1

    return n

Thomas Klemm · 12-28-2018, 04:04 AM

(12-28-2018 01:38 AM)Albert Chan Wrote:

Code:

def match_count(thelist): return sum(sum(thelist[i:i+70]) == 50 for i in range(71))

Much better. I wasn't happy with the nested generator expressions.

Here's my attempt using Clojure:

Code:

(defn sum [list]

  (apply + list))

(defn match [value]

  (= value 50))

(defn match-count [the-list]

  (->> the-list

       (partition 70 1)

       (map sum)

       (filter match)

       (count)))

(defn -main [n]

  (let [blue (repeat 100 1)

        white (repeat 40 0)

        the-list (concat blue white)]

    (->> (repeatedly n #(match-count (shuffle the-list)))

         (frequencies)

         (into (sorted-map)))))

(-main 100000)

Thanks to the threading macro ->> it reads like a recipe or pipeline.

This is a typical result:

Code:

{1 6092, 2 6052, 3 5967, 4 5922, 5 5705, 6 5669, 7 5560, 8 5332, 9 5097, 10 4820, 11 4703, 12 4333, 13 3988, 14 3816, 15 3463, 16 3096, 17 2920, 18 2514, 19 2232, 20 2034, 21 1763, 22 1525, 23 1315, 24 1186, 25 928, 26 787, 27 681, 28 512, 29 452, 30 379, 31 282, 32 196, 33 180, 34 128, 35 115, 36 73, 37 59, 38 41, 39 34, 40 20, 41 10, 42 10, 43 4, 44 1, 46 2, 47 2}

Cheers
Thomas

ijabbott · (This post was last modified: 12-28-2018 09:10 AM by ijabbott.)

I'm not that fluent in Python, but surely the O(n^2) versions of match_count above could be replaced with an O(n) version to speed it up a bit?

EDIT: Oh, I see Albert Chan has already done it in his version!

DavidM · 12-28-2018, 02:00 PM

(12-28-2018 09:06 AM)ijabbott Wrote: I'm not that fluent in Python, but surely the O(n^2) versions of match_count above could be replaced with an O(n) version to speed it up a bit?

EDIT: Oh, I see Albert Chan has already done it in his version!

Yes, his version nicely speeds up the match_count process. I didn't take the time to absorb it when I first looked at it, but my "aha!" moment occurred when I was thinking about implementing a SysRPL version of this test. As I was working through the possibilities in my head, I thought about rearranging the list being tested to make it easier to keep track of the running totals, and it suddenly hit me: "ahh, that's what Albert was doing with zip!" The light bulb flashed in my head, and I felt a little less confused for a moment. Smile

python's cProfile airs the real "dirty laundry" of this process, though, and it's something that none of our code snippets have addressed: random.shuffle(). If I'm reading the the profile report correctly, it appears that about 92% of the time taken by the fastest version (so far) was spent performing shuffles:

Code:

         4849009 function calls in 3.345 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

        1    0.000    0.000    3.345    3.345 <string>:1(<module>)

        1    0.019    0.019    3.345    3.345 christmas_balls_ac.py:13(main)

    10000    0.229    0.000    0.245    0.000 christmas_balls_ac.py:4(match_count)

      142    0.000    0.000    0.000    0.000 cp1252.py:18(encode)

  1390000    1.520    0.000    2.054    0.000 random.py:224(_randbelow)

    10000    1.024    0.000    3.080    0.000 random.py:264(shuffle)

      142    0.000    0.000    0.000    0.000 {built-in method _codecs.charmap_encode}

        1    0.000    0.000    3.345    3.345 {built-in method builtins.exec}

    10000    0.002    0.000    0.002    0.000 {built-in method builtins.len}

       71    0.001    0.000    0.001    0.000 {built-in method builtins.print}

    10000    0.016    0.000    0.016    0.000 {built-in method builtins.sum}

  1390000    0.117    0.000    0.117    0.000 {method 'bit_length' of 'int' objects}

        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

  2028650    0.418    0.000    0.418    0.000 {method 'getrandbits' of '_random.Random' objects}

The code that caused that result was a combination of Thomas Klemm's main() with Albert Chan's match_count():

Code:

import random

import cProfile

def match_count(thelist):

    head, tail = thelist[:70], thelist[70:]

    s = sum(head)

    n = s == 50

    for h, t in zip(head, tail):

        s = s - h + t

        if s==50: n = n + 1

    return n

def main(n):

    thelist = [1]*100 + [0]*40

    result = [0] * 72

    for _ in range(n):

        random.shuffle(thelist)

        result[match_count(thelist)] += 1

    for i, value in enumerate(result[1:], start=1):

        print('%d\t%d' % (i, value))

if __name__ == '__main__':

    cProfile.run('main(10000)')

(I shortened the runs to 10000 due to my lack of patience)

Albert Chan · (This post was last modified: 12-28-2018 05:56 PM by Albert Chan.)

Hi DavidM

Yes, using random.shuffle may be overkill.
With only 2 kinds of balls, random.sample may be faster.

Code:

def main(n):

    idx = range(140)

    result = [0] * 72

    for _ in range(n):

        thelist = [1] * 140  # all blue balls

        for i in random.sample(idx, 40): thelist[i] = 0

        result[match_count(thelist)] += 1

    for x in zip(idx, result): print("%d\t%d" % x)

brickviking · 12-28-2018, 09:42 PM

(12-26-2018 09:34 PM)pier4r Wrote: The problem: there are 100 blue Christmas balls and 40 white Christmas balls in a line, randomly ordered. There are Anna and Berta that needs balls for their Christmas trees.

They decide that Anna will pick a continuous chunk of the line of length 70, for example between the position 3 and the position 72, the rest will be picked by Berta.

Could they find a way to ensure that Anna and Berta pick both 50 blue balls and 20 white balls? If yes, is a special arrangement of the balls needed to achieve it?

If a 'special arrangement of the balls" is necessary, wouldn't that mean that we couldn't in fact have a randomly ordered sequence after all? I contend (again, without a rigid mathematic proof, 'cos I'm not that bright) that you can't actually arrange the balls specially to 'guarantee' that both Anna and Bertha get their required proportions of Christmas baubles. In saying that, it does appear that in a very small number of sequences, you can actually get a clear run of 70 that matches the criteria, and thereby the remainder would also match the criteria, but only when glommed together into a line; but it's a fairly small percentage of chance.

(Post 324)

pier4r · (This post was last modified: 12-28-2018 09:55 PM by pier4r.)

(12-28-2018 09:42 PM)brickviking Wrote: If a 'special arrangement of the balls" is necessary, wouldn't that mean that we couldn't in fact have a randomly ordered sequence after all?

Maybe I should have clarified that special arrangement means "it works only with some permutations, rather than all". And for your answer, without looking at the answer already given (aside from the one with python and RPL), maybe you can play around with your calculator (or computer, or dice, or whatever instrument you use to generate random runs) to see if your answer matches with the experiments.

3298 · 12-28-2018, 11:21 PM

(12-28-2018 09:42 PM)brickviking Wrote: I contend (again, without a rigid mathematic proof, 'cos I'm not that bright) that you can't actually arrange the balls specially to 'guarantee' that both Anna and Bertha get their required proportions of Christmas baubles.

You can arrange them like that. It's actually pretty easy.

To start, pick the first 70 balls such that 50 of them are blue and 20 are white. That takes care of the case when Anna's chosen range starts at the first one.
Now, let's look at what happens when the range is shifted by one. The n-th ball (where n specifies where the range started before shifting) is put back into the line, and the (n+70)-th ball is removed from it. We want to keep Anna's counts of blue and white balls stable, so just make the n-th and (n+70)-th ball have the same color. Done!
In short, this means the arrangement has to be a pattern of 50 blue balls and 20 white ones repeated twice, which is consistent with the requirement that the entire line shall consist of 100 blue balls and 40 white ones.

As a bonus, this still works when Anna's range is allowed to wrap around. That's not exactly hard to achieve though, because Anna's selection is the inverse of Berta's - when Anna's range wraps, Berta's doesn't, so you could just swap the names.

Albert Chan · (This post was last modified: 12-29-2018 12:24 PM by Albert Chan.)

Hi 3298

Sorry I do not follow the logic ... (balls cannot be "rearranged", see post #331)

Actually match_count function (with head + tail version) already proved it.

Inside the loop: s = s - h + t, where h and t = 0 or 1
Thus s is only allowed changes of 0, ±1

s started as sum(head), ended as sum(tail), this implied s visited every integer within the gap.

If gap = 0, sum(head) = sum(tail) = 50, we found the sublist.
If not, 50 is within the gap of visited integers, s must be 50 at some point.

Thomas Klemm · 12-29-2018, 07:35 AM

(12-27-2018 11:58 PM)DavidM Wrote: I'll need more time to analyze your implementation of match_count, though, as its use of the "blue" variable is puzzling me.

Not sure if that's still the case after Albert's version. Otherwise persist and feel free to ask questions.

Quote:Despite your code being much more compact, it was a bit slower on my laptop (506 seconds vs. 483 seconds for my original version).

Performance wasn't my primary concern. I just thought you might like the more functional approach.
I've never compared the performance differences between list comprehensions and generator functions.
But I try to use the latter when I expect huge sequences of data to avoid loading all of it into memory.

In this context I can recommend Generator Tricks for Systems Programmers.

Cheers and have fun with Python
Thomas

3298 · 12-29-2018, 02:05 PM

(12-29-2018 12:10 AM)Albert Chan Wrote: Sorry I do not follow the logic ... (balls cannot be "rearranged", see post #331)

I described how to generate the "special arrangement" mentioned in the challenge. If you call that "rearranging", or "generating a pattern to look for in order to spot the good arrangements", I don't care. It's the same thing.
In case you are referring to

(12-28-2018 11:21 PM)3298 Wrote: The n-th ball [...] is put back into the line, and the (n+70)-th ball is removed from it.

then I'll point out that this alludes to how Anna is allowed to move balls (by taking them home and leaving the rest for Berta). For this sentence, it's best to imagine that Anna rescinds her decision to pick the range starting at the n-th ball, and picks the range starting at the (n+1)-th one instead. Putting one ball back exactly where it was taken from and grabbing another is what she needs to do to enact this change of mind.

DavidM · 12-29-2018, 04:06 PM

(12-28-2018 03:41 PM)Albert Chan Wrote: Yes, using random.shuffle may be overkill.
With only 2 kinds of balls, random.sample may be faster.
...

It is, noticeably. Full check of 1 million groups completed in 141s on my laptop, compared with 342s for the random.shuffle() version.

(12-28-2018 09:42 PM)brickviking Wrote: If a 'special arrangement of the balls" is necessary, wouldn't that mean that we couldn't in fact have a randomly ordered sequence after all? I contend (again, without a rigid mathematic proof, 'cos I'm not that bright) that you can't actually arrange the balls specially to 'guarantee' that both Anna and Bertha get their required proportions of Christmas baubles. In saying that, it does appear that in a very small number of sequences, you can actually get a clear run of 70 that matches the criteria, and thereby the remainder would also match the criteria, but only when glommed together into a line; but it's a fairly small percentage of chance.

I believe you are reinforcing why I think the original problem description isn't clear enough for us to know exactly what it is we are supposed to be determining. There's a lot of room for (mis)interpretation here, due to the ambiguity of the description. @Pier: this is not meant as a criticism of your original post, it's simply that myself (and probably others) may need more clarification of exactly what problem we're supposed to solve here. As evidence to support this, see the next section.

(12-28-2018 09:53 PM)pier4r Wrote: Maybe I should have clarified that special arrangement means "it works only with some permutations, rather than all".

I think this type of clarification is a good start, as it makes a difference in how the problem is interpreted. Another example: I'm of the belief that all permutations of ball arrangements are included in the set of "randomly ordered" arrangements. So in my mind, any arrangement (including sorted or grouped by some set of rules) needs to be considered within the domain of possible solutions. Without knowing further rules about possible assessments, how can we try to find an answer? An important example: is it valid to suggest that Anna should check every possible contiguous group of 70 balls for any given situation? Or is she only allowed to check one group for a solution? I believe the answer to that particular question changes the nature of this "puzzle" substantially.

(12-29-2018 07:35 AM)Thomas Klemm Wrote:
(12-27-2018 11:58 PM)DavidM Wrote: I'll need more time to analyze your implementation of match_count, though, as its use of the "blue" variable is puzzling me.

Not sure if that's still the case after Albert's version. Otherwise persist and feel free to ask questions.

Quote:Despite your code being much more compact, it was a bit slower on my laptop (506 seconds vs. 483 seconds for my original version).

Performance wasn't my primary concern. I just thought you might like the more functional approach.
I've never compared the performance differences between list comprehensions and generator functions.
But I try to use the latter when I expect huge sequences of data to avoid loading all of it into memory.

In this context I can recommend Generator Tricks for Systems Programmers.

Hey Thomas - thanks again for the pointers. Part of my confusion was cleared when I reformatted the code with fewer line breaks, which then made it easier to see the similarities with Albert's code (which was also quite helpful -- how is it possible that I had not yet seen [or possibly retained] that python booleans can be treated as 0 or 1?).

My comment about the performance was not a criticism, merely a somewhat surprising observation. I find that I more often lose patience with a long-running routine than discovering that I've run into memory constraints, and this exploration certainly fit into that category. Hence my looking at performance issues...

pier4r · (This post was last modified: 12-29-2018 10:05 PM by pier4r.)

David I read your request. I'll answer later below.

(12-27-2018 04:59 PM)DavidM Wrote: The final result of the above is a tab-delimited list of counts and quantities, and the following is a sample of one run:

Code:

1 60726 2 60743 3 60133 4 59444 5 58566 6 57093 7 55206 8 53189 9 51518 10 48693 11 45882 12 43496 13 40126 14 37631 15 34314 16 31327 17 28141 18 25558 19 22785 20 19674 21 17479 22 15290 23 13129 24 11113 25 9502 26 7924 27 6637 28 5407 29 4362 30 3518 31 2719 32 2233 33 1763 34 1291 35 950 36 690 37 509 38 399 39 282 40 177 41 123 42 100 43 54 44 40 45 24 46 25 47 5 48 3 49 4 50 1 51 2 52 0 53 0 54 0 55 0 56 0 57 0 58 0 59 0 60 0 61 0 62 0 63 0 64 0 65 0 66 0 67 0 68 0 69 0 70 0 71 0

Actually I reflected on this report of yours. Nice that you shared it. I realized with delay that having 30 or more (well, actually even 10 or more ) sublists matching is not an easy thing.

(12-29-2018 04:06 PM)DavidM Wrote: I believe you are reinforcing why I think the original problem description isn't clear enough for us to know exactly what it is we are supposed to be determining. There's a lot of room for (mis)interpretation here, due to the ambiguity of the description.

Ok I reread the initial post and I see that the problem could be the following (see bold parts)

Quote:Could they find a way to ensure that Anna and Berta pick both 50 blue balls and 20 white balls? If yes, is a special arrangement of the balls needed to achieve it?

The bold parts seems to suggest the possibility of an action by Anna and Berta. Instead they are quite passive.

In short they get the random line of 140 Christmas balls, Anna goes through them and picks the first continuous chunk of 70 balls that contains 50 blue balls and 20 white balls. Then she says "I am good, Berta you pick the rest".

The special arrangement actually means what I wrote before as clarification. Is Anna finding her chunk only when the line has a certain arrangement of the balls, that is: among all the possible permutations, only a subset of them works; or can she always find a chunk of the line that matches the requirements?

Is anything better now?

This goes giving a +1 for the statement: "No, it is not always the problem that is difficult. It happens often that is poorly worded".

DavidM · 12-30-2018, 01:26 PM

Thanks, Pier. That helps clarify things.

I will stick with my earlier assertions/contentions. Smile

DavidM · 12-31-2018, 04:30 PM

To bring this back into the realm of calculator code, I offer the following SysRPL routine for exploration of various list configurations:

Code:

!NO CODE

!ASM

% standard extable doesn't include these, but they are listed as safe

DC %50_     2FF08

DC RROLL_   2644A

!RPL

::

   ( 1 argument required -- a list )

   CK1NOLASTWD

   CK&DISPATCH1

   list ::

      ( split the list into two 70-element parts )

      DUP

      BINT1 BINT70 SUBCOMP

      SWAP BINT71 140 SUBCOMP

      ( sum the first 70-element list )

      OVER >R

      %0

      BINT70 ZERO_DO

         RSWAP 'R RSWAP

         %+

      LOOP

      RDROP

      ( initial count: sum==50 => 1, otherwise 0 )

      %50_ OVER EQUAL ITE BINT1 BINT0

      ( place both lists on return stack for processing )

      SWAP 2SWAP

      >R >R

      ( update sum, incrementing count for each instance of sum==50 )

      BINT70 ZERO_DO

         BINT3 RROLL_ 'R

         BINT3 RROLL_ 'R

         BINT3 RROLL_

         %- %+

         %50_ OVER EQUAL IT ::

            SWAP #1+ SWAP

         ;

      LOOP

      2RDROP

      ( drop the sum )

      DROP

      ( convert the count from system binary to standard real )

      UNCOERCE

   ;

;

@

This compiles with the built-in compiler (256.06 MENU, execute "ASM"), provided you have the standard extable installed in a port. Store the resulting code into a variable (I named mine "mc" for "match count").

This program assesses a list of real (approximate) numbers where 1. represents blue balls and 0. represents white balls. The result is a count of the number of contiguous 70-ball segments in the list that meet the requirement of containing 50 blue and 20 white balls.

The program is written with the assumption that the list you give it is properly constructed (140 real/approximate numbers -- exact integers aren't allowed). Any other content in the list (different quantity or object type) may cause your calculator to crash.

Why would I create such a fragile program? Because I was more concerned with speed than robustness in this case. Checking the argument's contents is normally a requirement for SysRPL programs that are meant to be general-purpose tools, but I opted to forego the quantity and type-checking this time to make it faster, leaving that responsibility to other programs that may call this one.

This routine will check all 71 sublists in a properly-formatted list argument in less than 0.2s on my 50g, which makes checking a variety of list arrangements possible in a reasonable time period. An emulated 50g on my laptop executes the same code in about 0.0045s.

Albert Chan · 12-31-2018, 07:59 PM

(12-27-2018 04:59 PM)DavidM Wrote: I suspect the probability of randomly generating a list that has 71 "matches" is quite small.

To get 71 matches, first half must have 50 blues.
Tried 5 runs of million random samples, P(first half has 50 blues) ~ 14.84%

The other half guaranteed 50 blue balls, but only 1 combination work. (when first half = second half)

P(71 matches) = 14.84% / nCr(70,50) ~ 9.17e-19