Ugh. I'm getting more and more forgetful... splitting more than a single bucket was the purpose of weaving this into the GCD routine. But as it turns out,
all of these buckets need swapping of their respective halves. Sooo... congratulations, you found another bug in my code!
(05-09-2021 01:21 AM)Albert Chan Wrote: The idea of getting buckets from gcd(2n,d) does not have much to back it up.
What this is supposed to achieve is the mod4 splitting you're doing in the recursive function, just generalized a bit and ahead of time so as not to bog the recursive function down. (It's enough of a hot-spot, I can tell as much without a profiler.) On 60 this splitting is upgraded into mod8 instead of mod4, because 60 is divisible by 4 but not by 8.
Let me show you how it's supposed to work. I'll use 30 for a direct comparison, where it's just the mod4 rule you developed.
GCD(n,d) buckets are:
- 1: {1 7 11 13 17 19 23 29}
- 2: {2 4 8 14 16 22 26 28}
- 3: {3 9 21 27}
- 6: {6 12 18 24}
- 5: {5 25}
- 10: {10 20}
- 15: {15}
- 30: {} (empty because it's the base)
I've deliberately sorted these not in increasing order of GCD, but by presence of prime factors of 30 in each GCD (top level sort: prime factor 5 present only in the last four; then prime factor 3, and finally 2). I've also listed an empty bucket for 30, because technically that can be assembled from the prime factors of 30 too, and it makes the sorting by prime factors cleaner.
GCD(2n,d) buckets are (new buckets from doubling n are indented further; sorting is the same):
- 1: {1 7 11 13 17 19 23 29}
- 2: {2 14 22 26}
- - 4: {4 8 16 28}
- 3: {3 9 21 27}
- 6: {6 18}
- - 12: {12 24}
- 5: {5 25}
- 10: {10}
- - 20: {20}
- 15: {15}
- 30: {} (empty because it's the base)
- - 60: {} (also empty because it's larger than the base)
30 is a 4k+2 base, so d4, d8, d12, d16, d20, d24, d28 need to be = 2 (mod 4), i.e. 2, 6, 10, 14, 18, 22, 26. That's exactly what's left in the buckets 2, 6, 10 together after splitting them up by doubling n.
d2, d6, d10, d14, d18, d22, d26 need to be = 0 (mod4), i.e. 4, 8, 12, 16, 20, 24, which is what's in the new buckets 4, 12, 20.
Voila, on n = 30 doing GCD(2n, d) nets us the same split that comes from the mod4 rule.
However, as you can easily see, on these split buckets the contents of each corresponds to the digit indices of the
other one, instead of its own. That is compensated by swapping them after filling, so I can later simply look them up using the recursion depth counter. (There would be an extra indirection through the list of GCDs by digit, but my code flattens that away too before entering the recursive function.)
Example for that: when looping through the possibilities on the 18th digit (d18 in your nomenclature), I pull up bucket GCD(2n, 18) which would normally be bucket 6, but swapping makes me hit the bucket I generated for 12, which contains {12 24}. Those are the digits that actually fit for d18, because they were in the bucket 6 on the old GCD(n, d) table
and match the mod4 rule.
---
The generalization to mod8 for 8k+4, mod16 for 16k+8, etc. uses the same principle that these necessarily even digits must be preceded by an odd digit. Let's use the variable p to denote the power of 2 that still divides the base, so that makes the generalization a (mod 2p)-rule for the two variants (2pk) and (2pk+p). Adapting your equations, because you can write this stuff up better than me (I hope I didn't slip up somewhere in these equations):
If n = 2pk:
\(d_{2p-1}\cdot(2pk) + d_{2p} = 0 \pmod{2p}\)
\(d_{2p} = 0 \pmod{2p}\) → \(d_{2p} = d_{4p} = d_{6p} = ... = 0 \pmod{2p}\) → \(d_{p} = d_{3p} = d_{5p} = ... = p \pmod{2p}\)
That part was uninteresting because clearly n is divisible by 2p, so we got a GCD bucket for 2p anyway, even without doubling the base thrown into the GCD algorithm.
If n = 2pk+p:
\(d_{2p-1}\cdot(2pk+p) + d_{2p} = 0 \pmod{2p}\)
\(2\cdot d_{2p-1} + d_{2p} = 0 \pmod{2p}\)
\(d_{2p}/2 = -d_{2p-1} = p/2 \pmod{p}\)
\(d_{2p} = p \pmod{2p}\) → \(d_{2p} = d_{4p} = d_{6p} = ... = p \pmod{2p}\) → \(d_{p} = d_{3p} = d_{5p} = ... = 0 \pmod{2p}\)
This is the part that is captured by doing GCD(2n, d) instead of GCD(n, d). We throw one additional 2 into the pot of prime factors available to build GCDs out of, and that splits the p bucket into p and 2p in line with this rule, as well as 3p into 3p and 6p, 5p into 5p and 10p, etc.
I just need to swap all of these split buckets, not only p and 2p as the code in my previous post does, but also e.g. 3p and 6p. Solid theoretical foundation (I think), sloppy practical implementation. It's not helpful that our only found solutions for this puzzle are on such low bases, because they don't trigger the bug, and on the higher ones there are no solutions that could be skipped due to it.
Edit: bugfix was simple, just loop over multiples of highest_2power_gcd, and check if there's actually a bucket at that index.
Code:
#include <assert.h>
#include <malloc.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned int singledata_t;
typedef unsigned long long doubledata_t;
static_assert (sizeof(singledata_t) * 2 == sizeof(doubledata_t));
typedef struct {
unsigned int len;
singledata_t data[];
} nearinflen_uint_t;
#define NEARINFLEN_UINT_MALLOC(len) ((nearinflen_uint_t *) \
malloc(sizeof(nearinflen_uint_t) + sizeof(singledata_t) * (len)))
#define NEARINFLEN_UINT_SHIFT (sizeof(singledata_t) * 8)
static inline nearinflen_uint_t *nearinflen_uint_zero() {
nearinflen_uint_t *n = NEARINFLEN_UINT_MALLOC(1);
assert(n != NULL);
n->len = 1;
n->data[0] = 0;
return n;
}
static inline bool nearinflen_uint_iszero(nearinflen_uint_t *n) {
int i = 0;
for (; i < n->len; ++i)
if (n->data[i] != 0) return false;
return true;
}
static inline void nearinflen_uint_release(nearinflen_uint_t *n) {
free(n);
}
static nearinflen_uint_t *nearinflen_uint_add(nearinflen_uint_t *n, singledata_t m) {
nearinflen_uint_t *res = NEARINFLEN_UINT_MALLOC(n->len + 1);
assert(res != NULL);
unsigned int i = 0;
doubledata_t carry = m;
for (; i < n->len && carry != 0; ++i) {
carry += n->data[i];
res->data[i] = (singledata_t) carry;
carry >>= NEARINFLEN_UINT_SHIFT;
}
if (carry != 0) {
res->data[n->len] = (singledata_t) carry;
res->len = n->len + 1;
} else {
res->len = n->len;
memcpy(res->data + i, n->data + i, sizeof(singledata_t) * (n->len - i));
}
return res;
}
static nearinflen_uint_t *nearinflen_uint_mul(nearinflen_uint_t *n, singledata_t m) {
nearinflen_uint_t *res = NEARINFLEN_UINT_MALLOC(n->len + 1);
assert(res != NULL);
unsigned int i = 0;
doubledata_t carry = 0;
for (; i < n->len; ++i) {
carry += ((doubledata_t) n->data[i]) * m;
res->data[i] = (singledata_t) carry;
carry >>= NEARINFLEN_UINT_SHIFT;
}
if (carry != 0) {
res->data[n->len] = (singledata_t) carry;
res->len = n->len + 1;
} else res->len = n->len;
return res;
}
static singledata_t nearinflen_uint_div2(nearinflen_uint_t *n, singledata_t m,
nearinflen_uint_t **res) {
if (res != NULL) {
*res = NEARINFLEN_UINT_MALLOC(n->len);
assert(*res != NULL);
memset((*res)->data, 0, sizeof(singledata_t) * n->len);
}
unsigned int i = n->len - 1;
doubledata_t rem = n->data[i];
if (m > rem) {
if (i == 0) {
if (res != NULL) (*res)->len = 1;
return n->data[0];
}
--i;
rem = (rem << NEARINFLEN_UINT_SHIFT) | n->data[i];
}
if (res != NULL) (*res)->len = i + 1;
doubledata_t m_shifted = m, bit = 1;
for (; rem >= m_shifted; bit <<= 1, m_shifted <<= 1);
while (bit != 1 || i != 0) {
if (bit == 1) {
bit <<= NEARINFLEN_UINT_SHIFT;
m_shifted <<= NEARINFLEN_UINT_SHIFT;
--i;
rem = (rem << NEARINFLEN_UINT_SHIFT) | n->data[i];
}
bit >>= 1;
m_shifted >>= 1;
if (rem >= m_shifted) {
rem -= m_shifted;
if (res != NULL) (*res)->data[i] |= bit;
}
}
return (singledata_t) rem;
}
typedef unsigned long long bitset_t;
typedef struct output_digit_t output_digit_t;
struct output_digit_t {
singledata_t digit;
output_digit_t *next;
};
singledata_t base;
bitset_t *valid_digits;
static void try_digits(nearinflen_uint_t *n, singledata_t i, bitset_t not_taken) {
bitset_t valid = not_taken & valid_digits[i];
singledata_t digit = i - nearinflen_uint_div2(n, i, NULL);
for (; digit < base; digit += i) {
bitset_t digit_bitset = 1 << (digit - 1);
if ((valid & digit_bitset) == 0) continue;
nearinflen_uint_t *tmp = nearinflen_uint_add(n, digit);
if (i + 1 != base) {
nearinflen_uint_t *tmp2 = nearinflen_uint_mul(tmp, base);
nearinflen_uint_release(tmp);
try_digits(tmp2, i + 1, not_taken & ~digit_bitset);
continue;
}
/* this destroys digit and n, but we cannot have
two results that only differ in the last digit anyway,
so print and break out */
nearinflen_uint_release(n);
digit = nearinflen_uint_div2(tmp, base, &n);
bool last;
output_digit_t *out = NULL, *prev;
do {
last = nearinflen_uint_iszero(n);
prev = out;
out = malloc(sizeof(output_digit_t));
assert(out != NULL);
out->next = prev;
out->digit = digit;
nearinflen_uint_release(tmp);
tmp = n;
digit = nearinflen_uint_div2(tmp, base, &n);
} while (!last);
nearinflen_uint_release(tmp);
do {
last = out->next == NULL;
printf("%u%c", out->digit, last ? '\n' : ';');
prev = out;
out = out->next;
free(prev);
} while (!last);
break;
}
nearinflen_uint_release(n);
}
static inline singledata_t gcd(singledata_t big, singledata_t small) {
do {
singledata_t tmp = big % small;
big = small;
small = tmp;
} while (small != 0);
return big;
}
int main(int argc, char **argv) {
if (argc < 2) {
printf("Usage: %s <num>\n", argv[0]);
return 1;
}
base = (singledata_t) atoi(argv[1]);
if (base < 2 || base > 65) {
printf("Invalid input, must be an integer between 2 and 65\n");
return 1;
}
if ((base & 1) != 0) return 0;
valid_digits = calloc(base, sizeof(bitset_t));
assert(valid_digits != NULL);
singledata_t *gcd_by_pos = calloc(base, sizeof(singledata_t));
assert(gcd_by_pos != NULL);
singledata_t i = 1;
bitset_t bit = 1;
for (; i < base; ++i, bit <<= 1) {
singledata_t tmp = gcd(i, base * 2);
gcd_by_pos[i] = tmp;
valid_digits[tmp] |= bit;
}
singledata_t highest_2power_gcd = 1;
while (base % highest_2power_gcd == 0) highest_2power_gcd *= 2;
for (i = highest_2power_gcd; i < base; i += highest_2power_gcd) {
bitset_t tmp = valid_digits[i];
if (tmp != 0) {
valid_digits[i] = valid_digits[i / 2];
valid_digits[i / 2] = tmp;
}
}
for (i = 1; i < base; ++i) valid_digits[i] = valid_digits[gcd_by_pos[i]];
free(gcd_by_pos);
nearinflen_uint_t *zero = nearinflen_uint_zero();
try_digits(zero, 1, ~((bitset_t) 0));
free(valid_digits);
return 0;
}
Code:
::
CK1NoBlame FPTR2 ^CK1Z
DUP FPTR2 ^Z># DUP #2-
BINT63 #> caseSIZEERR
DUPONE #AND #0<> case2DROP
WORDSIZE 1LAMBIND
ERRSET ::
BINT64 dostws BINT1 #>HXS
OVER ONE_DO
DUP ONE{}N 4UNROLL
DUP4UNROLL bitSL
LOOP
DROP' ::
OVER #2* #3+ GETLAM
4PICK bitAND UNROT
3GETLAM FPTR2 ^QMul 1GETLAM
3PICK 3PICKOVER FPTR2 ^#>Z
FPTR2 ^Mod FPTR2 ^Z># #-
DO
INDEX@ #2* #2+ GETLAM
4PICKOVER bitAND
OVER HXS==HXS %0=
ITE_DROP ::
bitNOT 5PICK bitAND
3PICK#1+_ 3PICK INDEX@
FPTR2 ^#>Z FPTR2 ^QAdd
OVER 1GETLAM #<>case
2GETEVAL
ROTDROPSWAP
#2* #2* #3- UNROLL
;
OVER +LOOP
4DROP
;
SWAP' NULLLAM OVER #2* #1+
NDUPN DOBIND
1GETLAM ONE_DO
1GETLAM #2* INDEX@
BEGIN
SWAPOVER #/ DROP
#0=UNTIL
DROP #2* #3+ DUP GETLAM
INNERCOMP INDEX@ ROTOVER
#2* #2+ GETLAM bitOR
ROT#1+ {}N SWAP PUTLAM
LOOP
BINT1
BEGIN
#2*
1GETLAM OVER #/ DROP
#0<> UNTIL
1GETLAM OVER#>
IT ::
1GETLAM OVER DO
INDEX@ #2* #3+
DUP GETLAM INNERDUP #1=
ITE 3DROP ::
INDEX@ #3+
DUP GETLAM INNERDUP
#4+PICK SWAPROT
OVER #4+ UNPICK_
{}N SWAP PUTLAM
{}N SWAP PUTLAM
;
DUP +LOOP
;
DROP
1GETLAM ONE_DO
INDEX@ #2* #3+ GETLAM
DUPTYPEHSTR?
ITE_DROP ::
INNERCOMP ONE_DO
DUPROT #2* #3+ PUTLAM
LOOP
DROP
;
LOOP
BINT0 #>HXS bitNOT BINT1 Z0_
2GETEVAL
ABND
; ERRTRAP ::
1GETLAM dostws ERRJMP
;
1GETABND dostws
;
Times haven't changed significantly, and results are (as expected) identical. The 50g takes 111.6_s for 24 and 377.9_s for 22, which is basically the same as the previously reported times. The laptop still does 60 in about a second, 56 and lower in at most a handful of minutes (I paid a little more attention to the clock this time around, 52 takes the crown at about 7 minutes), and 58 shouldn't change at all from its 2.5 hours, since the bugfix has no effect on numbers of the format 2*prime (the new loop just runs from 4 to n in steps of 4 and encounters a bucket only on 4 itself, which the buggy code took care of too). I would be very surprised if this new loop took more than a few microseconds.