Post Reply 
Intenal Object format
07-30-2021, 03:24 PM
Post: #2
RE: Intenal Object format
I'll take the silence to mean no. So for the record here is what I have so far. Note that there is the in-memory layout and the serialized layout on disk. Differences are pointed out where relevant. Data is stored on disk in little-endian format

The basic layout for all objects is:
Code:

struct THPObj {
  uint16_t reference_count; 
  // definition of flag bits varies with type,
  // but the least significant bit indicates a heap allocated object 
  uint8_t flags:4;
  ObjTag tag:4; // Indicates the type of this object
};

enum ObjTag {
  TAG_REAL = 0,
  TAG_INT = 1,
  TAG_STRING = 2,
  TAG_COMPLEX = 3,
  TAG_MATRIX = 4,
  TAG_ERROR = 5,
  TAG_LIST = 6,
  TAG_IDENT = 7,
  TAG_FUNC_CALL = 8,
  TAG_UNIT = 9,
  TAG_INSTRUCTION_SEQUENCE = 10, // Just a list with special semantics
  TAG_USERFUNC = 11,
  TAG_LIST_PROCESSOR = 12,
  TAG_EVALUATOR_REQUEST = 13,
  TAG_GEN = 14 // Wrapper around a giac::gen object
For clarity, since bitfield ordering is non-standard, the flags occupy the high 4 bits of the 3rd byte, while the tag occupy the low 4 bits.

In the serialized format, reference count values of 0xFFFF and 0xFFFE sometimes act as special indicators, but I haven't figured them all out yet.
Reals:
Code:

struct Real: public THPObj{
  int8_t sign_stuff;
  int32_t exponent;
  uint64_t mantissa; // Packed BCD
};
Flag of 2 causes number to be displayed as DMS.
For sign_stuff 0 = NaN, 1 = normal, 2 = +Inf, -1 = -normal, and -2 = -Inf.
Also it lookes like only the top 56 bits of the mantissa are looked at, but I haven't done much digging.
Ints:
Code:

struct Integer: public THPObj {
  int8_t num bits; // rage of [-64,64]. Negative values indicated a signed value
  uint8_t padding[4];
  uint64_t data;
};
Flag bits appear to be used to change the displayed base of the number, but I haven't bothered mapping them all out yet.

Strings:
Code:

struct String: public THPObj {
  uint8_t padding;
  TSize num_chars;
  TChar data[];
};
In memory TChar = char32_t, while on disk it is char16_t.
On disk if the ref count = 0xFFFF, or in memory, TSize = uint32_t.
On disk if the ref count != 0xFFFF, TSize =uint16_t.

Func Calls:
This info applies to in-memory format only I haven't looked too much at the serialized format
Code:

struct FunctionCall: public THPObj {
  uint8_t param_count;
  uint8_t padding[4]; // Probably only on 64-bit, but haven't verified.
  THPObj* unknown_obj;
  void* definition; // pointer to some kind of data structure which describes the function.
  THPObj* params[];
};
Note that in the serialized format list of parameter pointers is replaced with the serialized parameters.
Find all posts by this user
Quote this message in a reply
Post Reply 


Messages In This Thread
Intenal Object format - devin122 - 07-27-2021, 08:57 PM
RE: Intenal Object format - devin122 - 07-30-2021 03:24 PM
RE: Intenal Object format - jfelten - 08-02-2021, 05:15 PM
RE: Intenal Object format - devin122 - 08-04-2021, 06:45 PM
RE: Intenal Object format - rprosperi - 08-04-2021, 07:24 PM
RE: Intenal Object format - Tim Wessman - 08-04-2021, 11:58 PM
RE: Intenal Object format - rprosperi - 08-02-2021, 10:43 PM



User(s) browsing this thread: 1 Guest(s)