SimpleVM Part I

Okay. Not gonna lie, I’m frustrated with SimpleVM as a project right now. Understood that I undertook it out of ignorance as an effort to learn something but that my initial poor design is now making the entire system messy is just a tiny bit

The hard part is the current feature branch, Malloc[/Realloc]/Free which is supposed to add to SimpleVM a working heap memory of sorts. There are a few major issues here: the design of the heap, from where heap memory is accessible and how heap relates to the registers around which SimpleVM was originally designed.

SimpleVM was originally designed to allow a developer or compiler assign to and operate on arbitrary register addresses in the positive half of the signed integer space. This is all well and good, except that it makes arrays and array operations laborious and unsafe. Because SVM code can LET to arbitrary register addresses, there is no way to really protect the registers’ integrity. The entire system is forced to place complete trust in the developer and/or compiler.

Take one on an solution was to employ a segmented RAM model whereby, as in more abstract languages like Python and Java, programmers would use symbolic names for addresses and variables. These symbolic names would be managed by the VM automatically and all would be well. Not really a hardware-level feature however, that is really abstract and got thrown out early in SVM’s design phase.

Take two was the Malloc[/Realloc]/Free branch which used a set of arrays not accessible from or contained within the “normal” register address space. Calling MALLOC would set a register to the index of a chunk of raw C memory specially allocated for that MALLOC invocation. REALLOC, FREE and a dereference operator would allow this model to behave properly even permitting pointers in MALLOC’d space.

The issue with this design is in pointer offset arithmetic. The way that C and every machine with pointers I know of operates is that if B is of length 2, B[0] = (B+0) and B[1] = *(B+SIZEOF(TYPEOF(B))). Because the addresses which scheme #2 uses are NOT in fact addresses on a big linear chunk of memory but rather are addresses which *CANNOT BE EDITED WITHOUT BREAKING THINGS.

After talking to Prof. Witchel about pulling values from memory on an ix86, it is obvious to me that most machines on the market today spend a LOT of time/instructions swapping data into and out of processor registers from RAM. The best thing to do therefore is to adopt an ix86-like design scheme whereby registers are the EXCEPTION not the NORM, and where pointer arithmetic on RAM addresses does in fact work.

DESIGN DECISION 10k feet overview: use RAM not registers. To this end reduce the number of registers to which programmers have access, improve and focus on the use of RAM as the primary storage mechanism.