SimpleVM Part I

23 Jan 2012

Okay. Not gonna lie, I’m frustrated with SimpleVM as a project right now. Understood that I undertook it out of ignorance as an effort to learn something but that my initial poor design is now making the entire system messy is just a tiny bit

The hard part is the current feature branch, Malloc[/Realloc]/Free which is supposed to add to SimpleVM a working heap memory of sorts. There are a few major issues here: the design of the heap, from where heap memory is accessible and how heap relates to the registers around which SimpleVM was originally designed.

SimpleVM was originally designed to allow a developer or compiler assign to and operate on arbitrary register addresses in the positive half of the signed integer space. This is all well and good, except that it makes arrays and array operations laborious and unsafe. Because SVM code can LET to arbitrary register addresses, there is no way to really protect the registers’ integrity. The entire system is forced to place complete trust in the developer and/or compiler.

Take one on an solution was to employ a segmented RAM model whereby, as in more abstract languages like Python and Java, programmers would use symbolic names for addresses and variables. These symbolic names would be managed by the VM automatically and all would be well. Not really a hardware-level feature however, that is really abstract and got thrown out early in SVM’s design phase.

Take two was the Malloc[/Realloc]/Free branch which used a set of arrays not accessible from or contained within the “normal” register address space. Calling MALLOC would set a register to the index of a chunk of raw C memory specially allocated for that MALLOC invocation. REALLOC, FREE and a dereference operator would allow this model to behave properly even permitting pointers in MALLOC’d space.

The issue with this design is in pointer offset arithmetic. The way that C and every machine with pointers I know of operates is that if B is of length 2, B[0] = (B+0) and B[1] = *(B+SIZEOF(TYPEOF(B))). Because the addresses which scheme #2 uses are NOT in fact addresses on a big linear chunk of memory but rather are addresses which *CANNOT BE EDITED WITHOUT BREAKING THINGS.

After talking to Prof. Witchel about pulling values from memory on an ix86, it is obvious to me that most machines on the market today spend a LOT of time/instructions swapping data into and out of processor registers from RAM. The best thing to do therefore is to adopt an ix86-like design scheme whereby registers are the EXCEPTION not the NORM, and where pointer arithmetic on RAM addresses does in fact work.

DESIGN DECISION 10k feet overview: use RAM not registers. To this end reduce the number of registers to which programmers have access, improve and focus on the use of RAM as the primary storage mechanism.

TODO LIST

Add a number (defaults to 64) to the program files which specifies how many registers the processor has to work with.
Throw out the register resizing code (now unneeded)
Provide a means for thread synchronization based on IO events. Because the LOAD and WRITE instructions are calls to a shared object some amount of scheduling is in order anyway. Room for research? I suspect so…. Or maybe not. We shall see.

STANDING DESIGN ISSUES AND EFFECTS

SVM registers now CANNOT be assumed thread-safe, each thread MUST have its own set of N registers
In efficient system for doing memory allocation, reallocation and management is now sorely needed
Threads are now much closer to being viable, because the thread- specific registers make it possible for threads to do computation in isolation from the rest of the program execution.
If threads are now going to happen, a global synchronization system is needed for IO to “shared” RAM. This is kinda fun, because since all C function calls are blocking this makes it much easier to synchronize threads at IO time and even automatically lock shared addresses (the i++ example)