The current Rubinius VM is named Shotgun. It is Evan's hand-coded C migration from his earlier VM implemented in Ruby.
The VM has several basic parts: parser, CPU, garbage collector.
Object model
In Ruby, everything is an object. In the Rubinius C code, these are kept as @OBJECT@, which are uintptr_t.
For most data types, the OBJECT points to a struct rubinius_object. As with MRI, Rubinius keeps the following data types right in the OBJECT (so-called immediate types):
-
Symbol -
Fixnum - The special values @nil@, @false@, @true@, each with their own class
There also is an additional special type, undef.
To distinguish reference OBJECT values from the immediate ones, Rubinius uses the fact that most architectures align allocated objects on 8- or 16-byte boundaries; Rubinius currently only assumes an alignment by a multiple of two. Symbols are currently marked by setting the last two bits of the OBJECT to 3 (11), Fixnums have the last two bits set to 1 (01). @false@, @true@, nil and undef are currenly stored as 0, 2, 4, 6; i.e. they look a bit like references (they are even), but they are easily discernible by being too small numbers to be pointers. All other even values are expected to be references, represented by converting the C pointers to unsigned long (REFERENCE_P).
(The previous paragraph says currently way too often to remind readers that the object representation might change, so this knowledge should not be applied directly, but only through the defined C macros.)
References point to @struct rubinius_object@, which starts with a header that contains three fields for various object and GC flags, a hash value, and @NUM_FIELDS@, a number that gives the number of fields of this object, and the klass pointer (itself an OBJECT). After the header, NUM_FIELDS fields follow, each of which needs to be an OBJECT (otherwise, garbage collection will crash).
An exception to this is *byte arrays*, which are structured mostly the same way, but the place normally occupied by OBJECT values is filled with bytes
(that do not have any OBJECT semantics). Since the number of fields still counts in OBJECT size units, byte arrays can only exist in multiples of
REFSIZE (sizeof OBJECT), i.e. 4 for 32-bit and 8 for 64-bit architectures.
The Rubinius data structures are fixed in size once allocated; therefore, Array uses (points to from a field) an ancillary data structure Tuple to contain the actual array, and String uses an ancillary data structure ByteArray to contain the actual characters. By exchanging the ancillary data structure for a different-sized one, the Array/String can change its size and still stay the same object.
Source Files
The basic object model is defined in include/rubinius.h. All other C files are in @shotgun/lib@, a prefix which will be elided in the rest of this.
archive.c contains code to manage ZIP archives. ZIP archives are used by Rubinius to keep compiled Ruby code in .rba files, ZIP archives composed of .rbc files.
The following files define basic Ruby data types:
- array.c (Array)
- bignum.c (Bignum). This makes use of libtommath
- bytearray.c (ByteArray, see above)
- class.c (Class)
- io.c (IO)
- module.c (Module)
- object.c (Object)
- regexp.c (Regexp)
- string.c (String)
- symbol.c (Symbol)
- tuple.c (Tuple, see above)
auto.c contains code that sets up all the basic classes.
The classes for these basic Ruby data types are kept in state->global. This is set up in bootstrap.c.
The actual CPU code is in @cpu.c@, which makes use of cpu_instructions.c and cpu_primitives.c Most of the code there is generated by instructions.rb and @primitives.rb@, which generate the .gen files. machine.c contains ancillary code for stacktracing etc., and rubinius.c contains various glue.
cpu_marshal.c contains code for writing Rubinius objects to .rbc files and reading them back in. This format is different from MRI marshaling.
Some classes are used internally by Rubinius to represent methods and their activation records:
- metaclass.c, methctx.c, methtbl.c, var_table.c
grammar.y and friends are the Ruby parser.
@heap.c@, object_memory.c and state.c are the basis for memory management, see below.
Garbage Collection
Garbage Collection is central to the workings of a VM for a modern language like Ruby. Ruby does not provide a way for objects to be deleted; instead, they are just dropped on the virtual floor when no longer being used (i.e., no live reference points to them any longer). The garbage collector mops through the VM now and then and makes sure those dead objects don't use up memory.
Rubinius GC occurs in two levels:
- baker.c: New objects start in the Baker (stop-and-copy) GC heap. When that is full, all live objects are copied to another Baker heap, and the old heap is set aside to be used as the new heap in the next Baker collection step. To allow GC to wait until we are between two VM instructions (so that during execution of any single instruction, there is no need to think about objects relocated by this process), Baker allows allocation to spill over into the new heap. (As a result, the size of the Baker heap also currently limits the size of any object that can be allocated.)
- marksweep.c: After a few (currently 16) rounds of surviving stop-and-copy, objects are tenured into a separate space for mature objects. This is still being garbage collected, but only using a mark-and-sweep algorithm (i.e., not the entire heap is copied each time). Mark-and-sweep space is allocated in *chunks*, so there are multiple mark-and-sweep areas.
To ensure that objects in the Baker heap that are only pointed to from the mark-and-sweep heap are noticed to be alive, a remembered list is kept. This is updated in a so-called write barrier that makes sure fields in the mark-and-sweep heap that are written with new values from the Baker heap also make it into the remembered list.
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile »
0.10—69% complete
Completed 9 of 13 tickets
Pages
- Home
- FAQ
- IRC Info and Who's Who
- Releases
- Using Git
- Installation
- Getting Started
- Common Build Problems and Solutions
- Howto - Contribute
- Howto - Write a ticket
- Howto - Run my Rails app with Rubinius
- Howto - Write a Ruby spec
- Howto - Write a Rubinius spec
- Howto - Fix a failing spec
- Howto - Develop with a separate RubySpec repo
- Howto - Debug shotgun
- The Rubinius specs
- Shotgun - The Rubinius Virtual Machine
- Developer Readme
- Core Library - Coding Guidelines
- Coding Style Guide
- Contributor Platforms
- Stuff to read
- Extending Standard Library Specs
- Improve Gem Support in Rubinius
- Actors - Concurrent Rubinius
- FFI or Foreign Function Interface
