July 07, 2005

WCB: Pluggable bytecode loaders

This was one of the things I really wanted to get into parrot. Granted, mainly to support playing Zork, but as a side-effect we would've gotten the capability to load in JVM and .NET bytecode, along with python bytecode, and for the really adventurous platform-native executables. (Or even better, executables for other platforms)

What I'm talking about here is giving the bytecode loading system of parrot the capability to have special-purpose bytecode loading libraries which can be loaded at runtime, as well as the capability of detecting what type of bytecode is being loaded and handing it off to the right loader. Parrot does have a version of this built in, but it's relatively rudimentary.

What I wanted to do was have a general-purpose mechanism in place to allow registering a loader and the conditions under which it would fire, and then have parrot walk the list of loaders every time a file was loaded up. This was already sort of necessary, and sort of implemented, to dispatch based on the extension of the file loaded in -- that's how parrot manages to handle bytecode, pasm, and pir files transparently. It's a bit hardcoded, though, and I'd rather it wasn't.

Why allow runtime additions to the bytecode loading system?

My personal favorite, all jokes aside, is the z-code loader. I fully expect that nobody sane will deploy it in a production environment, but I personally think it'd be really cool to be able to do:

parrot lurkinghorror.dat

and find myself on the campus of good old George Underwood University.

That aside, there are a lot of different bytecode engines out there, and there's no real reason not to be able to do a transform from one to another. Combined with the loadable opcode library facilities that were supposed to go into parrot there's no reason parrot shouldn't be able to handle other engine's bytecode -- the simplest way is to have a library of opcode functions that exactly match the functionality of the original bytecode and do a transform from the original bytecode to parrot bytecode, something that'll likely be mostly just an 8->32 bit word transform with a little bit of opnumber munging. This is something that's pretty easy for parrot and much less easy for most other VMs, since we have such a huge range of opcodes. Doing it on the JVM or .NET engine would require a more complex transform of the inbound bytecode. (Which isn't a bad thing, of course. It's just a thing)

More usefully, if you consider source code just an odd and somewhat densely packed bytecode, it means that allowing this means that all you need for parrot to properly dispatch source to a compiler is a bytecode loader that takes the source, compiles it, and then executes it. Want to handle ruby? Have a bytecode loader that dispatches all the .rb files to the ruby compiler and runs them. Tcl? Same thing. Heck, do it for all the languages that have registered compilers. (Though this would argue for a deferred, just-in-time library loader so you don't pay to load in all the language compilers and bytecode translation modules every time parrot's started, but that's not a big deal)

Posted by Dan at July 7, 2005 02:24 PM | TrackBack (0)
Comments

I would consider source code an odd and somewhat *sparsely* packed bytecode. But that's just me. :)

Posted by: Clinton Pierce at July 8, 2005 10:45 AM

Heh. You might, but doing a quick check on my compiler I find that a 2K source file generates a 68K PIR file and a 169K bytecode file. Granted, this might be a compiler problem -- I may dig into that to see what's up -- but C's not too far behind, with a 41 byte hello world source generating a 17K executable.

HLL are generally more semantically dense than lower-level languages, and it doesn't get much lower-level than bytecode or machine code.

Posted by: Dan at July 8, 2005 11:13 AM

IKVM does JIT Java bytecode to .NET CLR's CIL bytecode, and I imagine it is quite a boon to CLR adoption. IKVM for Parrot will indeed be a large win. :)

Posted by: autrijus at July 15, 2005 11:09 PM