So, at work we're using MySQL as our main database for the obvious reasons--it's free and it's good enough. Personally I prefer Oracle, but that's just old habits and being used to massive operations where Oracle was the rightest answer available. (I'll withhold judgment on whether Oracle's ever really the right answer, but you take what you can)
Anyway, the new app design wants transactions. No problem, right? MySQL supports them with InnoDB tables. Fine. All the table create SQL has a TYPE=InnoDB, to make sure they're InnoDB tables. Run the app and...
Crap in tables. Rollbacks aren't working. This is odd. Futz around, and find all the tables are of type MyISAM, which doesn't support transactions. So, of course, we do an ALTER TABLE foo TYPE=InnoDB. Accepted, table noted as converted.
Except it isn't.
Turns out that this build of MySQL doesn't have InnoDB support built in. Which is fine (well, no, it isn't, but it's not the issue here, and we can work around that) but the damn table create and alter never complained! The alter even succeeded. It just didn't do anything. Or, rather, it did a lot of work to ultimately do nothing.
Yes, I know, I can build my own, and probably will if I can't get stock RPMs. (Though I really would prefer stock RPMs, for reasons I don't want to go into) What just pisses me off is that, not only did the software not tell me that it couldn't do what I asked, but it acted like it did. And I burned a whole afternoon.
Makes me wonder how many folks have DB installs that they think have transactions enabled but really don't.
Update: I now have a working MySQL with working InnoDB support built in, installed. Which is good. It will still quietly lie if the inno settings aren't set up, which is still bad.
Finally got around to downloading and installing valgrind to run against parrot, to try and track down some weird memory problems that show up on OS X, but not on Linux. (OS X is my primary platform, the patch that triggers the bug came from someone on Linux) Turns out that the bug does manifest on linux, just nothing whines about it. OS X's C library's much pickier about things by default. This, I think, is a good thing.
Definitely a cool and useful toy, though. Almost enough to get me back to Linux as a development system...
Since I seem to end up going on about a variety of less well-known computer tricks, I figure I might as well make it a semi-regular feature. So welcome to the first official entry in the "What the heck is" series. :)
These are, or will be, questions that crop up as part of Parrot development. They're the sorts of things that you might run across when writing interpreters, OS kernel code, language compilers, and other relative esoterica. These are things that nobody (well, to a first approximation, at least) does, so the concepts are at best really fuzzy for most folks, and in many cases (such as with the continuation) completely unknown. So, rather than just grumbling about the sad state of people's knowledge about low level hardware and software concepts, I should try and do something about it.
Today, we're going to talk about walking the system stack, something that comes up with garbage collection. (I should talk about garbage collection. Later. Hopefully I'll remember to make a link here)
The system stack, for folks that aren't familiar with it, is just a chunk of memory that your machine's CPU uses to hold temporary values. There's usually a CPU register dedicated to it, either by convention or as part of the hardware itself, so access to the stack is fast, something that's generally considered a good thing.
Walking the system stack is a process where you figure out where the stack starts and ends and looking at all the data on it. Not necessarily changing the data, mind (as it's often write-protected, at least in part) but just looking at it. This is generally considered mildly evil, in part because there's no use you can make of the data on the stack that doesn't break encapsulation, any data hiding, or code modularity. Still, there are often good reasons to do it.
Garbage collectors (or GCs), for example, are one of those reasons. The whole purpose of a GC is to see what data is in use and what isn't. That means if there could be a reference to data on the stack, perhaps because the language accessing the data uses the system stack for data storage. (C does this, for example, as do nearly all the compiled languages that don't support closures) You'd really hate to clean up after a variable you thought was dead because you didn't look at all the places that the variable could be referred to from.
How does one walk the stack? Well, you need to get the base of the stack, the spot where the stack is as empty as possible. Depending on the OS and CPU architecture, there may be an easy way to do this. If not, what you can do is get the address of a variable that's been allocated on the stack at the very beginning of your program, at a point where there can't possibly be anything of interest on the stack. Then, when you want to walk the stack, you just get the current stack pointer. Everything between the two is your stack data, and you just run through it like any other array of binary data. Depending on what you're looking for you may be able to cheat a bit--many systems require that stack data be aligned, so that a 4-byte pointer must be on a 4-byte boundary, which reduces the amount of data you have to look at. (And other systems don't put any requirements at all on the stack data, which makes things a bit of a pain)
And that's it. Nothing at all fancy--walking the stack is just a matter of figuring out where and how big the current stack is, and grovelling over it. A trick you should almost never do, but when you need to, well, you need to.
Okay, let's talk for a moment about providing data for this proposed blog notification system. What is there, and what does it look like?
Data streams, I'm proposing, is divided up into channels. Each channel has three things associated with it:
The title is, as you might expect, the blog title. No big deal, other than being text so there are all those pesky character set issues to deal with. (Yes, I know, Unicode is the answer and will save us all! I think not) Blog titles are restricted to no more than 1023 octets. How many characters that is depends on the encoding, but worst case you're in full UTF-8, with room for 170, which ought to be more than enough.
The base url is the url that all contents vector off of. The channel should present something meaningful here, if queried. URL is limited to 255 characters. I think. Should be enough.
The channel ID is a base64 MD5 checksum of the original title and URL for a channel. If a channel changes title or URL, the MD5 checksum used for that channel doesn't change. The title and URL should be slammed together with no extra characters.
The public key is the public key of the channel. Everything that comes from this channel, or reports itself as from this channel, can be validated off this key. All outgoing messages are signed, so that clients and transit servers can run their signatures against this key and see if they're real messages. Or so is the plan, at least.
So, when you look at a channel, you may see:
Title: Squawks of the Parrot
Base URL:
Channel key: H0OQJSfvne/3yQ2lkISmvg
public key: SOMERANDOMSTRINGOFDIGITSANDLETTERS
(This isn't how it goes over the wire, just the data itself. We'll touch on wire format messages later. Maybe in this entry, maybe not. Dunno yet, and who edits blog text anyway? :)
Now, when you send a message across noting a change, it's one of:
Note that change messages include deletions.
Each message has exactly four things in it:
The message type is one of the above things--new entry, new comment, new trackback.
The channel key is the MD5 checksum of the base channel's original data (title and URL)
The relative URL is the URL tacked onto the base channel URL (just a straight slam together) to get the full path to the data. Note that, since comments and trackbacks are considered modifications of the base data element, the URL for a comment would be the same as the URL for the thing being commented on.
The signature is the public key signature of the message. When the message itself is run through the channel's key it should match this key. (I'm a bit fuzzy on the mechanics of asymmetrical public key crypto systems, so we'll put off the decision of what PK system is used, and just assume that something is)
If a system is something like the Lambda weblog or an Everything engine system, where each comment is a node in its own right, a new comment generates a new post, and things get odd. I'm not sure what to do in that case, other than perhaps have a "response to" message with the URL being responded to and the URL of the response.
Limits of the system
There are some limits here, of course.
All data must hang off the base URL. I don't think this is an issue for anyone, though I can see it being a problem if there are multiple data sources sharing a base URL, or of the base URL needs things removed from it. (Chopping off the index.php, for example, to get the base for the relative URLs) Don't do that, at least not for now.
There's required PK crypto, or at least secure verifiable digesting. I expect this may run afoul of a number of laws in various countries. I'm up for alternate validation methods if anyone has one, but I don't know of any.
There's no way to sub-divide a channel. For a blog this may not be a problem, but I can see someone like the New York Times wanting one big "times channel" with a bunch of sub-channels for each section. (NYC Metro, Tech, Science, Sports, whatever) Too bad, we don't do that right now.
There's the issue of changing channel data as well, which could be... interesting. Punting for now, but that may involve verifying based on out-of-band data. (Key files on the original URL/machine or something)
I think, though, that what's here is sufficient for what I want, at least from a source end. Have at it, though, as I don't want to do any technical details until I'm sure that what's being proposed is semantically sufficient.
One of the best things about easter is the after-easter candy sales, when the leftover candy gets dumped, because this means I can pick up packages of peeps or 25 cents each. This is, in itself, a Good Thing, but this year I have a small blowtorch. You know what that means...
Peep S'Mores!
Mmmm, mmm good!
So, the power supply on my iBook crapped out. Again. First time was the plug into the computer, second the cable from the wall to the transformer, and this time the transformer itself. This'll be the third time parts have been replaced.
That doesn't bother me. Well, not much.
What bothers me, and really pisses me off, is the local Apple Store. I've got the extended warrantee for this thing (I travel, and I'm not stupid) so anything that breaks should be replaced. This is the third time I've walked in with a broken power supply, and the third time they didn't have one in stock. It's not like this is some bizarre part--it's a damn power supply, and one that works on a lot of the iBook models.
This time, though.... not only did they not have any to replace mine with, but they had three on the shelf. Just to add insult to injury. FIx it under warrantee? Sorry, call apple care, it'll be three days. But you can buy one right now, if you want. A complaint to the manager got some handwavey blather about repair parts being from different accounts than retail parts, but let's be blunt--I don't care, and it's not my problem. Period. I don't care what account anything came from, or who's problem it is. I have a broken part and a contract that says that part should be replaced, and they refused.
I'm seriously considering another vendor when it's time to replace this iBook. And there aren't any other vendors of OS X stuff...
Okay, I've been thinking about this some, and I might as well get this down so folks can go rip it to shreds as they want. (Yes, I know, I should write a continuations thing. Maybe tomorrow, but probably not)
The problem, if you'll remember, is that I loathe the current "poll for RSS" scheme of seeing when blogs update, along with an utter lack of notification for things like trackback links and comment pings. What I'm proposing is an NNTP-style system with a set of loosely connected server systems that take notifications from the end-blogs and pass them around to whoever's watching, with the notifications ultimately hitting clients connected to those servers waiting for notifications of changes. Nothing fancy, just your standard store-and-forward multicast system. We've been doing this with news for decades, with some success. And some failure, too, of course. Forgetting the failure would be a bad thing.
I think the project is too big to go in one blog entry, so this time it's just the general assumptions.
The first assumption is that, for any sufficiently large group of people, some of them will be scum and do abusive things. This is the single most important assumption. I don't particularly like it, but pretending it is otherwise leads to the current state of email and Usenet News, where trust and sense (or at least an available admin with a clue-by-four) is assumed and definitely not really present. So, people will try to spoof, abuse, and hijack the system, either to get spoofed data out in the wild or to do abusive things to some data source's system.
The second assumption is that we are not distributing content. We are distributing notifications of content changes. We may, at some point, talk about content change distribution, but not this time. This means that we aren't sending article contents, excerpts, titles, or whatever around, just notifications that action X happened to URL Y with rider data Z. Maybe. We might toss the rider data. (Though sending around trackback counts and comment counts would be useful)
The third assumption is that the protocols should be efficient. That means the message for a URL change might be something like"\0\23POST\n/archive/0005.html" rather than whatever monstrosity would result if we XML encoded it. (Yes, there are two bytes of binary data marking a length prepended, though I can see dropping it to one byte, and I can see forcing the message type to exactly N bytes with no terminator, for some value of N) Remember--if I was keen on XML the last thing I'd be doing was grumbling about bandwidth usage.
The fourth assumption is that each data source will be in a channel, which one can subscribe to so your immediate upstream data provider can get the feed somehow. Hopefully not from the ultimate provider, though.
The fifth assumption is that, while the messages aren't that important, neither are they entirely meaningless (otherwise why bother in the first place?) so we need some form of store-and-forward system.
The final assumption (that I'm admitting to, at least) is that the protocols should be simple. Someone should be able to bodge together a client or feed submitter in a reasonably simple perl/python/ruby/scheme/unlambda module. Well, maybe not unlambda. Still, no ties to one language, and a simple enough protocol that one could probably do it by hand with a telnet client. Which argues against my proposal for message length. Damn.
I think that covers the assumptions. It's possible that something like Jabber or IRC can handle the middle server stuff, which would be just fine, though the store and forward thing may shoot that down. We'll see.
Next time is the feed end and channel stuff, I expect.
I hadn't planned on talking about this for a while, but this log entry (which, alas, has no comments enabled so I can't comment directly there) brings up a point I do want to talk about.
Inter-langauge interoperability under Parrot. What, exactly, does it mean?
Well, it means that any language implemented on top of parrot that respects parrot's calling conventions may make use of any code written in any other language that respects Parrot's calling conventions. This means that your perl 6 program can load up and use perl 6, perl 5, python, ruby, and (maybe, if we thump it to do so) Forth library code, and call between it all transparently. It means that if something hands your code an object that you can call methods on it regardless of what language created the object. (Heck, each method the object has may be written in a different language)
The nice thing is that if we get things right, nobody should have to do anything special to make it work. You can snag an all-perl module off of CPAN and use it in your python program, at least once we get the perl 5 and python compilers working. Should all just work, and the only potential issue will be making sure that the standard libraries of all the languages in use are handy. That and potentially having two or three (or four, or five, or six...) libraries that do the same thing, only slightly differently.
Things are a bit different when it comes to modules with C code, though. We're not going to present an interface that's compatible with any of the existing languages--they all rely on intimate knowledge of the internals of their respective interpreters, and getting them to work would be a massive and ultimately fruitless endeavor, as there's no way we can really duplicate the internal semantics.
That doesn't mean, though, that modules with C code can only be used by one language--like the native modules, if Parrot can load up a module with C code, anything that runs on parrot can use it. It just means that there won't be a simple recompile-and-go for C code.
We will try and present at least a minimal compatibility layer, as feasible, as there are some routines that can be macro'd up just fine. Perl's newSViv routine generates a new scalar with an integer value that's passed in. That can be easily handled with a macro that allocates a new PMC and assigns an integer value to it. It's when you get to things like SAVETMPS and other code that depends heavily on the performance of the internals of the interpreter (in this case the perl 5 interpreter, though I know the internals of Python are as exposed as perl's though they are rather cleaner internals than what perl has) that there's no way to fake it with a few macros and some preprocessor slight-of-hand.
Still, that's OK. We knew that perl's XS modules would be a casualty of the switchover, and were willing to accept that. And we never had any expectation of python or ruby C code making it over. (Though we'll probably be able to do something similar for them--you never know, and neither do I as I've just not looked deeply enough yet)
Since this has come up, and we're seeing more exposure in more places lately, it seems worth taking some time to lay out the history and purpose of Parrot. This stuff is all bits and pieces that are kicking around the 'Net and the Parrot docs, so nothing here is new, modulo any failings of memory, but...
The History Part
It all started at The Perl Conference 4. This was in summer 2000, and I'm not sure if it was an OSCON by that point, but that's irrelevant. There was a morning meeting of the perl 5 porters, the group of folks responsible for maintaining and extending perl. It was sort of a pre-meeting meeting, as the 'official' (which is to say, scheduled with a meeting room handy) meeting was in the afternoon. Apparently1 people, including Larry, were working on the standard brainstorming group session that results in walls covered in pages of notes and ideas. Useful, but nothing earth-shattering. Or earthenware shattering, for that matter.
About halfway through, Jon Orwant walked in, and threw what has been described as the most tightly controlled tantrum that anyone has ever seen. He also threw mugs at the door, one mug per word, just to add a bit of emphasis. The words were something like "Perl is dead unless you do something big". The last mug pitched at the door shattered as a bit of good timing, Jon left, and everyone there started thinking.
At the afternoon p5p meeting, it was announced that Perl 6 was starting, and we were going to to something fairly radical. Jobs (language designer, internals manager, PR hack, Corporate liason, project manager, QA, and documentation wrangler) were handed out to the folks there who either wanted them, or were best qualified of the folks who didn't not want to do them. Which is how I ended up with my job, but that's another story.
The public comment and design phase started then, and has been more or less continuing ever since. But at that point I started sketching out designs for the new engine. Nothing solid, since we didn't know what Larry had in mind for perl 6, but I did know what about perl 5's internals I hated, and the places it got in the way, so there was at least a starting point. The design progressed, albeit somewhat slowly, since I didn't want to start committing to a design for the internals until I had some idea of what functionality the language needed. Note that, at this point, the project had no name particularly, and was focussed entirely on perl 6.
The Perl community has a tradition of April Fools jokes. Some years they're quite good, others they're pretty understated and not impressive. (This year, FWIW, the joke was the hostile takeover of CPAN by the Matt's Script Archive folks, but that's another story) In 2001, Simon Cozens perpetrated a doozy. The gag was that Larry Wall and Guido van Rossum (the designer of Python) were burying the hatchet and designing a new language, Parrot, that would combine the best features of each language. Or the worst, if you were into that. There was a fake interview, some of the tech folks on both sides were in on it, and there was even an O'Reilly book announced, "Programming Parrot in a Nutshell". (Which is still in their online catalog) A good gag, very well executed, and if April 1st wasn't a weekend I think we would've had an amazing fit from a lot of folks. Ah, well.
Anyway, design on the perl 6 engine was still going on, but one thing that Simon and I both realized independently was that the engine we were designing really was suitable for pretty much any language in the same class as perl. (Dynamically typed, mostly OO, "scripting" language. Python, Ruby, and (I think) Tcl all fall in this category) Larry'd jammed an awful lot of functionality into perl 5, and all indications were that even more stuff was going into perl 6. It's not so much that we had to add things for Python or Ruby so much as perl was a proper superset of them. By the time TPC 5 rolled around in 2001 we were both convinced we could do it. We got together and talked at TPC 5, some other folks in other language communities (notably some of the python folks) were interested, and so not long after We announced our Master Plan. Of course we had to call it Parrot, because how could we not? Life was, in some ways, imitating satire, and that's the sort of thing you just have to go along with.
The first big public unveiling of Parrot was at the first Little Languages workshop that was being held at MIT. (And I'll note for the record that the only reason they'd heard of us was that I worked a few blocks away, and came up for some of the talks the Dynamic Languages group gave, and at one point Ben Stuhl and I spent the better part of the afternoon talking to Erik Kidd about efficient multimethod dispatch, as Ben and I missed the announcement that the talk was postponed, and showed up anyway. Which was really useful) Simon and I both gave presentations on aspects of the Parrot project, picked up a few things (like the fact that Ruby does continuations, which is why Parrot has them) and generally had a good time. Things have pretty much progressed from there.
The Up-front Part
Parrot started as the code to run perl 6. That's what got the project in motion, that's the community where we got our first developers from, and that's what's driving a lot of the development. Perl 6 also needs us--we're the engine. OTOH, Python, Ruby, PHP, Z-code, Befunge, Forth, BASIC, C#, and all the rest don't need us--they all have their own engines and system. If Parrot went away, Perl 6 would be screwed, while Python would just chuckle. (Only in the nicest possible way, I'm sure)
However...
Our mandate, such as it is, has gotten rather larger than it used to be. It is part of Parrot's mission, for various reasons, to run Python and Ruby code. While Guido and Matz don't have much control over what we do, it's not like Larry's got that much either. (Though he does have a bit more, but only because he's committed to using Parrot. If Matz or Guido made the same commitment they'd get the same say)
We also are specifically shooting to be a good general-purpose dynamic language engine. There's a lot of research and tinkering going on in the field, but folks are stuck either writing their own back end or targetting a decidedly non-dynamic back end, such as the JVM or GCC. That strikes us as silly, and since we're shooting for Python and Ruby anyway, well, it falls out nicely. This doesn't conflict with the need to support Perl 6, since it's a dynamic language.
What does this mean to you, the non-perl language designer?
Honestly, not much. While we're not going to make any decisions that penalize perl, neither will we make any decisions that penalize any other language in our class. We want to run Ruby and Python code well, and if someone wants to take a shot at something like PHP (well, OK, someone wants to help out Sterling Hughes, who's already shooting at it) great. Not only am I all for it, if there's stuff you want that we don't do, or do awkwardly, let us know and we'll do what we can to accommodate. Many of the dynamic features we already have or are working on (dynamically loading opcode libraries, pluggable bytecode loaders, and suchlike stuff) lends itself well to add-ons--worst case we don't do what you're looking for so you just go and write your own opcode library without uus and use Parrot as a glorified memory allocator and runloop. That's OK, we don't mind. :)
I'm really big on making things better, not worse, and more rather than less open. So while I won't compromise Perl's performance, I won't do it at the expense of anyone else's performance or ease of use. And when engineering tradeoffs arise, as they always will, well, we do the best we can, and who loses depends very much on circumstance. Might even be perl, depending on what the issue is... one never knows.
1 I say apparently, as I wasn't there--I was teaching that morning.
Software design is one of the very few places that, unfettered by physical and most outside constraints, we can make our tools act the way we think they should work, rather than the way they have to work. This is probably why our tools suck so badly.
Update: Bah, I really ought to edit things more before sending. This would be better expressed as "Software design is one of the very few places where we can make our tools do exactly what we want them to do. That's probably why they suck so badly."
I've received 3333 copies of the big@boss.com mail since January 9th. 13 so far today, and 25 yesterday. Doesn't anyone ever update their damn virus protection? (Yeah, I know, rhetorical question...)
One of the things I'm finding that annoys me the most about working offline is dealing with multiple change sets. At the moment I'm sitting in a coffee shop, blissfully internet-free, hacking away at parrot and work. That's cool, except I'm making a bunch of small, reasonably functionally independent changes, across multiple files. Unfortunately the changes affect the same sets of files in many cases.
What I want to do is be able to issue a CVS command that says "mark the current state as a snapshot/new base" and be able to commit the changes made for each snapshot marker separately. That way I can commit, with proper comments and an isolated set of diffs, the can/has changes, the half-stack changes, and the thread-safe queue changes, for example. Which would be nice.
Alas, they're going to get to go in with one big lump. Bleah. I wonder if subversion will let me do this. (Maybe with a local repository and change sets--I think BitKeeper'll do that, but I don't want to deal with the issues there. Subversion, though...)
Hadn't planned on this, but since I've been mulling over for the presentation at tonight's Boston Perlmongers meeting, I figure it's worth posting.
Consider, for a moment, the humble method call, the building block of any OO system. To wit:
SomeObject foo;
result = foo.bar(12)
Now, you might think that's simple, and in a statically typed language it is. You know at compile time what the type of foo is, you know where in its list of methods bar lives, and probably even can precalculate the multimethod dispatch if your language does that sort of thing, so the final executable at best needs to fetch the class base pointer for foo's class (which it knows at compile time, and thus might even be resolvable by the linker) take an offset to find the bar method (who's offset you know at compile time) and call it. If you've a horrible runtime or linker it's at worst a search for foo's class, a pair of pointer fetches and an addition, plus the ultimate method call. If you have a good linker or runtime (so you know the offset of foo's class in your master class offset array) it's two adds and two pointer fetches, plus the method call. If you're really good, it can all be resolved at link time so there is no runtime cost at all over the method call.
Generally we're looking at the good, but not best case, since if you're going static you might as well do it right but you don't want to touch your linker, so figure in most cases making a method call costs two pointer fetches and two additions to find the method. That's not bad. I'd love to be able to do that with parrot.
Alas, not, because all our method calls are essentially this more interesting thing:
$someobject $foo;
randomstring $bar;
$result = $foo.$bar(12);
Now, let's throw one more monkeywrench in the works here, since $foo's class can get in the act and decide how (or even if) to dispatch $bar, and can do it differently every time if it wants.
Given this, the only sane thing to do is to punt and delegate the method call entirely to the object. That means the degenerate case is a pointer fetch, addition, function call, and insanity as the class code does bizarre things, but we won't worry about that, since you can be degenerate in your own time.
The common case, then, will be a pointer fetch, addition, function call, then method lookup. And that method lookup's the killer. This is also the best case, and the only way to make the speed acceptable is to have a fast method lookup on the back end.
This back end lookup is nasty for a number of reasons, not the least of which is the fact that the inheritance hierarchy is mutable at runtime, the methods in the various classes in the tree are mutable at runtime--not just their bodies, but their very existence--and we may well end up having to redispatch so a method body that satisfies our lookup may not be the end of it, and we may need to keep on going. (Which is so much fun... though really useful)
Anyway, for that to all work out means we need a lot of support infrastructure. Method caches, a notification and event system so we can invalidate those caches, and fast optimized code available for the normal case, so most of the flexible stuff can just be tossed because we don't need it.
Hrm. This is long, so I think I'll go on about parts of this later. But I have decided that to support the common case, which is calling a method that's a compile-time constant name, we'll have a method call op like
callmeth Ix
as well as the
callmethform (Since remember that parrot's calling conventions specify the method name in one of the string registers, so we don't have to put it in the instruction stream) where in the first case Ix is the hashed value of the method name, using parrot's default provided hash scheme. At least that way we don't have to recalc the hash every time to go look up the method in the cache...
We probably should have a
callmeth Px
in whatever PMC register might be free for this, in the case where the method PMC can't change, so we don't have to go look it up. Perl couldn't use this in many cases, but something like C# might. (Though, arguably, not when calling into perl code, but we could have some method/object property check to see if things are runtime fixed. Hrm)
I've been struggling with objects for Parrot for quite a while, as many people on p6i will attest to. I think I've sort of got it, but since writing down a mild draft of what I'm thinking about often helps (and isn't really suitable for a PDD) I figure I'll dump out here, and see how things go. This is also useful for folks looking at how we're trying to do OO stuff.
The first important thing to realize is that up until recently I didn't do much OO work, at least not "real" OO. Sure, data encapsulation and indirect function pointers and vtables and such, but all very explicit, so this is something of a new(ish) thing. That and my first intro to OO was ages ago with C++, and then to Object Cobol. I still bear the scars.
Parrot's also in an unusual position where it has two major OO systems it needs to deal with--the perl 5 style "anything goes, go for it, good luck, mind the bear traps" of objects, and what I'm told is a more traditional object system, along the lines of Java/C++/Ruby/Python. And each type needs to inherit from the other. With multiple inheritance. And multimethod/signature based dispatching. Oh, and lets not forget interfaces! At least on the back side I can count on their being classes, which is something.
Needless to say, this is something of a challenge. So, to help me deal with it, I've tried to partition it into pieces, so I can deal with each piece in turn. The pieces are:
There are probably more, and I expect I'll add to the list. Heck, that'll help partition things, which is good.
Anyway, this time around I want to talk about the first point, using objects. This is for code which treats objects as opaque things. There's no knowledge of the internals of the object--it's a thingie in its own right. (Yes, I know, some systems let, or even encourage, you to peek inside objects. Lets not go there at the moment, that's a different class of code from what I'm talking about)
User code needs to be able to call methods on objects. They need to get and set properties (which isn't, strictly speaking, an object thing, but...), get the property hash, get a PMC for a method for later calling, and methods have to override properties of the same name, so if you get a property by name and there's a method of that name, you get the result of that method being called with no args. Well, it can, but it doesn't have to, strictly speaking. (Some languages may decide not to do this) Being able to get the class identifier for an object's a darned useful thing as well. We also need to see if an object is a member of a class, implements an interface, or has a method of a particular name. (And yes, we could fake up the name lookup with a fetch of the method PMC and check for failure, but we're not going tt)
This list, luckily, is reasonably short. Parrot satisfies, or will satisfy, all of these requirements though PMC vtable entries. (Which also means that any PMC could, potentially, act as an object. Which is kind of cool, when you think of it, if you're really fond of objects. Or not at all fond of objects but hanging around with people who are fond of them)
To do this, Parrot needs the following vtable entries:
Plus versions that take keys, in case people do things like @foo[12].bar(). I don't expect that'll be too common, though I do think code that looks like %commands{$command}.run(@params) will be, if for no other reason than that's the sort of thing that I tend to do.
That, as they say, is that. No knowledge of the internals of anything's needed.
Pity it's not quite enough to actually implement anything, since without a standard class system it's a bit fuzzy. I think that's it, though.
Listening to news reports and conversations around here (here being the northeast US) is really very bizarre some times.
For example, I've been hearing a lot lately about how the war is the real reason the economy sucks at the moment. It's either war jitters, or the lead-up to war jitters, and in fact the reason that things were so bad from, say. fall of 2002 until we started the big kaboom was people worried about the war.
Erm... I don't think so. That's certainly not how I remember things. It's a great after-the-fact excuse "Oh, people were worried about the war!" but what people were really worried about was an economy that had turned to mush, large and continuing job losses, and the cleanup after a massive amount of corporate fraud.
Blaming it on the war is a good way to dodge the responsibility (Though as we've seen, Iraq wasn't a threat in any meaningful way, which we knew, so the timing of the war was far more flexible than people think), but it doesn't address the real issues--our economy sucks, and it's not getting dealt with. Remembering a fantasy of the past is not going to help, since it means people won't act based on the realities, and acting based on a fantasy of what you wanted to have happen is always a dangerous thing.
Or at least are out of date. I am working, doing long-term consulting for WebEvent, a company in Andover Ma that does web-based calendaring stuff. They're cool people, and we're writing good code. (I'm working on the back-end DB/library code for the next version of the product, which is interesting. It's nice working on a small project. Relative to Parrot, at least....)
Which is the question that's been wandering around behind the scenes. Posting the closure and continuation entries has lead me to a bunch of places on the web that I'd never have gone to otherwise (it's weird the places things get noted) and amongst the .NET folks there was some noting of "Well, what about S#"?
S# is a version of Smalltalk for .NET, by David Simmons of SmallScript fame.
Now, David's a darned smart guy, and he's done a lot of interesting stuff. I've heard him speak at a number of conferences, most recently at OOPSLA '02, where he talked about the stuff he's done to get smalltalk running on .NET. Which he's done. But...
Most of his talk was how he subverted .NET with add-on code to actually do what he wanted. The S# compiler doesn't generate code that will run on a stock .NET system--you have to have the add-on executable library pieces that he wrote to get around the limits of .NET's design (some of those limits were intentional, which is fine) and S# programs won't run as trusted code because they call out to DLLs outside the .NET core.
Basically, he cheated. Which is fine. I rather like cheating. But is it really pure .NET code?
The same is true of Java. If you use the Swing UI, is it pure Java? Swing sure as heck isn't written in Java. It's C.
And the same is true of Perl. DBI is terrifically cool, as a database interface, but does code that uses it count as pure perl? DBI, and the DBD drivers that connect to specific databases, is written in C.
Where do you draw the line? (Is it even a meaningful line to draw?) What' counts as pure? Is it pure if the only non-language code you execute is the VM? Is it OK if the only non-language code you execute ship with the language runtime? Is it OK if the only non-language code you execute is just written by someone else?
Does using SDBM_File (which ships with perl) still leave you with code that counts as pure perl? How about DBI, if you installed it from CPAN? What about using Inline::Python? Is it still pure perl if you're yanking in the damn python interpreter?
I dunno. I'm not sure I actually care, so much as care whether the code runs under certain circumstances. If it's a standalone "executable" for a VM, the question is more whether it works under a certain mode (trusted/untrusted) or across platforms (in which case embedding x86 code's an issue). But still... there's always that call, no matter how faint, for "pure X code!"
Oh, and for the record, since I mentioned his name, I do owe David an apology--I told him at LL1 that I didn't think that a portable, cross platform JIT was possible. He disagreed,and turns out he was right. Sorry, Dave. (so I suppose he might pull this one off, but it didn't sound that way at OOPSLA....)
Like I said before, polling sucks rocks. Can't stand it, and I consider it an indication of a bad or badly thought out design. In this month alone (and it's only 7.5 days old, more or less) there've been 6273 requests for the index.rdf file, of which 4631 got 304'd, from 305 unique IP addresses. While that's flattering, it's also insane--there's no reason for all those queries. I just don't write that much stuff. And who knows how many of the page requests are from folks checking to see if anything's changed. And I can't imagine how many bits get flung across the wires for just to find that Boing Boing hasn't been updated. Though, given how obsessively Cory seems to update the thing, perhaps most of them do get new stuff. Still, how much of the feeds is duplicated?
The system as it stands also doesn't serve readers or users of aggregators that well. I know they suck for me. I read a few blogs and comment occasionally on others. What I want, as a user, is to be pinged when what I care about changes--a posted trackback, or comment or blog entry. Sometimes all three, but often just one. I can't do that without regularly polling (and in some cases not really at all) which is annoying too.
So, on the provider end, it sucks. On the consumer end, it sucks. Arguably sucking less than not existing at all, but still... there's much suckiness to go around. That bugs me, and I find it troublesome.
What'd be better? I think an INN/newsfeed sort of system to transport the ping information. I'm not going to deal with the actual data, since there are a while host of legal and technical issues involved there. Going with a feed system, though, adds in intermediate transport hosts, and initial upload hosts, plus potential distribution and authentication issues. Which I think I have solutions for. More complex than the current system, but robust and lower overhead, too.
I shall dump out the details in a bit, give or take some.
Someone posted a comment asking why, if we're doing closures and continuations in perl, we didn't choose a Lisp VM? (Or, by inference, a Scheme VM)
That's a darned good question. And, given that I'm pretty close to MIT as these things go (it was in comfy walking distance of where I worked for years) I do get asked it a lot.1 This wasn't as common a question for perl 5, and I presume python and ruby, as the language developed slowly and in concert with its interpreter.
The short answer, which made it into the Linux Magazine article (though not yet online--it's in the April 2003 issue, which just hit the newsstands around here) is that I didn't know about the Lisp and Scheme VMs when I started all this. Which is very true, and ultimately the real answer.
A more interesting answer is to the question "If you could do it again, would you choose a Scheme VM?" In this case most likely the Scheme-48 VM, as a number of folks who I respect speak very highly of it.
The answer to that, though, is still No. The reasons for that answer are the interesting ones. If you're a big Lisp or Scheme fan you may want to stop now, as I'm likely going to offend.
Since I'm responsible for making sure Perl 6 runs, I have three big concerns for any solution: functionality, portability, and supportability.
Functionality's the obvious one. If the chosen solution can't run the perl 6 Larry designs, it's no good. The language design is affected to some extent by the back end, and there have been features that have been changed because of features and limitations of the VM, but ultimately if Larry wants it I have to support it and, more to the point, I have to make it fast. That's fine, it's not like Scheme-48 is slow, but the design decisions that drive it are based on the needs of Scheme. Perl's needs are somewhat different--there's that whole "syntax" thing if nothing else. Sure the VM is turing complete, so everything I need for perl is doable, but doable isn't the issue. It's doable quickly. And I'm not sure there's a good match there. Could be wrong, of course, but if it came down to it, changes to Scheme-48 (or any Scheme or Lisp VM) that favored perl at the expense of Scheme or Lisp would be rejected, as they should be.
Portability is a second issue, but it's the one I'm worried the least about. This is, after all, Lisp we're pondering--if a piece of hardware had a two bits, a program counter, and an accumulator someone ported Lisp to it.
Supportability is the third issue. This is the big one, and by far the biggest killer of the deal.
For this to work, I need to support it. And I need to have enough folks in the perl community willing and able to support it. A good working knowledge of Lisp or Scheme is generally absent from the Perl community, and amongst those folks who do have an adequate knowledge of it the general feeling about Lisp is similar to the one I have when I find things with suckers or eyeballs on my pizza. (Which is fair enough, as many folks feel the same way about perl) It's possible that working on a Scheme VM wouldn't require Scheme knowledge, but... I really doubt that.
The development community backing Scheme and Lisp is also much smaller than the one backing Perl. Granted, there's a much higher percentage of people willing and able to bang the metal amongst Lispers than Perl folks, but still... numbers count. I don't need that many people to work on the engine, and I never expect more than five folks competent and active at any one time on the back end, but that core needs to be there, and it's going to have to come from the perl end of things, since I can't count on it from the Lisp end.
So, we have our own VM, one tailored to our needs, which is just fine. There's nothing wrong with another VM in the world--the world is, after all, a large place and there's plenty of room. And it's not like the Lisp or Scheme folks need anyone to choose their VMs for other projects for any sort of external validation.
1 I presume if I was in other places they'd ask the same about one of the Modulas, ML, or Haskell
If you haven't yet, read past this entry to the next one discussing continuations, then come back. Don't worry, I'll wait.
Now that you've done that, let's talk about where continuations make things interesting for VM implementation.
Since continuations are closures, that means that allocating variables on the stack is somewhat problematic--it's effectively impossible with a true, one-chunk, contiguous stack. It can be done if the system is using a linked call frame system rather than a stack, if the frames are garbage collected and there's proper copy-on-write magic added in the right spots, but it's generally easier to just go with a full lexical scratchpad deal.
Call stacks in general are a bit of an issue with continuations, since we may need to put them back in place at some point. Systems like .NET and the JVM use a single contiguous chunk of memory for their stack. As we talked about earlier, that makes calling and returning from subs faster. But to take a continuation with a system like this means stopping and making a full copy of the stack, since we may have to put it back later. With a call frame system, if there are only backlinks all you need to do is grab a handle to the current frame and you're set. Either way you need a garbage collection system of some sort to clean up after the stack chunks.
The need to copy the stack is the big killer here. Ignoring everything else, the need to snapshot the stack immediately kills any possibility of using the system control primitives with .NET and the JVM, since you just aren't allowed to do that. (You could, if you were clever, write an extension to either with the C interface they have, but that immediately makes your code platform and interpreter dependent, not to mention unsafe. And really, really dodgy, since this is one of those things you're just Not Supposed To Do) If you can't use the system control primitives it means you need to do it by hand, which is very expensive. It also makes jumping across 'real' code (that is, code that runs normally on the JVM or .NET, rather than with your new wacky control primitives) dicey, since a continuation can't be passed across the 'real' code.
FWIW, that's a problem with Parrot as well--continuations can't pass across a parrot->C->parrot boundary, since we can't restore the C stack. That's less of an issue, though, since that sort of transition's not as likely, as code generally stays on one side or the other of the C fence, and so there won't often be reasons to pass continuations across it.
Languages that just take complete control of a system don't generally have this problem as you never leave them, but that's a matter for a different day.