Tail calls, and their more restricted brethren tail recursion, are one of those clever optimizer tricks that were invented many years ago, then promptly forgotten by most of the mainstream language implementors. (Naughty them!)
Consider, for a moment, this simple function:
sub foo (int a) { a += 15; return bar(a); }
Not a big deal at all--takes a parameter, adds 15 to it, then returns the result of calling bar on the new value of a. The important thing to notice here is that the very last statement calls a function then immediately returns the result of that function.
Now, think of how code's generally generated. When you enter a function, some stack space is allocated. When you exit from that function, the stack space is deallocated and a value is generally returned. So the sequence of events looks like:
STACK ALLOCATE
(do stuff)
call function
STACK ALLOCATE
STACK DEALLOCATE
RETURN WITH VALUE
STACK DEALLOCATE
RETURN WITH VALUE
There are several interesting things here. If you look at it, the stack space that foo has isn't used after the bar function is called. While this isn't a huge amount of space, it is space, and space that isn't needed. If it was freed before bar was called, your program would be a little more space-efficient. For one function this isn't a big deal, but imagine what happens if you're 200 levels deep in a call (perhaps because there's some indirect recursion) and 100 of those levels look like return somefunc();
. That's 100 chunks of stack that are being used for no reason. If your stack only had space for 199 chunks, not using those 100 chunks of stack would mean the difference between your program running and your program crashing.
There's also that pesky "return what I just got" action. In our example all foo does at the end is collect the return value from bar and then immediately return it. The act of collecting and returning the value is essentially wasted time. If somehow foo could make itself go away so bar returns its value to whoever called foo, we'd save the time it took to do that extra value collection and return. Is this a whole lot of time? No, not really, unless there's something really bizarre about the language you're using. But even if it's only a few microseconds, over thousands of calls it can add up.
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form return somefunc();
into the low-level sequence pop stack frame; goto somefunc();
In our example, that means before we call bar, foo cleans itself up and then, rather than calling bar as a subroutine, we do a low-level goto operation to the start of bar. Foo's already cleaned itself out of the stack, so when bar starts it looks like whoever called foo has really called bar, and when bar returns its value it returns it directly to whoever called foo, rather than returning it to foo which then returns it to its caller.
This, simply, is a tail call. Tail because it has to happen as the very last operation in a function, or as the function's tail operation. Call because that last operation is calling a function. Making tail calls can sometimes (though not by any means always) require support from your platform's calling conventions, linker, and runtime library. It also needs a non-trivial amount of effort in the compilers that support it.
There's also a more restricted version, called tail recursion, which only happens if a function, as its last operation, returns the result of calling itself. Tail recursion is easier to deal with because rather than having to jump to the beginning of some random function somewhere, you just do a goto
back to the beginning of yourself, which is a darned simple thing to do. That means code that looks like:
sub foo (int a, int b) { if (b == 1) { return a; } else { return foo(a*a + a, b - 1); }
gets quietly turned into:
sub foo (int a, int b) { label: if (b == 1) { return a; } else { a = a*a + a; b = b - 1; goto label; }
What was a recursive function, chewing up lots of stack gets turned into a loop that chews up just one stack frame. And the compiler can do it as a simple under-the-hood transformation that needs no special knowledge of any other function's internals, or special function entry points or anything. This is a much easier piece of source transformation than full-blown tail calls for many languages, so it's more likely (though still not common, alas) to be implemented in a compiler.
Now, if you were taught to program using one of the more common procedural or OO languages, you were probably warned to avoid recursion because it was very easy to blow your stack out and die, which it is. If your compiler supports at least tail recursion (if not full tail calls) though, it means some of the recursive algorithms that used to be unsafe are now perfectly safe, since you're not allocating a chunk of stack for each level of recursion, which is a Good Thing. This can make some problem solutions easier, since you don't have to do the goto-style transformation yourself. (Especially since everyone these days seems to have a pavlovian-conditioned aversion to gotos--a shame since they're quite useful in a limited range of places--so you'll almost inevitably do bizarre contortions with infinite loops or something equally awkward)
How big a deal are tail calls and tail recursion? In a language like Lisp where just about everything requires a function call, and it's really easy to have a dozen or more functions in progress just in your code alone (not to mention the library functions your code calls, and the library functions they call) it makes a huge difference. Doing tail call optimizations are also really easy for Lisp, just because of the way the language works and is structured. Once you know about tail calls, it just falls out of the compiler, which is cool.
In procedural and OO languages the benefits are a bit smaller, since there are generally fewer function calls and more code executed between function calls, and many of the returns don't return just the result of a function call, but even then it does make a difference. Stack frames for procedural languages are usually larger than Lisp frames, since you have more stack-based variables--a C or Java stack frame may well be 10 or 20 times larger than a Lisp stack frame (since C and Java have stack-allocated variables, something Lisp generally doesn't have), so even if you can do a tail call optimization 5% or 10% of the time you'd save as much as you would for an entire Lisp run, give or take a lot of statistical handwaving. They probably won't happen that often (I'd bet the number's more like 1% or .1%) but still, a win's a win.
Is there a downside to tail call optimization? Well... no, not really. The only place it causes a problem is if you have a language that supports introspection sufficiently to allow you to inspect the call stack, as you'd not see any of the functions that tail-called out of themselves, but this isn't generally a big deal--the space and runtime win are considered worth the hit to introspection capabilities. They also put a bit more complexity into a compiler, which is likely already a pretty damn complex piece of code, so there's a burden on the language implementor, but if you think about it at the beginning it's not too big a deal to implement. (And if you have a continuation passing style of calling functions, it turns out to be essentially free, which is really cool, though the topic of another WTHI entry)
So, tail calls. Your friend if you're a programmer, and a darned useful thing to implement as an optimization if you're a compiler writer.
The true Dr. Evil runs Nintendo.
I picked up a Gameboy Advance SP last night, along with a copy of Pokemon Ruby. Peregrine's got an original Gameboy Advance we got him for solstice last year, and since we've got the big flight to Portland in a few weeks, we decided to get a second Gameboy and both Ruby and Sapphire so we could link up and play on the plane.
Now, I kinda like Pokemon. The comics amuse me, I find the show funny, and the original game for the Gameboy was pretty engrossing, in a cute and mostly inoffensive way. But this thing... I got the system last night at about 7 PM, and got back to the hotel at about 7:30 or so. I have, so far, logged more than four and a half hours on it.
Evil, I tell you, pure and simple from the 8th dimension!
(The GBA SP system itself, though, is amazingly powerful and very cool in a geeky way, but that's another topic entirely)
Or maybe #2, I'm not sure. (Does he own much Starbucks stock?)
How do I know this? Well, the thing that finally sucked me into the iTunes Music Store was Quincy Jones' Soul Bossa Nova.
Curse you, Dr. Evil! :)
For one day, at least, but better than nothing! :) I'll be there on Friday the 25th, and flying out the evening of Sunday the 27th, so I'll be there for some social stuff at least. Wheee!
I want to get this down before I forget, since it's terribly handy.
Peregrine's birthday was a few weeks back and, for his cake, he wanted a seascape scene on it. Easy enough to do with some little dolphin and whale figures we picked up at A C Moore (it was something like $1.50 for a sleeve of eight) and some colored frosting. Unfortunately I'd had rather horrible luck a few years ago trying something similar--getting enough food coloring in the frosting to get a good sea blue means that all you taste is that nasty chemical colorant flavor. Definitely not good eats.
I'd considered a three-coat approach for the cake--crumb coat it, frost it with normal frosting, refrigerate it until it was a nice hard-set, and layer on a thin coating of colored frosting. Doable, but a fair amount of work, and it meant either making two batches of frosting or warming up the remnants of the first batch and frost with it. Two batches is pretty wasteful (It's a nice buttercream, but that means a pound of butter and 6 egg yolks per batch) and I'd had horrible luck rewarming frosting, as it always curdles on me. Yech.
So, the sensible thing to do was use a colored powdered sugar sifted onto the top of the cake. I made a positive and negative stencil (There was a wave--sea green on the bottom, sky blue on top) out of some parchment paper and set out to make my own superfine sugar. The stores around me don't carry it, and I didn't want to use regular powdered sugar because it has corn starch in it that leaves an odd flavor. (It's OK when used in things or mixed with a liquid, but this wasn't and the flavor'd be obvious)
Making superfine sugar's simple--start with regular sugar and abuse the heck out of it until it was a powder. My first attempt was with the food processor and, while it did a little, there just wasn't enough blade contact to make it work well. The sugar tended to fly around and not actually get smacked with the blade.
My next attempt, though, worked remarkably well. I dug out our bar blender, dropped in some sugar, and hit the pulverize button. And pulverize it did--in nearly no time I had nothing but a very fine powder. The tall narrow shape of the container kept the sugar at the bottom, in contact with the blade, and it worked great. A couple of drops of color, another 30 seconds or so of abuse, and I had what turned out to be far too much colored sugar, though that was OK by me. (A quarter-cup goes a very long way, further than I expected. I think I used maybe a tablespoon of the resulting powder, and the stuff increased in volume as it got smashed) It worked really well, and I was very happy.
As an added bonus, the color got more intense as the sugar soaked in some of the liquid from the underlying buttercream, as well as getting a slightly mottled appearance from the slightly uneven thickness of the sugar. Turned out darned nice looking.
Important Art Tip for this: The good paste frosting colors are all oil-based (food oils, don't worry) and don't blend in well with the sugar. It works OK, but you get flecks of darker color where the paste hasn't mixed in well. A nice effect, but one you should get because you want it, not by happenstance. Liquid food coloring works much better and, while it's not great (I used some McCormick liquid food coloring we'd had around because of a plumbing problem we were tracking down) it works very well in this application.
FedEx dropped off a nice surprise package on my stairs today--a copy of Perl 6 Essentials, in the flesh. Or, rather, paper. Very, very cool.
Oh, and the cover animal is an aoudad, AKA a Barbary sheep. No, I have no idea what significance that has. :)
Well, I spent the weekend digging into some of the pending Parrot work, of which there is far too much that I bottleneck. (Important project management tip for folks doing project management--identify those places that one person is a bottleneck for, and dive for those areas as quickly as you can. Places that anyone, or other people, can work on can't get done until you do. And Delegate, dammit!) Anyway, one of those places that's pending is the python bytecode stuff. Yeah, I know, it's not as high a priority as, say, objects or exceptions, but given that Larry's not specified either yet it makes sense to tackle bits that don't block on him. Besides, I really need a handle on Python's object system before I finish Parrot's, since I don't want to be building two different object systems if I don't have to.
The reason for this, of course, is the up and coming Great Python Pie-throwing Contest, coming soon to an OSCON 2004 near you! Or near Portland, Oregon, at least, since I have no idea where you are.
So, after an hour or so of digging through with google and the python.org search facilities, I located some reference docs for python's bytecode. (Link so I can find them again later) Oddly, in the 2.1 maint docs, but not in the 2.2 docs anywhere. (Dunno whether it was an oversight or on purpose) The docs are, as you'd generally expect, fairly terse, as this stuff tends to be, since it hangs on knowledge of a whole lot of other stuff--if you know the other stuff you only need a terse description in the docs, and if you don't then no amount of documentation is likely to help.
Reading through assembly documentation (which bytecode is, really) is always an interesting undertaking. All assemblies have quirks, oddities, and other assorted things that make you go "Hmmm...." built into them, as you find things that aren't done the way you might think. They're often a result of other limits (many older CPU assembly languages were downright odd because of the need to save transistors), the personality of the designer, or hasty decisions, or dead-ends accumulated over time, and finding those is fun.
Python, of course, is no different. The bytecode has a bunch of standard stuff--stack rotates, math, and binary operations--but it's got some other oddities. The four-item stack rotate, for example, is kind of interesting, and a bit unusual. (Generally you have top-two swap and top three rotate, and if more's needed there'll be a general-purpose "rotate N" op) PRINT_NEWLINE, which prints a newline, is also kind of interesting, as having an op dedicated to printing just a newline's not common. (This isn't normally a time-critical operation, but it does have the advantage of being able to emit a platform-native newline, without having to translate a linefeed character or \n escape sequence, so it's a niftyish idea)
Then you get to the places where it became obvious that a limit was hit. The python bytecode apparently mandates 16 bit parameters. That's OK for most things, but the EXTENDED_ARG op, which holds bits 16-31 of the argument for the following op (making it a 32 bit argument split into two pieces) shows those places where it wasn't quite enough. (I'm wondering whether I can safely merge the two args together when I generate parrot bytecode--I can see Really Tricky Code branching to the op following the EXTENDED_ARG op in some cases, either to have a variety of top-16 bits or to take it as either a 16 or 32 bit op, but I'd bet that's not going to happen, as the Python folks tend to violently disagree with such things, which works for me)
You can also see how deeply various data types are welded into the interpreter. The tuple (Basically a fixed-length list of items) and the hash (sorry, dictionary) is pretty fundamental to python, while objects are... well, objects are an afterthought. At least that's the way they look from under the hood. The interpreter does support them, but there's barely more than what Perl has under here. I'm not a Python Guy, so I can't speak for the language support for objects, which may or may not be more significant. (Whether it is depends on how much the person explaining them likes Python) You certainly don't need anything fancy under the hood to support a fancy model--heck, it's all bits fed to the CPU when you get to the end of it--so I don't much care. A simple model actually makes it easier on me.
And, of course, there's the hackish stuff. CMP_OP springs to mind--this is an opcode that takes the name of the comparison operation as a parameter, then looks that comparison name up in a table somewhere and does it. That's... interesting. On the face it seems rather excessively indirect, but as I don't have the python sources at hand I can't say for sure that it's silly. I suppose there could be an extendable comparison table facility to allow the addition of new operators, but that doesn't seem too Pythonish. There's also the LOAD_CLOSURE op which takes a single integer parameter that indexes into two name tables simultaneously--if the offset is within the first table (the co_cellvars
table) it's that name, and if it isn't it's an offset into the second (co_freevars
) name table (after subtracting the number of items in the first name table) and it looks like anything else that references the cell and free variable store does this too. I suppose if nothing else it'll keep you from adding in new cells to things at runtime...
Anyway, looking at this all as a whole, it should be pretty trivial to get parrot to run bython bytecode, at least once the bytecode's preprocessed to turn it into parrot bytecode. At the moment the big issue is how much work should be done on all of it. Parrot, of course, is a register machine, while Python is a stack machine. It'd be dead-simple to have a bunch of python-specific ops that work just on the stack, but that's not the way to speed, honestly. It's even possible (though not particularly likely) that the python interpreter would actually be faster in that case, though honestly I doubt it. Might do it that way anyway, just to see, though. Handling the quirky bits shouldn't be a problem at all.
The bigger question is one of integration. The big reason I'm doing this is to get a chance to hurl a pie at Guido (well, probably Guido) which I've found to be rather motivating, but I'd rather this not end up as a one-off hack. (Ignoring the stack analysis code that'll result from it, though that's hardly trivial) What I want is for this to develop into a real, integrated python bytecode translator and interpreter, so Parrot can execute Python bytecode alongside other Parrot languages. Building a Python compiler to target Parrot would be the best way, and I suppose someone'll do that, but there's a limit to the motivating power of pie. Besides, there's a fair amount of bytecode around, so it's certainly a sensible thing to do.
The timeline, FWIW, is that at the Python Lightning talks at OSCON 2003 Guido and I will formally announce the details of the contest and the rules. (Which still need to be worked out, though we've a rough idea of what we're doing) Sometime by late December 2003 the bytecode for the test program will be generated, and at OSCON 2004 we'll have the actual face-off, followed by pie at 10 paces. (Mmmmm, pie!) While I expect it won't be a hugely exciting thing, I'm sure the video of the pie will haunt someone for years to come. Which should be fun.
(Yeah, yeah, I know... "Them's fightin' words!" Well, duh! Fight's already started :)
I've seen some other folks crank about this, but I'll add my voice to the din:Acrobat Reader 6.0 sucks.
The damn thing is horribly slow to launch, which is the biggest complaint I've seen around, and one I find really annoying. 5.x was slow, but nowhere near this slow. It rivals Pagemaker's launch speeds, and Pagemaker's a classic app and an amazing resource pig. (Unfair, I know--it does a lot and has a plugin facility that requires launch-time bindings)
Worse than that the search function for it is, in the normal case, utter crap. Yeah, sure, it's got recursive directory search capabilities, which are nifty, but I've used them exactly once so far. Firing up a search in 6.0 takes ages, resizes the damn document (which is probably why it takes so long), and eats up huge amounts of screen real-estate for not a whole lot. The 5.x style "pop up a little box with a few checkbox options" was much nicer. Yes, the full and fancy search is good as an option, but it really shouldn't be the default.
Oh, yeah, and the splash box is dopey too. Not only is it big, but it's got an overlay swoosh that extends outside it. (Well, at least on OS X it does) Thanks, it's clever, I'm impressed, please stop that. I'd turn it off, but given how long this thing takes to launch, well...
And this version has JavaScript support. Oh, joy, oh rapture, oh boy a potential security hole. I could've sworn that PDF files were a restricted subset of Postscript specifically designed to render documents pretty darned safely. (Postscript, while cool, is a full-on Forth-style language, and it's pretty easy to write a postscript doc that can do a DOS attack, or close to it. Go google for postscript-based raytracing (No, I do not joke) if you're interested) JavaScript, while sucking far less in its pure form than most people think, isn't. I just hope Adobe's got a very restricted set of functions exposed to it.
I think I'll stick with Acrobat 5.1 for now.
And when I say "wireless" I mean 802.11b (or g, I suppose, that'd do) wireless everything. I've this perfectly fine Airport card in my iBook, yet to do anything at home I'm tied down with cables. There's the network cable (since I don't have a basestation... yet) but also two or three USB cables thumped together for the printer, keyboard, and mouse. (It's a wireless rechargeable Logitech M700 mouse that I got because I wanted to not be tied to the keyboard for my sessions at OSCON. Very nice, even better than I expected. That's an entry for another time, but the short answer is "Go buy one, you'll like it". Though it does take some getting used to after using trackpads and trackballs for the past few years)
I want an 802.11[bg] capable wireless USB hub so I can plug in all the cables, dongles, and what-have-yous to it, and have my iBook use them via a network connection to it. None of this silly "hang another dongle off a USB port" the way that you have to do bluetooth. I want my bluetooth wireless dongle hanging off the wireless USB hub, dammit!
Don't suppose anyone knows of one of these beasts that works under OS X?
Well, except for the "state 'o parrot" talk, since it's a bit premature. Still, printouts are off to Vee for the OSCON tutorial, so that's one more thing finished. Alas I won't be going to YAPC::EU after all, as the training session that was paying for it fell through, which is a major bummer.
Cool thing posted to the perl6-internals list yesterday. Klaas-Jan Stol was, as part of a compiler class, writing a paper on a compiler. In this case, he was writing a Lua compiler to target Parrot. It's a good read. (And yes, I do know it points out some of the stuff we're missing, but that's OK :)
The compiler itself isn't done, but, then, neither is Parrot.
And, more importantly, why should you really care?
Nearly all programs create garbage as they run1, and nearly all programs clean up that garbage in one way or another. Garbage Collection is what we call it when the cleanup is done automatically. The automatically part is what makes garbage collection interesting and useful.
In a language like C, you can explicitly allocate memory from the system with the malloc
function. It takes a size, and returns you back a pointer to the newly allocated memory. When your program is done with the memory, it calls the free
function, passing in a pointer to the memory that was allocated, and the memory is given back to the system. Nice, simple, straightforward. And entirely manual.
The big drawback to the manual system is that if you make a mistake, Bad Things Happen. Freeing memory too soon means your program has a pointer to memory it no longer owns. Using that pointer will cause your program to crash as it tries to access memory outside its region. Or, worse yet, access memory that's been recycled and used for other things. Freeing memory too late, or not at all, means that your program leaks memory, leaving chunks of unused and unusable memory lying around. This is less of a problem, as at least you won't crash, but it will mean the system eventually runs out of memory to hand out, and your program will fail.
How do you then track allocated memory to make sure that you hold onto it as long as necessary, but no longer? The way a lot of programs do it is with something called reference counting. Each chunk of memory has an integer counter attached to it. Each time your program makes reference to the memory (generally by storing a pointer to it somewhere) the count is incremented. Each time a reference is deleted the reference count is decremented. When the reference count goes to zero, the memory is freed.
Unfortunately reference counting has a number of problems associated with it. First, of course, your code needs to fiddle with the reference count properly, and everywhere. This is, in a large program, potentially error-prone--memory may live forever, or get deleted too soon. Both are bad, leaving you with the leak or segfault problems. It also clutters your code with refcount fiddling calls and, even if you reduce them to macros or function calls, it's more code to wade through when you're writing and debugging the code.
The other big problem is the runtime cost. All your memory allocations need to be bigger, to hold the reference count. Your code is doing a lot of math and, while it's simple addition and subtraction, it takes time. And it means your deallocation system may need to be fancy, since if a chunk of memory that gets deallocated has pointers to other pieces of memory, those pieces need their refcounts manipulated. (While this is a cost you have to pay in the manual case, a refcount-based system is usually a general-purpose thing, so the deallocator's a generic deallocator rather than a special-purpose one with deep knowledge of the data being deallocated, so it needs to think more when deallocating things)
All this is definitely a pain. (Been there, done that, got the T-shirt) It's far better than leaking memory, though, and it does have the twin advantages of being conceptually simple and very portable. It's the system that many of the interpreted languages such as perl and python use, as well as some of the compiled languages like Objective C. (Though it's as much of a function of the Objective C object library as the language itself, but that's what's in widespread use)
The alternative is real, fully automatic, garbage collection. What a garbage collector does is track all your program's allocations, then it look through all your data and figures out which chunks of memory are in use and which aren't. When it finds pieces of memory that aren't used, it releases them to the system for later reallocation.
Garbage collection always starts by assuming everything is dead. It then begins with what's called the root set--a set of things that are always considered alive. Anything the root set references is marked as living. Then everything that this newly declared "living" memory references is considered alive, so then we walk through that memory, and so on. By the time we're done, anything that's still dead is unreferenced and can be safely cleaned up. In object systems that have destructors, finalizers, last-gasp methods, or other things that get called right before something gets killed, this is the time it happens.
Garbage collection is a nice thing. Your main program code doesn't have to worry about tracking memory, since the garbage collection system handles it for you. That means less bookkeeping code to write, and almost as importantly less bookkeeping code to execute. Your code allocates memory when it needs to, and puts references to that memory wherever it wants. There's no tracking, counting, or reference work
A garbage collector is not a simple thing to write, not by any means. They're very system specific in places, and efficient collectors are generally complex, low-level, nasty pieces of code. That's the big downside. The garbage collection code is usually small and isolated, however, which offsets this to some extent. While you still need to understand it, only one or two people need to understand it to work on it, while the rest of the people working on the project don't have to worry about it at all. Memory allocation bugs are also isolated to a small section of your code--the memory allocator and the garbage collector--rather than scattered across the entire code base.
Luckily there are a number of pre-made garbage collection systems available free for use, such as the Boehm collector, that already handle all the nasty bits. You plug them into your program and they just work. Pre-made collectors tend to be more conservative and a bit slower than a hand-rolled custom collector, as they can't have as much knowledge about your program as you do, but the advantages involved in using a mature code base and just not having to bother are significant. While personally I like writing GC code, if you don't I'd recommend grabbing the Boehm collector.
The big downside to garbage collection is indeterminacy. Your program holds on to memory longer than it otherwise might, can't be sure of how long an allocation might take (since allocating memory may require firing off a garbage collection run if your program runs out), and can no longer predict exactly when some resource that requires active destruction may be cleaned up. While a number of the garbage collection schemes have bounded worst-case memory usage and collection times, their worst case performance (either in memory usage or pause) is generally worse than a refcounting scheme.
Interestingly, depending on your program's allocation patterns, a garbage collection system can have better average and best case performance than a refcounting system. (Both of which are beat by a manual tracking system's best case performance, though the complexity of a fully manual system generally makes it unsuitable for large systems just because it's such an enourmous pain to get it right) A GC system doesn't need to allocate a counter word per allocation and doesn't need to twiddle that counter word, and a GC's code is normally more cache friendly
Refcount schemes can, of course, have lengthy pauses of their own--freeing up the root node of a structure with a million or more elements will make your program stall. A refcounting system also spends significantly more overall time doing memory management tasks, usually twice as much (except in the most pathological cases) so a program with a real GC will generally run a bit faster than one with a refcount system. Not hugely faster, since your program ought not be spending all that much time in the memory allocation system, but every percent or two counts.
1 I do actually know of a few non-trivial programs that, when running, do not allocate any dynamic memory at all outside a very limited, and tightly bounded, bit of stack memory. (And I could probably, with work, put my hands on a few that don't use stack, either) They're quite rare, and somewhat special-purpose, but they do exist.
Well, I just finally broke down, more or less, and changed one of the major rules for DOD behavior for Parrot, or at least set the groundwork for it. (DOD = Dead Object Detection, the phase of the cleanup procedures that look for objects that are unreferenced and kills them, which is separate from Garbage Collection, the process of reclaiming currently unused memory. More WTHI on those at a later point)
Now, we had originally set it so that Parrot had a tracing DOD system that was usually triggered on resource depletion. User programs could, if they chose, trigger off a DOD run, but that was about it. This, of course, clashes with Perl 5's system a bit. Perl 5 uses a refcounting scheme, which find garbage as soon as it becomes garbage. Most of the time, at least, as it never finds circular garbage, which is a bit of an issue. A refcounting scheme also does destroy things in order, a separate issue that we'll get to later. I wasn't that concerned with this, since for 99.5% of the collectable things a program uses it doesn't matter--there's no active destructor attached to it, so the only issue is the memory that it uses, a matter that's really for Parrot to deal with.
The things that do have active destructors, though, are the troublesome bit. The refcounting scheme that Perl 5 uses means that, if something is found to be dead, it's found close to immediately, usually on block exit when the lexicals are cleaned up. This provides a certain amount of immediacy that can be useful--filehandles get closed when they go out of scope, DBI statement handles that leave scope get destroyed, and so forth. While honestly very rare, they're very visible, and also very useful.
I wasn't, at the start, all that worried about it. The DOD sweep's pretty damn fast, and it happens with some frequency, so nothing lasts long past its death, and with the new features coming in, especially continuations, stuff will live past many folks expectations anyway. Delayed cleanup isn't a problem. If languages want more timely cleanup, they can force a DOD sweep when they clean up and exit from their blocks. The win of tossing refcounts (less code, fewer errors, denser data, less bookkeeping overhead) was far bigger than the loss of some determinism. Resource exhaustion, such as running out of filehandles, triggers a DOD sweep, so you won't die for lack of filehandles or suchlike things. We also have some features in the pipeline, such as being able to push block exit actions onto the stack, that would take care of many of the things people use this for anyway. (Since many of the things that really want timely destruction actually want to be executed on block exit, and use the timely destruction mechanism as a way to do that)
That didn't last, of course. (If it did I wouldn't be writing this) Perl 5 generally does timely destruction, on block boundaries, so code translated from 5 to 6 would need to force a DOD sweep on every block exit. We may be fast, but that's a lot of extra work. I've also been finding myself writing code that uses a fair number of DBI statement handles that I just don't clean up, and while I don't think they truly want lexical cleanup, they ought to get cleaned up faster than they are. And, finally, work's running me across a problem with perl 5, in that things that survive to global destruction don't get destroyed in any order, and that's been causing some headaches. That lack of ordering is what we're guaranteeing all the time, which is bad.
So, the compromise. Parrot now has a lazysweep
opcode that triggers a DOD sweep, but only if there are objects that demand timely destruction. To go along with that, there's a "I demand timely destruction!" flag in the PMC structure, a counter in the interpreter structure, and a function to deal with them both. If a PMC is tagged as demanding timely destruction then the lazysweep op will trigger a DOD run (which recalculates the counter, so if they all die it stops) and if there aren't any, it doesn't. Languages that make some guarantees of determinancy can insert these as appropriate and, while not free, they are close to it.
The second part of the compromise is ordered destruction. While we haven't got this implemented yet, if a flag is set we'll build a quick tree of dead PMCs if there's more than one dead PMC with an active destructor, and destroy them in order from outside to inside. If there's only one dead PMC with an active destructor we don't bother building, since a single node tree takes no effort. (And because we clean up unused PMCs after any destructors are called)
Great? No. Adds unavoidable overhead (calculating the timely destruction count on a DOD sweep), which I don't like. It can be reduced to a simple test in simple cases (those with no timely active destructors) but I'm not sure that will happen too often in real life. On the other hand, it does provide some guarantees, which people like, and which do make programming a bit simpler, so, well, there we go. The point is to make this all work right as cheaply as reasonable, so we deal. Guaranteed invariants cost, but there's no real way around that. At least the cost will be trivial for languages that don't need it, which is something at least.
In the last entry, modulo a rant or two, I explained what multimethod dispatch is, but I didn't really make a good case for what it's useful for. The comments on the entry gave some examples, but I think it warrants a bit more explanation.
I should start by noting that MMD is one of those features that you will use in a few very specific circumstances. This isn't the equivalent of addition we're talking about (though it does get used under the hood as part of addition) here. You'll either need it in a few well-defined places and use it, or you won't. If you don't, well, that's fine.
MMD is used in those places where you have a method that can take multiple types of things, and want to do something based on information in the inputs. Generally this means we check the types of the data, and do things differently depending on what types are passed in, but MMD isn't required to be limited to this. More on that in a bit.
Let's, for a moment, ponder a well-designed OO GUI system. (Yes, I know, but this is a thought experiment. Suspend your disbelief for a moment and follow along :) This system has an object for each GUI element, as well as a fully OO based event system. Even better, this system has typed events, so each event that you can get is a subclass of the generic Event class. Key presses, mouse clicks, window moves, resolution changes--they all have their own object type.
Your program, like so many other GUI systems, just spins its wheels, forever grabbing events and dispatching them off to do... something. Whatever's appropriate. In our case here, let's assume that each event comes with the GUI element the event applies to, just to make things easier. (The system tracks which GUI element each thing happened to, since it's a well-designed GUI system) The main loop of your program may look like:
while (1) { ($widget, $event) = GetEvent(); $widget->handle_event($event) }
How do you process the event? Well, your widget classes all have code that looks like:
class Window { method handle_event(Event $event) { switch($event->type) { case KeyPress { } case MouseDown { } case SingleClick { } case DoubleClick { } } } }
(Keeping in mind that I'm mostly making up syntax here) The handle_event method for each GUI element type needs to check the type of the event and do something based on that event type. This is what happens now, though you generally aren't lucky enough to have the event actually come in as a typed object, instead getting an event structure or something. Still, we're pondering an OO GUI system, and the type of an object is as good a place (or better) as any to put the event type info.
Anyway, big switch statement. Ick. Maintenance pain, definitely, and one big glob of code. How could MMD make this better? Well, the class would then look like:
class Window { multimethod handle_event(KeyPress $event) { } multimethod handle_event(MouseDown $event) { } multimethod handle_event(SingleClick $event) { } multimethod handle_event(DoubleClick $event) { } }
Rather than one big method, we have one method per type. This gets us a number of wins.
First, of course, the system handles dispatching, rather than our code. That's less code for us to have to write (albeit only a little less) and, more importantly, the system is likely far better optimized for fast dispatch than we can be, because the system is in a position to cheat unmercifully. We, as application programmers, have to play by stricter rules.
Second, we've isolated the code for each event type into separate methods, which makes maintenance a bit safer. Twidding code for one event doesn't risk the code for the rest. (Not a common problem, but a problem nonetheless) There is the potential for code duplication, but that's not usually an issue, since often the code to handle each event type will be in separate subroutines anyway.
Third, we've probably eliminated a sub dispatch. While simple event handling code can be put, with some risk, in the body of the switch, odds are if there's more than a few lines needed to handle it the code will be factored out to a separate subroutine. That means two dispatches for each event, while the MMD form has a single dispatch. (Not a huge savings per call, granted, but if we're in an event-heavy system efficiency translates to responsiveness)
Finally, it's easy to add support for new event types after the fact, and in classes we don't have access to. The window class may well be provided by the system, with a default event handler that takes plain Events and quietly discards them. The system instructions could well be "to handle an event of type X, add in a MMD handle_event that takes an event of the type you want to handle", or "To override default behaviours, add a handle_event that takes an event of type X in your Window child class". This provides a nice system--you get a fallback handler automatically, along with default behaviors the system provides (likely with full and evil access to the internals of the system) without having to explicitly redispatch. You want to handle an event of type X? Great, handle just it. You don't need to have a generic handle_event that dispatches to the parent handle_event for everything you don't understand. (You can, but that's error prone. We don't like errors)
Another place that MMD is used, as I talked about before, is in operator overloading. This is a place where being able to add in options to the dispatch table is extraordinarily useful. Without MMD, dispatch for overloaded operators is traditionally done based on the type of the left-hand side. That's fine if your object is on the left and it knows about the type of the object on the right, but less fine if your fancy object is on the right, or doesn't have a clue as to what to do with the object on the right. (Not that MMD will guaranteed save you, but at least there's an option)
Type based dispatching also has a nice property of allowing for closest-match lookups. If you have any sort of inheritance hierarchy, at some point you'll come across the case where no method exactly matches the types in the signature. In that case, what do you do?
You could just bail, of course. Fall back to the default method, or throw an exception, or otherwise pitch a fit. But that's not tremendously helpful. What most MMD systems do, then, is to find the closest match. This is done reasonably simply if you consider each level of the inheritance tree a unit of distance. If there's an exact match, it's 0 units away. If you have a Foo, and Foo inherits from Bar, then Foo is one unit from Bar. What you do, then, is look at each of the possible versions of a method and find out how 'close' you are to it. If your types all match, you're 0 units away, and you've got the method. If not, then you figure out how far away you are. This can be done simply, by figuring out how far off each parameter you have is and adding the distances up, or treating the parameters in the declared methods as defining a point in N-dimensional space and figuring out how far you are from each possibility. The former's easier, the latter is more correct, so which you choose to implement depends on how comfortable you (or the people implementing the engine) are with distances in N-dimensional spaces. Luckily, for speed reasons, the distance table can be completely precalculated, so the engine doesn't have to throw in the speed hit of transcendental functions in each method dispatch. When multimethods are defined, yes, but that's pretty rare relative to the number of times they're used.
Earlier I said that we generally dispatch on type, but we don't have to, and that's true. Type-based MMD is done mainly in the interest of speed--it's usually very quick to check the types of your parameters and look up in a big table to see which 'real' method you want to dispatch to, but you certainly aren't limited to that. Haskell, for example, allows you to dispatch based on the values that are passed, as well as their types. This is a tremendously powerful mechanism, though one that's not often implemented. (I have a nagging suspicion that's as much because it becomes close to impossible to mathematically prove a program's correctness with this scheme as the cost, and the folks doing language research tend to be of a theoretical bent. I may be wrong, though)
So you could certainly have something that looks like:
multi foo(int i < 5) { print "Smaller than 5!"; } multi foo(int i >= 5) { print "5 or bigger!"; }
in your program, if your language supports it.
The implementation issues are a bit of a pain, and while implementation issues shouldn't always get in the way, an explanation is generally warranted.
In a pure type-based MMD system, all we really need for each sub/method that can do MMD is an N-dimensional table, one dimension per parameter that participates in dispatch. Each type has a number assigned to it, so at dispatch time we grab the numbers, index into the table, and dispatch to the resulting method. It's quick, though in its naive implementation it can be pretty memory-hungry. (There are optimizations for that, of course) We can even take advantage of inheritance to find the best match if we don't have an exact match.
In a value or expression based MMD system, it's not that easy. We need to evaluate the parameters to get their values, then run through all the expressions in the type signatures until we find one that matches. This can potentially take quite a while, relatively speaking. It also requires evaluating the parameters to the function at least once for the dispatch, which may cause an unusual and potentially unpredictable number of fetches of active data. It is, though, a much more interesting dispatch mechanism in a loosely typed language like Perl.
Now, in more concrete news, Perl 6 will support type-based MMD. That's no big deal. Any language that uses Parrot's native dispatching will also get the benefits of MMD, though whether you can actually declare a sub or method as MMD in that language is a separate question. (One who's answer is "No", I expect) I don't know if Perl 6 will support value-based dispatch, but it could. Parrot, however, will. Or, more to the point, Parrot will support arbitrary fallback lookup mechanisms. The default will be MMD type-based lookup, but we could, and probably will, provide a MMD value-based lookup that could alternately be swapped in, since it's not any more trouble and, more importantly, has no particular overhead, as it's just one more potential vector off an inflection point (fallback dispatch) that we need to make indirect and overridable anyway, and we can add as many different ways as we like without additional overhead.
So... Even if Perl 6 doesn't do it, and Parrot doesn't provide it by default, you (or someone Altogether Too Clever) could add it in after the fact, try out some different syntaxes, and work something out for Perl 6.2. The nice thing about it is that, since MMD is already going to be in, it's not like value-based MMD will be adding in a new concept, it'll just be adding in a new variant of an existing concept, which is a much easier thing.
There's yet another variant of the sobig virus/worm/whatever thing running around. This one apparently replaces Palyh thing (The one from "support@microsoft.com") that was running around. Palyh had a May 30 drop-dead date, so I can only assume the thing was proof-of-concept, designed either to show off someone's Studly Programming Skillz or as a study in spread characteristics in preparation for a nastier attack.
The proliferation of this damn thing isn't the big problem. The big problem is all the gateway filters that are in place that block the worm and incorrectly bounce back notice of an infected mail message. You'd think at this point that people would have figured out how to properly bounce, or at least not improperly bounce, these damn things, given how many worms and viruses have built-in SMTP capabilities with spoofed senders. (I can, after all, pretty much guarantee you that I'm not sending out infected mail) I've gotten more bouncemail for this thing than I have actual copies of the virus.
This recent spate of virus crap has prompted me to try and get some virus checking built into the mail server here, since I'm tired of wasting disk space on this crap, and I'd rather the other folks using the mail services here don't have to deal with it. (I think only one or two people are actually at risk, as everyone else is a mac or pine user, but....) I gave ClamAV a shot, to see if I could use it, as it seems like the only free game in town. And, while it seems to work pretty well as a virus checker, I haven't figured out how I can use it on my system.
It's not that there aren't instructions in there--there are. I can use it as a sendmail milter, or I can use it through amavis somehow. Unfortunately, the build of sendmail I'm running doesn't have milter support (and it's an 8.11 release of some vintage or other) and I can't for the life of me figure out how to make amavis-ng install and configure right. (The fact that the docs are all .texi files, and the version of makeinfo I have doesn't like them, doesn't help) Google wasn't much help locating instructions to use ClamAV with just procmail. Bleah.
I'm sure some of the problem is the vintage of the software on the server, and I'm getting more and more tempted to just rebuild it from scratch, but that really requires a second machine to build into. OTOH, this beast is getting pretty limiting for what it does, so a new box may well be in order, and I can relegate this thing (an old 300Mhz no-cache Celeron) to being a slow Parrot tinderbox machine. (Though, given a few hundred spare bucks to buy a new server, well, I'd rather buy an iPod... :)