REPLs and Assemblers with OCD

Time to make a mess!

First, Let’s get our vocabulary straight. For those unfamiliar with the term REPL, it is short for a “Read-Eval-Print Loop“. It is basically a great way to poke at a language with a stick and get a taste for things you can do with it, and/or a great debugging aid, etc. Many modern languages offer a REPL (online even!), and I hadn’t realized until recently just how bad Eureka was going to be at it. But why?

Let’s take that REPL piece by piece and see where we fall short. (We can assume in advance that I can write the Loop part.)

The Read step is basically a solved problem, and has been for some time. As my earlier link to the “online REPLs” shows, you can do this any number of ways; easy peasy.

Now to the Eval. The heart of Eureka’s interpreter is wrapped in a cute candy coating known as ekContextEval(), which takes arbitrary Eureka code in a string and compiles / assembles / interprets it in one swoop. It returns false if the eval fails (any compile or runtime errors), and you can ask the ekContext for a list of text errors. I think E is covered.

Now we just need to Print, and we’re finally catching up to Every Other Language On The Planet. But what to print? My poor Eval function loves to tell me all of the ways I’ve made a mistake, but it never gives me anything interesting in the form of an actual value. Perhaps we need to find out the kind of data Eval should be returning first, and then work on why we don’t have it.

Let’s try another REPL so we know what kind of output we want. Using a Ruby REPL as an example, I did this (input and output alternate lines):

>  5
=> 5
>  [5,6,7]
=> [5, 6, 7]
>  a = 5
=> 5
>  a
=> 5

The flow here is pretty obvious. I give an expression (“5″), it compiles and runs the expression and spits out what its value is (=> 5). This is where Eureka breaks down; I don’t have that value anywhere!

It turns out Eureka’s assembler is a neat freak. Since Eureka’s interpreter relies almost entirely on two stacks (frames and values), the maintenance of those stacks is critical. If the interpreter allows things to linger in the value stack, it’ll just be a huge mess, and nobody wants a huge mess. The assembler carefully ensures that everything that is placed on the stack eventually has a permanent home, even if that home is in the garbage can (the free list).

Here is the assembly dump for Eureka when you give it the expression “5″:

.kint   0    5

.main
         pki  0
         pop  1
         ret  0

Simple stuff. Beforehand, it defines constant int index 0 with a value of 5. Then, the code begins by pushing constant integer index 0 (the value 5) onto the stack. Everything looking good so far! But wait, the next instruction is to pop a single value from the stack! Where did it come from? But what about our precious eval result?!

The pop instruction comes from the innocuous sounding function asmPad(), deep in the guts of Eureka’s assembler. Its job is to keep track of the flow of outputs of expressions (the “offer”) and the inputs of other expression (the “keeps”). When an rvalue expression doesn’t “offer” as much as the associated lvalue wants to “keep”, asmPad() saves the day and pads the stack with null values. This ensures that the ops are always self consistent, even if someone botches the arity of a function call or only requests two return values from a function that isn’t returning anything. However, it also protects the other direction as well. If the rvalue “offered” is more than the lvalue wanted to “keep”, asmPad() happily injects pop instructions to politely discard the rvalue’s hard work. This is a well oiled machine, except when you’d like to show off that discarded value to someone struggling with a Eureka REPL.

So, how to fix it? I tried a handful of terrible things in the guts of asmPad() and always came up short. There is no sense in punishing asmPad() for cleaning up after the messes of mismatched instructions; it is only doing the one job it has ever had! I tried injecting special versions of “ret” instead, to clue in the interpreter of when I wanted the data, etc. All of these attempts ended in failure. I finally realized that the answer was right under my nose: the pop instruction.

Most of the time, things popped by the pop instruction aren’t very interesting. However, when working in a REPL, all of the juiciest pieces of data (the results!) are either popped at the end of execution, or were recently stored in a variable (the vset instruction). All I need to do is pay attention to the most recent values affected by pop and vset, and I should have my REPL.

So I did just that; I added an optional argument to ekContextEval() which is an empty array. If it is present during evaluation, I simply clear its contents before every pop or vset, and add references to any values involved into the array, adding a ref count to them in the process. Since Eureka’s interpreter is refcounted, this is actually pretty cheap. The result array stabilizes at a small-ish size pretty quickly (especially if reused by the REPL code), and simply stores off pointers to ekValues in the same way the interpreter does. When ekContextEval() returns, whatever is in that array happens to be the eval result. Here is my local Eureka REPL running very similar inputs to the Ruby one earlier:

[chunks: 0] [globals: 0] $ 5
=> 5
[chunks: 0] [globals: 0] $ [5,6,7]
=> [ 5, 6, 7 ]
[chunks: 0] [globals: 0] $ var a = 5
=> 5
[chunks: 0] [globals: 1] $ a
=> 5

Success! I ended up with the REPL I was looking for, and a newfound appreciation for my neat freak assembler code. That said, it is good to get your hands dirty once in a while.

Posted in Blog