Introduction to KAPPA

Mon 24 July 2023

Why?

I'm not very sane for doing this, and the severity of the undertaking is much greater than when I wrote my software renderer. Writing a good software renderer isn't easy, but compared to writing a compiler, it really becomes light work. Oh yeah, I haven't answered the question. I'm making it because I wanted to mess with user scripting in C, but got too stubborn to steal someone else's scripting system, so I started writing my own, named KAPPA. The idea is that someday, my JRPG I plan to make will use these scripts to define weapon behavior, for example.

I wasn't really inspired by anything to do this, I just sat down one day and thought: "Wouldn't it be awesome to see how a programming language implementation can develop over time?" So I proceeded to write a tokenizer the very next day. I can actually turn to my savior Bisqwit for help, because he made a series on compilers. His diagram of defining a programming lanaguage's grammar was very helpful.

The progress

So, I don't really have any media of the progress, since I didn't have anyone nerdy enough to boast to about it, but the commit history of the repository makes it pretty clear. Starting in March, during my college semester, I sat down and wrote the first lines of code, creating the tokenizer. I had originally planned to make a timelapse of me coding it and uploading it to YouTube, but I ultimately scrapped it.

Code Parameters

Creating the tokenizer was very straightforward, we simply skip whitespace, identify which type of token it is since we have a list of tokens, then mark it down with some info. Some tokens consist of several characters, while some are single characters, so I noted those down too. Once I had a token stream, I did some analysis on it to remove comments, and recognize constants among other things.

Lexical Analysis

I don't actually know if this is what lexical analysis does, but in my case, this is what I did. To be fair, I know nothing about compilers, so a lot of what you will see is not really modern practice.

Assembly

I chose to have KAPPA be compiled into executable memory, as well as be a hard-typed language. Yes, it is compiled to executable memory, and I am aware I have opened a portal to the hacker dimension in any program that uses my system. My thinking is that none of these scripts will be networked, so the chance of getting rekt by a malicious script is the same as having an executable require a library that gets modified in a malicious way.

As I started writing the compiler and assembler, I think it actually ended up being easier than writing an interpreter. Initially I thought writing an interpreter would be easier, but I've reverse engineered enough that I feel more confident in my x86 assembly than runtime environments. Plus, the day I make my own instruction set, I can use this compiler to assemble to its respective bytecode.

Indeed, the whole assembler and compiler is still a large undertaking. Just getting a simple function to return the nth Fibonacci number took quite a while. It required some basic assembly, but getting started took some time as I haven't written something like this before. It required a function prelude, fastcall-like calling conventions, moves between registers, integer addition, and immediate value moves into registers. Oh yeah, and comparisons. But once those few assembler lines are implemented, you can calculate the Fibonacci sequence!

Structuce

Oh yeah, I haven't touched on the grammar of the language. Here's how it goes:

Global

Just like in C, you can declare a global variable or function. A global variable is stored in the data segment, and can have a constant expression initialize it. If the identifier after the declarator is not followed by a new expression, then it's an initializer, otherwise, it's a function. Then the expression contains a comma-separated list of parameters, which immediately get assembled into a fastcall convention, and space on the stack is allocated.

Statement

Here, you can declare a new variable, and even initialize it with a non-constant value! In which case an assignment operator after a declaration can parse an expression after. Otherwise, you can parse a keyword, such as "if" or "while", followed by an expression, followed by a new statement.

Expression

This one I fiddled around with for a while. Initially, I had recursive descent, where the right hand side of an operator would be another parsed expression, but then I wanted to implement order of operations, so expressions instead became this: Chain together operators, marking the left hand side, and right hand side. Sweep through, doing multiplications first, then additions, then logicals, then finally assignments. The recursive descent still appears in any nested expressions. This allows for order of operations, but it's a little messy, and has an ugly special case where if there are no operators, it immediately compiles the following token. Speaking of which, the elements between operators are single values, so we'll just assemble those as register moves, or immediate value moves, then chain together some push's and pop's to store calculations.

The great re-typing

When I first started writing the compiler, everything was treated as an unsigned 8-byte integer. As soon as I looked ahead to try to do floating-point arithmetic, I properly implemented types, and changed the assembler to accept different sized integers.

Floats

The most recent change is probably the most significant, and tough to implement. I knew that floats had to be treated with care in x86 assembly, but I was not ready for the amount of bogus I had to deal with. You can't push or pop floats, or move immediate values into them, or use jl, je type instructions with them. I essentially had to double the lines of code in the assembler to completely change how these are dealt with. In the end though, I almost got a good sine approximation using KAPPA, and for now, I feel pretty good about that. I'll iron out the random discontinuity another day.

Sine Reconstruction KAPPA Sine

Conclusion

I plan to fix a lot of the special case issues that have been created during work on KAPPA, as well as implement it into my own projects to see what it can do. As for the future of KAPPA itself, I want to add structure types next, and arrays. Who knows, it may be turing-complete after that.

Curiosity Didn't Kill The Coder