Author Topic: Spectere's Random Programming Bullshit  (Read 2541 times)

Spectere

  • \m/ (-_-) \m/
  • Administrator
  • Hero Member
  • *****
  • Posts: 5716
  • printf("%s\n", "Hi!");
    • View Profile
    • spectere.net
Spectere's Random Programming Bullshit
« on: December 03, 2020, 06:16:59 PM »
Sometimes I get in these weird moods, where I just want to try implementing something different. Kind of a nice way to learn new things and to keep myself sharp. Usually, these are little one-offs, like when I decided to implement a Wolfenstein 3D-style raycaster. This thread is dedicated to those one-offs.

One of my more recent one-offs has been a little expression parser that I've been writing in C#, currently targeting .NET 5.0 (originally .NET Core 3.1). I'm still derping around with it, but it combines a number of concepts and things together and is shaping up to be fairly nifty.

I did have a few objectives in mind when I approached this. In addition to implementing the parser, I wanted it to be able to access variables and API calls. Essentially, the variables would be a key-value pair that the expression engine could read from and write values to, and the API would be implemented by passing it a class with methods, properties, and fields. Properties and fields in the API class can be modified using an expression, and methods can be called. Since I don't want types to be a concern, some sort of dynamic typing support will be required.

Since typing is one of the core factors, I went ahead and resurrected a class I wrote a while ago to handle dynamic types, then modernized it to use the most current language features (pattern matching and switch expressions helped to declutter the code quite a bit). This class contains an internal object and dynamically converts it between various types depending on what it contains. When a value is created, it will attempt to coerce the type to one of the following, in order: long, decimal, double, string. The reason for this is to ensure that the number is stored with full precision (which long and decimal are far more adept at doing) before using an approximate type.

This class also has overloads for all of the assignment and comparison operators, which all handle converting the two sides to like types, performing the operation, and returning a new value type with the result. This also includes provisions for string comparisons and concatenations, and will throw an exception if two incompatible types are used (such as subtracting a string from a long). When all is said and done, stuff like this becomes possible:

Code: [Select]
var a = new Value("1234");
var b = new Value(0.567m);
var c = a + b;  // c = 1234.567

Note that a was initialized using a string, while b was initialized using a decimal, yet the addition was still successful and returned the expected result (as opposed to "12340.567", which would have been the result if string "addition"—that is, concatenation—were used). We also put in a bunch of different implicit conversion overloads, allowing it to return the desired type and converting it if necessary/possible.

The next step is implementing the parser itself. This was done by tokenizing the string (that is, reading each symbol and assigning a type to it). For example, this would translate this:

Code: [Select]
_a := (4 + -5) * 2
…into this:

Code: [Select]
Identifier (_a)
Assignment
OpenParens
Value (4)
Addition
Subtraction
Value (5)
CloseParens
Multiply
Value (2)

The expression engine then takes that tokenized collection and evaluates it. It does this by breaking down the tokenized expression into a tree, taking operator precedence into account. It does this until it descends into leaf nodes (nodes with no children, such as values and identifiers), then starts working its way up, evaluating the binary and unary nodes until it arrives upon a final result. In the above example, the tree would look something like this:

Code: [Select]
root: NodeBinary(Assignment)
    left: NodeLeaf(Identifier: _a)
    right: NodeBinary(Multiplication)
        left: NodeParenthesis(4 + -5)
            value: NodeBinary(Addition)
                left: NodeLeaf(Value: 4)
                right: NodeUnary(Negation)
                    value: 5
        right: NodeLeaf(Value: 2)

5 has a negation unary operation applied, giving us -5. Then, 4 is added to the resulting -5, giving us -1. That value is multiplied by 2, giving us -2. That value is then assigned to the identifier _a.

Since the value type allows us to handle all types more or less equally, it also lets you do some wonky stuff. The following concatenates -16 and 45, and 23 and 52 together as strings, then adds the result together:

Code: [Select]
(-16 & 45) + (23 & 52)
-1645 + 2353
707

Okay, great. So we have a working expression parser. Next step is to implement variables and API calls. Identifier nodes are going to handle both of them, with the mnemonic that simple variables are prefixed with an underscore and API calls handle…well, basically everything else. We also want everything to be case-insensitive, so we need to make sure to take that into account.

Variables are simple, so let's start with those. We're going to treat them as key/value pairs, so we can just use a Dictionary<string, Value> and that'll do the trick. We'll handle all of that in the API context handler. If we see an underscore prefix, we treat it as a variable, otherwise we treat it like an API call. Both both getting and setting variables it's a simple matter of transforming the key name to lowercase for case-insensitivity. For gets, we check to see if the key exists, returning null if it doesn't or the value if it does. For sets, we add a new entry to the Dictionary if the key doesn't exist and update the existing entry if it does.

As far as APIs are concerned, we want to write this in such a way that it'll work regardless of what the class looks like, so we'll need to use reflection. Since we support methods, properties, and fields, and all three of them are handled slightly differently, we need to be able to accommodate all three. First, we search for the member using a combination of reflection and a simple LINQ expression:

Code: [Select]
var child = apiObject.GetType().GetMembers().FirstOrDefault(e => string.Equals(e.Name, objectName, StringComparison.InvariantCultureIgnoreCase))
This gives us a case-insensitive match to the first object containing that name. Since we're using an "OrDefault" query, we do a quick null check to make sure that it exists, throwing an exception if it doesn't. We can then check the MemberType property of the returned value. Since we're only concerned with fields, methods, and properties, we search for those.

Fields and properties are simple. All we have to do is cast those to FieldInfo or PropertyInfo, respectively, and call GetValue(). Return that to the parser and there we go. Methods are a bit more complicated, especially if you want to take optional parameters and polymorphism into account. We don't really care about that right now, so all we do is take whatever parameters the expression parser picked up (basically, by setting up a little rule where values in parens following identifiers are treated as parameters), convert it into an object array, cast the MemberInfo class to MethodInfo, then return the results of the Invoke method.

There is one little problem, however. We want to be able to write API functions like a normal .NET function, without necessarily having to use the Value types for parameters and the return value. The problem with this is that .NET's default binding wasn't really intended to handle something like the Value type. Since Values can be converted seamlessly to just about any type, we need to write a custom binder to ensure that conversion will work smoothly.

We do this by creating a ValueBinder class, which extends the .NET Binder class. The only method that we care about overriding is the ChangeType() method. All we have to do here is check the requested type and explicitly point it to the appropriate Value conversion method. The following works nicely:

Code: [Select]
public override object ChangeType(object value, Type type, CultureInfo culture) {
    var val = value as Value;
    if(type == typeof(Value))  // No conversion necessary.
        return value;

    if(val is null) return null;

    return Type.GetTypeCode(type) switch {
        TypeCode.Boolean => val.ToBool(),
        TypeCode.Decimal => val.ToDecimal(),
        TypeCode.Double => val.ToDouble(),
        TypeCode.Int32 => val.ToInt32(),
        TypeCode.Int64 => val.ToInt64(),
        TypeCode.Single => val.ToSingle(),
        TypeCode.String => val.ToString(),
        _ => val.ToObject()
    };
}

Now, all we need to do is pass an instance of ValueBinder to the Invoke call using the default binding flags, and now it'll work far more reliably with functions that use built-in types.

And, with that, everything is pretty much done. My test fixtures gave me a whole bunch of green checkmarks, so I decided to give it a quick stress test, if you will: the quadratic formula. Nothing too crazy, sure, but it's a pretty decent all around test.

First of all, we'll need a square root function:

Code: [Select]
private class MathApi {
    public static Value sqrt(Value x) => Math.Sqrt(x);
}

Next, we write a quick test fixture:

Code: [Select]
[Test, Repeat(100)]
public void QuadraticFormula() {
    var vars = new Dictionary<string, Value>();
    var api = new MathApi();
    var ctx = new ApiContext(vars, api);

    var a = _rng.Next(1, 4);
    var b = _rng.Next(20, 40);
    var c = _rng.Next(1, 4);

    vars.Add("a", a);
    vars.Add("b", b);
    vars.Add("c", c);

    const string expr1 = "(-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)";
    const string expr2 = "(-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)";

    var expectedX1 = new Value((-b + Math.Sqrt(Math.Pow(b, 2) - (4 * a * c))) / (2 * a));
    var expectedX2 = new Value((-b - Math.Sqrt(Math.Pow(b, 2) - (4 * a * c))) / (2 * a));

    var actualX1 = new Parser(expr1).Eval(ctx);
    var actualX2 = new Parser(expr2).Eval(ctx);

    var message = $"a = {a}, b = {b}, c = {c}\n"
                + $"X1 => {expr1} = {actualX1} "
                + (expectedX1 == actualX1 ? "" : "(!!!)") + "\n"
                + $"X2 => {expr2} = {actualX2}"
                + (expectedX2 == actualX2 ? "" : "(!!!)") + "\n";

    Assert.AreEqual(expectedX1, actualX1, message);
    Assert.AreEqual(expectedX2, actualX2, message);
    Console.WriteLine(message);
}

And then we run it (output trimmed, since 100 iterations is a bit much):

Code: [Select]
a = 2, b = 37, c = 2
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.05421292112523979
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -18.44578707887476

a = 1, b = 24, c = 2
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.0836247121870155
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -23.916375287812983

a = 2, b = 33, c = 3
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.09141556395963946
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -16.40858443604036

Ding ding!

Now, for a quick bonus. Let's go ahead and work out the expression tree for one of those examples:

Code: [Select]
(-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)

root: NodeBinary (Division)
    left: NodeParens(-_b + sqrt(_b^2 - (4 * _a * _c)))
        child: NodeBinary (Addition)
            left: NodeUnary (Negation)
                value: NodeLeaf (Identifier: _b)
            right: NodeLeaf (Identifier: sqrt)
                params: NodeParens (_b^2 - (4 * _a * _c))
                    child: NodeBinary (Subtraction)
                        left: NodeBinary (Power)
                            left: NodeLeaf (Identifier: _b)
                            right: NodeLeaf (Value: 2)
                        right: NodeParens (4 * _a * _c)
                            child: NodeBinary (Multiply)
                                left: NodeBinary (Multiply)
                                    left: NodeLeaf (Value: 4)
                                    right: NodeLeaf (Identifier: _a)
                                right: NodeLeaf (Identifier: _c)
    right: NodeParens(2 * _a)
        child: NodeBinary(Multiply)
            left: NodeLeaf (Value: 2)
            right: NodeLeaf (Identifier: _a)

And there you have it! Wasn't that fun? :D
"This is a machine for making cows."

Bobbias

  • #1 Poster
  • Hero Member
  • *****
  • Posts: 7210
  • 404 Avatar not found.
    • View Profile
    • Magnetic Architect
Re: Spectere's Random Programming Bullshit
« Reply #1 on: December 07, 2020, 10:23:28 AM »
Ok, this is pretty damn neat. The closest thing I've worked on would be attempting to figure out how to do static analysis on self modifying code.... I did not succeed in that endeavor. I couldn't figure out exactly how to approach it. Well, more precisely what I was trying to do was analyze intcode programs from last year's AOC. Unfortunately I got caught up in trying to understand how you even structure an approach to parsing a language where everything is integers (of arbitrary size), and they may represent both data or code, and make heavy use of self modification. I was trying to do it in Racket, since racket provides a lot of functionality for creating DSLs. But if I wanted racket to treat it as a language in the editor itself then id need a way to go from intcode to a simplified representation without the self modification. Only way I could think to do that would be either dynamic analysis or hardcore data flow analysis, which I have yet to grock.
This is going in my sig. :)

BANNED FOR BAD PUNS X_x

Spectere

  • \m/ (-_-) \m/
  • Administrator
  • Hero Member
  • *****
  • Posts: 5716
  • printf("%s\n", "Hi!");
    • View Profile
    • spectere.net
Re: Spectere's Random Programming Bullshit
« Reply #2 on: December 10, 2020, 01:32:32 AM »
Sometimes the hardest part about approaching a problem like that is knowing exactly what to search for, and sometimes not knowing what the most effective solution is. It's all too easy to search for a solution only to find out much later that there are far easier ways of accomplishing a given task.

That's one of the pitfalls I typically find myself jumping into if I opt to use a certain project to learn a language. On one hand, it's a great way to learn by doing, and pursuing a goal is far more likely to lead to viable results than just writing a bunch of test programs. On the other hand, there were a lot of little things that I wound up reworking in Plip (that GameBoy emulator) because I ended up approaching a number of problems from the perspective of a C programmer, not a C++ programmer. While the two are largely compatible, C++ adds significantly more safety and convenience.

With the expression parser, I just went ahead and implemented that in C#. I live and breathe that language so I was able to just focus on the implementation and not have to worry about language details. My test suite is admittedly a bit less than ideal, but it does the job. :P
"This is a machine for making cows."