Sometimes I get in these weird moods, where I just want to try implementing something different. Kind of a nice way to learn new things and to keep myself sharp. Usually, these are little one-offs, like when I decided to implement a Wolfenstein 3D-style raycaster. This thread is dedicated to those one-offs.
One of my more recent one-offs has been a little expression parser that I've been writing in C#, currently targeting .NET 5.0 (originally .NET Core 3.1). I'm still derping around with it, but it combines a number of concepts and things together and is shaping up to be fairly nifty.
I did have a few objectives in mind when I approached this. In addition to implementing the parser, I wanted it to be able to access variables and API calls. Essentially, the variables would be a key-value pair that the expression engine could read from and write values to, and the API would be implemented by passing it a class with methods, properties, and fields. Properties and fields in the API class can be modified using an expression, and methods can be called. Since I don't want types to be a concern, some sort of dynamic typing support will be required.
Since typing is one of the core factors, I went ahead and resurrected a class I wrote a while ago to handle dynamic types, then modernized it to use the most current language features (pattern matching and switch expressions helped to declutter the code quite a bit). This class contains an internal object and dynamically converts it between various types depending on what it contains. When a value is created, it will attempt to coerce the type to one of the following, in order: long, decimal, double, string. The reason for this is to ensure that the number is stored with full precision (which long and decimal are far more adept at doing) before using an approximate type.
This class also has overloads for all of the assignment and comparison operators, which all handle converting the two sides to like types, performing the operation, and returning a new value type with the result. This also includes provisions for string comparisons and concatenations, and will throw an exception if two incompatible types are used (such as subtracting a string from a long). When all is said and done, stuff like this becomes possible:
var a = new Value("1234");
var b = new Value(0.567m);
var c = a + b; // c = 1234.567
Note that a was initialized using a string, while b was initialized using a decimal, yet the addition was still successful and returned the expected result (as opposed to "12340.567", which would have been the result if string "addition"—that is, concatenation—were used). We also put in a bunch of different implicit conversion overloads, allowing it to return the desired type and converting it if necessary/possible.
The next step is implementing the parser itself. This was done by tokenizing the string (that is, reading each symbol and assigning a type to it). For example, this would translate this:
_a := (4 + -5) * 2
…into this:
Identifier (_a)
Assignment
OpenParens
Value (4)
Addition
Subtraction
Value (5)
CloseParens
Multiply
Value (2)
The expression engine then takes that tokenized collection and evaluates it. It does this by breaking down the tokenized expression into a tree, taking operator precedence into account. It does this until it descends into leaf nodes (nodes with no children, such as values and identifiers), then starts working its way up, evaluating the binary and unary nodes until it arrives upon a final result. In the above example, the tree would look something like this:
root: NodeBinary(Assignment)
left: NodeLeaf(Identifier: _a)
right: NodeBinary(Multiplication)
left: NodeParenthesis(4 + -5)
value: NodeBinary(Addition)
left: NodeLeaf(Value: 4)
right: NodeUnary(Negation)
value: 5
right: NodeLeaf(Value: 2)
5 has a negation unary operation applied, giving us -5. Then, 4 is added to the resulting -5, giving us -1. That value is multiplied by 2, giving us -2. That value is then assigned to the identifier _a.
Since the value type allows us to handle all types more or less equally, it also lets you do some wonky stuff. The following concatenates -16 and 45, and 23 and 52 together as strings, then adds the result together:
(-16 & 45) + (23 & 52)
-1645 + 2353
707
Okay, great. So we have a working expression parser. Next step is to implement variables and API calls. Identifier nodes are going to handle both of them, with the mnemonic that simple variables are prefixed with an underscore and API calls handle…well, basically everything else. We also want everything to be case-insensitive, so we need to make sure to take that into account.
Variables are simple, so let's start with those. We're going to treat them as key/value pairs, so we can just use a Dictionary<string, Value> and that'll do the trick. We'll handle all of that in the API context handler. If we see an underscore prefix, we treat it as a variable, otherwise we treat it like an API call. Both both getting and setting variables it's a simple matter of transforming the key name to lowercase for case-insensitivity. For gets, we check to see if the key exists, returning null if it doesn't or the value if it does. For sets, we add a new entry to the Dictionary if the key doesn't exist and update the existing entry if it does.
As far as APIs are concerned, we want to write this in such a way that it'll work regardless of what the class looks like, so we'll need to use reflection. Since we support methods, properties, and fields, and all three of them are handled slightly differently, we need to be able to accommodate all three. First, we search for the member using a combination of reflection and a simple LINQ expression:
var child = apiObject.GetType().GetMembers().FirstOrDefault(e => string.Equals(e.Name, objectName, StringComparison.InvariantCultureIgnoreCase))
This gives us a case-insensitive match to the first object containing that name. Since we're using an "OrDefault" query, we do a quick null check to make sure that it exists, throwing an exception if it doesn't. We can then check the MemberType property of the returned value. Since we're only concerned with fields, methods, and properties, we search for those.
Fields and properties are simple. All we have to do is cast those to FieldInfo or PropertyInfo, respectively, and call GetValue(). Return that to the parser and there we go. Methods are a bit more complicated, especially if you want to take optional parameters and polymorphism into account. We don't really care about that right now, so all we do is take whatever parameters the expression parser picked up (basically, by setting up a little rule where values in parens following identifiers are treated as parameters), convert it into an object array, cast the MemberInfo class to MethodInfo, then return the results of the Invoke method.
There is one little problem, however. We want to be able to write API functions like a normal .NET function, without necessarily having to use the Value types for parameters and the return value. The problem with this is that .NET's default binding wasn't really intended to handle something like the Value type. Since Values can be converted seamlessly to just about any type, we need to write a custom binder to ensure that conversion will work smoothly.
We do this by creating a ValueBinder class, which extends the .NET Binder class. The only method that we care about overriding is the ChangeType() method. All we have to do here is check the requested type and explicitly point it to the appropriate Value conversion method. The following works nicely:
public override object ChangeType(object value, Type type, CultureInfo culture) {
var val = value as Value;
if(type == typeof(Value)) // No conversion necessary.
return value;
if(val is null) return null;
return Type.GetTypeCode(type) switch {
TypeCode.Boolean => val.ToBool(),
TypeCode.Decimal => val.ToDecimal(),
TypeCode.Double => val.ToDouble(),
TypeCode.Int32 => val.ToInt32(),
TypeCode.Int64 => val.ToInt64(),
TypeCode.Single => val.ToSingle(),
TypeCode.String => val.ToString(),
_ => val.ToObject()
};
}
Now, all we need to do is pass an instance of ValueBinder to the Invoke call using the default binding flags, and now it'll work far more reliably with functions that use built-in types.
And, with that, everything is pretty much done. My test fixtures gave me a whole bunch of green checkmarks, so I decided to give it a quick stress test, if you will: the quadratic formula. Nothing too crazy, sure, but it's a pretty decent all around test.
First of all, we'll need a square root function:
private class MathApi {
public static Value sqrt(Value x) => Math.Sqrt(x);
}
Next, we write a quick test fixture:
[Test, Repeat(100)]
public void QuadraticFormula() {
var vars = new Dictionary<string, Value>();
var api = new MathApi();
var ctx = new ApiContext(vars, api);
var a = _rng.Next(1, 4);
var b = _rng.Next(20, 40);
var c = _rng.Next(1, 4);
vars.Add("a", a);
vars.Add("b", b);
vars.Add("c", c);
const string expr1 = "(-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)";
const string expr2 = "(-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)";
var expectedX1 = new Value((-b + Math.Sqrt(Math.Pow(b, 2) - (4 * a * c))) / (2 * a));
var expectedX2 = new Value((-b - Math.Sqrt(Math.Pow(b, 2) - (4 * a * c))) / (2 * a));
var actualX1 = new Parser(expr1).Eval(ctx);
var actualX2 = new Parser(expr2).Eval(ctx);
var message = $"a = {a}, b = {b}, c = {c}\n"
+ $"X1 => {expr1} = {actualX1} "
+ (expectedX1 == actualX1 ? "" : "(!!!)") + "\n"
+ $"X2 => {expr2} = {actualX2}"
+ (expectedX2 == actualX2 ? "" : "(!!!)") + "\n";
Assert.AreEqual(expectedX1, actualX1, message);
Assert.AreEqual(expectedX2, actualX2, message);
Console.WriteLine(message);
}
And then we run it (output trimmed, since 100 iterations is a bit much):
a = 2, b = 37, c = 2
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.05421292112523979
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -18.44578707887476
a = 1, b = 24, c = 2
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.0836247121870155
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -23.916375287812983
a = 2, b = 33, c = 3
X1 => (-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -0.09141556395963946
X2 => (-_b - sqrt(_b^2 - (4 * _a * _c))) / (2 * _a) = -16.40858443604036
Ding ding!
Now, for a quick bonus. Let's go ahead and work out the expression tree for one of those examples:
(-_b + sqrt(_b^2 - (4 * _a * _c))) / (2 * _a)
root: NodeBinary (Division)
left: NodeParens(-_b + sqrt(_b^2 - (4 * _a * _c)))
child: NodeBinary (Addition)
left: NodeUnary (Negation)
value: NodeLeaf (Identifier: _b)
right: NodeLeaf (Identifier: sqrt)
params: NodeParens (_b^2 - (4 * _a * _c))
child: NodeBinary (Subtraction)
left: NodeBinary (Power)
left: NodeLeaf (Identifier: _b)
right: NodeLeaf (Value: 2)
right: NodeParens (4 * _a * _c)
child: NodeBinary (Multiply)
left: NodeBinary (Multiply)
left: NodeLeaf (Value: 4)
right: NodeLeaf (Identifier: _a)
right: NodeLeaf (Identifier: _c)
right: NodeParens(2 * _a)
child: NodeBinary(Multiply)
left: NodeLeaf (Value: 2)
right: NodeLeaf (Identifier: _a)
And there you have it! Wasn't that fun?
