Cook Computing

Functional Style Regex Engine in C#

January 4, 2007 Written by Charles Cook

Wesner Moise's recent post Hard Problems, Simple Solutions speculates about implementing regular expression matching using a functional style. He points to a regular expression engine in 14 lines of Python and suggests this could be ported to C# using iterators and anonymous functions.

Wesner writes that the regular expression would be composed functionally through a string:


re = Sequence( term1, Plus( term2, Alternate( term3, term4 )), term 5)

and the results would be returned as a collection by applying the regular expression to the list to be searched:


results = re( list_to_search )

I thought I would attempt a port of the Python code to learn a bit more about iterators and anonymous functions and came up with the class Rgx below (non-generic for strings only to make the code easier to read). This allows the sample from the Python code to be written like this:


// c(a|d)+r

Rgx.Expr e = 
  Rgx.Seq(Rgx.Char('c'), 
    Rgx.Seq(Rgx.Plus(Rgx.Alt(Rgx.Char('a'), Rgx.Char('d'))), 
      Rgx.Char('r')));

foreach (string r in e("cdar"))
  Console.WriteLine("Match with remainder: {0}", r);

The lack of global functions in C# makes the code uglier Wesner's suggested code above (btw why no global functions in C#? I believe the CLR supports them) and because the yield statement cannot be used inside anonymous method blocks it is necessary create the separate iterators SeqIterator and CharIterator. Note that like the Python code this class doesn't do anything useful; think of it as a proof of concept.

     
class Rgx
{
  public delegate IEnumerable Expr(string s);

  static IEnumerable IconCat(IEnumerable xs, IEnumerable ys)
  {
    foreach (string x in xs) yield return x;
    foreach (string y in ys) yield return y;
  }

  static public IEnumerable Nil(string s)
  {
    yield return s;
  }

  static public Expr Seq(Expr l, Expr r)
  {
    return delegate(string s)
    {
      return SeqIterator(s, l, r);
    };
  }

  static IEnumerable SeqIterator(string s, Expr l, Expr r)
  {
    foreach (string sl in l(s))
      foreach (string sr in r(sl))
        yield return sr;
  }

  static public Expr Alt(Expr l, Expr r)
  {
    return delegate(string s)
    {
      return IconCat(l(s), r(s));
    };
  }

  static public Expr Star(Expr e)
  {
    return delegate(string s)
    {
      return IconCat(Nil(s), Seq(e, Star(e))(s));
    };
  }

  static public Expr Plus(Expr e)
  {
    return Seq(e, Star(e));
  }

  static public Expr Char(char c)
  {
    return delegate(string s)
    {
      return CharIterator(s, c);
    };
  }

  static IEnumerable CharIterator(string s, char c)
  {
    if (s.Length > 0 && s[0] == c)
    {
      yield return (s.Substring(1));
    }
  }
}