Splitting a String
I recently investigated a bug where someone had tried to split a string into space separated words using String.Split in C#. The bug was that the code failed to handle leading/trailing spaces and sequences of more than one space in the string:
var input = " 1 22 3333 ";
var words = input.Split(' ');
foreach (var word in words) Console.Write("[" + word + "] ");
// outputs [] [1] [22] [] [3333] [] []
Not a particularly interesting bug but investigating it did throw up a couple of mildly interesting points. I first thought I would try using Regex.Split but this does not handle the leading/trailing spaces:
var words = Regex.Split(input, " +"); // outputs [] [1] [22] [3333] []
Ignoring the fact I could just call String.Trim to fix this, I tried using Regex.Matches. The interesting point here is that this doesn't compile:
var words = from match in Regex.Matches(input, (@"[^ ]+")) select match.ToString();
This is because the class MatchCollection, the type returned from Regex.Matches, doesn't implement IEnumerable<T>, only implementing IEnumerable.The fix is to add a cast for the type returned from the MatchCollection enumeration:
var words = from Match match in Regex.Matches(input, @"[^ ]+") select match.ToString();
foreach (var word in words) Console.Write("[" + word + "] ");
// outputs [1] [22] [3333]
The other interesting point is that I noticed there is no ForEach extension in Linq. For example, this doesn't compile:
words.ForEach(word => Console.Write("[" + word + "] "));
I suppose this is the case because Linq is a functional style of programming and so its operators should be side-effect free. ForEach does not fit in with this, its whole purpose being the side-effects it is used for. If you want to use ForEach in this way you can implement your own ForEach extension:
public static class MyExtensions
{
public static void ForEach<T>(this IEnumerable<T> source, Action<T> action)
{
foreach (T item in source)
{
action(item);
}
}
}
Going back to the original problem, I discovered that String.Split takes an option which solves the problem:
var words = input.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var word in words) Console.Write("[" + word + "] ");
// outputs [1] [22] [3333]