Monday, May 07, 2007

Nesting Splitters

To recap on my previous post - C# provides the "yield" keyword that allows any function that returns an IEnumerable object to provide lazy iteration capability (a Custom Enumerator). This allows us to iterate across an array or list and filter elements, convert them to different types, aggregate them with another list etc.

My last post detailed how to "Split" an IEnumerable into two separate mutually exclusive enumerators. I.e. the equivalent of an if-else statement acting on each element.

At the end of the post I suggested that it would be great to output multiple enumerators i.e. the equivalent of if-then-else or switch type patterns. This may not be required as we can Nest Enumerators!

It goes like this:

inputList => split1 and split2 using our first test/predicate
split1 => split3 and split4 using our second test/predicate on the results of split1

We can then split the results add infinitum - creating a Binary Tree Splitter. At each node in the tree we queue the results of the "lagging" enumerator until it catches up. What is the benefit?

We can split out multiple enumerators and process each of them differently.

Wednesday, May 02, 2007

Splitting an IEnumerable into Multiple Custom Iterators

Bill Wagners MSDN column on custom iterators is excellent. It demonstrates clearly how to create a custom iterator so that IEnumerable iterators can iterate through collections in a lazy fashion. That is only processing the elements in the collection that are required.

As soon as it becomes evident that custom iterators take an IEnumerable as an input and output an IEnumerable it also becomes evident that we can "pipeline" custom iterators.

For Example:


  1. Start with a Collection of People
  2. Filter out all people with surname Grant
  3. Filter out all people with first name Mark
  4. Display List of Results on screen

This is a simple pipeline like so: 1 -> 2 -> 3 -> 4

We can also Transform iterators into different types and we can Merge or Aggregate multiple iterators into one be iterating over each in turn. This got me thinking is it possible to split an iterator into mutiple iterators.

By splitting an iterator it allows us to process the different sets of results differently. As an example:

We have an iterator that recurses a file structure. We need to split out two iterators -one for files and one for directories.

Out input iterator is of type IEnumerable and splits into a directoryIterator and fileIterator of type IEnumerable. These can then be Transformed into an IEnumerable and an IEnumerable respectively.

Having asked Bill if he could blog a solution I had some inspiration! Here is a test case for the solution:


using MarkGrant.Collections;

using NUnit.Framework;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;

namespace MarkGrantTest
{

/// <summary>
///
/// </summary>
/// <originalAuthor>mark.grant</originalAuthor>
/// <creationDate>01/05/2007 23:58:19</creationDate>
[TestFixture]
public class TestSplitter
{
[Test]
public void Test()
{
List<int> list = new List<int>();
for (int i = 0; i < 20; i++)
{
list.Add(i);
}

// intialize our splitter
Splitter<int> splitter = new Splitter<int>(list, delegate(int number) { return number % 3 == 0; });
IEnumerable<int> factor3 = splitter.GetMatchingElements();
IEnumerable<int> nonFactor3 = splitter.GetNonMatchingElements();

foreach (int factor in factor3)
{
Assert.IsTrue(factor % 3 == 0);
}

foreach (int nonFactor in nonFactor3)
{
Assert.IsTrue(nonFactor % 3 == 1 nonFactor % 3 == 2);
}
}
}
}


The test creates a Splitter class that Matches multiples of 3 and non-multiples. The IEnumerables can be called in any order and can overlap. As we don't know which iterator will be called the splitter will need to record the current state of the two iterators - i.e. which elements are left, which iterator they are related too etc. This can be achieved by queuing elements that are requested but belong to the other iterator.

The Splitter class:


using System;
using System.Collections.Generic;
using System.Text;

namespace MarkGrant.Collections
{
/// <summary>
/// Split an input enumerable into two mutually exclusive enumerables
/// </summary>
/// <typeparam name="T"></typeparam>
public class Splitter<T>
{

/// <summary>
/// Are we queuing the results from the first enumerator or the second - it
/// depends on which one is ahead
/// </summary>
private bool _queuingFirstIterator;

/// <summary>
/// We Queue one Enumerator unless the two are perfectly synchronised
/// </summary>
private Queue<T> _unyieldedElements = new Queue<T>();

/// <summary>
/// Our filter
/// </summary>
private Predicate<T> _predicate;

/// <summary>
/// Our input enumerator
/// </summary>
private IEnumerator<T> _inputEnumerator;

/// <summary>
/// Construct our Splitter with the input enumerator and the filter
/// </summary>
/// <param name="inputEnumerable">the input enumerator</param>
/// <param name="predicate">our filter</param>
public Splitter(IEnumerable<T> inputEnumerable, Predicate<T> predicate)
{
if (inputEnumerable == null)
{
throw new ArgumentNullException("inputEnumerable");
}

if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
_inputEnumerator = inputEnumerable.GetEnumerator();
_predicate = predicate;
}

/// <summary>
/// All elements that match the predicate
/// </summary>
/// <returns></returns>
public IEnumerable<T> GetMatchingElements()
{
return GetNextElement(true);
}

/// <summary>
/// All elements that don't match the predicate
/// </summary>
/// <returns></returns>
public IEnumerable<T> GetNonMatchingElements()
{
return GetNextElement(false);
}

/// <summary>
/// Returns the next element for our enumerator
/// </summary>
/// <param name="isMatching"></param>
/// <returns></returns>
private IEnumerable<T> GetNextElement(bool isMatching)
{
while (true)
{
// if we have an element in the queue for this iterator then
// yield it otherwise next element in the inputEnumerator is
// for this enumerable then yield it otherwise add it to the queue
// break when no more elements remain in either
if (_queuingFirstIterator == isMatching && _unyieldedElements.Count > 0)
{
yield return _unyieldedElements.Dequeue();
}
else if (_inputEnumerator.MoveNext())
{
if (_predicate(_inputEnumerator.Current) == isMatching)
{
yield return _inputEnumerator.Current;
}
else
{
_unyieldedElements.Enqueue(_inputEnumerator.Current);

_queuingFirstIterator = !isMatching;
}
}
else
{
break;
}
}
}

}
}





The Splitter class is initialized with the input IEnumerable and the Predicate that is used to filter the elements into the Matching and NonMatching itereators. As we enumerate through the input iterator, we keep iterating until we find an element that matches the one we are after, all the elements that we don't want we queue and a flag is set to record which iterator is "lagging" behind and therefore requires its elements to be queued.

We could for example alternate the requesting of results for either iterator. For example when matching factors of 3 starting with the iterator that matches 3, 6, 9 etc and an input list of 1,2,3,4,5,6 we get, where M is the Matching iterator (i.e. anything which is a factor of 3):

  1. M - Queue (1)
  2. Yield (1)
  3. M - Queue (2)
  4. Yield (2)
  5. M - Yield (3)
  6. Yield (4)
  7. M - Queue (5)
  8. Yield (5)
  9. M - Yield (6)

If anyone fancies improving the code to include multiple predicates, please let me know.