Huddled Masses
The internet home of Joel "Jaykul" Bennett...
Browse: Home / Custom IComparers in PowerShell (and Add-Type for v1)

Custom IComparers in PowerShell (and Add-Type for v1)

By Joel 'Jaykul' Bennett on 16-Dec-2008

Someone asked the question in #PowerShell (on irc.Freenode.net):

How do I find an item (in an array) based on one of it’s properties?

Actually, the question was rather more complicated than that. They were importing a bunch of users from a csv file, and wanted to sort them and search them based on a specific column. There are many ways to skin this cat. Imagine that you have a CSV file, and have imported it, like so:


Set-Content users.csv @"
LastName, FirstName, UserName, Url
Bennett, Joel, Jaykul, http://HuddledMasses.org
Rottenberg, Hal, HalR9000, http://halr9000.com
Hicks, Jeffrey, SapienScripter, http://blog.sapien.com/
"
@

$users = Import-Csv .\users.csv

Now, imagine that the CSV file has thousands of users in it, and that you need to not only sort the data by first name or last name on demand, but you also need to pull users from the list (by name) on demand.

These are trivial tasks in PowerShell:


$users = $users | Sort Lastname, Firstname
# or
$users = $users | Sort Firstname, Lastname

return $users | Where { $_.Firstname -eq "Bob" } | Select -First 1
 

This works but will have some performance issues when the list gets too long, because your search algorithm is … well, we could say it’s O(n), or just say it’s slow, because you’re basically going through the whole list every time. And I do mean the WHOLE list. You were looking for “Bob” ... but the Where-Object cmdlet doesn’t know your data is sorted, and doesn’t know you only want one user, so it’s going to iterate the whole list, because Select-Object can’t signal it to stop.

There are several ways to solve this, but what it comes down to is that you need a (better) search algorithm than Where-Object can implement: simply put, you’d like to use a binary search, since the data is already sorted.

Binary Search

I’m not going to get into this much, but to put it simply: a binary search algorithm is pretty much the fastest search you can have, but it depends on the data being pre-sorted in order. It works by starting in the middle, and comparing the middle item to the item you are searching for. If what you’re searching for comes BEFORE that item, then it discards the second half of the items, and examines the middle item of the first half. In this way, it eliminates half the items each time, and returns the correct item (if it exists) in roughly (log n) steps.

So how do we do that in PowerShell?

Well, that’s easy: Just use the static method: [Array]::BinarySearch… The only problem is, in order for BinarySearch to work, the items have to be not only sorted, but comparable. Because we’re trying to compare PSObjects, you can’t just compare two items from the $users array and determine the correct order…

If you wanted to use [Array]::Sort you could use an overload that lets you pass in a second array (of comparable items) to serve as the keys for the non-comparable items, but this won’t work for BinarySearch:


[Array]::Sort( $(@($users|%{$_.Firstname+" "+$_.Lastname})),$users )

So you need what is an “IComparer.” That is, a class which implements a “Compare” method which can be used to determine the order of the objects. Sadly, this requires a custom type — something you can’t do in pure PowerShell script, believe it or not. PowerShell does not have the ability to create custom classes (which is how you can tell it’s not a real programming language), so when you need a custom type in PowerShell you have to drop to another .Net language like C# or Python.

In PowerShell 2.0 there is a cmdlet “Add-Type” which can take a string representing C# source-code, and compile it in memory for immediate use, but in PowerShell 1.0 you need to have a script version of it, which I called New-Type, to avoid confusion because it only handles one of Add-Type’s four usage models:

Once you have Add-Type, you can create a ScriptBlock-based implementation of IComparer, by passing this as a string:

New-Type @"


using System;
using System.Collections;
using System.Management.Automation;
   public sealed class ScriptBlockComparer : IComparer
   {
      ScriptBlock _Comparer;
      public ScriptBlockComparer(ScriptBlock Comparer) { ComparerScript = Comparer; }
      public ScriptBlock ComparerScript
      {
         get { return _Comparer; }
         set { _Comparer = value; }
      }
      public int Compare(object x, object y)
      {
         try {
            return (int)ComparerScript.Invoke(x, y)[0].BaseObject;
         } catch(Exception ex) {
            throw new InvalidOperationException("Comparer Script failed to return an integer!", ex);
         }
      }
   }

"@

So now you have everything you need, and you can write your custom comparer, and do binary search. The one thing you need to be aware of is that when you’re doing binary search, you’re going to want to pass in the thing you’re searching for (ie: the first name) not an Object with a FirstName property. So your binary search needs a case for when the second parameter is a string:


$cmp = New-Object ScriptBlockComparer {
Param($one, $two)
if($two -isnot [string]) { throw "This comparer expects to compare objects to strings" }
   $first,$last= $two.Split(" ")
   $ord = $one.Firstname.CompareTo($first)
   if($ord -eq 0 -and $last) { $ord = $one.LastName.CompareTo($last) }
   return $ord
}

# Now you can use this to search
$users[[Array]::BinarySearch( $users, "Joel", $cmp )]
# Or this
$users[[Array]::BinarySearch( $users, "Joel Bennett", $cmp )]
 

Now, you may not think this is a big deal, so let me just share with you the results of searching a collection of about 49000 users:

Duration(Sec)Commmand
6.74000$users = $users | Sort FirstName, LastName
6.68200$users | Where { $_.FirstName -eq “Joel” }
0.02900$users[[Array]::BinarySearch( $users, “Joel”, $cmp )

What you see here is two things:

  1. Searching for a single user using Where-Object takes pretty much as long as (re)sorting the whole array, because every single item must be checked.
  1. Using BinarySearch rocks :D

What you might not notice is that [Array]::BinarySearch returns an index, not the item … so it’s particularly useful if you need to find the first of something, etc. because you can just increment the result.

BinarySearch is not always faster: If your array is initially unsorted, the requirement to pre-sort the array costs basically the same as any speedup you might get for a single search. Also, because of the call to the script, if the array is very small, like the four users in the sample above, it’s actually faster to just loop through them (20ms versus 24ms on my PC). The BinarySearch starts paying off pretty quickly as you scale up the number of searches and records, but it does take both more records (the break-even point for just the search is about 10, in this contrived example), and more searches (if you have to eat the cost of the initial sort).

In any case, New-Type certainly has many other uses, and the ScriptBlockComparer can be used for other things as well … although I will say that as far as I can tell, sorting is almost always better performed with the sort cmdlet than with [Array]::Sort and an IComparer. ;)

Similar Posts:

  • RFC: Information in PoshCode Module Manifests
  • Solving the “failed to grant minimum permission requests” error
  • Better error messages for PowerShell ValidatePattern
  • More Custom Attributes for PowerShell (Parameter Transformation)
  • How to Import Binary Modules from Network Shares

Posted in Huddled | Tagged .Net, BinarySearch, PowerShell, Scripting, Searching

« Previous Next »

Lijit Search

Tags

.Net .Net 2008 Scripting Games Automation Bugs Design Development Funny Gadgets GeoShell GUI Huddled Masses Internet licensing Microsoft Modules My Software News Personal PInvoke Pipeline Politics PoshCode PoshConsole PowerBoots PowerShell PowerShell Functions PowerTips Rants Recommender Repository Scripting ShowUI Software Solutions Textile Tips User Group UserInterface WalkThrough WebHosting Windows 7 WordPress WPF Xml

About Huddled Masses

This is web site is dedicated to the musings of Joel Bennett (aka Jaykul) about technology, software, software development, the web, and the world.

Any resemblance of the views expressed and the views of my employer, my terminal, or the view out my window are purely coincidental. The resemblance between them and my own views is non-deterministic. The question of the existence of views in the absence of anyone to hold them is left as an exercise for the reader.

P.S.: I occasionally link to things I think are great. When I do, I occasionally find a "referral code" so I can make a little cash. I promise that I don't link to anything just because of that cash (I wouldn't cross the street for the amount of cash those links bring in, never mind write a whole blog post) ... but I do not promise that things I link to will stay great as time passes, nor that you will agree with me about their greatness!

Archives

  • January 2012
  • October 2011
  • August 2011
  • July 2011
  • June 2011
  • March 2011
  • February 2011
  • January 2011
  • November 2010
  • August 2010

Copyright © 2012 Joel Bennett.

Powered by WordPress and Hybrid.