March 2006 - Posts
OrderBy, ThenBy, and Descending. Oh myThe last installment discussed how some of the extension methods delivered with the preview extract portions of a collection (either the first elements, or the last elements. That’s cool, but it’s rather pointless without the ability to order the elements based on some criteria you choose. So, the next samples in the preview discuss ordering operators.
First, a parenthetical side note. In the first few blog entries where I discussed LINQ, I was very deliberate about how I went over the code. I dissected every line of code. I explained every language feature. The next few entries were a bit less detailed. I still covered every sample in full, but I assumed a bit more background (both for me, and the reader). Well, now we reach the next level. If you’ve been following along, you’ve seen enough code that you should be familiar with the basic LINQ syntax. If not, see the links at the bottom of the page.
LINQ provides the orderby contextual keyword to order a collection:
string[] words = { "cherry", "apple", "blueberry" };
var sortedWords =
from w in words
orderby w
select w;The collection sortedWords will appear in alphabetical order. The default ordering for strings is based on alphabetical order. Pretty simple, right?
But, of course, you’ll often want to order a collection using other properties of the collection. And, LINQ supports a number of ways to do that. First, you can specify a different property of the collection elements as the ordering field:
string[] words = { "cherry", "apple", "blueberry" };
var sortedWords =
from w in words
orderby w.Length
select w;Or:
List<Product> products = GetProductList();
var sortedProducts =
from p in products
orderby p.ProductName
select p;
You’ve seen that you can order a collection using the default order for any of the fields in the objects that make up the collection. But, of course, this isn’t the only ordering you’ll need. Suppose you need to order a set of strings without concern for case. You can write a comparer, and use that for your ordering relation:
public class CaseInsensitiveComparer : IComparer<string>
{
public int Compare(string x, string y)
{ return string.Compare(x, y, true); }
}
string[] words = { "aPPLE", "AbAcUs", "bRaNcH",
"BlUeBeRrY", "ClOvEr", "cHeRry"};
var sortedWords = words.OrderBy(a => a,
new CaseInsensitiveComparer());Note that the syntax above is a bit different. Rather than using the contextual orderby keyword, you explicitly call the OrderBy extension method.
And, clearly, descending order is important:
double[] doubles = { 1.7, 2.3, 1.9, 4.1, 2.9 };
var sortedDoubles =
from d in doubles
orderby d descending
select d;If you can order collections of builtin types using the descending keyword, you can order collections of user defined types the same way:
List<Product> products = GetProductList();
var sortedProducts =
from p in products
orderby p.UnitsInStock descending
select p;
There’s nothing magic about any of the individual ordering clauses. You can combine the descending order with a custom comparer:
string[] words = { "aPPLE", "AbAcUs", "bRaNcH",
"BlUeBeRrY", "ClOvEr", "cHeRry"};
var sortedWords = words.OrderByDescending(a => a,
new CaseInsensitiveComparer());Of course, you’ll often want to combine orderby collections in order to create primary and secondary orderings:
string[] digits = { "zero", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine" };var sortedDigits =
from d in digits
orderby d.Length, d
select d;
Those multiple orderby clauses are implemented using the OrderBy and the ThenBy extension methods, as shown below:
string[] words = { "aPPLE", "AbAcUs", "bRaNcH",
eBeRrY", "ClOvEr", "cHeRry"};
var sortedWords =
words.OrderBy(a => a.Length)
.ThenBy(a => a, new CaseInsensitiveComparer());And, combining multiple can also be combined with the descending keyword, which can be applied individually to each ordering:
List<Product> products = GetProductList();
var sortedProducts =
from p in products
orderby p.Category, p.UnitPrice descending
select p;
Setting a descending order is accomplished by a different extension method, OrderByDescending, or ThenByDescending:
string[] words = { "aPPLE", "AbAcUs", "bRaNcH",
"BlUeBeRrY", "ClOvEr", "cHeRry"};
var sortedWords =
words.OrderBy(a => a.Length)
.ThenByDescending(a => a, new CaseInsensitiveComparer());Finally, you should be aware of the Reverse extension method. Reversing a collection is more efficient than performing a full sort, so if you know you have the collection sorted, but backwards, that’s the right choice:
string[] digits = { "zero", "one", "two", "three", "four",
"five", "six", "seven", "eight", "nine" };
var reversedIDigits = (
from d in digits
where d[1] == 'i'
select d)
.Reverse();In this entry, we saw that LINQ provides a set of operators to order results. Next time, we’ll cover the grouping operators.
Part 1The general query syntax
Part 2The one where I discuss Object and Collection initializers
Part 3The one where I finish restriction operators
Part 4Beginning to discuss projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
Part 7Finishing up the projection items
Part 8Projection Operators and Extension methods
Ummm, or really ticking off pizza hutThis was forwarded through a few people, and it's a bit old, but I had too laugh.
I can't decide if all that work is worth it, but it is a great money saver.
Maximizing ROI at Pizza HutOr, getting the most for your money
Projection Operators and extension methodsThis installment give you a taste of extension methods. We’ll look at Partitioning Operations, which enable you to retrieve parts of a collection based on an items position in the collection. This functionality is defined using extension methods, which means that the methods can be used with any collection. Or, more correctly, these methods can be used with any class that implements IEnumerable<T>.
As I’ve been doing, we’ll walk through the pertinent sample code and see what new features these next samples show.
The first few samples show Take, which limits the number of returned values. These two methods show how to retrieve the first 3 elements to an array, and a nested query:
public void Linq20() {
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var first3Numbers = numbers.Take(3);
Console.WriteLine("First 3 numbers:");
foreach (var n in first3Numbers) {
Console.WriteLine(n);
}
}public void Linq21() {
List<Customer> customers = GetCustomerList(); var first3WAOrders = (
from c in customers,
o in c.Orders
where c.Region == "WA"
select new {c.CustomerID, o.OrderID, o.OrderDate} )
.Take(3);
Console.WriteLine("First 3 orders in WA:");
foreach (var order in first3WAOrders) {
ObjectDumper.Write(order);
}
}The important lesson of these two samples is not that you can retrieve a small number of items, but rather that by writing an extension method, you can create functionality that can be reused in a large number of ways. The Take method is in Sequence.cs, which is delivered with the LINQ preview:
public static IEnumerable<T> Take<T>
(this IEnumerable<T> source, int count) {
if (count > 0) {
foreach (T element in source) {
yield return element;
if (--count == 0) break;
}
}
}
Retrieving N items from a collection is incredibly simple. So, let’s skip forward and look at the two samples that use TakeWhile. TakeWhile continues to return elements as long as a test condition is true. For example, this sample returns the numbers in the array as long as all values encountered are less than 6:
public void Linq24() {
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var firstNumbersLessThan6 = numbers.TakeWhile(n => n < 6);
Console.WriteLine("First numbers less than 6:");
foreach (var n in firstNumbersLessThan6) {
Console.WriteLine(n);
}
}The output is: 5,4,1,3. The 9 doesn’t pass the test, and therefore terminates the output. A more complicated example shows how to use a more interesting condition. The copy and paste demons were in play here, because the variable name (firstSmallNumbers) below doesn’t describe what it’s doing really well. This method will continue to return elements as long as the number returned is greater than or equal to its position.
public void Linq25() {
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 }; var firstSmallNumbers = numbers.TakeWhile((n, index) => n >= index);
Console.WriteLine("First numbers not less than their position:");
foreach (var n in firstSmallNumbers) {
Console.WriteLine(n);
}
}The output is 5,4. The 1 is less then its position in the array (2), so the iteration stops. As with Take(), TakeWhile() is an extension method that is delivered in Sequence.cs. There are two versions of TakeWhile(). The first examines some property of an element. The second examines some property of the element, and its position in the collection:
public static IEnumerable<T> TakeWhile<T>
(this IEnumerable<T> source, Func<T, bool> predicate) {
foreach (T element in source) {
if (!predicate(element)) break;
yield return element;
}
}
public static IEnumerable<T> TakeWhile<T>
(this IEnumerable<T> source, Func<T, int, bool> predicate) {
int index = 0;
foreach (T element in source) {
if (!predicate(element, index)) break;
yield return element;
index++;
}
}
Take and TakeWhile provide the means for you to examine the first part of a collection. Skip and SkipWhile are the way you access the latter parts of a collection. Skip() jumps past the first N elements in a collection. (I’m just showing some excerpts from the samples now, rather than the entire collection.)
// Skip the first 4 numbers in an array:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var allButFirst4Numbers = numbers.Skip(4);
// Skip the first 2 elements in the collection returned from a query:
List<Customer> customers = GetCustomerList();
var waOrders =
from c in customers,
o in c.Orders
where c.Region == "WA"
select new {c.CustomerID, o.OrderID, o.OrderDate};
var allButFirst2Orders = waOrders.Skip(2);
Just like Take(), Skip() is an extension method. By creating an extension method, you can apply Skip() to any type that implements IEnumerable<T>:
public static IEnumerable<T> Skip<T>
(this IEnumerable<T> source, int count) {
using (IEnumerator<T> e = source.GetEnumerator()) {
while (count > 0 && e.MoveNext()) count--;
if (count <= 0) {
while (e.MoveNext()) yield return e.Current;
}
}
}
Finally, in the same way that Take() is complemented by TakeWhile(), Skip() is complemented by SkipWhile():
// Skip until the returned value is divisible by 3:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var allButFirst3Numbers = numbers.SkipWhile(n => n % 3 != 0);
// allButFirst3Numbers contains 3,9,8,6,7,2,0
// Skip until the number is less than its position in the array:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var laterNumbers = numbers.SkipWhile((n, index) => n >= index);
// laterNumbers contains 1,3,9,8,6,7,2,0
One more time, repeat after me: This is an extension method so SkipWhile() can be used with any type that implements IEnumerable<T>. (And, like TakeWhile(), there are two versions. The first examines the element, and the second the element and its position in the array
public static IEnumerable<T> SkipWhile<T>
(this IEnumerable<T> source, Func<T, bool> predicate) {
bool yielding = false;
foreach (T element in source) {
if (!yielding && !predicate(element)) yielding = true;
if (yielding) yield return element;
}
}
public static IEnumerable<T> SkipWhile<T>
(this IEnumerable<T> source, Func<T, int, bool> predicate) {
int index = 0;
bool yielding = false;
foreach (T element in source) {
if (!yielding && !predicate(element, index)) yielding = true;
if (yielding) yield return element;
index++;
}
}
Well, those are the sample partition operators. Later this week, we’ll move on with more samples.
Part 1The general query syntax
Part 2The one where I discuss Object and Collection Initializers
Part 3The one where I finish restriction operators
Part 4Beginning to discuss Projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
Part 7Finishing up the projection items
More on a recent question from an Effective C# readerAbout a month ago, I wrote this entry discussing the performance of GetHashCode() for different types.
Now, Vipul Patel, another C# MVP has written an excellent treatise on why it matters for the C# FAQ blog:
How can I speed up Hashtable lookups?Why the code generated matters
What happens when intelligent people mis-use dataIt seems Dr. Richard Grimes is at it again. He published this whitepaper asserting that Microsoft is running from .NET with it's Vista release.
Of course, it produced the expected result on slashdot.
OK, repeat after me: "It's an operating system, not an application."
I could join in the mudslinging, and say that Sun must be running from Java because Solaris is built in Java. But I'd be wrong, so we'll just leave that as an analogy.
Instead, I'll point out numerous locations where Microsoft is investing in .NET to produce important (or even flagship) products:
- Visual Studio 2005: 7.5 million lines
- SQL Server 2005: 3 million lines
- BizTalk Server: 2 million lines
- Visual Studio Team System: 1.7 million lines
These products have hundreds of thousands of lines of managed code:
- Windows Presentation Foundation (Indigo): 900K lines
- Windows Sharepoint Services: 750K lines
- Expression Interactive Designer: 250K lines
- Sharepoint Portal Server: 200K lines
- Content Management Server: 100K lines
So, there's no doubt that Microsoft is investing in .NET, both for their own customers, and in their own products.
Dr. Grimes' facts on managed code delivered with Vista may be right, but his conclusions are just plain wrong.
Somedays, I think I work with -3 level organizationsThis was written quite a while ago, but is somehow apropos for a conversation I head recently about group dynamics and organizational structure.
The Capability Im-Maturity ModelCIMM
The second unsolvable code cipher solved in less than a monthI'm a couple days behind in reading and posting, but earlier this week, the same group of computer enthusiasts broke yet another 'unbreakable' WW II era coded message.
Somehow it seems fitting that they solved it on PI day.
The first unsolvable cipher announcementMy entry about the first cipher
BBC: Enigma project cracks second codeYet another enigma code broken
Finishing up the Projection itemsContinuing the investigation of LINQ, this entry discusses the last two projection samples. Both illustrate a very important run-time feature of the way LINQ works. Namely, LINQ works by creating an expression tree that executes a chain of queries. So, that means you can create a completed chain of operations to execute knowing that you won’t create collection for each interim step.
public void Linq18() {
List<Customer> customers = GetCustomerList(); DateTime cutoffDate = new DateTime(1997, 1, 1);
var orders =
from c in customers
where c.Region == "WA"
from o in c.Orders
where o.OrderDate >= cutoffDate
select new {c.CustomerID, o.OrderID};
ObjectDumper.Write(orders);
}
Notice that this sample uses multiple from clauses. And, the first contains a where filter. The output from the first clause provides the input to the second, and therefore, only those customers that match the first clause are passed to the second, where their orders are processed.
I’ll warn you right now, walking through multiple select queries can be very painful in the VS 2005 debugger. The current debugger doesn’t know enough about LINQ to handle it very well. But if you try, you’ll see a few cool things that explain my point about when queries are evaluated.
Orders is a System.Query.Sequence.SelectMany<Customer, <Projection>f__ed>. And, when you first execute the query, its internal fields (selector and source) are null. It’s only after go into the ObjectDumper.Write() method that the query gets executed, and these fields have real values.
The Selector is an object that determines what to generate: it’s a System.Query.Func<Customer, IEnumerable< <Projection>f__ed >. Embedded inside that examines a customer, and if the customer matches the where clause, it then calls another embedded Func delegate to process the orders for a particular customer.
That key point is worth repeating: It’s a chain that processes each object in the sequence, rather than generating the first result, processing it, and then generating the second result, etc.
Managing Order of interim results
The next sample shows another way to chain results from one select into another:
public void Linq19() {
List<Customer> customers = GetCustomerList(); var customerOrders =
customers.SelectMany(
(cust, custIndex) =>
cust.Orders.Select(o => "Customer #" + (custIndex + 1) +
" has an order with OrderID " + o.OrderID) );
ObjectDumper.Write(customerOrders);
}
Okay, this works similar to the first sample in this entry. The syntax is different, and instead of creating a sequence of anonymous objects, it creates a sequence of strings.
There’s a few new items here, so we’ll go through it in sequence:
SelectMany( …) creates a sequence of items.
(cust, custIndex) =>
says to select the customer object and the customer index from each customer.
Cust.Orders.Select( … )
Generates the sequence of strings from the orders for a given customer.
The:
o => “ … “
generates the series of strings, containing the customer ID and the order ID.
Once again, the customerOrders variable has nulls for the sequence until such time as the ObjectDumper starts enumerating the results.
That finishes the projection queries. Stay tuned for more.
Part 1The general query syntax
Part 2The one where I discuss Object and Collection Initializers
Part 3The one where I finish the restriction operators
Part 4Beginning to discuss projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
This should be coming to a startup near us (I hope)Michigan's state government has created several new initiatives aimed at growing the nascent tech economy in our state.
Well, Microsoft recently updated their Startup zone with quite a few resources for startup software companies (ISVs). (http://microsoftstartupzone.com/default.aspx)
I knew about this program a few months back, after chatting with John Nogrady at PDC last fall. John focuses on ISVs working in the consumer space. Other members of his team are dedicated to other vertical industries: Health Care, Line of Business application, Online Content, Collaboration, etc. (If you’re in the collaboration space, Don Dodge’s blog: http://dondodge.typepad.com/the_next_big_thing/ is a very useful read.)
The main landing page will tell you about the resources Microsoft has dedicated to helping startup companies leverage Microsoft’s platform to grow their business. Before you start laughing, understand that Microsoft has an ulterior motive: they want tomorrow’s successful software companies developing applications on .NET, instead of their competitor’s platforms. They’ll help you succeed if you’re helping to make Windows the platform for tomorrow’s apps.
And they do help. There are dedicated resources for different verticals. The Emerging Business Team partners with national VCs to help connect emerging businesses with capital resources. And, more than just capital, you can access Microsoft’s library of software research technology. See here: http://www.microsoft.com/mscorp/ip/ventures/ and here: http://www.microsoft.com/mscorp/ip/ventures/technologies.asp Don Dodge (referenced above) wrote a great introduction to the program last fall: http://dondodge.typepad.com/the_next_big_thing/2005/09/microsoft_resea.html
With the emphasis on diversifying Michigan’s economy in the face of the auto industry’s current challenges, Michigan’s entrepreneurial and capital communities should be looking into these resources. Because, if you aren’t, you’re competitors are. Would you rather partner with Microsoft, or compete against them?
Discussing indexed, filtered, and compound queries
This installment discusses the indexed, filtered, and compound projection
queries. Index Queries create a new collection based on the index, and the
object referenced. Filtered queries create a set of new objects that pass some
filter (or where clause). Finally, compound queries create a new type based on
nested, or joined, collections.
Indexed Queries
LINQ 12 examines an array of numbers and creates a collection of objects
where the number in the array is the same as its index in that array. It makes
use of the anonymous types I discussed in my last LINQ blog entry. Here’s the
code:
public void Linq12() {
int[] numbers = { 5, 4, 1, 3, 9, 8,
6, 7, 2, 0 }; var numsInPlace = numbers.Select((num, index) =>
new {Num = num, InPlace = (num == index)});
Console.WriteLine("Number: In-place?");
foreach (var n in numsInPlace) {
Console.WriteLine("{0}: {1}", n.Num, n.InPlace);
}
}
This query projects a new object containing the number, and a Boolean
describing whether or not the number matches its index. There’s not a lot of new
syntax above. Notice that the select examines the object, and its index in the
collection (two integers), and the output is the new anonymous type. That’s an
indexed query, in a nutshell. It creates a new collection based on the contents
of the collection, and the index of an individual item in the collection.
On to Filtered Queries.
A filtered query creates a new collection based on a set of criteria in an
existing collection. LINQ13 generates a new collection of strings from a list of
indices into the digits array.
Here’s the code:
public void Linq13() {
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
string[] digits = { "zero", "one", "two", "three",
"four", "five", "six", "seven", "eight", "nine" }; var lowNums =
from n in numbers
where n < 5
select digits[n];
Console.WriteLine("Numbers < 5:");
foreach (var num in lowNums) {
Console.WriteLine(num);
}
}
So, parsing through it a bit:
from n in numbers
where n < 5
Iterates all the values in the numbers array that are less than 5.
This statement:
select digits[n];
Returns the string from the nth location in the digits array of
strings.
The result:
Numbers < 5:
four
one
three
two
zero
The first Compound Query
Compound queries work on a more than one sequence.
The first compound query examines two arrays of numbers and returns every
pair {a,b} where a < b.
Today, you could write this code as follows:
int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
int[] numbersB = { 1, 3, 5, 7, 8 };
foreach ( int a in numbersA )
foreach ( int b in numbersB )
if ( a < b )
{
// add a,b to the collection
}
LINQ’s query capabilities make this simpler:
public void Linq14() {
int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
int[] numbersB = { 1, 3, 5, 7, 8 }; var pairs =
from a in numbersA,
b in numbersB
where a < b
select new {a, b};
Console.WriteLine("Pairs where a < b:");
foreach (var pair in pairs) {
Console.WriteLine("{0} is less than {1}",
pair.a, pair.b);
}
}
This line:
from a in numbersA,
b in numbersB
creates the nested loop I showed above.
where a < b
is the test.
And, this:
select new {a, b};
returns the new anonymous type (a pair of two numbers a,b).
The output is as follows:
Pairs where a < b:
0 is less than 1
0 is less than 3
0 is less
than 5
0 is less than 7
0 is less than 8
2 is less than 3
2 is less
than 5
2 is less than 7
2 is less than 8
4 is less than 5
4 is less
than 7
4 is less than 8
5 is less than 7
5 is less than 8
6 is less
than 7
6 is less than 8
The next compound query builds the list of all orders (from all customers)
that are less than 500.00. It iterates across all orders in all customers,
and only where the total is less than 500.00.
The output of the query is the customer ID, the order ID, and the total.
public void Linq15() {
List<Customer> customers = GetCustomerList(); var orders =
from c in customers,
o in c.Orders
where o.Total < 500.00M
select new {c.CustomerID, o.OrderID, o.Total};
ObjectDumper.Write(orders);
}
In case you are wondering, the ObjectDumper.Write() method uses reflection to
print out an object’s public properties, including iterating any contained
collection.
The next query is very similar. It returns all orders that are newer than Jan
1, 1998:
public void Linq16() {
List<Customer> customers = GetCustomerList(); var orders =
from c in customers,
o in c.Orders
where o.OrderDate >= new DateTime(1998, 1, 1)
select new {c.CustomerID, o.OrderID, o.OrderDate};
ObjectDumper.Write(orders);
}
Finally, if you look at both queries above, they request one of the values
twice. LINQ15 requested the total twice, LINQ16 requested the date twice. So,
LINQ 17 shows how to modify LINQ15 to cache the order total and avoid the extra
evaluation. Notice that the from clause initializes a variable, total, to the
value of the o.Total source of the query:
public void Linq17() {
List<Customer> customers = GetCustomerList(); var orders =
from c in customers,
o in c.Orders,
total = o.Total
where total >= 2000.0M
select new {c.CustomerID, o.OrderID, total};
ObjectDumper.Write(orders);
}
Well, this entry ended up a lot longer than I’d originally hoped. But,
it should have given you a good feel for executing queries across multiple
collections.
Part 1The general query syntax
Part 2Object and Collection Initializers
Part 3Restriction Operators
Part 4Beginning Projections
Part 5Anonymous Types and Projections
Newer technology solves what was once "unsolvable"In WWII, the Germans had invented a cipher machine called 'Enigma', that had (for its time), a complex set of machinery that enabled very hard-to-crack codes. It took the best efforts of folks like Alan Turing and John Nash to break many of cipher texts transmitted using this machine.
In order to speed the decipher tasks, they built a mechanical computer, called the bombe, to assist in the task.
Even so, some messages never were correctly deciphered during wartime. No one could find the right settings for the deciphering algorithms. Three were published in a cryptography journal in 1955 as a puzzle, and one has just been solved by a network of amateurs using a peer-to-peer network of personal computers.
Cool.
And, it makes you wonder when the current 'unsolvable' cryptography algorithms will fall to some new technology.
BBC News: Online amateurs crack nazi codesThe first of three unsolved WWII ciphers has been cracked
The Code Book: Simon SinghThis is a good overview of the history of cryptography