June 2006 - Posts
Soma discusses the LINQ components, and the LINQ namesRead all about it here:
http://blogs.msdn.com/somasegar/archive/2006/06/21/641795.aspx
And no, I'm not changing all my previous LINQ blog items to reflect the new names.
Do it now, or do it laterToday, I’m going to discuss two important linq concepts: custom sequence operators and deferred vs. immediate execution.
First, lets consider the custom sequence operator. This method computes the Dot Product of two vectors:
public void Linq98() {
int[] vectorA = { 0, 2, 4, 5, 6 };
int[] vectorB = { 1, 3, 5, 7, 8 }; int dotProduct = vectorA.Combine(vectorB, (a, b) => a * b).Sum();
Console.WriteLine("Dot product: {0}", dotProduct);
}The standard LINQ deliverables do not contain a Combine method. But, the LINQ libraries are completely extensible. Just build your own:
public static class CustomSequenceOperators
{
public static IEnumerable<T> Combine<T>(
this IEnumerable<T> first,
IEnumerable<T> second,
Func<T, T, T> func) {
using (IEnumerator<T> e1 = first.GetEnumerator(), e2 = second.GetEnumerator()) {
while (e1.MoveNext() && e2.MoveNext()) {
yield return func(e1.Current, e2.Current);
}
}
}
}
There’s a lot going on here, so let’s look at it carefully.
The Combine method is a generic method with one type parameter. In this example, T will be an int, so you can mentally perform that substitution if it makes it easier.
Combine takes three parameters: two IEnumerable<T>, representing the two sequences to combine, and a Func<T,T,T>, which represents the function predicate. It will enumerate both sequences, and call the predicate using the Nth element from each sequence. The return value is the sequence containing the results of each call to the function predicate. To get the dot product, the first method simply sums all the results of the sequence.
The important lesson of this sample is that you can write your own extension methods (like Combine()) to provide extra capabilities that you need.
The next two methods demonstrate the difference between deferred execution (the default), and immediate execution (which you can request).
Look at these two methods (and note the highlighted difference):
public void Linq99() {
int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
int i = 0;
var q =
from n in numbers
select ++i;
foreach (var v in q) {
Console.WriteLine("v = {0}, i = {1}", v, i);
}
}
public void Linq100() {
int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
int i = 0;
var q =(
from n in numbers
select ++i )
.ToList();
foreach (var v in q) {
Console.WriteLine("v = {0}, i = {1}", v, i);
}
}
The first method produces this:
v = 1, i = 1
v = 2, i = 2
v = 3, i = 3
v = 4, i = 4
v = 5, i = 5
v = 6, i = 6
v = 7, i = 7
v = 8, i = 8
v = 9, i = 9
v = 10, i = 10
The second, this:
v = 1, i = 10
v = 2, i = 10
v = 3, i = 10
v = 4, i = 10
v = 5, i = 10
v = 6, i = 10
v = 7, i = 10
v = 8, i = 10
v = 9, i = 10
v = 10, i = 10
The difference is that queries are executed only when the user requests data from the query (in this case, the foreach loop at the bottom of the method). Only then, does the value of ‘i' get incremented. That’s why the loop produces the values 1-10 for the first method. In the second method, the call to ToList() creates a list that contains all the results of the query. Therefore, it executes the query. Hence, the value of i in the second method is 10, and does not change by enumerating the list.
Another example of this same behavior can be seen in this method, which reuses a query object after changing the underlying collection. The second iteration executes the query again, producing new results. Here’s the method:
public void Linq101() {
int[] numbers = new int[]
{ 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
var lowNumbers =
from n in numbers
where n <= 3
select n; Console.WriteLine("First run numbers <= 3:");
foreach (int n in lowNumbers) {
Console.WriteLine(n);
} for (int i = 0; i < 10; i++) {
numbers[i] = -numbers[i];
} Console.WriteLine("Second run numbers <= 3:");
foreach (int n in lowNumbers) {
Console.WriteLine(n);
}
}The results are:
First run numbers <= 3:
1
3
2
0
Second run numbers <= 3:
-5
-4
-1
-3
-9
-8
-6
-7
-2
0
Note how the results are based on the contents of the source collection when the query is executed, not when it's created.
In the next installment we’ll look at Join methods in LINQ.
Part 1The general Query Syntax
Part 2The one where I discuss Object and Collection Initializers
Part 3The one where I finish restriction operators
Part 4Beginning to discuss projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
Part 7Finishing up the projection items
Part 8Projection operators and extension methods
Part 9OrderBy, ThenBy, and Descending, oh my
Part 10Grouping operators and building nested groups
Part 11Set Operations, you bet
Part 12Conversions: caching collections
Part 13Where U at item, where U at?
Part 14Boolean tests on sequences
Part 15Aggregation operators: Sum, Product, Averages, and moe
Part 16Concatenation and EqualAll
Disagreeing with Ken Spencer, but I see his pointKen Spencer has written that he's very dissatisfied with Boston as a Tech Ed city location.
I have to disagree. I'm enjoying Boston quite a bit.I haven't experienced the traffic problems, because I've been walking everywhere. I find that a huge benefit, because I get some fresh air.
I find it takes a toll to spend all day in the conference center, the busses, and the hotel. The half hour walk from the Boston Gardens to the Convention center really helps start the day.
I did dislike Los Angeles because the traffic was bad, and the distance to and from Staples Center to anything made walking prohibitive.
In case you careI'll be spending time in The Learning Center if you have questions, showing up for a Meet the Authors autograph session, and spending some time at the Regional Director booth as well.
Here are the times and locations:
Monday 9:00 - 12:00: TLC Developer Discussion Area.
Monday 1:30 - 3:30: Regional Director booth.
Wed: 9:00 - 12:00: TLC Developer Discussion Area.
12:00 - 3:00: TLC Info Desk.
3:00 - 6:45: TLC Visual Studio Demo Station #2
Thursday: 9:00 - 12:00 TLC Visual Studio Demo Station #1.
Thursday 12:30 - 1:00: Meet the Authors at the Tech Ed Bookstore.
Thursday 3:00 - 6:00: TLC Developer Discussion Area.
Friday: 9:00 - 12:00 TCL Info Desk
Concatenation and EqualAllThis next installment is a grab back of miscellaneous methods: Concat, and EqualAll.
Concat() is pretty simple, it just builds a sequence that contains all the elements in both collections:
int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
int[] numbersB = { 1, 3, 5, 7, 8 };var allNumbers = numbersA.Concat(numbersB);
Console.WriteLine("All numbers from both arrays:");
foreach (var n in allNumbers) {
Console.WriteLine(n);Or, you can build a list from different sequences:
List<Customer> customers = GetCustomerList();
List<Product> products = GetProductList();
var customerNames =
from c in customers
select c.CompanyName;
var productNames =
from p in products
select p.ProductName;
var allNames = customerNames.Concat(productNames);
Console.WriteLine("Customer and product names:");
foreach (var n in allNames) {
Console.WriteLine(n);I lied a bit above, and that leads to the next big concept I’ll discuss in LINQ: deferred executation. The variables allNames and allNumbers are not sequences, or some other collection. Instead, both are instances of IEnumerable<T>, for some T. The original collections are not copied, but rather an iterator accesses all the elements in the first collection, followed by the second. The eventual magic method looks like this (from Sequence.cs, delivered with the LINQ preview):
static IEnumerable<T> ConcatIterator<T>(
IEnumerable<T> first, IEnumerable<T> second) {
foreach (T element in first) yield return element;
foreach (T element in second) yield return element;
}
Rocket science it ain’t, but it’s very useful.
The next miscellaneous method is EqualAll(). As its name implies, it compares two different sequences and returns true if elements in the sequences are all equal. The two sequences must be the same size, and have the same elements.
var wordsA = new string[] { "cherry", "apple", "blueberry" };
var wordsB = new string[] { "cherry", "apple", "blueberry" };
bool match = wordsA.EqualAll(wordsB); // returns truevar wordsA = new string[] { "cherry", "apple", "blueberry" };
var wordsB = new string[] { "apple", "blueberry", "cherry" };
bool match = wordsA.EqualAll(wordsB); // returns falseEarlier in this blog post, I gave you a hint of what will be coming next: Query Execution. LINQ provides mechanisms for you to demand immediate execution, or defer execution until you examine the results of a query.
Part 1The general Query Syntax
Part 2The one where I discuss Object and Collection Initializers
Part 3The one where I finish restriction operators
Part 4Beginning to discuss projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
Part 7Finishing up the projection items
Part 8Projection operators and extension methods
Part 9OrderBy, ThenBy, and Descending, oh my
Part 10Grouping operators, and building nested groups
Part 11Set Operations, You bet
Part 12Conversions: caching collections
Part 13Where U at item, where U at?
Part 14Boolean tests on sequences
Part 15Aggregation operators: Sum, Product, Averages and more
How can people make such bad decisions?Here in MI (Canton be exact), a new IKEA stored opened today.
I think that IKEA has a lot of fine merchandise, but some people have strange priorities and even stranger decision making skills.
IKEA offered a promo that the first 100 customers would get a free chair (a $79 dollar value). Some mental midgets began lining up on Sunday night. (yes, that's right. They lined up on Sunday for a store opening on Wednesday). So, that means a $79 dollar chair is worth sitting outside waiting for a store to open for Three Freakin' Days! I've got to believe that almost anyone with a job makes more than $80 in three days. Not to mention the weather (it rained here), and other assorted inconveniences that one must do without while in line (bio break, family, any hobbies).
Sorry, but a free chair just isn't worth.
But wait, some people were even more incredibly stupid.
This morning, the news announced that the line had grown to over 250 overnight. Yup, that's right. Some boneheads spent the night in the rain just so they could be too late in line to get the free chair.
These people demonstrated that they can't count, have no life, no clue.
Of course, as a marketing ploy, it's brilliant. Anyone in southeastern Michigan that has a TV, radio, or newspaper knows IKEA is opening a store. They also know that a lot of morons are waiting in line to see it open.
As for me, I'll probably make it to IKEA in July.
When 'Open' means 'Open if you're not Microsoft'Adobe has released the PDF document format as a standard (http://www.aiim.org/standards.asp?id=25013)
Now, Adobe has not released PDF as an 'open' format. Namely, they control the spec, but from their own words:
"I believe the Open vs Published is more than a semantic issue. The keys to remember are:
1) PDF is not Open Source. Adobe does not release the code to Acrobat or the Adobe Reader. The format itself is maintained by Adobe solely. Open Source software is generally maintained and added to by a community.
2) The PDF specification is a published specification. That is to say, Adobe shares the full technical underpinnings of the format.
3) The PDF specification is "open" to the extent that anyone can look it and-- if they are smart enough-- create good PDF using it.”
See:
http://blogs.adobe.com/acrolaw/2005/12/acrobat_and_pdf.html)
But, now Microsoft has introduced a "Save As PDF" command to the next version of Office. And Adobe complains. (http://news.com.com/2100-1012_3-6079320.html)
I'll make two observations:
1. Adobe is wrong. If anyone can create PDF documents (see above) then, anyone (including Microsoft) should be able to create PDF documents.
2. This will (in the end) work against Adobe. Microsoft has announced a competing open format (called XPS). (http://www.microsoft.com/whdc/xps/viewxps.mspx) If MS cannot work in the Adobe PDF format, they will throw all their marketing weight behind XPS. That will work to PDF's disadvantage. And, in the end, XPS will be the new, open, standard, document format.
Sum, Product, Averages and moreDanger, long post coming
The next (large) set of samples are the aggregation samples. The aggregation samples perform some calculation on the results of a query and return that single result. For example, this method counts the number of distinct numbers in the array containing the factors of 300:
int[] factorsOf300 = { 2, 2, 3, 5, 5 };
int uniqueFactors = factorsOf300.Distinct().Count();Remember from earlier that Distinct() removes any duplicates from a sequence. Count() returns the number of members in a sequence.
Count can take a predicate, so that it counts only the number of items that match a pattern:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
int oddNumbers = numbers.Count(n => n % 2 == 1);Count can be nested so that you can create new types that contain counts for particular conditions. This query creates a sequence of customers with the number of orders for that customer:
List<Customer> customers = GetCustomerList();
var orderCounts =
from c in customers
select new {c.CustomerID, OrderCount = c.Orders.Count()};
You can create groups and use the aggregate operators on the group:
List<Product> products = GetProductList();
var categoryCounts =
from p in products
group p by p.Category into g
select new {Category = g.Key, ProductCount = g.Count()};
Of course, counting is a rather elementary skill, and we expect more from our operators. There’s sum:
int[] numbers = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
double numSum = numbers.Sum();There’s Min:
string[] words = { "cherry", "apple", "blueberry" };
int shortestWord = words.Min(w => w.Length);There’s max:
List<Product> products = GetProductList();
var categories =
from p in products
group p by p.Category into g
select new {Category = g.Key,
MostExpensivePrice = g.Max(p => p.UnitPrice)};
There’s Average:
var categories =
from p in products
group p by p.Category into g
select new {Category = g.Key,
AveragePrice = g.Average(p => p.UnitPrice)};
All of those operators have several different versions: for integers, shorts, doubles, float, decimal. So, you can apply them to any different numeric type. Also, while I only showed one sample for each of those methods, everything you can do with Count can be done with all these methods as well.
Finally, there’s the Aggregate extension method.
This sample creates a produces the running product for the sequence of numbers:
double[] doubles = { 1.7, 2.3, 1.9, 4.1, 2.9 };
double product = doubles.Aggregate(
(runningProduct, nextFactor) => runningProduct * nextFactor);The Aggregate extension method takes the first number in the sequence (1.7), and uses it as the first argument (runningProduct). Then, the second number (2.3) is used as the second argument (nextFactor). The result (1.7 * 2.3) is stored back into runningProduct. Each successive number in the sequence will be used as nextFactor, until the sequence has completed. And, the final answer gets returned in the final result.
Sometimes, you want to seed the aggregation with a starting value. In that case, you use the other version of Aggregate() that takes a starting value:
double startBalance = 100.0;
int[] attemptedWithdrawals = { 20, 10, 40, 50, 10, 70, 30 };
double endBalance =
attemptedWithdrawals.Aggregate(startBalance,
(balance, nextWithdrawal) =>
( (nextWithdrawal <= balance) ? (balance - nextWithdrawal) : balance ) );
The operation is the same (in principal), but the initial value is seeded using the startBalance (100). Each successive withdrawal is made (if the account has sufficient funds).
The ‘glue’ that holds all the aggregation operators together is that they return a single value based on an examination or operation involving every element in a sequence.
Part 1The general query syntax
Part 2The one where I discuss Object and Collection Initializers
Part 3The one where I finish restriction operators
Part 4Beginning to discuss projections
Part 5Anonymous types and projections
Part 6Discussing indexed, filtered, and compound queries
Part 7Finishing up the projection items
Part 8Projection operators and extension methods
Part 9OrderBy, ThenBy, and Descending, oh my
Part 10Grouping operators, and building nested groups
Part 11Set Operations, You bet
Part 12Conversions: caching collections
Part 13Where U at item, where U at?
Part 14Boolean tests on sequences