Bill Blogs in C#

Bill Wagner discusses C#, LINQ, and other items of interest

September 2006 - Posts

Joining Dianne in the Echo Chamber
Dianne Marsh wrote a great post on a number of upcoming user group / technology presentations in Ann Arbor next week. 

Dianne Marsh's announcement of our local activities
I really don't have anything to add
Posted by wwagner | with no comments
It's saner, more compact, and more clear

When we last left our heroes, we looked at Joins.  It’s been a while since I’ve posted some serious technical content. There’s been a lot going on internally in our company, and I’ve finally gotten the time to put some stuff together.

Also, the LINQ to SQL demos, while interesting, are not as instructive once you know the syntax from the core LINQ samples. The key point of LINQ to SQL is the translation from C# (or VB.NET syntax) into SQL. Once you’ve seen that, examining query after query is less instructive.

But, there are still things to learn.

In this post, I’m going to discuss some of the more interesting OrderBy and Group By methods.  They show a few bits of LINQ syntax that we haven’t seen before.

var categories =
    from p in db.Products
    group p by p.CategoryID into g
    orderby g.Key
    select new {
        g.Key,
        MostExpensiveProducts =
            from p2 in g
            where p2.UnitPrice == g.Max(p3 => p3.UnitPrice)
            select p2
    };

This builds a data structure that shows you the most expensive product (or products) in each category.  Let’s look at it line by line again:

from p in db.Products // list all the products (as p).
group p by p.CategoryID into g // each ‘g’ is a group of products containing the same product ID.
orderby g.Key // In increasing Key order.
select new { // Let’s build a new dictionary type.
g.Key, // containing the category key.
MostExpensiveProducts = // and the most expensive products.
from p2 in g // which are a list of the products in this group
where p2.UnitPrice == g.Max(p3 => p3.UnitPrice)
// where the Unit price is the maximum price in the category.
select p2 // select the entire product record.

Or, for the SQL weenie out there, you could write this:

SELECT [t1].[CategoryID], (
    SELECT COUNT(*)
    FROM [Products] AS [t2]
    WHERE ([t2].[UnitPrice] = [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
    ) AS [MostExpensiveProducts]
FROM (
    SELECT MAX([t0].[UnitPrice]) AS [value],
[t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
ORDER BY [t1].[CategoryID]
SELECT [t2].[ProductID], [t2].[ProductName], 
[t2].[SupplierID], [t2].[CategoryID],
[t2].[QuantityPerUnit], [t2].[UnitPrice],
[t2].[UnitsInStock], [t2].[UnitsOnOrder],
[t2].[ReorderLevel], [t2].[Discontinued]
FROM (
    SELECT MAX([t0].[UnitPrice]) AS [value],
[t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
CROSS JOIN [Products] AS [t2]
WHERE ([t2].[UnitPrice] = [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
ORDER BY [t1].[CategoryID], [t2].[ProductID]

Every time I see this, I’m more and more impressed by the power of LINQ 2 SQL, and the new C# language syntax.

Moving on, there are a few items in the GroupBy and Having section that are worth covering.

This little ditty shows how LINQ 2 SQL translates a C# standard query operator into T-SQL:

var q =
    from p in db.Products
    group p by p.CategoryID into g
    where g.Count() >= 10
    select new {
        g.Key,
        ProductCount = g.Count()
    };

The where clause (where g.Count() >= 10) finds all  product categories that have more than 10 products, and returns a pair structure that contains the key and the product count.

You can create a new anonymous type to hold the key, and create keys that are a composition of multiple columns:

var categories =
    from p in db.Products
    group p by new { p.CategoryID, p.SupplierID } into g
    select new {g.Key, g};

Let’s look at this one line by line again.

from p in db.Products // iterate all the products.
group p by new { p.CategoryID, p.SupplierID } into g // Group them by category ID and Supplier ID pair.
select new {g.Key, g}; // Create a new type.
// g.Key contains { CategoryID, SupplierID }, g contains the full list of matching product records.

Finally, you can create groups by performing computations on the columns. This sample creates a single group that contains all the products where the unit price is greater than 10:

var categories =
    from p in db.Products
    group p by new { Criterion = p.UnitPrice > 10 } into g
    select g;

I’m amazed at the T-SQL this generates:

SELECT [t2].[value] AS [Criterion], (
    SELECT COUNT(*)
    FROM [Products] AS [t3]
    WHERE [t2].[value] = (
        (CASE
            WHEN [t3].[UnitPrice] > @p1 THEN 1
             ELSE 0
         END))
    ) AS [Group]
FROM (
    SELECT [t1].[value]
    FROM (
        SELECT
            (CASE
                WHEN [t0].[UnitPrice] > @p0 THEN 1
                 ELSE 0
             END) AS [value]
        FROM [Products] AS [t0]
        ) AS [t1]
    GROUP BY [t1].[value]
    ) AS [t2]
ORDER BY [t2].[value]
SELECT [t3].[ProductID], [t3].[ProductName], 
[t3].[SupplierID], [t3].[CategoryID],
[t3].[QuantityPerUnit], [t3].[UnitPrice],
[t3].[UnitsInStock], [t3].[UnitsOnOrder],
[t3].[ReorderLevel], [t3].[Discontinued]
FROM (
    SELECT [t1].[value]
    FROM (
        SELECT
            (CASE
                WHEN [t0].[UnitPrice] > @p2 THEN 1
                 ELSE 0
             END) AS [value]
        FROM [Products] AS [t0]
        ) AS [t1]
    GROUP BY [t1].[value]
    ) AS [t2]
CROSS JOIN [Products] AS [t3]
WHERE [t2].[value] = (
    (CASE
        WHEN [t3].[UnitPrice] > @p3 THEN 1
         ELSE 0
     END))
ORDER BY [t2].[value], [t3].[ProductID]

Yup, I know I said I wasn’t going to show all the SQL, but you have to admit it’s an impressive display of the productivity in LINQ 2 SQL.  I don’t know about you, but I’m sure I can get more done by using the new C# syntax that writing all that SQL.



Posted by wwagner | with no comments
Filed under: ,
Following Bruce Eckel's advice on interview questions

Bruce Eckel posted on "The questions you forget to ask when you are interviewing for a job, but wish you'd asked after taking the job".

It's an apropos time for me to answer them, because we (SRT Solutions) are looking for an architect, a senior developer, and a couple junior developers.

Note on the Junior Developers:  When we bring in junior developers, we are looking to evaluate potential.  We're looking for people that can grow to senior developers, and eventually architects.

Specifically, the projects that are starting are ASP.NET 2.0 projects that will be using the AJAX / Atlas framework. If you're interested, please contact us at info@srtsolutions.com

On to Bruce's Questions:

  1. If I want to buy something like a book or a tool, how does the process work (how hard is it?). What's the cost limit before the approval must go up the management chain?
    Well, we have a short management chain, so it doesn't really apply. The simplest answer is that if it (the book, tool, whatever) will make you more productive, get it.
  2. What's the noise level like during the day?
    See question 5. It's up to you.
  3. How many meetings am I expected to attend, and how long do they usually last?
    On average, once a week, and hopefully less than an hour. Design reviews for larger projects are probably longer and likely more frequent early in a project, but less frequent and shorter later in a project.
  4. Is there a dress code?
    Dress so you don't embarrass yourself.
  5. Can I work from home sometimes?
    Yes.
  6. Does it matter when I work, as long as I come to meetings?
    Sometimes.  On some projects, our software drives rather expensive (and large) hardware. Occasionally, you'll need to go to the large and expensive hardware. I suppose that's similar to a meeting, but it's different as well.
  7. How many projects have succeeded/failed in the last five years? To what do you attribute the failures?
    Great question. 12 successes, 2 failures. One of the failures was a project that was already failing when we arrived. It was still failing after we left. The second was a rather typical story:  It was an ill-defined project, and what we built was not what the customer expected. Every (rather frequent) checkpoint resulted in a new task list, some of which undid the previous work.

So, if we sound like fun, you have the software engineering talent, and you want to work in South-Eastern MI (near Ann Arbor), contact us.



Posted by wwagner | with no comments
Will the madness ever end?

I received some excellent commentary on an old discussion of GetHashCode recently (links below). It points out that sometimes you can get so caught up in a discussion that you forget to restate all your assumptions as you continue the discussion. 

So, let's go back and uncover our assumptions and restate the proper recommendations, based on correctness first and performance second. I'm only going to cover reference types much here, because that's all Rachel mentioned in her comment. For value types, see Item 10 in Effective C#.

Let's start with correctness.  Remember that GetHashCode(), and Equals() are related:  If two objects are equal, as defined by Equals(), they must generate the same has code. (Note that objects that are not equal may generated the same hash code.) 

This paragraph applies only if you do not override Equals() in your type.  In that case, the default GetHashCode() works correctly for reference types. If you do override Equals() and change its behavior, you've likely broken the default GetHashCode(), and you need to override it to provide a correct definition.

But what about performance?  Can you do a better job than the default?  The answer is a resounding "Maybe".  Rachel says, "And since it’s [System.Object.GetHashCode()] correct (with the one caveat about overloading ==), it’s likely to get used a lot. If I were writing one of those classes, I’d feel obligated to ensure that default implementation didn’t result in poor performance."

That's very true.  Knowing nothing about the distribution of your hash code keys, a random distribution is the best you can do.  And, that's pretty good. 

The bottom line is that I've never defined my own GetHashCode() method because I thought I could get better performance for reference types.  It's not very likely.

However, in my experience, it's more common to use a value type as the key for a hash code. In that case, you need to override GetHashCode() in order to make it correct.  In addition, if my keys are reference types, that type almost always uses value semantics for Equals(). In those cases, you must override GetHashCode() to implement correct behavior. When you override GetHashCode() to provide correct behavior, it's important to write a version that performs well.  Or, at least as well as the default implementation.



My earlier GetHashCode post
answering another reader question (click Comments for Rachel's response)
Posted by wwagner | with no comments
Filed under:
Yes, they do work together

I downloaded and installed .NET 3.0 RC 1, the .NET 3.0 and the VS 2005 extentions for WWF, and finally the Orcas CTP Tools for the .NET Framework 3.0 RC1.

Well,  the Orcas CTP for .NET 3.0 does not include the LINQ bits (C# 3.0 / VB 9.0 compilers, System.Query namespace, etc.)

But, fear not, oh bleeding edge developers.  You can install the May LINQ CTP on top of .NET 3.0 bits and have both the .NET 3.0 functionality AND C# 3.0 goodness.

Note:  The May CTP is pre-release software, and the usual disclaimers apply. (I'm running this configuration in a Virtual PC for now. I've had no issues, but I'm not comfortable putting this on a production machine, yet.)

I've listed links to all the downloads below, in the order you need to install them.



Microsoft Pre-Release Software Microsoft .NET Framework 3.0 - Release Candidate
.NET 3.0 runtime components
Microsoft® Windows® Software Development Kit (Web) for RC 1 of Windows Vista and .NET Framework 3.0 Runtime Components
The .NET 3.0 SDK
Microsoft® Visual Studio® 2005 Extensions for Windows® Workflow Foundation Release Candidate 5
Windows Workflow Foundation
Microsoft Visual Studio Code Name “Orcas” Community Technology Preview – Development Tools for .NET Framework 3.0
VS 2005 refresh for .NET 3.0
Microsoft Visual Studio Code Name “Orcas” Language-Integrated Query, May 2006 Community Technology Preview
The LINQ bits

Posted by wwagner | with no comments
Filed under: ,
To give us everything we need to know about Click Once

Brian Noyes is coming to Detroit and Lansing to speak at the Great Lakes .NET User Group, and the Greater Lansing User Group.

It's great timing, because his Click Once book is ready to go

I've known Brian for several years, and I've always had a lot of respect for him professionally, and and as person.  We started working together when we both wrote for Visual C++ Developer's Journal (which has since be folded into Visual Studio Magazine). Back then, Brian and I tech-edited each others' work, until it reached the managing editor.  At that time, the managing editor was this guy. (which may explain a lot).

Brian was also one of the valuable editors for me when I wrote Effective C#.  The book would not be what it is today without this input.  I'm honored to have him come to Michigan, and I'd encourage you to come to one of his talks.  You'll learn a lot, and you won't be disappointed.



Posted by wwagner | with no comments
Sometimes you need join, sometimes you can use where

This blog post shows the way Linq2Sql translates language elements that look like Join statements. (I chose that way of phrasing it because not all queries result in the JOIN command in SQL.)  So, once again, I’m going to skimp on the output, instead concentrate on the code and the SQL query.

There is one other important key concept here.  Some of the queries use ‘where’ clauses. And some use ‘join … in … on … equals .. into …” clauses.  The difference is simple. You can use ‘where’ when your db to object mapping classes have defined the entity relationships.  You must use join when you create queries that don’t have that built in relationship.

So, the first Join sample finds all the orders for customers in a given city (London).

var q =
    from c in db.Customers
    from o in c.Orders
    where c.City == "London"
    select o;

That query uses the key navigation to find customers in the target city, and then retrieve all their orders.  Here’s the query:

SELECT [t1].[OrderID], [t1].[CustomerID], 
[t1].[EmployeeID], [t1].[OrderDate],
[t1].[RequiredDate], [t1].[ShippedDate],
[t1].[ShipVia], [t1].[Freight], [t1].[ShipName],
[t1].[ShipAddress], [t1].[ShipCity],
[t1].[ShipRegion], [t1].[ShipPostalCode],
[t1].[ShipCountry]
FROM [Customers] AS [t0], [Orders] AS [t1]
WHERE ([t0].[City] = @p0) AND ([t1].[CustomerID] = [t0].[CustomerID])

Which produces the following text:

OrderID=10289   CustomerID=BSBEV        
EmployeeID=7    OrderDate=8/26/1996    
RequiredDate=9/23/1996  ShippedDate=8/28/1996  
ShipVia=3       Freight=22.7700         ShipName=B's Beverages 
ShipAddress=Fauntleroy Circus   ShipCity=London        
ShipRegion=null         ShipPostalCode=EC2 5NT 
ShipCountry=UK  OrderDetails=...        Customer={ }    Employee={ }    Shipper={ }

Well, obviously, if you can look at one field, you can look at multiple fields.  This query looks at the suppliers table and the products table, matching on the supplier ID.  The goal is to find all products from suppliers in the US, where the product is in stock.

var q =
    from p in db.Products
    where p.Supplier.Country == "USA" && p.UnitsInStock == 0
    select p;

Producing this SQL:

SELECT [t0].[ProductID], [t0].[ProductName], 
[t0].[SupplierID], [t0].[CategoryID],
[t0].[QuantityPerUnit], [t0].[UnitPrice],
[t0].[UnitsInStock], [t0].[UnitsOnOrder],
[t0].[ReorderLevel], [t0].[Discontinued]
FROM [Products] AS [t0]
LEFT OUTER JOIN [Suppliers] AS [t1]
ON [t1].[SupplierID] = [t0].[SupplierID]
WHERE ([t1].[Country] = @p0) AND ([t0].[UnitsInStock] = @p1)

And this output:

ProductID=5     ProductName=Chef Anton's Gumbo Mix      SupplierID=2    
CategoryID=2    QuantityPerUnit=36 boxes       
UnitPrice=21.3500       UnitsInStock=0  UnitsOnOrder=0 
ReorderLevel=0  Discontinued=True       OrderDetails=...       
Category={ }    Supplier={ }

This is a similar variation, but the where clause matches multiple records (multiple employees are from Seattle), and an employee may have multiple territories.

var q =
    from e in db.Employees
    from et in e.EmployeeTerritories
    where e.City == "Seattle"
    select new {e.FirstName, e.LastName, et.Territory.TerritoryDescription};

And LINQ turns that C# into the following SQL:

SELECT [t0].[FirstName], [t0].[LastName], [t3].[TerritoryDescription]
FROM [Employees] AS [t0]
CROSS APPLY ((
        SELECT [t1].[TerritoryID]
        FROM [EmployeeTerritories] AS [t1]
        WHERE ([t0].[City] = @p0) AND ([t1].[EmployeeID] = [t0].[EmployeeID])
        ) AS [t2]
    LEFT OUTER JOIN [Territories] AS [t3] ON [t3].[TerritoryID] = [t2].[TerritoryID])

And the output is:

FirstName=Nancy         LastName=Davolio        TerritoryDescription=Wilton
FirstName=Nancy         LastName=Davolio        TerritoryDescription=Neward
FirstName=Laura         LastName=Callahan       TerritoryDescription=Philadelphia
FirstName=Laura         LastName=Callahan       TerritoryDescription=Beachwood
FirstName=Laura         LastName=Callahan       TerritoryDescription=Findlay
FirstName=Laura         LastName=Callahan       TerritoryDescription=Racine

This next Join query finds pairs of employees where the second employee reports to the first employee, and both are from the same city.

var q =
    from e1 in db.Employees
    from e2 in e1.Employees
    where e1.City == e2.City
    select new {
        FirstName1 = e1.FirstName, LastName1 = e1.LastName,
        FirstName2 = e2.FirstName, LastName2 = e2.LastName,
        e1.City
    };

For those happier with SQL than C# (strange folk you may be)

SELECT [t0].[FirstName] AS [FirstName1], [t0].[LastName] AS [LastName1], 
[t1].[FirstName] AS [FirstName2], [t1].[LastName] AS [LastName2],
[t0].[City]
FROM [Employees] AS [t0], [Employees] AS [t1]
WHERE ([t0].[City] = [t1].[City]) AND
([t1].[ReportsTo] = [t0].[EmployeeID])

And, the output:

FirstName1=Steven       LastName1=Buchanan      FirstName2=Michael      
LastName2=Suyama        City=London
FirstName1=Steven       LastName1=Buchanan      FirstName2=Robert      
LastName2=King  City=London
FirstName1=Steven       LastName1=Buchanan      FirstName2=Anne        
LastName2=Dodsworth     City=London

This next query executes an explicit join, and prints the number of orders from each customer:

var q =
    from c in db.Customers
    join o in db.Orders on c.CustomerID equals o.CustomerID into orders
    select new {c.ContactName, OrderCount = orders.Count()};

The SQL is:

SELECT [t0].[ContactName], (
    SELECT COUNT(*)
    FROM [Orders] AS [t1]
    WHERE [t0].[CustomerID] = [t1].[CustomerID]
    ) AS [OrderCount]
FROM [Customers] AS [t0]

Note that the results set only contains those columns you requested:

ContactName=Maria Anders        OrderCount=6
ContactName=Ana Trujillo        OrderCount=4
Etc.

And, if you can join two tables, you can join three (or more):

var q =
    from c in db.Customers
    join o in db.Orders on c.CustomerID equals o.CustomerID into ords
    join e in db.Employees on c.City equals e.City into emps
    select new {c.ContactName, ords=ords.Count(), emps=emps.Count()};

That code joins the customers, orders, and employees table in order to produce the customer’s contact name, the number of orders from that customer, and the number of employees in the same city as the customer. Which looks like this in SQL:

SELECT [t0].[ContactName], (
    SELECT COUNT(*)
    FROM [Orders] AS [t1]
    WHERE [t0].[CustomerID] = [t1].[CustomerID]
    ) AS [ords], (
    SELECT COUNT(*)
    FROM [Employees] AS [t2]
    WHERE [t0].[City] = [t2].[City]
    ) AS [emps]
FROM [Customers] AS [t0]

And these results:

ContactName=Maria Anders        ords=6  emps=0
ContactName=Ana Trujillo        ords=4  emps=0
Etc.

Now, it gets a bit more complicated.  This next query enumerates all the employees, joined to all that employee’s orders. If there are any orders, the result contains the employee’s first and last name, and the order. If there are no orders, you have the employee’s first and last name, concatenated with an empty order.

var q =
    from e in db.Employees
    join o in db.Orders on e equals o.Employee into ords
    from o in ords.DefaultIfEmpty()
    select new {e.FirstName, e.LastName, Order = o};

The SQL is getting more complicated, and here it is:

SELECT [t0].[FirstName], [t0].[LastName], 
[t2].[test], [t2].[OrderID], [t2].[CustomerID],
[t2].[EmployeeID], [t2].[OrderDate], [t2].[RequiredDate],
[t2].[ShippedDate], [t2].[ShipVia], [t2].[Freight],
[t2].[ShipName], [t2].[ShipAddress], [t2].[ShipCity],
[t2].[ShipRegion], [t2].[ShipPostalCode], [t2].[ShipCountry]
FROM [Employees] AS [t0]
LEFT OUTER JOIN (
    SELECT 1 AS [test], [t1].[OrderID], [t1].[CustomerID],
[t1].[EmployeeID], [t1].[OrderDate], [t1].[RequiredDate],
[t1].[ShippedDate], [t1].[ShipVia], [t1].[Freight],
[t1].[ShipName], [t1].[ShipAddress], [t1].[ShipCity],
[t1].[ShipRegion], [t1].[ShipPostalCode], [t1].[ShipCountry]
    FROM [Orders] AS [t1]
    ) AS [t2] ON [t0].[EmployeeID] = [t2].[EmployeeID]

I’m eliding the results, because it’s really long and boring. So, let’s go on to the next query, which shows how you can create new columns in the result set using some operation on existing columns:

var q = 
    from c in db.Customers
    join o in db.Orders on c.CustomerID
equals o.CustomerID into ords
    let z = c.City + c.Country
    from o in ords                 
    select new {c.ContactName, o.OrderID, z};

Which produces this slightly simpler SQL:

SELECT [t1].[ContactName], [t2].[OrderID], [t1].[value] AS [z]
FROM (
    SELECT [t0].[City] + [t0].[Country] AS [value],
[t0].[CustomerID], [t0].[ContactName]
    FROM [Customers] AS [t0]
    ) AS [t1]
CROSS JOIN [Orders] AS [t2]
WHERE [t1].[CustomerID] = [t2].[CustomerID]

And, if you need to join tables by comparing more than one column, it’s easy.  You create compatible anonymous types and compare them:

var q =
    from o in db.Orders
    from p in db.Products
    join d in db.OrderDetails
        on new {o.OrderID, p.ProductID} equals new {d.OrderID, d.ProductID}
        into details
    from d in details
    select new {o.OrderID, p.ProductID, d.UnitPrice};

Which expands into this SQL:

SELECT [t0].[OrderID], [t1].[ProductID], [t2].[UnitPrice]
FROM [Orders] AS [t0], [Products] AS [t1], [Order Details] AS [t2]
WHERE ([t0].[OrderID] = [t2].[OrderID]) AND ([t1].[ProductID] = [t2].[ProductID])

Finally, there’s one wrinkle on the JOIN related methods.  The following query shows how you need to modify the C# query when one of the columns (Orders.EmployeeID) may be null.  You need to cast the other side of the comparison to an appropriate nullable type. 

var q =
    from o in db.Orders
    join e in db.Employees
        on o.EmployeeID equals (int?)e.EmployeeID into emps
    from e in emps
    select new {o.OrderID, e.FirstName};

Which becomes:

SELECT [t0].[OrderID], [t1].[FirstName]
FROM [Orders] AS [t0], [Employees] AS [t1]
WHERE [t0].[EmployeeID] = [t1].[EmployeeID]

Join commands in LinqToSql exist so you can perform queries and projections outside of your original data relationships. And, because the Linq2Sql libraries write the T-SQL for you, you don't have to break apart everything, and create new queries and stored procs.  Cool.

Next up:  Ordering and Grouping.



Posted by wwagner | with no comments
Filed under: ,
I didn't want to write this post, but I will

Today was probably a tough day for everyone.  And, in simple activities I found myself reminded of the events of 5 years ago, no matter how much I tried to avoid it. For example, one of my habits is to pay bills on Monday morning. Normally, I don't even think as I write the checks. I found myself pausing as I wrote the date over and over. I miss not finding a date that significant.

I picked up my son from school today, because it was raining. He told me how his class watched news reports from 5 years ago. Childhood shouldn't involve planes flying into buildings.

So, I leave you with the link below, and this thought:

I want our society and culture to be remembered for what we built rather than what we destroyed.



Building the World Trade Center
Footage from the PBS Documentary

Posted by wwagner | with no comments
And the Ann Arbor .NET Developers Group will be there

Ok, this is late notice, but tonight the IT Zone / Spark is hosting the All Organization Networking / Open House.

The purpose is to showcase all the professional organizations in the Ann Arbor area. It will include business groups, and technical groups.

The Ann Arbor .NET Developers Group meets at Spark Central on the 2nd Wednesday of each month.



Special Event: All Organization Networking/Open House
The official invitation
The Ann Arbor .NET Developers Group
Our home page, and announcements
Posted by wwagner | with no comments
but worth it

After Scott's post and my response on Exception handling, another friend pointed to this post by Joel Spolsky (http://www.joelonsoftware.com/items/2003/10/13.html

Excerpt: "I consider exceptions to be no better than 'gotos'. ... In fact they are significantly worse than goto's".

He echoes many of the earlier concerns voiced by Tom Cargill in 1994 ("Exception Handling: A False Sense of Security" (http://www.awprofessional.com/content/images/020163371x/supplements/Exception_Handling_Article.html)

Excerpt:  "The really hard part of using exceptions is to write all the intervening code in such a way that an arbitrary exception can propagate from its throw site to its handler, arriving safely and without damaging other parts of the program along the way."


The problem with both these positions is that they highlight concerns that just don't apply as much.

Let's start with Tom Cargill's article.  Herb Sutter showed how to solve this problem in Exceptional C++ (Items 8 through 17). (I'm condensing to the point of error in the next paragraph, so please read Herb's book).  Basically, by enforcing the weak exception guarantee, you can write reasonably exception safe code. The Weak Exception Guarantee is that if an exception is thrown, the program is in a stable state, and won't leak resources.  The strong guarantee is harder, but better. The strong guarantee says that a method either completes successfully OR it does not change program state and exits by throwing an exception (or allowing an exception to propagate through it).  Read that again and understand the error implications carefully: If an exception exits from a method, the program state is unchanged.  It's like the method call never happened.

That's not easy, even in a .NET world.  (The weak guarantee is a simple matter of handling IDisposable correctly.)  But, the strong exception guarantee is a worthy goal. It makes it much easier for client code to handle exceptions, and it makes it much easier for all the intervening code to be robust. 

Next, let's look at Joel's objections:

1. Exceptions are invisible in source code, so it's hard to find even with careful code inspection.  True, but I'll add the unchecked return codes aren't much easier.  And, unchecked VALUES for error codes are almost impossible:  The code checked return values 0, 1, 2, 3.  But, can the method return 4,5,6 or other codes?  And, exceptions as an error mechanism makes it easier to see the core algorithms:  I can switch my mind from looking for the core logic, to looking for exception handling practices.  Two passes through the code, and I'm feelling confident.

2. They create too many possible exit points.  That's true.  The important follow up question:  Why does that matter?  Too many exit points matters if you don't enforce either the strong or the weak exception guarantee.  If the only place you clean up resources is at the bottom of a function, in the success path of the function, you've got problems.  And, the strong exception guarantee matters too:  If your function makes four or five transformations on program state, and the third way fails, you need to reset everything.  Depending on the application, this may or may not be difficult:  You might be able to use the copy and swap pattern advocated in Sutter's book. You might be able to lean on database transactions.  Or, in the case of non-mutating methods, you don't have any problems at all.

I'm not suggesting that writing exception safe code is easy.  But, I am saying that using error codes instead does not really make it any easier.  Your best plan is to understand the weak and strong exception guarantees, and try to enforce them throughout your code.



Posted by wwagner | with no comments
Filed under:
Not so much a code as what you might call 'guidelines'

Recently, one of the readers of Effective C# asked me some questions about Properties vs. Get and Set methods.  It turned into a rather interesting conversation, so I'm posting it all here:

The original Question:

I'm writing because of a discussion we've been having about the first item in your book. As you know, that item says to "Always use properties instead of accessible data members."

Our main issue is that the book doesn't explain the value of properties over the traditional set/get method approach. In fact, some of the folks on the team are arguing that properties are just syntactic sugar that have more negatives than positives. One example given is that if a property has both set and get implemented then the property could accidentally be assigned when the intention was comparison
        obj.serverDown = false
versus the desired
        obj.serverDown == false
Whereas, with the traditional set/get method, the above bug isn't possible
        ! obj.IsServerDown()

I'm curious what your opinion/rationale is on the topic? Also, do you know of any other
references (online or in print) that discuss this issue?

My first answer:

There are a few reasons why I believe Properties are beneficial vs. the traditional get and set methods used in earlier OO languages (such as C++). 

Consider SOA and a document centric API:

Properties are naturally serialized using the XML serializer. Therefore, a list of employees defined using properties becomes a simple XML document for wire transfer. (I’m assuming read/write properties for Name, Salary, etc).  You can’t easily use the XML serialize for a similar type implement using get / set methods. This is a necessity to build SOA applications where the client will request a document from the server.  The natural document is the set of public properties for a data type. (The same argument can be made for the client code, where a command document is a type with a set of public properties).

Databinding:

This one may be cheating, because one could argue that MS could implement databinding using get / set methods.  But they didn’t, so properties are the clear winner here. And, in fact, if MS did use a get / set metaphor, it would be similar to Java beans, where get_ and set_ denoted a property.  Personally, I dislike the idea that a methods name can give it magical behavior.

Clarity:

Despite the text in the article you reference below, I think there is merit to the property syntax, when used correctly.  It clearly shows that a type contains certain data elements (at least logically, even if not physically).  Point.X looks cleaner to me than Point.get_X(), or Point.set_X(256).  The syntax of the language expresses the design intent more clearly.  That’s certainly subjective, but that’s my opinion.

To address your concern below, there are two answers.  First, it’s much less likely to make the mistake of if (obj.ServerDown = false) because it can only be done with boolean types (not any arbitrary integral type as in C or C++). Secondly, the habit of putting the constant on the left side fixes that as well: if (false == obj.ServerDown).

On the question of hiding inefficiencies behind properties:

Let's consider this code:

public void Calc(int[] foo)
{
  Sum = 0;
  for (int i = 0; i < foo.Length; i++)
    Sum += foo[i];
}

It is indeed possible that the Length property (or the indexer) cause serious inefficiencies.  (Look at the code again, and imagine that Sum is a property with Get and Set accessors.) But, let's not get carried away. In most cases, the JIT compiler will inline simple properties, and there is no inefficiency at all. Of course, you may consider caching the temp variable anyway, and there may be some savings, if the setter has extensive validation.

Syntactically, you could hide anything inside a property's get or set methods.  You could even include database access to retrieve (or set) the value. But, just because it's syntactically correct doesn't mean you should do it.  I’d say it’s a bad practice to hide a database access behind a property for several reasons.  Would you ever make a database call to retrieve one scalar value?  (That’s what such a getter would look like.) Wouldn’t it be far more likely that you would design your application to pull some reasonable set of data from the database and have it locally available?  My point is that I wouldn’t believe GetCommission() makes a database call any more than a SalesPerson.Commission property. And, if the app really does use some form of lazy evaluation, you’re paying the performance hit regardless of the property or method syntax, so it really doesn’t matter.

So, if the argument is that properties are ‘hiding’ performance issues, I don’t think that’s often true.  Most developers would think getSomething() and setSomething() would contain the same code as a Something property. The JIT compiler knows a property is a get / set method pair and can perform the same optimizations as a get / set method pair. So, you really don’t lose anything.  In some sense, properties are syntactic sugar (there’s no new amazing functionality here), but the XML Document serialization and databinding are the two practical reasons to prefer them over get / set methods.  Your colleagues have found the one negative (= vs. ==), which has been around since the dawn of C, so we’ve probably got the habits to handle that one.

After that initial dialogue, the reader and I discussed some guidelines for deciding between properties and methods.  Because, there are performance pitfalls hiding behind properties and indexers, if misused.

Here's what we came up with:

  • Use a Property when all these are true:
    The getters should be simple and thus unlikely to throw exceptions.  Note that this implies no network (or database) access. Either might fail, and therefore would throw an exception.
  • They should not have dependencies on each other.  Note that this would include setting one property and having it affect another.  (For example, setting the FirstName property would affect a read-only FullName property that composed the first name + last name properties implies such a dependency )
  • They should be settable in any order
    The getter does not have an observable side effect Note this guideline doesn't preclude some forms of lazy evaluation in a property.
  • The method must always return immediately. (Note that this precludes a property that makes a database access call, web service call, or other similar operation).
  • Use a method if the member returns an array.
  • Repeated calls to the getter (without intervening code) should return the same value. 
  • Repeated calls to the setter (with the same value) should yield no difference from a single call.
  • The get should not return a reference to internal data structures (See item 23). A method could return a deep copy, and could avoid this issue.



Posted by wwagner | with no comments
Filed under: ,
Avg, Sum, Min, Max, and T-SQL

This post will cover the Count/Sum/Min/Max/Avg category for DLinq. I’m changing the format because what’s interesting in DLinq isn’t the C# code, it’s how the DLinq library translates the query operators (and your lambda BLOCKED EXPRESSION into SQL.  So, I’m going to concentrate less on the code and its output, and more on what SQL gets generated from your queries.  Unfortunately, I don’t know the exact mechanism LinqToSql uses to translate an expression tree into T-SQL.  But, seeing the inputs and the outputs are instructive.

A simple query on the count from a table:

var q = db.Customers.Count();

Generates this SQL:

SELECT COUNT(*) AS [value]
FROM [Customers] AS [t0]

A Count with a condition:

var q = db.Products.Count(p => !p.Discontinued);

Generates this SQL:

SELECT COUNT(*) AS [value]
FROM [Products] AS [t0]
WHERE NOT ([t0].[Discontinued] = 1)

You can sum a single field:

var q = db.Orders.Select(o => o.Freight).Sum();

Which generates a reasonably simple SQL script:

SELECT SUM([t0].[Freight]) AS [value]
FROM [Orders] AS [t0]

You can write a sum query slightly differently:

var q = db.Products.Sum(p => p.UnitsOnOrder);

Which generates almost the same query:

SELECT SUM([t0].[UnitsOnOrder]) AS [value]
FROM [Products] AS [t0]

Min can be written two different ways as well:

var q = db.Products.Select(p => p.UnitPrice).Min();
var q = db.Orders.Min(o => o.Freight);

Which generates these two (very similar) SQL scripts:

SELECT MIN([t0].[UnitPrice]) AS [value]
FROM [Products] AS [t0]
SELECT MIN([t0].[Freight]) AS [value]
FROM [Orders] AS [t0]

Sure, that stuff was simple.  I could write the SQL procedures as easily as I could write the C# code.  Now, it’s going to get a little more complex.

Look at this query:

var categories =
    from p in db.Products
    group p by p.CategoryID into g
    select new {
        CategoryID = g.Key,
        CheapestProducts =
            from p2 in g
            where p2.UnitPrice == g.Min(p3 => p3.UnitPrice)
            select p2
    };

This builds a list of product categories. Each product category contains two fields: The category ID, and a list of all the products in that category where the price is the cheapest price for that category.

The SQL to perform the query is:

SELECT [t1].[CategoryID], (
    SELECT COUNT(*)
    FROM [Products] AS [t2]
    WHERE ([t2].[UnitPrice] = [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
    ) AS [CheapestProducts]
FROM (
    SELECT MIN([t0].[UnitPrice]) AS [value],
[t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
ORDER BY [t1].[CategoryID]
SELECT [t2].[ProductID], [t2].[ProductName], [t2].[SupplierID], 
[t2].[CategoryID], [t2].[QuantityPerUnit], [t2].[UnitPrice],
[t2].[UnitsInStock], [t2].[UnitsOnOrder], [t2].[ReorderLevel],
[t2].[Discontinued]
FROM (
    SELECT MIN([t0].[UnitPrice]) AS [value], [t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
CROSS JOIN [Products] AS [t2]
WHERE ([t2].[UnitPrice] = [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
ORDER BY [t1].[CategoryID], [t2].[ProductID]

This shows quite a few of the benefits to LinqToSQL.  First, I don’t like writing T-SQL procedures.  LinqToSql means I don’t have to.  Secondly, most developers I work with would solve this problem by using a few standard queries on the database, and compose the final results in the business layer (or worse, the UI layer).  That results in an inefficient design:  You’ll make more round trips to the database layer, and each layer retrieves more information. LinqToSql composes the SQL script on the fly, preserving only round trip, and transferring only the answers you asked for.  Cool.

[[Editor’s note:  The next three samples essentially replace Min with Max.  So, rather than waste all that bandwidth in all these tubes, I’ll just say that Min and Max are very similar. If you can use one, you can use the other. ]]

[[Another Editor’s note: Avg works the same as either Min or Max.  Simple, huh? ]]

The final Average sample does some interesting computation:

var categories =
    from p in db.Products
    group p by p.CategoryID into g
    select new {
        g.Key,
        ExpensiveProducts =
            from p2 in g
            where p2.UnitPrice > g.Average(p3 => p3.UnitPrice)
            select p2
    };

This query generates a list of category keys where each category item contains the Category Key, and a list of all products whose unit price is greater than the average unit price for that category.  Here’s the SQL:

SELECT [t1].[CategoryID], (
    SELECT COUNT(*)
    FROM [Products] AS [t2]
    WHERE ([t2].[UnitPrice] > [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
    ) AS [ExpensiveProducts]
FROM (
    SELECT AVG([t0].[UnitPrice]) AS [value],
[t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
ORDER BY [t1].[CategoryID]
SELECT [t2].[ProductID], [t2].[ProductName], [t2].[SupplierID], 
[t2].[CategoryID], [t2].[QuantityPerUnit], [t2].[UnitPrice],
[t2].[UnitsInStock], [t2].[UnitsOnOrder], [t2].[ReorderLevel],
[t2].[Discontinued]
FROM (
    SELECT AVG([t0].[UnitPrice]) AS [value], [t0].[CategoryID]
    FROM [Products] AS [t0]
    GROUP BY [t0].[CategoryID]
    ) AS [t1]
CROSS JOIN [Products] AS [t2]
WHERE ([t2].[UnitPrice] > [t1].[value]) AND
([t1].[CategoryID] = [t2].[CategoryID])
ORDER BY [t1].[CategoryID], [t2].[ProductID]

As with the above samples, notice that the generated SQL is quite a bit more verbose than the C# 3.0 code.  Your tools have gotten more expressive, and you can get more done in less time.  (And, in my case, you don’t need to worry that your SQL procedure skills just aren’t that great.)



Posted by wwagner | with no comments
Filed under: ,