October 2008 - Posts

Book Review: Working with Legacy Code by Michael Feathers

 

I don’t think Working With Legacy Code gets the respect and readership that it should.  I believe that’s because most of us have a working definition of legacy code that implies something we want to avoid: We want to work on the cool new stuff, not the old legacy stuff. It makes us conjure images of C, or FORTRAN, or worse, COBOL. Or maybe something newer, but still mature enough you want to move on.

That’s not the definition Michael uses in his book. Michael defines legacy code as “Code without tests”.  Based on that definition, do you work on legacy code? If you’re honest, you’ll say yes. Now, ask yourself if you want better techniques to work with code that doesn’t have tests.

If so, this is for you. You’ll learn several specific techniques that you can employ to take this code, make the absolute minimum number of modifications to get the code testable, and then you’ll feel safer applying your usual refactoring techniques.

I like the way the book is organized, with lengthy chapter titles that point to specific large scale code problems you’ll often find in code that doesn’t have tests. Example titles are “My Application has no Structure”, or “I can’t get this Class into a Test Harness”. Do those sound like problems you encounter? In these and other chapters, Michael identifies several common practices that lead to untestable code: dependencies on other system resources, unavailability of public interfaces to support testing, lack of interfaces for mocking, and so on.  Each chapter title is more or less a description of the current problem, and the chapter content is a set of techniques that will enable you to move that code into a more testable design  Once you can apply tests, you can add those tests and then go about your changes.

Other chapters show how to write tests that help you understand the current behavior. While this can seem silly, it does help ensure you don’t make a mistakes as you move the code forward.

Final, the last section of the code is a set of techniques that help break dependencies between different parts of a legacy system so that it is easier to inject those tests.

I haven’t said anything about the languages used in the book for examples.  That’s because there are several: C++, Java, and C# all appear. One section that is specific to moving from procedural to OO techniques includes C. However, if you use a different language, don’t let that turn you off. The techniques are language agnostic, and that is proven by mixing the samples in different sections with different languages.

This is one of those books hat will always be handy,and will be one of the resources I turn to often when I inherit that set of code that just doesn’t have any tests. If you find yourself staring at blocks of undecipherable code, you should do the same.

PDC Day 2

Once again, I’m posting my thoughts on day 2 a bit later than most.  My thoughts around the business ramifications of the keynote announcements went out in Matt Roush’s Great Lakes IT Report this morning.  You can read them archived here.

From the keynote, I’m actually debating putting Windows 7 on my primary laptop.  bootable VHDs, multiple monitor support, even on VHDs. I’ve got needs, and Windows 7 looks like it will address some of them.

From the standpoint of the development enhancements, the point of the keynote is that new developer tools and libraries will be optimized for the kinds of applications discussed on the first day: distributed applications that will enable you to create applications that can use the web when connected, and work disconnected when necessary.

You may be thinking that you can do that now. You’re right, you can.  But, the new capabilities will make it possible for you to right applications when you don’t even know if it’s connected or not. The synchronization happens at layers below your code, in the library.

Well, I’m glossing over lots of things, but I haven’t had much soak time on these topics.

However, one of the cool things about the PDC format is that I spent a fair amount of the afternoon with some folks on the Azure Tools team and learned quite a bit about how this is going to work.  I can’t wait to start playing with all of this more.

PDC Day One

Keynote:  Windows Azure

Editors note:  I wrote a piece for a regional newsletter on the business ramifications of Azure. That’s posted here. Now that it’s live, I’ll add my comments on Windows Azure from a technical side.

In addition to the keynote, I attended a number of the Azure sessions on Day one. My thoughts below mix those together.

Azure will let you think at a distributed web level. With today’s tools, you’re spending time thinking about servers (the hardware), back end communications (infrastructure), scale out (more infrastructure). A properly architected Azure application will let you concentrate on services, your logic, and your distributed metaphor.

I like the concept of on-demand scaling. I like making that ‘someone else’s’ problem.  In keeping with the vision of the breadth of abstraction rather than the level of abstraction, it appears that while you can offload all that configuration to the infrastructure, you will have ways of controlling those portions of the infrastructure that are interesting to you.

This vision separates Azure from the current concept of using a bank of virtual servers.  With current models, you’re still thinking about ‘servers’, not ‘services’. Why do I care about how many pieces of physical hardware are needed to satisfy peak loads?  I should only care that my service (on whatever physical infrastructure) can handle those peak loads, and then dial back to conserve resources (green scaling).

Finally, from the developer standpoint, it’s great that you’ll be able to use familiar tools (and logical, if radical, extensions to those tools) in order to create these new applications. That will provide two important technical goals:  1) you can quickly learn to create applications for this new cloud platform and 2) you have a path to move current applications into the cloud.

A break from the clouds:  Anders Hejlsberg

The session will be online in its entirety, so I won’t post my four pages of notes here, rather, I’ll discuss the big picture.

C# 4.0 is about adding support for dynamic idioms without abandoning static typing and its benefits.  C#’s vision is one of ‘static when you can, dynamic when you must’. That means in C# 4.0, you’ll have much easier syntax to support dynamically typed objects in the DLR,  COM objects, and .NET objects that you interact with via reflection.

The demos should some great ways to work with JavaScript libraries, python objects, and anonymous types.  In C# 4.0, anonymous types will support Duck Typing through dynamic objects.  Wicked cool.

C# 4.0 will also support syntax that enables generics to be ‘safely covariant and contra-variant’ whereas now, they are invariant. 

The Post 4.0 world is where it gets very interesting:  The C# compiler will be a managed service with programmable APIs. Anders showed a demo where he created an interactive C# shell. It was less than 200 lines of code, and let you write C# on the console.  Anders was having way too much fun with that demo.  Longer term, I want to play with that code to create a DSL for something. that will be very powerful.

Another item from More Effective C# Posted on InformIT

Today, the fine folks at InformIT posted another item from More Effective C# for you to learn more about the style and the content from the book:

Item 29:  Enhance Constructed Generic Types with Extension Methods

If you have questions about the content, come by The Ann Arbor Computing Society (AACS) on Wed Nov 5th.  That's one of the topics I'll be discussing before signing books at Borders.

Update:  That would be the downtown Ann Arbor Borders, on Liberty.

Extension Methods and Null Arguments

A little while ago, I did a DNR TV on C# 3.0. During that, I talked about preserving null semantics when  you write extension methods.  I made the point that you should never test if the first parameter of an extension method is null. That’s because it breaks the semantics of member methods, which is what extension methods appear to be from the calling spot.

For example, this bit of code would throw a null reference exception:

 

SomeType foo = null;

foo.SomeMethod();

 

However, if SomeMethod were actually implemented like this you’d lose error information and let possible error conditions go undetected:

public static Result SomeMethod(this SomeType thing)

{

    if (thing == null)

        return null;

    // etc.

}

 

Clearly that’s bad.  Well, recently one of the folks like saw this DNR TV episode asked me this question:

What is your opinion of checking for null on the first parameter and then throwing an ArgumentNullException?

public static void SendMail(this IEnumerable<Person>

    sequence)
{
    if(sequence == null)
        throw new ArgumentNullException…;
}

That also changes the semantics of the method modulo an instance method. It is more subtle, but it has still changed.

As I said above, de-referencing null throws a NullReferenceException. the sample in the question above throws an ArgumentNullException.

That’s semantically different.

Client code that wants to examine and recover from that coding mistake will be broken. 

Important side bar point: I’m not advocating catching NullReferenceExceptions as a program logic technique (In fact, I think that’s a bad idea).

Even though I don’t want client developers to control program logic by chcking for NullReferenceException, I should ensure that my code obeys the existing semantics.

Well, suppose you changed the SendMail method as follows so it does follow the normal semantics:

public static void SendMail(this IEnumerable<Person>

    sequence)
{
    if(sequence == null)
        throw new NullReferenceException…;
}

Well, now it does obey the semantics of an instance method. That’s good.

However, you’ve now written extra lines of code that don’t do anything useful. If you removed the check, the code would behave exactly the same. As a general rule, I don’t like to write code that doesn’t do anything. (See gratuitous default constructors for an example). 

That’s why I make the practice of not checking null on the first parameter of extension methods.  I don’t write code, and it works correctly.

Getting ready for PDC

Well, today is being spent getting everything ready for traveling to Los Angeles for the PDC conference.

Like most people without a blue badge, I’m going as an attendee to learn as much as I can about the future of our industry and the platforms I use.  However, there is one known appointment on the schedule:

I’ll be at the PDC bookstore on Tuesday from 12:15 to 12:45 to sign copies of More Effective C#.  If you’re coming to PDC, stop by and say hello.

Yet another Book Post: The C# Programming Language 3rd Edition (Annotated)

I was recently notified that the 3rd edition of the C# Programming Language is out.

This version has is new in several ways.  Obviously, it includes a description of all the new C# 3.0 language features. 

In addition, a number of people were invited to provide annotations on the language specification.  It’s an incredible group of smart people:

Brad Abrams and Krzystof Cwalina (of Framework Design Guidelines fame)

Joseph Albahari (of C# in a nutshell fame)

Don Box (of Don Box fame)

Jesse Liberty (of Programming C# fame)

Eric Lippert (member of the C# team, with a fantastic blog)

Fritz Onion (of Essential ASP.NET fame)

Vladimir Reshetnikov (SDET on the C# team)

Chris Sells (of Chris Sells fame)

Oh, and they let me add my annotations as well.

Excerpts of “More Effective C#” posted on InformIT

Three different items from More Effective C# have been posted publicly on the InformIT site:

Item 36:  Understand how Query Expressions Map to Method Calls.

Linq is built on two concepts: A query language and a translation from that query language to a set of methods.”

Item 44: Prefer Storing Expression<> to Func<>

If your type will be storing expressions, passing those expressions to other objects not under your control, or if you will be composing expressions into more complex constructs, consider using expressions instead of func. You’ll have a richer set of APIs that will enable you to modify those expressions at runtime, and invoke them after you have examined them for your own internal purposes.

Item 13: Use lock() as your First Choice for Synchronization.

Threads need to communicate with each other. Somehow, you need to provide a safe way for different threads in your application to send and receive data with each other. However, sharing data between threads introduces the potential for data integrity errors in form of synchronization issues. Somehow you need to be certain that the current state of every shared data item is consistent. You achieve this safety by using synchronization primitives to protect access to the shared data. Synchronization primitives ensure that the current thread will not be interrupted until a critical set of operations is completed.

It will give you a taste of the rest of the book.

I hope you enjoy it.

Ann Arbor PDC Keynote Viewing and Discussion

I would hope you are going to the PDC, but if you can't make it, we may have the next best thing:

 SRT Solutions is hosting a web viewing of the Ray Ozzie key note addresses (Monday and Tuesday) and a discussion around the talks.

Space is limited, so please sign up.  You can signup for the Monday event here. You can signup for the Tuesday event here.

Our local Microsoft office is graciously sponsoring lunch for both events.

 

More Effective C# available now

Today is the official release date for More Effective C#

Writing a book may seem to be a solitary activity, but nothing could be further from the truth. I have been lucky enough to work with fantastic editors, technical editors, and community members as I have put this together. If you read the acknowledgements, you’ll see what I mean.

What .NET Library Features do you use?

Scott Hanselman posted a survey (one little question) on what features do you use in the .NET framework: http://www.hanselman.com/blog/SurveyTimeWhatNETFrameworkFeaturesDoYouUse.aspx

 On the off chance that one of my two readers haven't heard of Scott Hanselman, go respond.

Clarifying my comments on Previous Post

My last post generated valid criticisms from readers. Let’s clarify and correct that.

Keith Elder commented:

I got the impression reading your post that you "skimmed" my article and didn't really get what it was I was getting at.
[…]

The whole point of this is C# 3.0 can be written where it is unreadable (lambda as an example) but it can also make the code 200% more readable than the old way by combining several features, not just one.

Although you say you disagree, I know we both agree because we both agree the LINQ example is the most readable.  Hugs all around :)

Fair enough. The title of Keith’s post certainly led me to conclude that his point was that the new syntax made for less readable code. Not his intent, and clearly my fault.  My apologies for misinterpreting your intent.

Sean comments:

 

On the "don't use it just to use it" idea, I admit that just throwing something in can be reckless, how can you know when to use it if you don't blatantly experiment with it?

I would advocate ‘blatantly experimenting with it’. It’s a great way to learn where new features can help, and when you’ll run into weaknesses with that new feature.

But…

Before declaring a task done, take a moment and do your own review of the code.  In this sense, much like Keith explored multiple solutions in his blog post, look at alternatives, determine which is the most readable, and maintainable, and go with it.

Another important technique to learn new techniques is reading code. Scott Hanselman’s “Weekly Source Code” series is great example of how to do that.

Creating Readable LINQ

Keith Elder wrote a post a post asserting (or at least proposing) that a more imperative syntax for a problem is more readable than a LINQ based C# 3.0 version. (See here for his post).  Well, that got my hackles up.  Some days, my hackles get up before I do, but I digress.

His example code as examining IP addresses for a machine. His poor example was this:

   1: string ipAddy = Dns.GetHostAddresses(Dns.GetHostName()).Single(i => ValidateIP4Address(i.ToString())).ToString();

(ValidateIP4Address is shown in Keith’s example, and is not repeated here).

I’ll agree with Keith on one point: that’s ugly code. Formatting is only part of it. It’s also not leveraging one of the features that make LINQ readable: query expressions. It uses two different conversions from each IP address to a string.  Much of the core logic is hidden.

Let’s try this instead:

   1: var ipAddy = (from address in Dns.GetHostAddresses(Dns.GetHostName())
   2:               let addressLabel = address.ToString()
   3:               where ValidateIP4Address(addressLabel)
   4:               select addressLabel).First();

The first line of the query defines the source: Host addresses.

The second line defines a local variable to cache the string representation of the address.

The third line defines the filter condition: A valid IP4 address.

And the fourth line defines the result: A single string, the first in the sequence

I’ll agree that if you haven’t looked at LINQ code very much, this can still appear hard to read. But that’s a short-term argument: If you’re reading my blog, you're a developer, and you should be learning new features in whatever language you are using.

Some of Keith’s concerns are very valid: pulling out the latest new features just to use them and experiment with whatever looks interesting will create bad code. But there is a lot to be gained by using the new features carefully, and adding them in the appropriate manner, and following the best idioms for the newer features.

There’s plenty of resources to help you. Use them, and discuss readability with your peers. It’s the only way you’ll know what other developers will understand.

Paul Kimmel interviews me about C#, LINQ, and writing books

Paul Kimmel spent some time over the summer chatting with me about More Effective C#, LINQ, upcoming C# features, and the process of writing books.  It's live here: http://www.informit.com/articles/article.aspx?p=1237069

Euler Problem 11, or it’s about time I wrote some code

I just needed to write some code last night, so I figured I’d pull out the Euler problems and solve another.

Problem 11 asks you to find the largest product of any sequence of 4 numbers in a 20 x 20 grid. The sequence can be horizontal, vertical, or diagonal. As with the other problems, my goal is to display how C# 3.0 syntax enhancements can be best exploited for this problem.

I wrote two very similar solutions to problem 11 so that you can see how LINQ mixes imperative and functional styles, and how lazy evaluation means that the different expressions of the solution actually have similar characteristics at runtime.

The algorithm must search through the grid four times finding all the candidate sequences: horizontal, vertical, down-right, and down-left. There are four LINQ queries that find possible answer sequences. To find the single answer, you must find the max value across all four queries.

A more pure functional solution creates a single query by concatenating the four queries together. Then, after concatenating all four queries, it calls order by to find the max:

   1: var best = (from x in Enumerable.Range(0, 17) // horizontal
   2:             from y in Enumerable.Range(0, 20)
   3:             let first = data[x, y]
   4:             let second = data[x + 1, y]
   5:             let third = data[x + 2, y]
   6:             let fourth = data[x + 3, y]
   7:             let answer = first * second *
   8:                     third * fourth
   9:             select new
  10:             {
  11:                 X = x,
  12:                 Y = y,
  13:                 Direction = "Right",
  14:                 Values =
  15:                  string.Format("{0}, {1}, {2}, {3}",
  16:                     first, second,
  17:                     third, fourth),
  18:                 Answer = answer
  19:             }).Concat(from x in Enumerable.Range(0, 20) // Vertical
  20:                       from y in Enumerable.Range(0, 17)
  21:                       let first = data[x, y]
  22:                       let second = data[x, y + 1]
  23:                       let third = data[x, y + 2]
  24:                       let fourth = data[x, y + 3]
  25:                       let answer = first * second *
  26:                               third * fourth
  27:                       select new
  28:                       {
  29:                           X = x,
  30:                           Y = y,
  31:                           Direction = "Down",
  32:                           Values =
  33:                            string.Format("{0}, {1}, {2}, {3}",
  34:                               first, second,
  35:                               third, fourth),
  36:                           Answer = answer
  37:                       }).Concat(from x in Enumerable.Range(0, 17)  // Down to the right
  38:                                 from y in Enumerable.Range(0, 17)
  39:                                 let first = data[x, y]
  40:                                 let second = data[x + 1, y + 1]
  41:                                 let third = data[x + 2, y + 2]
  42:                                 let fourth = data[x + 3, y + 3]
  43:                                 let answer = first * second *
  44:                                         third * fourth
  45:                                 select new
  46:                                 {
  47:                                     X = x,
  48:                                     Y = y,
  49:                                     Direction = "DownRight",
  50:                                     Values =
  51:                                      string.Format("{0}, {1}, {2}, {3}",
  52:                                         first, second,
  53:                                         third, fourth),
  54:                                     Answer = answer
  55:                                 }).Concat(from x in Enumerable.Range(3, 17)
  56:                                           from y in Enumerable.Range(0, 17)
  57:                                           let first = data[x, y]
  58:                                           let second = data[x - 1, y + 1]
  59:                                           let third = data[x - 2, y + 2]
  60:                                           let fourth = data[x - 3, y + 3]
  61:                                           let answer = first * second *
  62:                                                   third * fourth
  63:                                           select new
  64:                                           {
  65:                                               X = x,
  66:                                               Y = y,
  67:                                               Direction = "DownLeft",
  68:                                               Values =
  69:                                                string.Format("{0}, {1}, {2}, {3}",
  70:                                                   first, second,
  71:                                                   third, fourth),
  72:                                               Answer = answer
  73:                                           }).OrderByDescending((record) => record.Answer).First();
  74:  
  75: Console.WriteLine(best);

Of course, many folks are turned off by seeing 70+ lines of code with a single semi-colon. So here’s a different refactoring of the same code. In the following version, I construct four different queries. It’s important to understand that constructing a query in LINQ is not the same as executing it. After constructing four distinct queries, I concatenate them, and find the max.  The C# compiler can correctly determine that all four queries project a sequence of the same anonymous type, so all four query results can be concatenated.  (If you really want, you can look in Reflector and find that the compiler generates only one anonymous type for the query result objects.)

var horizontal = from x in Enumerable.Range(0, 17) // horizontal
                 from y in Enumerable.Range(0, 20)
                 let first = data[x, y]
                 let second = data[x + 1, y]
                 let third = data[x + 2, y]
                 let fourth = data[x + 3, y]
                 let answer = first * second *
                         third * fourth
                 select new
                 {
                     X = x,
                     Y = y,
                     Direction = "Right",
                     Values =
                      string.Format("{0}, {1}, {2}, {3}",
                         first, second,
                         third, fourth),
                     Answer = answer
                 };
var vertical = from x in Enumerable.Range(0, 20) // Vertical
               from y in Enumerable.Range(0, 17)
               let first = data[x, y]
               let second = data[x, y + 1]
               let third = data[x, y + 2]
               let fourth = data[x, y + 3]
               let answer = first * second *
                       third * fourth
               select new
               {
                   X = x,
                   Y = y,
                   Direction = "Down",
                   Values =
                    string.Format("{0}, {1}, {2}, {3}",
                       first, second,
                       third, fourth),
                   Answer = answer
               };
var downRight = from x in Enumerable.Range(0, 17)  // Down to the right
                from y in Enumerable.Range(0, 17)
                let first = data[x, y]
                let second = data[x + 1, y + 1]
                let third = data[x + 2, y + 2]
                let fourth = data[x + 3, y + 3]
                let answer = first * second *
                        third * fourth
                select new
                {
                    X = x,
                    Y = y,
                    Direction = "DownRight",
                    Values =
                     string.Format("{0}, {1}, {2}, {3}",
                        first, second,
                        third, fourth),
                    Answer = answer
                };
var downLeft = from x in Enumerable.Range(3, 17)
               from y in Enumerable.Range(0, 17)
               let first = data[x, y]
               let second = data[x - 1, y + 1]
               let third = data[x - 2, y + 2]
               let fourth = data[x - 3, y + 3]
               let answer = first * second *
                       third * fourth
               select new
               {
                   X = x,
                   Y = y,
                   Direction = "DownLeft",
                   Values =
                    string.Format("{0}, {1}, {2}, {3}",
                       first, second,
                       third, fourth),
                   Answer = answer
               };
 
var best = horizontal.
    Concat(vertical).
    Concat(downRight).
    Concat(downLeft).
    OrderByDescending((record) => record.Answer).First();
Console.WriteLine(best);

There’s one other interesting bit of syntax that should be noted here.  I’ve used ‘let’ in all the queries to avoid recomputing the same result, and to ensure that I’ve avoided any copy/paste errors as I put together the answer.

Search

Go

Blog Group Links

Nascar style badges