February 2010 - Posts

VS2010 RC: Much Faster than Beta 2

I spent all day yesterday working with VS2010 RC. (MSDN Subscribers could download late on Monday. It becomes public today).

First impression:  It is much faster, and more stable than the beta 2 build.

The reason for the extra public build and the delay in the launch date was performance. (Soma and the Gu discussed this previously).

Caveats:  It’s only been one day. If I were making a ship decision, I’d say VS2010 RC needs a lot more soak time.

I put VS2010 RC on the PDC Laptop. (The Windows Rating is a 3.2, due to the Windows Aero score). On that machine, the VS2010 Beta was rather painful. It worked, but I could type noticeably faster than VS2010 could display characters. Even worse, slewing characters would cause the display to fall more than a screen behind in drawing.

None of those effects occur in the the VS2010RC. (VS2010 RC uses the hardware rendering on this machine. See Jason Zander’s discussion of perf changes betwen B2 and the RC for more information.) I spent all day coding on the PDC laptop. It kept up during the entire day. Scrolling, slewing keys, intellisense. It’s downright zippy.

As an extra added bonus:  The Add Reference dialog appears in a heartbeat.

As of now, I am doing all development in VS2010 RC.

Euler Problem 14: Learn to Memoize

This is going to be fun.  It’s a bit of LINQ, a bit of academic Computer Science, and a bit of meteorology.

Euler Problem 14 concerns a sequence referred to as hailstorm numbers.

Hailstorm number sequences are generated by applying one of two functions to a number in order to generate the next number. In this case, the sequence is:

n –> n / 2 (n is even)

n –> 3n + 1 (n is odd)

When n is even, the next number in the sequence is smaller. When n is odd, the next number in the sequence is larger

In all cases, it is believed that for every starting number, the sequence will oscillate for some time, and then eventually converge to some minimum number. (In this case, that minimum number is 1.)

These sequences are called hailstorm numbers because they (very simplistically) act like hail in a storm: oscillating up and down in a thunderhead before eventually falling to earth.

This particular problem asks you to find the sequence that has the longest chain, given input numbers less than 1 million.

Writing some code

The brute force method will take your computer a very long time to compute. The problem asks for the longest sequence, and you’ve got a million of them to compute.

Stack size is a related problem.  You could write a method that computes the next value, and then finds the sequence size recursively:

 

   1: private static long Generate(long n)
   2: {
   3:     if (n == 1) { return 1; }
   4:     var next = (n % 2 == 0) ? n / 2 : 3 * n + 1;
   5:     return Generate(next) + 1;
   6: }

A LINQ query gives you the answer:

   1: var answer = (from StartingValue in Enumerable.Range(1, 999999)
   2:              let SequenceSize = Generate(StartingValue)
   3:              orderby SequenceSize descending
   4:              select new { StartingValue, SequenceSize }).First();

This works, and does give the correct answer.

But I’m not really satisfied, because it takes more than 15 seconds (on my PDC laptop) to finish.

Making it Faster

There are two steps to making this faster.  The final version uses a technique called Memoization. Memoization enables you to avoid computing the same result more than once. Pure functions have a useful property in that the output depends on their inputs, and nothing else. Therefore, once you’ve computed the sequence length for, say 64, you should never need to compute it again. It’s always going to be the same.

Moemoization means to store the result for a computation and returned the stored result rather than do the work again. This can provide significant savings in a recursive algorithm like this. For example, memoization of the result for 64 (7) means saving the computations for 64, 32,16,8,4,2 and 1. Memoization of the results for longer sequences would mean correspondingly larger savings.

You could modify the Generate method to provide storage for previously computed results.  Here’s how you would do that:

 

   1: private static Dictionary<long, long> StoredResults = new Dictionary<long, long>();
   2:  
   3: private static long Generate(long n)
   4: {
   5:     if (StoredResults.ContainsKey(n))
   6:         return StoredResults[n];
   7:     if (n == 1) 
   8:     { 
   9:         StoredResults[n] = 1;
  10:         return 1; 
  11:     }
  12:     var next = (n % 2 == 0) ? n / 2 : 3 * n + 1;
  13:     var answer = Generate(next) + 1;
  14:     StoredResults[n] = answer;
  15:     return answer;
  16: }

But, what’s the fun in that?  There’s no reuse. It’s a one off solution.

I want to write a generic Memoize function that lets me memoize any function with one variable. Wes Dyer’s post explains this technique in detail.  Memoize is a generic method that executes a function, abstracting away the types for both the parameter type and the result type:

   1: public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> function)
   2: {
   3:     var previousResults = new Dictionary<T, TResult>();
   4:  
   5:     // Important:  This is a lamdba, not the result.
   6:     return (T arg) =>
   7:     {
   8:         if (previousResults.ContainsKey(arg))
   9:             return previousResults[arg];
  10:         else
  11:         {
  12:             TResult result = function(arg);
  13:             previousResults.Add(arg, result);
  14:             return result;
  15:         }
  16:     };
  17: }

 

The first question you may have is how the previous results actually works. It’s a local variable, not a static storage. How can it possible live beyond the scope of the method?

Welll, that’s the magic of a closure. Memoize doesn’t return a value, it returns some func that enables you to find the value later. That func contains the dictionary. I go into this technique in Items 33 and 40 in More Effective C#.

In order to use this, you need to move Generate() from a regular method to a lamdba expression (even if it is a multi-line lambda):

   1: Func<long, long> GenerateSequenceSize = null;
   2: GenerateSequenceSize = (long n) =>
   3: {
   4:     if (n == 1) { return 1; }
   5:     var next = (n % 2 == 0) ? n / 2 : 3 * n + 1;
   6:     return GenerateSequenceSize(next) + 1;
   7: };
   8:  
   9: GenerateSequenceSize = GenerateSequenceSize.Memoize();

Now, we have the generate function in a form we can memoize. The only trick here is that you have to set GenerateSequenceSize to null before you assign it in line 2. Otherwise, the compiler complains about using an unassigned value in line 6.

You can extend the Memoize function to methods with more than one input, but for now, I’ll leave that as an exercise for the reader.

Search

Go

Blog Group Links

Nascar style badges