August 2008 - Posts

Euler Problem 10

Patrick posted his solution earlier this week. I figure it was time to add mine.

Also, Octavio Hernandez pointed out in the comments to problem 8 that he had discussed similar code constructs in his blog a while back.

Problem 10 asks for the sum of all primes less than 2 million. It's also one line of LINQ code:

   1: var number = Common.Primes.GeneraePrimes().TakeWhile((n) => n < 2000000).Sum();
   2: Console.WriteLine(number);

As Patrick pointed out, this will cause an overflow unless you make some changes to the GeneratePrimes() method so that it returns a sequence of longs instead of a sequence of ints.

I also wasn't thrilled with the fact that it was taking a few minutes to finish instead of seconds.  That needed some fixing.

George Byrkit pointed out that the math can be optimized by recognizing that the highest factor you need to check for any N is Sqrt(N).

Sadly, the current .NET Framework does not contain methods that do the check efficiently. George suggested using List<T>FindAll() to find only those numbers smaller than the target. That method is a member of List and creates a new List object. Yuck. Also, it doesn't perform lazy evaluation, and it checks the entire list.

But now that the numbers have gotten much larger, it's worth the work to create my own version of those methods that speed this up.

Here's the new version of IsPrime:

 

   1: public static IEnumerable<long> GeneratePrimes()
   2: {
   3:     var range = Enumerable.Range(2, int.MaxValue-1);
   4:     var answers = from number in range
   5:                   where number.IsPrime()
   6:                   select (long)number;
   7:  
   8:     return answers;
   9: }

The only change here is to cast the prime number to a long before returning.

IsPrime gets a slightly larger makeover:

 

   1: private static bool IsPrime(this int number)
   2: {
   3:     bool isPrime = knownPrimes.TakeWhile(n => n*n <= number).
   4:         TrueForAll(n => number % n != 0);                   
   5:     if (isPrime)
   6:         knownPrimes.Add(number);
   7:     return isPrime;
   8: }

The astute reader will immediately notice that I'm calling TrueForAll using an IEnumerable<int> instead of a List<int>. That's not supposed to compile. And in fact, it doesn't, until I write my own version of TrueForAll:

   1: public static bool TrueForAll<T>(this IEnumerable<T> sequence, Predicate<T> test)
   2: {
   3:     foreach (T item in sequence)
   4:         if (!test(item))
   5:             return false;
   6:     return true;
   7: }

Note that this is typed for IEnumerable<T>, so the compiler will use this version on sequences that are not List<T>, but will still use the base framework version for all List<T> types.

And, now it produces the correct answers in under a minute.

Code from Lightning Talk: Compiler Tricks

My last Lightning Talk should how you can mix constructor calls and object initializers. It's a useful technique that can keep you from writing too much code you don't need.

To see what I mean, start with this simple person class:

   1: class Person
   2: {
   3:     public string FirstName
   4:     {
   5:         get;
   6:         set;
   7:     }
   8:  
   9:     public string LastName
  10:     {
  11:         get;
  12:         set;
  13:     }
  14:  
  15: }
  16:  

You can initialize a person using the new object initialier syntax:

   1: Person someone = new Person{ FirstName = "SomeOne", LastName = "Else"};

The compiler calls the default constructor, so the above line actual calls the default constructor. It's as though you've written this (which is valid as well):

   1: Person someone = new Person(){ FirstName = "SomeOne", LastName = "Else"};

That can be very useful if you later decide that you have some property of the object that should be immutable. For example, you might add a property for the social security number. People can change their name, but not the soc numbers. You'd modify the person class to look like this:

   1: class Person
   2: {
   3:     public string FirstName
   4:     {
   5:         get;
   6:         set;
   7:     }
   8:  
   9:     public string LastName
  10:     {
  11:         get;
  12:         set;
  13:     }
  14:  
  15:     public string SSN
  16:     {
  17:         get;
  18:         private set;
  19:     }
  20:     public Person(string ssn)
  21:     {
  22:         SSN = ssn;
  23:     }
  24: }

Now, you have to make use of the fact that you can explicitly call a constructor. You'd do something like this:

   1: Person someone = new Person("000-00-0000"){ FirstName = "SomeOne", LastName = "Else"};

The advantage is that the class writer does not need to create extra constructors to specify combinations of the different properties you'd like to set at initialization time. You can specify only the parameters that must be used to set immutable state, and let the caller specify whatever set of mutable properties using the object initializer syntax.

You can also make use of this when you create a collection. You can use the constructor to specify a capacity and still use the collection initializer syntax to add the elements:

   1: List<string> names = new List<string>(100) 
   2: { 
   3:     "string one", 
   4:     "string two" 
   5:     // 98 other strings elided.
   6: };
It's a nice little bit of syntactic sugar to apply at the right time.
Mads Torgersen keynoting at CodeMash

Dianne posted the official announcement to the CodeMash google group here, I'm happy to join the echo chamber and promote that Mads will be offering one of the keynote addresses at CodeMash.

 In the colloborative and lightheared spirit of CodeMash, Mads sent us this new bio:

Mads Torgersen is the Language PM for C# at Microsoft. He runs the C# language design meetings, takes the notes, maintains the language specification and that type of language lawyer activity. He also occasionally gets to go out and meet real customers, as well as some of the self-established customer spokespeople that populate events like CodeMash. Before joining Microsoft 3 years ago, Mads was almost a real academic, working as an Associate Professor at a Danish university and doing programming language research. During that time he ran a collaboration with Sun Microsystems to design and implement generic wildcards in Java.

This will be an exciting CodeMash.

Euler Problem 9

I was going to wait a bit to post this, but Patrick Steele posted his solution, so I figured I should post mine.

Patrick took the straight out brute force approach. I decided to think a bit differently and use the some LINQ code to generate my answer.

The problem is to find the only Pythagorean triplet {a,b,c} for which a + b + c = 1000.

The first step was to find a sequence of all triples {a,b,c} where a+b+c added to 1000. To save a little computing time, I made use of two important facts. First, in order to be a Pythagorean triplet, c must be greater than both a and b. Second, different orders of a and b don't matter: {3,4,5} would be the same as {4,3,5}. Therefore, I made the assumption that a must be less than or equal to b.

From those possible answers, I needed to find the only triplet that is a Pythagorean triplet. Here's the code:

 

   1: static void Main(string[] args)
   2: {
   3:     var possible = from A in Enumerable.Range(1, 333)
   4:                    from B in Enumerable.Range(A, (1000 - A) / 2 - A)
   5:                    let C = 1000 - A - B
   6:                    select new { A, B, C };
   7:     var answer = (from triple in possible
   8:                   where triple.A * triple.A + triple.B * triple.B == triple.C * triple.C
   9:                   select triple).Single();
  10:  
  11:     Console.WriteLine(answer);
  12:     Console.WriteLine(answer.A * answer.B * answer.C);
  13: }

There are two bits of LINQ goodness here that I haven't mentioned before.

First, there's the call to Single(). Single will throw an exception if the sequence does not contain exactly one answer. That helps me test: there must be exactly one.

Second, the C# compiler creates an override for ToString() in every anonymous type. That override writes the value of all the properties in the anonymous type. That's why I can simply use Console.Writeline(answer) to see the values of a,b,c.

It's just language goodness.

Euler Problem 8

It's time for another quick tour of LINQ and numeric problems.

Problem 8 asks you to find the largest product of 5 consecutive digits in a 1,000 digit number.

This can be broken into three different sub-problems.

The first is to convert a 1000+ character string (including line breaks) into a sequence of 1000 numbers.

The second is to find all 5 digit sequences in that filtered set of characters.

The final is to find the maximum product.

Here's my code:

   1: class Program
   2: {
   3:     const string massiveNumber =
   4:         @"73167176531330624919225119674426574742355349194934
   5: 3520312774506326239578318016984801869478851843
   6: 1560789112949495459501737958331952853208805511
   7: 0698747158523863050715693290963295227443043557
   8: 6648950445244523161731856403098711121722383113
   9: 9893423380308135336276614282806444486645238749
  10: 8907296290491560440772390713810515859307960866
  11: 2427121883998797908792274921901699720888093776
  12: 7333001053367881220235421809751254540594752243
  13: 4907711670556013604839586446706324415722155397
  14: 7817977846174064955149290862569321978468622482
  15: 2241375657056057490261407972968652414535100474
  16: 6370484403199890008895243450658541227588666881
  17: 7171479924442928230863465674813919123162824586
  18: 6458359124566529476545682848912883142607690042
  19: 9022671055626321111109370544217506941658960408
  20: 8403850962455444362981230987879927244284909188
  21: 0156166097919133875499200524063689912560717606
  22: 6116467109405077541002256983155200055935729725
  23: 6269561882670428252483600823257530420752963450";
  24:  
  25:     static void Main(string[] args)
  26:     {
  27:         int finalAnswer = 0;
  28:         var filteredSequence = (from c in massiveNumber
  29:                                 where Char.IsDigit(c)
  30:                                 select c).ToList();
  31:         for (int n = 0; n < filteredSequence.Count - 4; n++)
  32:         {
  33:             var sequence = (from c in filteredSequence
  34:                             select int.Parse(c.ToString())).Skip(n).Take(5);
  35:  
  36:             foreach (int val in sequence)
  37:             {
  38:                 Console.Write("{0}, ", val);
  39:             }
  40:             var answer = sequence.Aggregate(1, (product, value) => product * value);
  41:             Console.WriteLine(" => {0}", answer);
  42:             finalAnswer = Math.Max(finalAnswer, answer);
  43:         }
  44:         Console.WriteLine(finalAnswer);
  45:    }
  46: }

 

 

The first query removes the CR/LF pairs. I used the IsDigit() method to ensure that only numeric characters pass that filter.

Next, inside the loop, each consecutive 5 digit sequence gets generated.

Toward the end of the loop (line 40 above) uses the Aggregate method to generate the product. After that it's a simple call to Max to find the largest sequence.

Web site rule number one: Define your audience

A recent experience highlights how little anyone in any business thinks about software, especially web sites.

I was running out of checks in the home checking account. The handy-dandy reminder check came up. Normally, I would take that to the local branch and hand it to the teller. (I live in a small town, and the bank is just down the street.)

This time, the vendor wanted me to use the web, or an 800 number. I choose the web. They pointed me to www.<bankname>.com.

That site, like almost all company web sites, was designed as a marketing tool. There are pages upon pages on their product and service offerings, their competitive advantages, their strengths relative to their competitors and the like.

After several tries to find the page where I could order new checks, I tried their search functionality. Results 1-10 (of over 1,000) for "order new checks" did not give me any useful pages. I gave up and used the phone.

The core problem was that this vendor was mixing metaphors badly: a 'website' is a place on the web. With one part of their mind, they consider a website as a marketing tool. They tried to reach potential new customers. With another part of their collective corporate mind, they were trying to streamline communication with current customers. Those are two very different audiences, with very different goals. Would they have sent the same large envelope of brochures to existing customers and potential new customers? Of course not. It would cost money, and (more importantly) it wouldn't work. Yet this company, and many others would happily and unwisely use the answer "see our website" for everything. No matter the question, no matter the audience, the answer is "it's on the website."

Don't do that. It would be tragic to put everything in one place. None of the possible audiences would be able to find anything. That's exactly what this bank had done. "You can find it on the website" isn't an answer. Create different content areas for different purposes. And segregate those different purposes based on audience needs.

Euler Problem 7

Yes, it's true that I've been very lax in working on these problems.  It's been a combination of the day job, finishing all the editing tasks on "More Effective C#", and actually trying to keep the personal life intact.

The 7th problem is quite straightforward: find the 10,001st prime number. The main program is just two lines:

   1: static void Main(string[] args)
   2: {
   3:     var numbers = Common.Primes.GeneratePrimes().Skip(10000).First();
   4:  
   5:     Console.WriteLine(numbers);
   6: }

Simple LINQ stuff. The Skip() method skips the first N elements, and First() escapes the query and returns the first element.

Of course, the meat of the method is the GeneratePrimes() method:

   1: private static List<int> knownPrimes = new List<int>();
   2:  
   3: private static bool IsPrime(this int number)
   4: {
   5:     bool isPrime = knownPrimes.TrueForAll(n => number % n != 0);                   
   6:     if (isPrime)
   7:         knownPrimes.Add(number);
   8:     return isPrime;
   9: }
  10:  
  11: public static IEnumerable<int> GeneratePrimes()
  12: {
  13:     var range = Enumerable.Range(2, int.MaxValue-1);
  14:     var answers = from number in range
  15:                   where number.IsPrime()
  16:                   select number;
  17:  
  18:     return answers;
  19: }

GeneratePrimes() returns the sequence of primes. It uses the IsPrime() method to determine if a number is prime.  IsPrime() shows a couple of assumptions I've made to make this a reasonably simple method.  First of all, it assumes that all primes up to the number being examine have been found. Therefore, if the number has no prime factors smaller than itself, it must be a prime number. That saves quite a bit of work, because the set of primes smaller than N is much smaller than the set of numbers less than N.

To make this work, note that when a prime is found, it is added to the list of known primes.

Overload Resolution on Extension Methods

I had a question from a colleague the other day about extension methods.

I'm simplifying a little, but he was looking for a way to ensure that every source file used the same set of extension methods.

There is one way to enforce this, but it does have an effect on your code organization.

Section 7.5.5.2 of the C# language specification describes how the compiler determines which extension methods are in scope, and which extension methods take precedence over others.

This extract is key:

The search for (a candidate extension method) proceeds as follows:

ยท Starting with the closest enclosing namespace declaration, continuing with each enclosing namespace declaration, and ending with the containing compilation unit, successive attempts are made to find a candidate set of extension methods:

o If the given namespace or compilation unit directly contains non-generic type declarations Ci with eligible extension methods Mj, then the set of those extension methods is the candidate set.

o If namespaces imported by using namespace directives in the given namespace or compilation unit directly contain non-generic type declarations Ci with eligible extension methods Mj, then the set of those extension methods is the candidate set.

That means if you place extension methods in the same namespace as the code from which you call them, those extension methods win. Furthermore, extension methods in an enclosing namespace will be preferred over other namespaces that have been imported.

Therefore, if you place those extension methods that you want to have as part of your global library in the same namespace as your application, or in a parent namespace of that application, you'll be fine.

Note that you still need to import that namespace into each compilation unit, in order to make those methods accessible.

Search

Go

Blog Group Links

Nascar style badges