March 2008 - Posts
Here are my notes on the third Project Euler problem. The problem is "What is the largest prime factor of the number 600,851,475,143?"
Once again, this is a problem that is best solved creating some simple LINQ queries. From the outside, this query generates the list of all prime factors in the very large number:
long largeNumber = 600851475143;
var allPrimeFactors = from p in Primes.PrimeFactors(largeNumber)
orderby p descending
select p;
foreach (var f in allPrimeFactors)
Console.WriteLine(f);
As with the other first two problems, the query is fairly simple. Just grab the prime factors, and order them in descending order.
The work is in the Primes class. This class uses tail recursion to find all prime factors. It starts with the smallest prime number (2), and removes all copies of that number as a factor. Once all those are returned, it will look at the next larger factor and remove those.
A couple quick notes:
1. The algorithm, as written, must start with the smallest prime number. Otherwise, it will find non-prime factors.
2. It doesn't matter that MorePrimeFactors looks at possible factors that aren't prime. They won't appear because any smaller prime factors have already been removed. (for example, 6 won't appear because any factors of 2 or 3 have already been removed).
public static class Primes
{
// Find all prime factors.
public static IEnumerable<int> PrimeFactors(long number)
{
// Start by removing the lowest prime (2)
return MorePrimeFactors(number, 2);
}
// This recursive method finds all prime factors.
private static IEnumerable<int> MorePrimeFactors(long number, int factor)
{
// Find the next prime factor
while (number % factor != 0)
factor++;
// Return it.
yield return factor;
// look again...
if (number > factor)
// recursively look for this factor again, using Num/factor
// as the new big number
foreach (int factors in MorePrimeFactors(number / factor, factor))
yield return factors;
}
}
Note that the PrimeFactors algorithm is not linq based, but it does rely heavily on custom enumerators.
I kept looking for an updated version of CopySourceAsHTML (http://www.jtleigh.com/people/colin/software/CopySourceAsHtml/) that would work with Visual Studio 2008. (which is why some of my recent code snippets don't look the way I want).
While Colin & J.T. Leigh did not release any updates, Guy Burstein did. You can get a new version here: http://blogs.microsoft.co.il/blogs/bursteg/archive/2007/11/21/copy-source-as-html-copysourceashtml-for-visual-studio-2008-rtm.aspx
The second Euler problem asks you to find the sum of all even valued terms in the Fibonacci sequence which do not exceed 4,000,000.
Let's look at the solution from the outside in. Here's the query that finds the answer:
var EvenFibNumbers = (from n in Enumerable.Range(1, MAX)
let value = FibonacciSequence.Fibonacci(n)
where value % 2 == 0
select value).TakeWhile(n => n < MAX);
var answer = EvenFibNumbers.Sum();
There are a couple notes on the performance metrics used here:
The TakeWhile() method call does short circuit the query so that it does not continue calculating unneeded numbers. I could optimize this by trying to be very careful about which Fibonacci numbers I calculate, but it really doesn't matter. TakeWhile means I don't calculate them anyway. (Well, to be strictly accurate, I do calculate one more than I need, but that's no big.
Next, note the "let" statement in the query. I use that to avoid calculating the Nth Fibonacci number more than once. It just introduces a new local variable (value) and assigns it to the result of Fibonacci(n).
On this algorithm, there are some other interesting changes you could make. If you examine the Fibonacci sequence carefully, you'd see that every 3rd Fibonacci number is even. You could, possibly, save some work by using that fact. Because of how the Fibonacci sequence is calculated, I don't think it matters. (Comment if you have a way to optimize it that I'm not seeing).
What's more interesting, to me, is how I wrote the Fibonacci sequence. I used a mix of Functional and Object Oriented techniques. (For a more functional approach, read Dustin Campbell's article on memorization: http://diditwith.net/PermaLink,guid,7191176b-c72a-49db-986e-466845665fa1.aspx)
public static class FibonacciSequence
{
// Store all Nth Fibonacci numbers that have been calculated.
private static Dictionary<int, long> FibnocciNumbers = new Dictionary<int, long>();
// Return the Nth Fibonacci number.
public static long Fibonacci(int n)
{
long answer;
// If it's already been calculated, just return it.
if (FibnocciNumbers.TryGetValue(n, out answer))
return answer;
// The first and 2nd numbers are 1.
if (n < 2)
answer = n;
else
// It's the sum of the previous two numbers.
answer = Fibonacci(n - 1) + Fibonacci(n - 2);
// Store...
FibnocciNumbers.Add(n, answer);
// return
return answer;
}
}
You'll note that this class uses a recursive method to calculate the Nth Fibonacci number. However, it will only calculate each number in the sequence at most once. Once a calculation has been performed, the answer is stored in a private dictionary for later use. Rather than try and twist your brain around many of the more advanced Functional programming concepts, using C# 3.0, you can mix good old fashioned oo programming with good old fashioned functional programming. Cool!
Stay tuned for more on Friday, when I write up the notes on problem 3.
I am starting a new series. A friend recently pointed me at the Project Euler problems. You can see the project here: http://projecteuler.net/
As I finish them, I'll post an explanation of the problem and the code here. I've also posted the code on MSDN Code Gallery. As I finish a week's worth of code, I'll post a new release of source code there.
A few ground rules:
- I'm not going for the 'best' or 'most efficient' solution. I'm using this to demonstrate significant C# 3.0 features. If you have a different preferred algorithm, post it yourself. In fact, you can register at Project Euler and post your own code.
- I will post blog entries more or less at the same time I update the code. You might see the code before I discuss it. If you don't want spoilers, don't do it.
- I will discuss the algorithms and the answers freely. If you want to figure out the problems yourself, don't read too far before you try it yourself. Or, once again, signup at Project Euler and try yourself before I post my answers.
- I will do the problems in order, so if you're interested in avoiding spoilers, you can work in order, and stay ahead of where I'm at. (The first three problems took me less than 2 hours from start to finish).
Ok, with that out of the way, here's the code for the first problem. (Really, it's not that hard. I'm not giving much away here). In the first problem, you need to find the sum of all natural numbers below 1000 that are multiple of 3 and 5.
It's one (or two) lines of code in C# 3.0:
var nums = from n in
Generators.NumberSequence(0, 1000)
where ((n % 3 == 0) || (n % 5 == 0))
select n;
var answer = nums.Sum();
I should also show you the code for NumberSequence. I figure I'll need sequences of numbers fairly often in this projecxt, so I pulled that into a library:
public static IEnumerable<int> NumberSequence(int from, int to)
{
do
{
yield return from++;
} while (from < to);
}
Yes, this method assumes that from is less than to. Seems like a reasonable assumption at this time.
You can download the code for the first three problems here: http://code.msdn.microsoft.com/EulerCSharp
Note that I said 3 problems. I uploaded the code for the first three problems on Project Euler, so you may want to wait if you want to avoid spoilers.
My alma mater, The University of Illinois, just announced a joint venture with Microsoft, Intel, and the University of California at Berkeley which will enable commodity computer systems to make use of parallel computing techniques previously in the exclusive realm of super computers. You can read more here: http://www.upcrc.illinois.edu/news_031808.html
Wicked cool.
That, combined with reseach by compiler teams to make parallel computing more accessible is going to lead to a wild future.
Dave Donaldson asks "Do you write software at homer like you do at work?"
Dave wonders if we use the same discipline writing software at home (for our own use) that we do at work. I think the question is great, but I'm concerned about those developers that answer yes without thinking. In fact, should you use the same tools at home that you use at work? I'll answer all his questions for my own development, and I'll explain why I make the choices that I make. Here are the questions:
- Do you have a source code repository?
[BW] Yes. I make mistakes and I want to backup.
- Do you use it?
[BW] Yes.
- Do you still check-in many times while coding (like hopefully you do at work), or do you wait and do one massive check-in before you go to bed?
[BW] Somewhere between those. This will become more apparent as you read on, but think about why you checkin often at work. Does someone else need your work? Not integrating (via checkin) often means that different branches (each developer's working copy) get farther away from each other.
- Do you have a task management system?
[BW] Yes. I use the same notecards we use at work.
- Do you have a bug tracking system?
[BW] bugs = tasks, therefore they go on cards.
- Do you assign bugs a severity level?
[BW] Yes. But, it's a simple one. "Fix it now", "Fix it later", "Do this when I have nothing better to do".
- Do you set milestones?
[BW] That depends. Why am I doing the development? What's the pain? The milestones (if any) depend on why the work is being done.
- Do you hold yourself to them?
[BW] Related: If I was writing a Bracket Management system now, I've got a real external deadline approaching. It's immovable. However, if I'm just experimenting, no such external pressure applies.
- Do you use the same types of patterns?
[BW] Sure.
- Do you apply a separation of concerns?
[BW] As often as I do at work. Decoupling has both benefits and costs. Which is greater in this instance? (The same analysis should apply at work).
- Do you write clean code at home, or is it a mess just to get it done?
[BW] Habits are permanent. You have to follow the same practices at home, or the bad practices will bleed through.
- Do you write tests?
[BW] yes.
- Do you run them?
[BW] yes.
- Do you have a build server?
- Do you use it?
- Do you practice continuous integration?
[BW] No on all three. There's a reason. Why do you use these tools at work? I'll submit that it is to facilitate collaboration among the team. With a team of one you don't have the same benefits. A separate build server does ensure that I haven't forgotten to add files to the repository. Visual SVN does the same thing. So does delete / update / build (make a backup first). I have multiple development machines, and I use online folders to help me keep them in sync. That means I'm not concerned about a catastrophic failure on my desktop losing work. (See 'check in often' above). Continuous integration is the same. By definition, in a one person-development effort, if it builds on my machine, it builds on the build server. Running my tests on every build gives me the same results.
Now, I will submit that if I think of the CI server as an automated test server rather than as an automated build server, I start to think about possible benefits. Every checkin runs the test(s) in the background, and I'm continuing to work. But, that changes my habits. At work, I write tests, write code, see tests pass, check in. At the moment, without a CI server, I do the same thing. If I moved to using a CI Server (thinking it is the test server), my workflow might change to write tests -> check in (see CI build fail) -> write code -> Check in -> repeat until CI build passes (meaning all the tests pass). I don't think I want to adopt that process in my work environment. I'd likely break the build repeatedly every day. That's not good
My purpose in answering Dave's question is to get more people thinking. Underlying Dave's post are some values. If you follow bad habits in less structured environments, those same habits will creep into your more structured environments. Practice doesn't make perfect, practice makes permanent. Every time you break your own rules, you'll reinforce bad habits. That's especially true when no one else is watching you. Countering that is an understanding of why you follow certain processes. If the justification for a particular process doesn't apply, neither should the process. This is true at home, at work, and at other work locations. As I said above, I don't have a separate build server at home. I don't see the benefits on a single person project. At home, and at our office (< 15 developers) we use note cards to track tasks and bugs. That's a low-friction and reasonable answer for a small shop that's located in the same office (although folks do telecommute often, that's what phones are for.) I'm fully aware that the notecards don't scale to a large, or more distributed environment. That doesn't mean I'm against other tools, just that I don't believe they would be appropriate for us now.
By way of analogy, I use a basic claw hammer to hang a picture. That's the best tool for the job. When I'm helping to build sets, and we need to build several platforms and flats, I use the air hammer. Both choices are correct. It would take more time to setup the compressor, uncoil the hoses, get the compressor going and so on… I'm done using the hand tool before I even get the air hammer turned on. Of course, I can finish a whole platform in less time (including setup) using the air hammer. It's a matter of evaluating the goals, and picking the write tool for the job. The analogy still works for those practices where I use my normal work structure. Even for hanging the picture, I get the hammer. I don't use the nearest heavy object (like my wife's crystal candlestick). The cost of using the wrong tool (as opposed to the right lightweight tool) is rather high.
Jeff Beehler wrote this post clarifying some questions on VSTS licensing as it relates to build environments.
There is a common misconception (that I shared) that your build machine needed a licensed copy of VSTS team developer. In fact, that's not correct. Jeff's bottom line is refreshing in its simplicity:
"The way I think about it, the people that are using the Team editions need to be properly licensed which in turn ensures the that the build machines are covered as well."
He references the VSTS licensing whitepaper (warning: it's dry as dust) located here. I did look at the whitepaper, and and that does lead me to wonder about some scenarios. The whitepaper describes how the build process works in a Team Server environment (p. 9). What's not clear to me is if this same statement is true when developers are using Team Developer (or VS2008 Pro) to create unit tests, but not using TFS as the reposititory and ALM solution. (For instance, can SourceVault customers use this scenario? Or, because they aren't using TFS Server, do they need to have the appropriate developer SKU on the build machine?
Update: As Jeff comments below, even in the scenario I outlined above, if the developers have the license to create the tests, the build server can run them. Cool.
Over the last week, we've had an interesting discussion over what podcasts any of us listen to (or watch) to improve our skills as developers / software people / business people. It seemed interesting enough to share with others, so here's my list:
.NET Development
.NET Rocks: DNR is now twice a week. Carl and Richard cover quite a few different topics, and I almost always learn something. This podcast makes me reach for the keyboard and want to code.
Hanselminutes: Scott's is a mix of hard code tech stuff, processes, emerging (and not yet emerging) tools, and best practices. As much or more than any other developer podcast, this one makes me think.
On Microsoft: This video podcast features interviews with authors on the Pearson Education imprints (Addison-Wesley, Prentice Hall, Cisco Press, Exam Cram, IBM Press, Que Publishing, and SAMS Publishing). Full disclosure: Effective C# is an Addison Wesley book). This podcast gives a really good overview of new and upcoming technologies and tools coming from Microsoft.
Software in General
The Java Posse. This is a weekly news cast on developments happening in the java development space. Even though I do very little professional development in Java (or on any other language on the JVM), it's still an important part of the development space. My only complaint on the Java Posse is that they do have a noticeable anti-Microsoft bias. That bias, combined with a lack of knowledge about newer developments in the .NET space leads to some technical errors when they compare the .NET ecosystem to the Java / JVM ecosystem. But, they are honest about that bias, so if you take those items with a grain of salt, there is a lot of information in the podcasts.
On Software: This is another video podcast from Pearson. Discussions range to anything of interest to software developers: C++, Java, C#, Unit Testing, F#, continuous integration, tools, Agile methods, etc. All discussions are with known and respected experts in the field.
On OpenSource: Same source, same interview format. Discussions with open source developers. I watch most of these, but some of the projects (like Linux Kernel Development) cover topics of which I have no interest. I skip those.
On Security: Same source. Here I listen to the developer-centered interviews that discuss how to create software that is more secure, and more capable in secure environments. I often skip the interviews that are on network admin topics.
Business and Technology:
Stanford Entrepreneurial Thought Leaders: During the school sessions, this podcast features lectures and interviews with some of the most impressive leaders in the software and entrepreneurial business areas. You'll learn how folks have found and served new markets, created new businesses, and how some folks have grown several businesses over their careers.
Channel 10: Short videos about different items that are happening around Microsoft. Some technology, some business, some (not really funny) humor. They are short pointers to tech ideas that you might want to learn more about.
Women in Technology: It's about the technology, not the women. These are short interviews with some of the top people in the industry, sharing what they've done. My only complaint is that it's obviously recorded over the phone, and often the quality is rather poor.
Humor:
Mondays: Nerd humor. Not safe for work (although we certainly would allow it in our office, because we're nerds and we're humorous.)
R3TV: CBC Radio 3 TV. Not sure how to describe this. It's a mix of humor, and new Canadian Independent music. Where
Music:
CBC Radio 3: There are 5 podcasts from CBC Radio 3. I've subscribed to all of them: Track of the Day, CBC Radio 3 Podcast with Grant Lawrence, the R3-30, Radio 3 Sessions, and Special Edition. It's a northern thing. Us in the upper Midwest thrive on Hockey Night in Canada, CBC's Olympic coverage (did you know there are Olympic sports in which American's don't' win medals?) Some don't come out as often, so it's not as crazy as it sounds. Besides, where else am I going to get my fix of "Puck Rock"?
The Edge Videocast: A monthly videocast of new music videos. I save some, I delete some immediately.
IndieHeart podcasts: There are two podcasts here. Heart of the Night, and the Transatlantic Acoustic Show. Both will give you a taste of independent music from the middle of the US.
All Songs Considered: The Podcast version of NPR's All Songs Considered. I skip some, because Bob Boilen can be a bit annoying when he goes on about the music of his youth. But still, the music is often very good.
Detroit Jazz Stage: Detroit has an amazing Jazz scene. This will give you a taste of what's happening in this region. Also, I find that instrumental jazz is the best music to listen to while writing.
The Jazz Suite: Another weekly instrumental Jazz music.
DeadPod: Last, but clearly not least, DeadPod. Nothing could be better than a fix of dead music.
There you go, one person's long, strange trip through the podcasting universe.
The world-famous Carl Franklin invited be to be on DNR TV again. The episode is live here: http://dnrtv.com/default.aspx?showID=105
We discuss generics in C#, and how you can use generics to create more reusable block of code.
We go from fairly obvious generic uses like collections and EventArgs<T> all the way to how generics provide some of the underpinnings for C# 3.0.
Behind the scenes notes: Any of you that listen to Carl's podcasts know how much work he puts in to ensure quality production. You may not know how much work it really takes. The morning of our recording, woke up to find the house a chilly 62 degrees (F). Yup, the furnace had stopped working overnight. (I don't know why, but furnaces always seem to stop working in the middle of the night). After getting the kids to school, we called the furnace guys, and they were here by 9:00. We needed a complete new blower motor. Of course they came back with the new motor at the moment Carl and I started recording. Even though I could hear the banging as the guys mounted the motor on the furnace, none of that comes through in the recording. Although I think you can hear my teeth chattering while I was coding in 57 degree temp. The heat finally came on about 40 minutes into the recording.
One of our goals when we create software is to minimize the coupling between different components. By minimizing coupling, we make it easier to create different building blocks that can be used by more different types, in more different ways, and in more different combinations.
There are many different ways to get two (or more) components to work together. You can use inheritance, interface implementation, delegates, file transfer, object composition, message passing (WCF, or other protocol), events (yes, I know those are similar to delegates), and probably others I'm not thinking about. You can also add more flexibility by creating generic versions of these.
At one end of the spectrum you have tightly coupled components. Inheritance is a very tightly coupled relationship between base classes and derived classes. Think about how much changing a base class affects any and all derived classes. Add a new abstract method, you'll break every derived class. Add a method, you may break any derived class. Remove methods, change method names, change parameters. Something will likely break. The base classes and the derived classes are tightly coupled. It's also limiting. In .NET, only one base class is allowed. If you design your component to require a particular base class, you force all clients to derive their classes from your base class. It's very limiting for their designs.
At the other end of the spectrum are paradigms like LINQ, where you pass anonymous delegates around like beer at a frat party. There is much more freedom. Your code needs to define methods that match certain signatures. You can implement (or not implement) any interface. You can change base classes. You can use static methods, instance methods, anonymous methods, whatever.
It's clear that there is no one best answer all the time. But, where on the continuum should you be? And, how often should you move far toward one end or the other?
It's clear to me that most C# (and VB.NET) developers understand inheritance, and classic interfaces. Generic interfaces seem to be a bit fuzzier for some developers. Events seem reasonable to most folks. Delegates are a bit fuzzier. Generic delegates a bit more fuzzy. Anonymous delegates are where more people are still learning. Does that affect the right answer for a given problem?
At what point do you sacrifice the loosest possible coupling in return for a clearer (or easier to use) API? Is it ever in fact better to choose tighter coupling in return for clarity?
I admit that this didn't have the technical meat that I try to put in my posts. However, that's because I'm interested in your thoughts on this. I'll post mine in a few days,