Is F# faster than C#
Description
I had a bet with a co-worker that C# would outperform F# for a simple counting exercise. This turned out to be quite a surprise for many reasons. In this blog post I will try to explain why.
Problem
Sum the numbers from 1 to n using brute force (can't use the (1 + N)*(N/2) equation). Do this in both C# and F# and compare the speed. Since F# needs to do this recursively I thought this should be a no contest win for C#.
Using F#
We initially wrote the code without using tail recursion which immediately ended with a stack overflow. So we went back and wrote it doing it the functional way, with tail recursion. Here is what we came up with.
<span class="codeKeyword">module</span> FSharpLib
<span class="codeKeyword">let</span> sumNum fromNum toNum =
<span class="codeKeyword">let</span> <span class="codeKeyword">rec</span> sumNumTail fromNum toNum result =
<span class="codeKeyword">if</span> fromNum > toNum then
result
<span class="codeKeyword">else</span>
sumNumTail (fromNum+1L) toNum (result+fromNum)
sumNumTail fromNum toNum 0L
Using C# - Attempt 1
You'll see later why I say attempt 1. But, this is the simplest C# code I could think of.
<span class="codeKeyword">public</span> <span class="codeKeyword">static</span> <span class="codeKeyword">long</span> CSharpForLoop(<span class="codeKeyword">long</span> fromNumber, <span class="codeKeyword">long</span> toNumber) {
<span class="codeKeyword">long</span> result = 0;
<span class="codeKeyword">for</span> (<span class="codeKeyword">long</span> i = fromNumber; i <= toNumber; i++) {
result += i;
}
<span class="codeKeyword">return</span> result;
}
The Results - Attempt 1
This is the results of running the above code using a fromNumber of 1 and a toNumber of
10000000 and running it 10 times alternating between C# and F# to filter out start-up times, etc.
| F# | 0.8606902s | 11% faster |
|---|---|---|
| C# | 0.9556198s |
I ran it like 20 times because I couldn't believe it. I looked at the C# code for any possible optimizations I could add. But, being so simple there wasn't a lot I could do.
Reflector is an amazing tool
I turned to my good friend Reflector to see what was going on. Here is what the F# code looked like when viewing it as C# code.
<span class="codeKeyword">public</span> <span class="codeKeyword">static</span> <span class="codeKeyword">long</span> sumNum(<span class="codeKeyword">long</span> fromNum, <span class="codeKeyword">long</span> toNum)
{
FastFunc<<span class="codeKeyword">long</span>, FastFunc<<span class="codeKeyword">long</span>, FastFunc<<span class="codeKeyword">long</span>, <span class="codeKeyword">long</span>>>> sumNumTail = <span class="codeKeyword">new</span> clo@7();
<span class="codeKeyword">return</span> FastFunc<<span class="codeKeyword">long</span>, <span class="codeKeyword">long</span>>.InvokeFast3<<span class="codeKeyword">long</span>, <span class="codeKeyword">long</span>>(sumNumTail, fromNum, toNum, 0L);
}
and clo@7.Invoke looks like
<span class="codeKeyword">public</span> <span class="codeKeyword">override</span> <span class="codeKeyword">long</span> Invoke(<span class="codeKeyword">long</span> fromNum, <span class="codeKeyword">long</span> toNum, <span class="codeKeyword">long</span> result)
{
<span class="codeKeyword">while</span> (fromNum <= toNum)
{
result += fromNum;
toNum = toNum;
fromNum += 1L;
}
<span class="codeKeyword">return</span> result;
}
They used a while loop instead of a for loop. OK, I can do that. But, what happened to tail, I thought that tail is why the CLR is so good at functional language implementations.
C# - Attempt 2
<span class="codeKeyword">public</span> <span class="codeKeyword">static</span> <span class="codeKeyword">long</span> CSharpWhileLoop(<span class="codeKeyword">long</span> fromNumber, <span class="codeKeyword">long</span> toNumber) {
<span class="codeKeyword">long</span> result = 0;
<span class="codeKeyword">while</span> (fromNumber <= toNumber)
{
result += fromNumber;
<span class="codeComment">// I left out the toNum = toNum because it didn't do anything.</span>
fromNumber += 1L;
}
<span class="codeKeyword">return</span> result;
}
and the results (I've got you now F#)...
| F# | 0.8606902s |
|---|---|
| C# For Loop | 0.9556198s |
| C# While Loop | 0.9105592s |
Oh wait... DAMN! What could F# be doing to make it almost 10% faster than C#.
To the IL code Batman
This is the IL for the F# code.
L_0000: nop <span class="codeComment">// while loop</span>
L_0001: nop
L_0002: ldarg.1 <span class="codeComment">// fromNum <= toNum</span>
L_0003: ldarg.2
L_0004: ble.s L_0009
L_0006: nop <span class="codeComment">// return result</span>
L_0007: ldarg.3
L_0008: ret
L_0009: nop <span class="codeComment">// result += fromNum</span>
L_000a: ldarg.1
L_000b: ldc.i8 1
L_0014: add
L_0015: ldarg.2 <span class="codeComment">// fromNum += 1L</span>
L_0016: ldarg.3
L_0017: ldarg.1
L_0018: add
L_0019: starg.s result <span class="codeComment">// loop</span>
L_001b: starg.s toNum
L_001d: starg.s fromNum
L_001f: br.s L_0000
This is the IL for the C# while loop.
L_0000: nop <span class="codeComment">// long result = 0L</span>
L_0001: ldc.i4.0
L_0002: conv.i8
L_0003: stloc.0
L_0004: br.s L_0012 <span class="codeComment">// while loop</span>
L_0006: nop <span class="codeComment">// result += fromNumber</span>
L_0007: ldloc.0
L_0008: ldarg.0
L_0009: add
L_000a: stloc.0 <span class="codeComment">// fromNumber += 1L</span>
L_000b: ldarg.0
L_000c: ldc.i4.1
L_000d: conv.i8
L_000e: add
L_000f: starg.s fromNumber
L_0011: nop
L_0012: ldarg.0 <span class="codeComment">// fromNumber <= toNumber</span>
L_0013: ldarg.1
L_0014: cgt
L_0016: ldc.i4.0
L_0017: ceq
L_0019: stloc.2
L_001a: ldloc.2
L_001b: brtrue.s L_0006 <span class="codeComment">// loop</span>
L_001d: ldloc.0
L_001e: stloc.1
L_001f: br.s L_0021
L_0021: ldloc.1
L_0022: ret
What just happened.
Wow, the C# compiler is quite a bit more verbose. First of all, C# makes the while loop into a do/while loop. Secondly, in the loop condition C# creates a temporary variable to store the comparison and then checks for 'true' ("brtrue.s"). If you look at the F# IL you'll see the use of the "ble.s" instructions (branch if less or equal) instead. Finally we see that C# uses a few "conv.i8" instructions (convert to 8 byte/64-bit int) which I can only guess adds to it's defeat.
Conclusion
So is F# just a more performant language or is C# just compiling to more robust IL code which as F# matures will end up suffering the same fate? I personally think (hope actually) the later is true. I'm not an expert in compilers but I would have to assume the C# compiler team is using every trick in the book to make highly performant IL code without sacrificing robustness and I can only assume that F# will get to a point where it needs to make the same concessions to handle some obscure cases.
Very nice article! I think you are dead on though with your assumption that once F# is fully integrated into VS 2010/.NET 4.0 it might succumb to a very slight performance hit.From what I understand right now in the CTP its not fully integrated inside the .NET 4.0 CLR.
By looking at the IL you've presented it looks like you were running both samples in debug mode. Did you try release mode?I played with the C# code you have above and found the following:In debug mode, both the for loop and while loop versions perform roughly the same.In release mode, the while loop performed ~1.4x faster than the debug version, and the for loop performed ~3.6x faster (for loop was ~2.5x faster than while loop).I have never used F#, so I can't give any results regarding debug/release mode differences.But you may want to run your tests again to see if there are any differences (Maybe you didn't lose the bet after all... maybe).
My thoughts are if both versions are equivalently "correct", then C# is simply at fault for compiling to less optimized code. That is to say that there's no excuse why complexities elsewhere in the language mean that it has to create a poorer compilation as compared to the F# compiler.Personally I'd rather see the C# compiler become properly optimized in comparison to the F# compiler, not F# become slower. Don't know why you would "hope" for the latter. *rolls eyes*Interesting post, nonetheless. Looking forward to fiddling with F#.
F# is based on OCAML which was itself known for good performance. It allowed programmers to easily implement algorithms that would run at near-native speeds. It was even used with good success in some national programming competitions. The entire F# compiler has, of course, been developed from the ground up to output IL, but the language lends itself well to optimized output code.
How about comparing to a LINQ implementation? Good or bad, I'd be interested in seeing the results :)
"I would have to assume the C# compiler team is using every trick in the book to make highly performant IL"Are you kidding? The C# compiler does far less stuff, as you just saw right here. The F# compiler determined the code was a loop, and compiled it so. (It didn't need to use the tail prefix 'cause it did even better.)I'm not saying the C# compiler doesn't optimize, but it tries to keep a direct mapping from source to IL. (I think this is even sort of a goal for the C# compiler).I think I heard that F# will do other things, like when creating a closure, it could pass captured values in as arguments instead of a closure object+fields.Second, can you explain why you think the C# IL is more "robust"?Anyways, it looks as if you compiled with debug (all the nops), so this isn't a very good comparison after all...
I looked at the output of the same code in managed C++ in release and found two issues: (1) the compiler is making some fairly egregious mistakes in terms of doing redundant loads and store and (2) for some bizarre reason is loading 32 bit constants and converting them to 64 bits.The JIT, however, does a fairly decent job of turning the IL into x86 and nearly everything is in registers.I'm impressed with the F# compiler - that's pretty nice IL for the task.
This looks like unoptomized code with all those nops..
Execution time of Fibonacci sequence in C# and F# 4.0 from 0 to 39 (Console application run outside Visual Studio).F# method "let rec fib n = if n > 2 then fib (n-1) + fib (n-2) else if n > 0 then 1 else 0"C# method 1 (lambda method): "fib = n => n > 2 ? fib(n - 1) + fib(n - 2) : (n > 0 ? 1 : 0);"C# method 2 (static method): "if (x > 2) return Fibonacci(x - 1) + Fibonacci(x - 2); else if (x > 0) return 1; else return 0;"--- DEBUG ---F# takes 2.722 seconds
C# takes 3.554 seconds (lambda)
C# takes 3.317 seconds (static)--- RELEASE ---F# takes 1.260 seconds
C# takes 1.793 seconds (lambda)
C# takes 1.250 seconds (static)Consusion: C# is slightly faster in release mode.
The C# compiler does not aim at high optimized IL code. It only does some trivial optimizations, like unreachable code removal and removal of unused variables. It is the JIT compiler, which s doing more aggressive optimizations when generating the machine instructions. So your comparison should not stop at the IL level. Especially for an analysis of execution speed deviation, you should rather look at the disassambly. However, in order to do so, you will have to run in release mode without having any debugger attached or the JIT will refuse to optimize properly. At the end, I suppose, C# will slightly outperform F# but haven't tested it for your specific task.