To respond to all the questions that I have received last day I will make some clarifications.
First of all, one of the jobs were I use Erlang is Distributed programming. More specifically I am building an application that changes a lot (while I am trying to develop some algorithms, and from this comes the desire to use a functional language) which needs to perform a humangous number of floating point operations (yes C is a better language, Octave is very good for doing math, and MPI is what academia uses, BUT I decided to use Erlang to do this job although I read some articles regarding some limitations of its floating point operations).
So I needed to optimize in some way the floating point precision of Erlang. To the reader it might look like this is nonsense. Why do you use Erlang ? Answer: because it is easy to learn, it is a functional language, you can think in higher order constructs (well suited when you are doing a proof of a concept), you have a concurency model at you disposal (communication is done using message passing), you think more and code less (well you encode more in your head and write less), and if someone asks you : why didn’t you use the Abstract Factory, Object pool, Singleton, Adapter, Composite, Decorator, Flyweight, Iterator, Memento patterns, I can answer with satisfaction: it is functional programming, not OOP, so these patterns are provided by the language constructs.
Therefore I looked some ways to adapt the environment in which I develop, to what I need. Furthermore, let me bring up the fact that Slackware users have to compile almost everything. The argument might be that they want to squize every bit of performance from their system.
The following question comes up: “If I compile myself the Erlang VM, will the performance of my system increase ?”. It depends. I am sure that the Erlang team does its best to provide the highest performance while maintaining the portability, by not making any assumptions about the underlying hardware. And usually, portability is more important than performance. Besides, what will you think that will happen if some software developer makes a program that has its highest performance on the Intel platform ? Do you think that AMD, and Sun will be happy ? Or vice-sersa, Sun’s Niagara vs. Intel’s processors.
This is the user’s job: to adapt what is provided to him (for free) to its needs. And so I did. So should everybody do. Do you have a server motherboard ? If yes, did you know that it has I/O processors ? Would somebody develop an “read call” version that uses specifically the instructions of the I/O processor? It seems not, the basic strategy is: “I/O in Erlang is slow”, but what can you do about it ?
So the least I could do was to compile Erlang with the Intels compiler, to tell the processor to use a more strict model when performing divisions and square roots, to speed up the floating point operations (with a trade-off towards precission) and the memory transfer.
So, if you use Erlang for I/O bound oprations for some sort (parsing XML, using it as a gateway for some sort of services) this kind of optimization does not improve the performance at all. You should look for other types of optimizations.
Regarding the tests that I performed, the comparison was made with the CC=gcc, no other arguments provided. The ./configure will uses the Makefile’s provided optimization level (-O3, -O2). Please note that it is not always wise to optimise everything, since for example unrolling loops increases the code size. So the
icc -gcc-version=420 -mssse3 -ip -par-threshold=50 -fp-model strict -prec-div -prec-sqrt -pc80 -opt-streaming-stores always -par-threshold=50 -par-runtime-control -par-schedule-static-balanced -static-libgcc -static-intel
can be interpreted as follows:
- “I will provide you” parameters that correspond to the gcc version 4 (again, please note that I do NOT give optimization directives such as -O3 or -O2, firstly because are specified in the Makefiles, and secondly because I think that the Erlang team has performed extensive profiling to see were optimization enhances performence or not).
- “If you can” please increase the precision of sqrt and div
- “Use the existing resources as well as you can” (-mssse3)
- “My threads are memory bound”, and every one of them has what to do (do not steal data from neighboring threads, just do your work !)
Last but not least “Open a Bash console …”, yes I really meant it. I do not insult anyone’s intelligence. This might be crucial detail. I remember that back some time, while I was working at a project I was setting the path to a program using Bash style, while I was working in a C shell and I didn’t know what was going on (I know the prompt is different, but somebody tried to make a joke by compiling the source code and altering the prompt). Secondly, if somebody searches Google, one might see that not setting the path, or not setting it correctly is an often mistake.