ALMA software scalability experience with growing number of antennas

The ALMA Observatory is a challenging project in many ways. The hardware and software pieces were often designed specifically for ALMA, based on overall scientific requirements. The observatory is still in its construction phase, but already started Early Science observations with 16 antennas in September 2011, and has currently (June 2012) 39 accepted antennas, with 1 or 2 new antennas delivered every month. The finished array will integrate up to 66 antennas in 2014. The on-line software is a critical part of the operations: it controls everything from the low level real-time hardware and data processing up to the observations scheduler and data storage. Many pieces of the software are eventually affected by a growing number of antennas, as more processes are integrated into the distributed system, and more data flows to the Correlator and Database. Although some early scalability tests were performed in a simulated environment, the system proved to be very dependent on real deployment conditions and several unforeseen scalability issues have been found in the last year, starting with a critical number of about 15 antennas. Processes that grow with the number of antennas tend to quickly demand more powerful machines, unless alternatives are implemented. This paper describes the practical experience of dealing with (and hopefully preventing) blocking scalability issues during the construction phase, while the expectant users push the system to its limits. This may also be a very useful example for other upcoming radio-telescopes with a large number of receivers.