Supercomputing group iVEC could more than double its technical headcount as it prepares to install and support all three components of the Federal Government’s $80 million Pawsey Centre project.
The group has so far delivered a $5 million Linux cluster named 'Epic' and a $4 million hybrid-GPU cluster named 'Fornax' under Stages 1A and 1B of the project.
It is expected to deliver a separate, petascale supercomputer by next year, under Stage 2 of the project.
IVEC acting executive director Paul Nicholls said the group had recruited two operational staff for Epic and five storage and HPC specialists for Fornax.
Nicholls said iVEC would gradually grow the size of its operational team over the next 12 months as Fornax went into production and it began designing, installing and testing the Stage 2 machine.
“We’re very, very close [to announcing the petascale technology],” he told iTnews.
“The solution has been identified and we are in discussions with five vendors ... it is a multi-vendor solution.”
Nicholls said staffing requirements would depend on the design of the final petascale machine, but indicated the operational team could grow to more than 15 people.
In May, iVEC also established a separate, six-person Supercomputing Technology and Applications Program (STAP) team to help scientists develop code for Pawsey machines.
The Western Australian Government last year committed $15.8 million over four years to attract skilled workers to the Pawsey Centre.
Fornax goes into production
The Fornax GPU cluster will shortly enter production phase, involving up to 20 scientific projects, after being installed at the University of Western Australia in January.
The supercomputer was supplied by SGI and comprises 96 nodes, each containing two six-core Intel Xeon CPUs, an Nvidia Tesla GPU, 48 GB memory and seven terabytes of local storage.
IVEC spokespeople told iTnews that the machine had performed largely as expected during a seven-week ‘early adopter’ phase that ended in June.
System administrator Ashley Chew highlighted a brief outage due to subnet management issues in the InfiniBand network that connected Fornax nodes and a 500-terabyte global file system.
Chew said the issue was resolved by running subnet management software on a Fornax node, rather than using software embedded on an InfiniBand switch.
“InfiniBand is great at what it does but it requires a lot of tinkering to make it work,” he noted.
Additionally, Chew observed that researchers tended to use Open GLX application programming interface for Fornax, instead of newer OpenCL or CUDA technology.
“We have a job to educate users of the GPU cluster better,” Nicholls noted.
Five members of the STAP team devoted work to developing code for Fornax users, with the remaining member focused on Epic.
Nicholls said early adopters of Fornax had been chosen for their experience with GPU programming.
Early adopters used Fornax for geoscience, bioinformatics, chemistry, and radioastronomy projects including processing data from the Murchison Widefield Array — one of Australia’s pathfinders for the Square Kilometre Array.
Similarly, the first round of the production phase would likely involve researchers with GPU experience, while the STAP team worked with four to five other groups in preparation for round two in early 2013, Nicholls said.
IVEC expects Fornax’s combination of GPUs and fast local storage to be particularly well-suited to data-intensive computations in radioastronomy and the geosciences.