The nCore Y-Class supercomputer combines unprecedented low power computational performance with a low latency interconnect and a unified programming model.
This combination meets users' needs for eco-friendly energy consumption, powerful computational ability and mitigates technical risk by preserving existing investment in developed applications.
Designed to fulfill the demands of high performance applications in wide ranging fields, the Y-class system allows scientists and engineers to gain insight into complex problems while delivering real-world results.
The nCore BrownDwarf Y-Class system unifies COTS technologies, high performance SoCs, advanced low latency interconnects, and optimized software to create a supercomputer delivering exceptional performance, reliability, power telemetry, reconfigurability, and programmability at significantly reduced power levels.
The heart of the system is an AdvancedTCA compliant AMC card, which is a self contained high performance compute node running the Linux operating system.
4 ARM A15 cores and 24 C66x DSP cores are organized into 48 nodes per chassis running Linux on the ARM cores. Each chassis delivers 23.1TFLOPS (SP) of ARM+DSP processing power.
The ARM and DSP cores communicate via shared memory over a packet based switching fabric at 2 terabytes (TB) per second.
Each node has 26GB of ECC DRAM accessed via 51.2 gigabytes (GB) per second of memory bandwidth to contain the largest problems while maximizing computational performance.
The extreme low latency RapidIO interconnect provides 20 Gigabits per second (Gbps) of non-blocking, point-to-point, bi-directional RDMA between system nodes.
Each four node blade has 320Gbps of non-blocking switch bandwidth. Each chassis switch blade moves 560Gbps for a total of 2.2Tbps of total switching capability while externally presenting 14 RapidIO Quad-lanes via QSFP+ connectors for an aggregate of 280Gbps per second between chassis.
The Y-class system programming model is a unified compute off-load model where the DSP cores accelerate critical algorithm components via OpenMP Accelerator Model.
Both the ARM cores and the DSP can be programmed in OpenMP 3.0. OpenCL will be available Q4 2013
The system uses OpenMPI via nCore's RapidIO BTL plugin. This preserves the users' investment in existing applications and significantly reduces porting and tuning requirements.
The BrownDwarf Y-Class system compute node has unprecedented onboard power measurement capability for power consumption vs. compute experiments and energy usage monitoring.
Telemetry is captured from all power supply rails using a data acquisition system with two 16 channel multiplexers and six high performance analog to digital converters.
With a user controlled, nanosecond accurate triggering and clocking mechanism, high-resolution measurements are captured out-of-band and directly synchronized with algorithm results during analysis.
There are 36 measurement points on each Y-Class node. 16 VDC/ADC measurement points each on the base and mezzanine boards with an additional low resolution global VDC/ADC points and two temperature sensors on the base board.