22 juil. 2024
Optical circuit switching in disaggregated cloud and HPC infrastructures
The Hyperscalers in cloud computing and other providers of high-performance computing (HPC) services have to architect and scale their computing platforms to meet client demand for AI applications while controlling CAPEX and reducing power requirements. In particular, the processing power required has increased by orders of magnitude.
Disaggregation of resources holds the key to lower costs and reducing power
Instead of the building blocks of these platforms being tightly and inflexibly bundled in a fairly monolithic platform such as a standard server chassis, the process of "disaggregating" the requisite component parts or sub-systems avoids the risk of increased inefficiency and underutilisation of some of the key underlying resources, and importantly excessive power consumption, inevitable if simply "racking and stacking" more servers.
In a disaggregated architecture, these resources (CPU, memory, storage, acceleration hardware in its various guises) are flexibly combined by interconnecting them using integrated high-speed digital transceivers and a dedicated interconnect fabric based on appropriate transport media and switching technologies. These can then be combined and appropriately scaled, independently of each other, to meet the demands of the expected workloads.
Flexible resource utilisation
The principle of disaggregation is shown in the diagram above. The required resources are bundled together in bespoke ratios to form flexibly proportioned "bare metal" hardware hosts "composed" on-the-fly, using a common pool of underlying finely grained resources. The key building blocks in this case are the lower level resource elements themselves such as CPUs, memory, storage and various kinds of accelerators (GPUs, TPUs, FPGAs).
Several levels of disaggregation can be defined, related to the granularity at which the resource blocks can be accessed and consumed.
In the most granular form of disaggregation, each resource block (e.g. a bank of DRAM, a CPU, an accelerator) has onboard hardware to facilitate the necessary high-speed, low-latency connection of its resources to an interconnect platform.
Less granular forms of resource disaggregation that are more compatible with current hardware implementations may be seen as a way to facilitate a more gradual transition towards fully disaggregated platforms. These include: optical interconnect overlaid on packet-switching fabric and repurposing conventional servers.
Optical interconnect overlaid on packet-switching fabric
An application in which the dynamically interconnected compute resource components are limited only to accelerator hardware. By fitting single mode optical transceivers they can be flexibly and directly interconnected with those in other hosts using a dedicated optical switching fabric that effectively acts as an overlay to a packet switching fabric that is already used to provide most of the interconnect between the hosts in the cluster.
Repurposing conventional servers
Going beyond the interconnection of accelerator cards alone to accessing more of the resources already present in fleets of conventional servers, a dedicated PCIe interconnection card fitted with specialised SerDes processing hardware and firmware and high-density, high-speed optical transceivers acts as a high performance gateway between the PCIe-connected compute resources in that chassis and the optical interconnect fabric.
The interconnect fabric
An optical interconnect fabric with transparent optical circuit switching provides deterministic, circuit-switched, fixed bandwidth data paths which are well suited to interconnect hardware resource elements that would otherwise be directly and deterministically interconnected at a low level by dedicated traces on a server motherboard or via a specific bus technology such as PCI Express.
It also promises significant reductions in power consumption of the fabric itself compared to an electrical fabric, much lower latencies associated with the data paths through it, and a better ability to physically scale the fabric up and out. It also enjoys significantly better future-proofing thanks to the inherent transparency of the fabric to the formats and line rates of the serialized data traffic between the optical transceivers associated with the disaggregated resource elements.
The lowest loss optical circuit switches, such as POLATIS® DirectLight™ switches, allow for fabrics to be constructed with up to four or more stages of switching whilst keeping within the optical loss budgets of typical transceivers used with disaggregated resource elements.
The benefits of disaggregated computing:
- Hardware computing platforms can be composed on-the-fly.
- Platforms can be scaled to whatever size and ratio of the available resource types is appropriate to the kinds of workloads that will be run on the hardware.
- Platforms can be resized during the course of running a particular workload as the resource consumption requirements evolve.
- Resources not required can be temporarily powered down, resulting in OPEX savings.
Disaggregation enables operators to:
- Select best-of-breed vendors for the various component building blocks.
- Use those resources that support only the specific functions they need.
- Upgrade different types and/or blocks of resource element as and when required.