Understanding the Open Architecture of AI Hardware: OAI & OAM

16 May 2024

In the AI domain, the Open Accelerator Infrastructure (OAI) is a sub-organization established by one of the world's most influential open hardware organizations: the Open Compute Project (OCP). Since 2019, OAI has focused on defining AI accelerator card form factors suitable for large-scale deep learning training, addressing the issue of diverse AI accelerator card forms and interface incompatibility. By releasing the OAI-UBB (Universal Base Board) 1.0 design specification, OAI has promoted the standardization of AI acceleration hardware platforms, enabling support for products from different manufacturers without modification, significantly enhancing the scalability and flexibility of AI modules.

 

The Architecture Adopted by AI Giants - OAM (Open Accelerator Module)

For AI server developers, the Open Accelerator Module (OAM) brings significant benefits. Due to the diversity and specialization of AI accelerator chips, developers face higher development costs and longer development cycles. The emergence of OAM provides these developers with an efficient and scalable solution, making it easier to integrate new AI accelerators. This not only lowers the entry barrier but also accelerates the time-to-market for products.

Advantages and Challenges of OAM

OAM has three notable advantages, making it particularly important in today's rapidly developing AI market:

    1. High Performance and Efficiency: OAM can significantly improve processing performance and efficiency, particularly for high-computing-demand applications such as deep learning and machine learning.
    2. Scalability: OAM's design allows for flexible compatibility and scalability between different systems and baseboards, adapting to the advancing high-computing-load and technological developments.
    3. Support for Diverse Application Scenarios: OAM is applicable to various fields, including AI inference, scientific simulations, and data analysis, enabling it to meet a wide range of business needs.

These advantages demonstrate OAM's potential and flexibility in modern data centers and high-performance computing environments.

However, adopting OAM also comes with several challenges:

    1. Technical and Design Complexity: Current specialized AI hardware systems are technically and design-wise complex, often requiring 6 to 12 months to integrate new AI accelerators into the system, thus hindering the rapid adoption of new competitive technologies.
    2. High Power Consumption Management: With OAM product design power exceeding 600W, the Base Specification already recommends the use of liquid cooling technology, indicating that managing high power consumption is a significant challenge, especially as future power may easily exceed 700W.
    3. Standardization and Compatibility: The rapid evolution and diversity of AI accelerators demand higher standards for standardization and compatibility to support the scalability and high-speed communication links between various hardware acceleration solutions.

What Can NEXTRON Do for OAM Designers?

NEXTRON is one of the few market solution providers with expertise in high-speed transmission, structural design, and cooling modules. It has already provided high-speed I/O and OAM-related products to several leading AI chip designers. For OAM design, NEXTRON has identified two common challenges:

    1. Structural Design for Cooling: Top Stiffener with Thermal Solution The Top Stiffener, though minimally covered in the Base Specification, requires good structural design to complement the OAM design; otherwise, it directly impacts cooling efficiency. A good Top Stiffener design includes solid support and thermal conduction, enabling air-cooling solutions like 3U or 4U height 3D VC to function effectively. Facing next-generation solutions with over 600W TDP, integration with a cold plate liquid cooling is also necessary. This tests the manufacturer's structural design, material selection, and understanding of cooling technology.

    2. Challenges in Processing: OAM Bottom Stiffener Poor connection between OAM and UBB is a common issue in practice, often due to neglecting the OAM Bottom Stiffener. OAM generally uses the Mirror Mezz Pro Connector, and the Base Specification mentions that tolerances must be maintained at ±0.15mm. However, due to differences in assembly and processing capabilities, a poorly made Bottom Stiffener can lead to excessive final assembly tolerances or uneven ends, resulting in poor connections.

Conclusion

AI-related technologies have developed rapidly in recent years, bringing cross-disciplinary challenges. As NEXTRON collaborates with AI leaders to overcome various difficulties, it deeply realizes that every small detail is crucial for achieving impressive AI performance. NEXTRON hopes that the accumulated experiences and capabilities can help more AI product developers solve problems, making this civilization-changing technology more accessible to everyone.


This website uses technical and analytical cookies, including third-party cookies, to analyse user browsing behaviour, create website visit statistics and improve the contents provided. To consult the full Cookie Policy or decline, at a later date, your consent to the cookies used by the website, click here.

Got It!