1、 Preface
CXL (Compute Express Link) technology is a new type of high-speed interconnection technology aimed at providing higher data throughput and lower latency to meet the needs of modern computing and storage systems. It was initially jointly launched by Intel, AMD, and other companies, and received significant support from companies such as Google and Microsoft.
CXL's goal is to address the memory gap between CPUs and devices, as well as between devices. The server has a huge memory pool and a large number of PCIe based computing accelerators, each with a large amount of memory. Memory segmentation has caused significant waste, inconvenience, and performance degradation. CXL was born to solve this problem< Br/>
nine hundred
The background of CXL technology can be traced back to PCIe (Peripheral Component Interconnect Express) technology, which is a standard interface technology used to connect internal components of a computer. PCIe devices can initiate a DMA to access memory, as long as the target physical address is known. Prior to CXL, there were various protocols such as OpenCAPI led by IBM, CCIX supported by ARM, GenZ supported by AMD, and NVlink proposed by Nvidia. Although PCIe has made many improvements, it is difficult to meet the high bandwidth and low latency communication needs between modern computer processors and accelerators. So, CXL technology emerged< Br/>
CXL technology has a wide range of application scenarios, including data centers, artificial intelligence, and processor interconnections. In the field of data centers, CXL technology can interconnect different computing and storage resources, improving system performance and efficiency. In the field of artificial intelligence, CXL technology can enable accelerators such as GPUs and FPGAs to collaborate better with the main processor, improving the speed of AI model training and inference. In terms of processor interconnection, CXL technology can achieve interconnection between processors from different manufacturers, improving the overall performance and flexibility of the system.
&Nbsp;
2、 CXL Technology Overview
2.1. What is CXL technology?
CXL (Compute Express Link) is a high-speed serial protocol that allows for fast and reliable data transmission between different components within a computer system. CXL was launched in 2020 and designed jointly by companies such as Intel, Dell, and HP. It aims to solve the bottleneck problems in High-performance computing, including memory capacity, Memory bandwidth and I/O latency. CXL can also achieve memory expansion and memory sharing, and can communicate with peripherals such as computing accelerators (such as GPUs, FPGAs), providing faster and more flexible data exchange and processing methods< Br/>
&Nbsp;
CXL technology not only provides high-speed transmission, but also supports memory sharing and virtualization, making collaboration between devices closer and more efficient. This technology helps meet the needs of modern data centers for large-scale processing and analysis, while also providing better support for emerging applications such as AI, machine learning, and blockchain< Br/>
&Nbsp;
2.2. Three modes of CXL technology: http://CXL.io , CXL.cache, and CXL.memory
The CXL protocol consists of three sub protocols:
http://CXL.io This mode can expand memory to external devices, making data transmission faster. http://CXL.io Connect CPU and external devices through PCIe bus, so that CPU can share memory with external devices, and can directly access I/O resources of external devices.
CXL. cache: This mode can improve performance by caching memory to external devices. The CXL. cache mode allows the CPU to retain the most commonly used data in the local cache, while keeping the less commonly used data in external devices. This can reduce memory access time and improve overall system performance.
CXL. memory: This mode allows external devices to be used as main memory, thereby achieving greater memory capacity. The CXL. memory mode allows the CPU to view external devices as extended memory, allowing it to store more data. This approach can improve the reliability of the system, as even if a memory failure occurs, the CPU can still continue to run through external devices.
nine hundred and five
http://CXL.io It is a physical layer interface defined in the Compute Express Link (CXL) specification, which can provide lower latency, higher bandwidth, and better scalability than traditional PCIe.
http://CXL.io By using SerDes technology, a technique for converting serial data into parallel data and reverse conversion, multiple different data streams are transmitted simultaneously on a single physical channel. These data streams can include bandwidth intensive data streams, low latency command and control information, as well as configuration registers and status information. http://CXL.io It also supports advanced features such as hot swapping and link training.
http://CXL.io The physical layer specification defines the electrical characteristics, timing requirements, and connector interfaces of signals to ensure high reliability and performance. http://CXL.io It adopts a signal rate of 4x25Gbps or 3x32Gbps and supports one-way or bidirectional communication. In terms of connectors, http://CXL. io uses a 40 pin SMT connector, with 27 pins used for data transmission and the rest for power, ground, and clock signals.
There are three types of CXL technology:
Type 1: Accelerator or add-on card installed through PCIe slot. These cards can be integrated with existing systems and communicate directly with the CPU through the CXL interface to provide faster data transfer speed. Used for cache devices such as network cards.
Type 2: Capable of all Type 1 devices, typically used in scenarios with high-density computing. For example, GPU accelerator.
Type 3: A specialized storage device that communicates directly with the host processor and can use the CXL protocol to achieve low latency and high throughput data transmission. It is used as a memory buffer to expand Memory bandwidth and memory capacity.
&Nbsp;
-Architecture of CXL technology
&Nbsp;
3、 The advantages of CXL technology
Faster data transfer speed: CXL technology can achieve data transfer speeds of up to 25GB/s, which is faster than the commonly used PCIe 4.0 technology. This means that data processing and transmission can be faster in high-performance application scenarios such as data centers.
Lower latency: CXL technology can directly connect computing devices such as CPUs, GPUs, and FPGAs to memory, avoiding the latency caused by traditional I/O buses, thereby achieving lower latency and improving computing efficiency.
Higher energy efficiency: CXL technology supports Shared memory among multiple computing devices, reducing memory redundancy and improving energy efficiency. In addition, CXL technology also supports memory virtualization, which can dynamically allocate memory resources based on application load, further improving system energy efficiency.
Stronger scalability: CXL technology can support memory expansion, allowing for the addition of more memory capacity without downtime, thereby increasing system scalability and preparing for future application needs.
More extensive application scenarios: CXL technology is not only applicable to High-performance computing fields such as data centers, but also can be applied to multiple fields such as artificial intelligence, blockchain, and the Internet of Things.
In short, it supports high bandwidth, low latency data transmission, has better flexibility and scalability, and can achieve mixed use of different types of hardware devices.
&Nbsp;
4、 Application of CXL Technology
4.1 Application in Computer Systems
CXL can be used in many applications in computer systems:
High-performance computing: CXL can provide a protocol with low latency and high bandwidth, which can be used to connect CPU, GPU, FPGA and other processors, and provide faster data transmission in High-performance computing.
Storage acceleration: CXL can be used to connect storage devices, such as SSDs and NVMe drives, to provide faster storage access speed. In addition, CXL can be integrated with Memory controller to provide more bandwidth for storage acceleration.
Artificial intelligence: CXL can be used to connect AI chips to provide faster data transmission, thereby improving the performance and efficiency of AI workloads.
Network acceleration: CXL can also be used to connect network adapters to provide faster network transmission speed, thus improving the performance of Web application.
4.2 Applications in Data Centers
In the data center, CXL can be applied in the following areas:
High-performance computing: CXL can provide faster data transmission speed and lower latency than traditional PCIe, thus improving the efficiency and throughput of High-performance computing.
Storage acceleration: CXL can directly connect the storage accelerator to the host CPU, achieving faster data access and higher IOPS, improving storage performance.
AI acceleration: CXL can directly connect AI accelerators to CPU/GPU/FPGA and other processors, achieving faster model training and inference speed, and improving the performance of artificial intelligence applications.
Large scale virtualization: CXL can combine multiple CPU and memory resources into a large-scale virtualization cluster, thereby improving resource utilization and flexibility, and reducing the complexity of virtualization management.
nine hundred and seven
4.3 Applications in the field of artificial intelligence
In the field of artificial intelligence, CXL can play the following roles:
Improving data transmission efficiency: For tasks such as deep learning, a large amount of data transmission and computation is required. In traditional PCIe interconnects, slow data transmission speed and high round-trip latency can lead to low computational efficiency. By using CXL interconnection, low latency and high-speed data transmission can be achieved, improving computational efficiency.
Accelerated model training: As deep learning models become increasingly complex, more computational resources are needed for training. By using CXL interconnection, the CPU, GPU, and other accelerator devices can work closely together to improve the speed and efficiency of model training.
Integrating AI with the Internet of Things: CXL technology enables AI applications to directly connect to IoT devices, enabling faster data processing and response. This is crucial for real-time applications that require quick response.
Reduce energy consumption: CXL technology can reduce the power consumption required to transmit data, while also reducing data caching in the system, achieving more efficient energy management.
nine hundred and eight
5、 Comparison of CXL Technology with Other Technologies
Comparison with technologies such as PCIe and NVMe:
Bandwidth: The bandwidth of CXL is much higher than that of PCIe. The CXL 2.0 standard can reach up to 32 GT/s, while PCIe 5The bandwidth of. 0 can only reach 16 GT/s. In contrast, NVMe is a protocol rather than an interconnect technology, and its bandwidth depends on the interconnect technology used.
Delay: Both CXL and PCIe have the characteristic of low latency, but CXL slightly outperforms PCIe in terms of latency. The NVMe protocol performs well in terms of latency.
Features: CXL supports functions such as memory expansion, cache consistency, and device direct memory access, which are not provided by PCIe or NVMe.
Application scenario: PCIe is mainly used to connect external devices, such as GPUs, network cards, and storage devices. NVMe is mainly used to connect Solid-state drive. CXL, on the other hand, is more flexible and can be used to connect processors, storage devices, network adapters, and other peripheral devices, with a wider range of applications.
Compatibility: Due to CXL being a relatively new technology, many old devices may not be compatible with it. PCIe has become a universal connection standard and has been widely used.
Cost: Currently, the cost of CXL hardware and equipment is relatively high, while PCIe is more popular and cost-effective.
&Nbsp;
CXL and CCIX are both high-speed interconnection standards used to connect different chips, but they differ in some aspects. The following is the main comparison between CXL and CCIX:
Performance: The CXL standard provides higher bandwidth and lower latency, giving it advantages in High-performance computing, machine learning, artificial intelligence and other fields. The CCIX standard focuses on low power consumption and high reliability, suitable for scenarios such as the Internet of Things and mobile devices.
Compatibility: The CXL standard is based on the PCI Express protocol and is therefore compatible with existing PCIe interfaces. The CCIX standard requires the use of new physical and control layers, which to some extent limits its compatibility.
Application scenario: The CXL standard is suitable for scenarios that require high performance and stronger memory expansion, such as large server clusters and supercomputers. The CCIX standard is suitable for scenarios that require low power consumption and high reliability, such as data centers, IoT, mobile devices, etc.
Supported architectures: CXL supports multiple processor architectures such as x86, Power, and ARM. CCIX, on the other hand, focuses on ARM and Power architectures, with limited support for x86 architectures.
&Nbsp;
6、 Challenges and Solutions for Implementing CXL Technology
The implementation of CXL technology faces the following challenges:
Complexity: The implementation of CXL technology requires highly complex system design and integration, which means adapting to different hardware, software, and tools.
Security: Due to the fact that CXL technology involves low-level hardware operations and may involve data sharing between multiple devices, security risk is an important consideration. Appropriate measures need to be taken to ensure data security and privacy.
Performance: CXL technology needs to provide high-speed data transmission and low latency to meet the requirements for computing and storage capabilities. This requires efficient protocols and optimized hardware and software design.
Compatibility: CXL technology needs to be compatible with existing interfaces and protocols to support upgrades of old devices and systems. This requires appropriate converters and middleware.
To address these challenges, the following solutions can be adopted:
Standardization: Develop unified standards and specifications to ensure compatibility and interoperability between devices and systems from different manufacturers.
Optimization design: Improve performance and security by optimizing hardware and software design. For example, add Hardware acceleration, memory cache, and error correction.
Manage data sharing: Take appropriate measures to manage data sharing between devices, such as access control, authentication, and encryption.
Provide middleware: Provide converters and middleware to support upgrades and compatibility of existing systems.
&Nbsp;
7、 Conclusion
Wider application scenarios: With the increasing importance of data centers, CXL technology will be applied in more application scenarios, such as supercomputers, AI accelerators, network accelerators, NVMe SSDs, etc.
Higher bandwidth and lower latency: CXL technology supports higher bandwidth and lower latency, which gives it an advantage in processing large-scale data.
Better memory scalability: CXL technology allows multiple devices to share the same block of memory, greatly improving memory scalability and flexibility, and has broad application prospects in large computing clusters and supercomputers.
Better compatibility: Due to CXL technology being based on the PCIe protocol, it is compatible with existing PCIe interfaces. This makes CXL technology more compatible and scalable.
Targeting different processor architectures: CXL technology can support different processor architectures such as x86, ARM, and Power, providing more choices for different systems.