The perfect choice of one-stop service for diversification of architecture.
In order to support the "triple play" application, people's demand for high-speed communication and ultra fast computing is increasing, which puts forward new challenges to system developers, algorithm developers and hardware engineers, requiring them to integrate various standards, components and networking devices into a whole
At the same time, developers should not only keep up with the increasing performance requirements, but also pay attention to keeping the cost low. These purposes can be achieved by effectively using FPGA based on serial RapidIO as DSP coprocessor
Since the triple play application integrates voice, video and data applications, new algorithms must be used to set the parameters of its development and system optimization strategy. In the meantime, developers need to solve the following problems: constructing an adjustable and scalable architecture, supporting distributed processing, adopting standard based design, and optimizing performance and cost
After careful study, we will find that these challenges to meet the application requirements mainly involve two themes: one is connectivity, which essentially means to realize "fast" data transfer between different devices, boards and systems; The second is computing power, which refers to the processing resources available in the equipment, board and system respectively
Connection between computing platforms
The standard based design is usually much simpler than the "free play" design. It is also a typical design mode today. Although the parallel connection standards (PCI, PCI-X, EMIF, etc.) can meet the current requirements, they are insufficient if they consider the adjustability and scalability. With the continuous progress of packet processing technology, The development trend of connection standards obviously tends to high-speed serial connection, which can be seen from Figure 1
High speed serial standards such as PCIe and GBE / XAUI have been applied in desktop and network industries, but the data processing system in wireless communication facilities has slightly different requirements for interconnection. It requires:
1. Few pins;
2. Backplane and chip to chip connection are required;
3. Adjustable bandwidth and speed;
4. Have DMA and message passing functions;
5. Support complex and adjustable topology;
6. Support multipoint transmission;
7. Highly reliable;
8. Support time of day synchronization;
9. Quality of service (QoS) available
Figure 1: trends towards serial connections
Serial RapidIO (sRIO) protocol standard can easily meet most of the above requirements, or even exceed these requirements. Therefore, serial RapidIO has become the mainstream connection technology for data plane interconnection in wireless communication infrastructure. SRIO network is based on two "basic modules": endpoint and switch The endpoint device is responsible for sending and receiving data packets, and the switching device is responsible for transmitting data packets between ports, but not for the interpretation of data packets. Figure 2 shows the construction module of sRIO network
Figure 2: sRIO network building blocks
According to the specification definition, serial RapidIO has a 3-tier architecture, as shown in Figure 3
Figure 3: sRIO architecture
This includes:
Physical layer is responsible for describing device level interface specifications, such as packet transmission mechanism, flow control, electrical characteristics and low-level error management
Transport layer - provides routing information for transmitting packets between different endpoint devices. The switching device works at the transport layer in a device-based routing mode
Logic layer - defines the overall protocol and packet format. Each packet contains a load of up to 256 bytes. Transactions access the address space of 34 / 50 / 66 bits through load, store or DMA operations
SRIO has many advantages. A 4-channel sRIO link running at 3.125 Gbps can provide 10 Gbps traffic on the premise of completely maintaining data integrity. SRIO is similar to microprocessor bus. It completes memory and device addressing and packet processing in hardware, which not only greatly reduces the overhead and delay for I / O processing, It also increases the system bandwidth relative to other bus interfaces, but unlike most other bus interfaces, the sRIO interface has few pins, and its adjustable bandwidth based on high-speed serial link can be adjusted in the range of 1.25 - 3.125 Gbps. Figure 4 is an illustration of the sRIO specification
Figure 4: sRIO specification
      Computing resources in the platform
With configurable processing resources, developers can implement their applications in hardware, such as data compression and encryption algorithms, and even a complete set of firewall and security applications that were only implemented in software in the past. Now they can implement them in hardware, but doing so requires a large parallel ecosystem with shared bandwidth and strong processing capacity, that is, CPU, NPU When FPGA and / or ASIC carry out shared or distributed processing to build such a system, some requirements for computing resources include:
1. Support distributed processing capability of complex topology; Â
      2. Highly reliable direct peer-to-peer communication capability; Â
      3. Multiple heterogeneous operating systems; Â
      4. Support the communication data layer through multiple heterogeneous operating systems; Â
      5. Modular and scalable platform with extensive ecosystem support
The sRIO protocol specification and architecture support the different requirements of computing devices in the field of embedded and wireless infrastructure. With sRIO, the independence of system structure can be realized, and scalable systems with operator level reliability, advanced traffic management function and high performance and high throughput can be deployed. In addition, The extensive supplier ecosystem also makes it easier for designers to use off the shelf components to construct sRIO system. SRIO is a packet based protocol that supports:
1. Use grouping operations (including reading, writing and messaging) to realize data movement; Â
      2. I / O inconsistency function and cache consistency function; Â
      3. Realize efficient interworking and protocol encapsulation by supporting data streaming and SAR functions; Â
      4. Implement a traffic management architecture by supporting millions of data streams, 256 traffic categories and lossy operations; Â
      5. Support flow control of multi transaction request flow (including configuring QoS); Â
      6. Support priority division to reduce problems such as bandwidth allocation, transaction reservation and deadlock avoidance; Â
      7. Support various hardware topology modes such as standard topology (tree and grid) and arbitrary topology (daisy chain) through system discovery, configuration and learning, including supporting multiple hosts; Â
      8. Error management and classification (recoverable, notification and critical)
IP scheme of serial RapidIO
In order to support fully compatible maximum load operation when sending and receiving user data through logical (I / O) and target and source interfaces on transport layer IP, Xilinx and other manufacturers have designed their endpoint IP solutions according to the latest RapidIO v1.3 specification
Figure 5 shows Xilinx's complete sRIO endpoint IP scheme, which includes the following components:
1. Logicore RapidIO logic (I / O) and transport layer IP; Â
      2. Reference design of buffer layer; Â
      3. Logicore serial RapidIO physical layer IP; Â
      4. Register manager reference design
Figure 5: sRIO endpoint IP architecture of Xilinx
IP architecture
Xilinx provides the source code of the buffer layer reference design, which can complete the automatic queuing and prioritization of packets. SRIO physical layer IP can realize link training and initialization, discovery and management, and error and retry recovery mechanism. In addition, high-speed transceivers are instantiated in the physical layer IP to support line speeds of 1.25gbps 1-channel and 4-channel sRIO bus connection of 2.5Gbps and 3.125gbps
The reference design of the register manager provided in this scheme allows the sRIO master device to configure and maintain the endpoint device configuration, link state, control and timeout mechanism. In addition, the register manager also provides a port that allows the user to design and detect the endpoint device state
Logicore provides a complete endpoint IP, which has been tested by industry-leading sRIO device manufacturers. Users can obtain it through Xilinx coregen GUI tool. Logicoregen tool can help users configure baud rate and endpoint. Logicore supports extended features such as flow control, retransmission suppression, doorbell and message passing. Therefore, Users can create a set of flexible, adjustable and customized sRIO endpoint IP optimized specifically for application requirements
Using various resources in most high-performance FPGAs provided by Xilinx and other manufacturers, system designers can easily create and deploy their intelligent solutions to enhance the advantages of products in time to market, adjustability, scalability and adaptation to future development. Some system design examples using sRIO and DSP technology are given below
SRIO system application example
1. Embedded system: the CPU structure like x86 is optimized for general applications that do not require a lot of multiplication. In contrast, the DSP structure is optimized for signal processing operations such as filtering, FFT, vector multiplication and search, and image or video analysis
Therefore, the embedded system using both CPU and DSP can easily take advantage of the two structures of general processor and signal processor. Figure 6 shows an example of such a system, which includes FPGA, CPU and DSP architecture at the same time
Figure 6: high performance DSP subsystem based on CPU
In high-end DSP, serial RapidIO has become the mainstream data interconnection mode. The main data interconnection in x86 CPU is realized by PCI Express. As shown in Figure 6, some simple configuration of FPGA can be used to adjust the scale of DSP application and / or bridge several completely different interconnection standards (such as PCI Express and serial RapidIO)
In this system, the root complex chipset manages the PCI Express system, and the sRIO system is managed by a DSP. The 32 / 64 bit address space (base address) of PCIe can be automatically mapped to the 34 / 66 bit sRIO address space (base address). PCIe applications communicate with the root complex chipset through memory or I / O reading and writing. These transactions can be written through streams, primitives and confirm read / write transactions I / O operations such as switches atomic nreads nwrite / nwrite_rs can be easily mapped to sRIO space
Designing this kind of bridging function in Xilinx's FPGA is very simple, because the back-end interface of PCI Express and the functional module of serial RapidIO endpoint are similar packet queue modules. Then, it can realize the conversion from PCIe to sRIO or from sRIO to PCIe, so as to establish data flow between the two protocol domains
2. DSP processing application: in those applications where DSP processing is the main architecture requirement, the system structure can be designed as shown in Figure 7
Figure 7: devices requiring powerful DSP processing power
Xilinx Virtex-5 FPGA can be used as coprocessor of other DSP devices in the system. If sRIO is used for data interconnection, the whole set of DSP system scheme can be easily adjusted. This scheme has scalability, adapts to future development, and can be realized in a variety of overall dimensions
When applications that require powerful DSP functions also need to perform a large number of fast and complex operations or data processing, these processing tasks can be unloaded to x86 CPU to run Xilinx Virtex-5 FPGA, which allows bridging between PCIe subsystem and sRIO structure, so as to realize efficient function unloading
3. Baseband processing system
With the rapid maturity of 3G network, OEM manufacturers will adopt new devices and equipment with overall dimensions to reduce the problems of capacity and coverage. The DSP architecture based on sRIO and FPGA is an excellent scheme to meet such challenges. The traditional DSP system can also be readjusted to this fast and low-power FPGA based structure, so as to make full use of the adjustability advantage of FPGA
In such a system, as shown in Fig. 8, the FPGA can meet the antenna service line speed