Kiến trúc phần mềm Radio P10

Chia sẻ: Tien Van Van | Ngày: | Loại File: PDF | Số trang:35

0
57
lượt xem
11

Kiến trúc phần mềm Radio P10

Mô tả tài liệu

Digital Processing Tradeoffs This chapter addresses digital hardware architectures for SDRs. A digital hardware design is a configuration of digital building blocks. These include ASICs, FPGAs, ADCs, DACs, digital interconnect, digital filters, DSPs, memory, bulk storage, I/O channels, and/or general-purpose processors. A digital hardware architecture may be characterized via a reference platform, the minimum set of characteristics necessary to define a consistent family of designs of SDR hardware....

Chủ đề:

Bình luận(0)

Lưu

Nội dung Text: Kiến trúc phần mềm Radio P10

1. Software Radio Architecture: Object-Oriented Approaches to Wireless Systems Engineering Joseph Mitola III Copyright !2000 John Wiley & Sons, Inc. c ISBNs: 0-471-38492-5 (Hardback); 0-471-21664-X (Electronic) 10 Digital Processing Tradeoffs This chapter addresses digital hardware architectures for SDRs. A digital hard- ware design is a configuration of digital building blocks. These include ASICs, FPGAs, ADCs, DACs, digital interconnect, digital filters, DSPs, memory, bulk storage, I/O channels, and/or general-purpose processors. A digital hardware architecture may be characterized via a reference platform, the minimum set of characteristics necessary to define a consistent family of designs of SDR hardware. This chapter develops the core technical aspects of digital hardware architecture by considering the digital building blocks. These insights permit one to characterize the architecture tradeoffs. From those tradeoffs, one may derive a digital reference platform capable of embracing the necessary range of digital hardware designs. The chapter begins with an overview of digital processing metrics and then describes each of the digital building blocks from the perspective of its SDR architecture implications. I. METRICS Processors deliver processing capacity to the radio software. The measure- ment of processing capacity is problematic. Candidate metrics for processing capacity are shown in Table 10-1. Each metric has strengths and limitations. One goal of architecture analysis is to define the relationship between these metrics and achievable performance of the SDR. The point of view employed is that one must predict the performance of an unimplemented software suite on an unimplemented hardware platform. One must then manage the compu- tational demands of the software against the benchmarked capacities of the hardware as the product is implemented. Finally, one must determine whether an existing software personality is compatible with an existing hardware suite. TABLE 10-1 Processing Metrics MIPS Millions of Instructions per Second MOPS Millions of Operations per Second MFLOPS Millions of Floating Point Operations per Second Whetstone Supercomputing MFLOPS Benchmark Dhrystone Supercomputing MIPS Benchmark SPECmark SpecINT, SpecFP Instruction Mix Benchmarks (92 and 95) 312
2. METRICS 313 Consistent use of appropriate metrics assures that these tasks can be accom- plished without unpleasant surprises. 1. Differentiating the Metrics MIPS, MOPS, and MFLOPS are differentiated by logical scope. An operation (OP) is a logical transformation of the data in a designated element of hardware in one clock cycle. Processor architectures typically include hardware elements such as arithmetic and logic units (ALUs), multipliers, address generators, data caches, instruction caches, all operating in parallel at a synchronous clock rate. MOPS are obtained by multiplying the number of parallel hardware elements times the clock speed. If multiple operations are required to complete a machine instruction (e.g., a floating- point multiply), then MIPS = ®MOPS, ®
3. 314 DIGITAL PROCESSING TRADEOFFS may employ set-associative cache coherency and other schemes to yield a higher number of instruction executions for a given clock speed. In addi- tion, there is statistical structure to the application, which will determine whether the data and instruction necessary at the next step will be in the cache (cache hit) or not (cache miss). Statistical structure is also present in the mix of input/output, data movement in memory, logical (e.g., masking and finding patterns), and arithmetic needed by an application. Some appli- cations like FFTs are very computationally intensive, requiring a high pro- portion of arithmetic instructions. Others such as supporting display windows require more copying of data from one part of memory to another. And sup- port of virtual memory requires the copying of pages of physical memory to hard disk or other large-capacity primary storage. This gives the programmer the illusion that physical memory is relatively unlimited (e.g., 32 gigabytes) within a physically confined space of, say, 128 Mbytes of physical mem- ory. 3. Standard Benchmarks Consequently, MIPS are hard to define. Often, the popular literature attributes MIPS based on a nonstatistical transformation of MOPS into instructions that could be executed in an ideal instruction mix. This approach makes the chip look as fast as it possibly could be. Since most manufacturers do this, the SDR engineer learns that achievable per- formance on the given application will be significantly less than the nomi- nal MIPS rating. The manufacturer’s MIPS estimate is useful because it de- fines an upper bound to realizable performance. Most chips deliver 30 to 60% of such nominal MIPS as usable processing capacity in a realistic SDR mix. In the 1970s, scientists and engineers concerned with quantifying the ef- fectiveness of supercomputers developed the Whetstone, Dhrystone, and other benchmarks consisting of standard problem sets against which each new gen- eration of supercomputer could be assessed. These benchmarks focused on the central processor unit (CPU) and on the match between the CPU and the memory architecture in keeping data available for the CPU. But they did not address many of the aspects of computing that became important to prospective buyers of workstations and PCs. The speed with which the dis- play is updated is a key parameter of graphics applications, for example. The SPECmarks evolved during the 1990s to better address the concerns of the early-adopter buying public. Consequently, SPECmarks are informative but these also are not the ideal SDR metric in that they do not generally reflect the mix of instructions employed by SDR applications. Turletti [293], how- ever, has benchmarked a complete GSM base station using SPECmarks, as discussed further below. 4. SDR Benchmarks At this point, the reader may be expecting some new “SDR benchmark” to be presented as the ultimate weapon in choosing among new DSP chips. Unfortunately, one cannot define such a benchmark. First
4. METRICS 315 Figure 10-1 Identify processing resources. of all, the radio performance depends on the interaction among the ASICs, DSP, digital interconnect, memory, mass storage, and the data-use structure of the radio application. These interactions are more fully addressed in Chapter 13 on performance management. It is indeed possible to reliably estimate the performance that will be achieved on the never-before-implemented SDR application. But the way to do this is not to blindly rely on a benchmark. Instead, one must analyze the hardware and software architecture (using the tools described later). One may then accurately capture the functional and statistical structure of the interactions among hardware and software. This systems analysis proceeds in the following steps: 1. Identify the processing resources. 2. Characterize the processing capacity of each class of digital hardware. 3. Characterize the processing demands of the software objects. 4. Determine how the capacity of the hardware supports the processing demands of the software by mapping the software objects onto the sig- nificant hardware partitions. There is a trap in identifying the hardware processor classes. ASICs and DSPs are easily identified as processing modules. But one must traverse each sig- nal processing path through the system to identify buses, shared memory, disks, general-purpose CPUs, and any other component that is on the path from source to destination (outside the system). Each such path is a process- ing thread. Each such processor has its own processing demand and priority structure against which the needs of the thread will be met. One then abstracts the block diagram into a set of critical resources, as illustrated in Figure 10-1. This chapter begins the process of characterizing the capacity of SDR hard- ware. It summarizes the tradeoffs among classes of processor, functional ar- chitecture, and special instruction sets. Other source material describes how to program them for typical DSP applications [294]. The extensive literature available on the web pursues detailed aspects of processors further [295–298]. The popular press provides product highlights (e.g., [299–303]). This text, on the other hand, focuses on characterizing the processors with respect to the support of SDR applications. This is accomplished by the derivation of a dig- ital processing platform model that complements the RF platform developed previously.
5. 316 DIGITAL PROCESSING TRADEOFFS TABLE 10-2 Mapping of Segments to Hardware Classes Segment Module Typical Performance Illustrative Manufacturers RF RF/IF HF, VHF, UHF Watkins Johnson, Steinbrecher IF ADC 1 to 70 Msa/sec Analog Devices (AD), Pentek IF Digital Rx 30.72 Mz Filters Harris Semiconductor, Graychip, Sharp IF Memory 64 MB at 40 MHz Harris, TRW IF, BB DSP 4 " 400 MFLOPS TI, AMD, Intel, Mercury, AD, Sky BS, SC Bus Host M68k, Pentium Motorola, Force, Intel SC Workstation 50#100 SPECmark 92 Sun, HP, DEC, Intel Legend: BB = baseband; BS = bitstream; SC = source coding. II. HETEROGENEOUS MULTIPROCESSING HARDWARE Segment boundaries among antennas, RF, IF, baseband, bitstream, and source segments defined in the earlier chapters make it easy to map multiband, multi- mode, multiuser SDR personalities to parallel, pipelined, heterogeneous mul- tiprocessing hardware. A. Hardware Classes Some design strategies map radio functions to affordable open-architecture COTS hardware. In one example, the VME or PCI chassis hosts the RF, IF, baseband, and bitstream segments as illustrated in Table 10-2. The workstation hosts the OA&M, systems management, or research tools including the user interface, development tools, networking, and source coding/decoding. Each module shown in the table represents a class of hardware. The parameters of these modules that assure that a software personality will work properly are defined in the digital processing reference platform. Consider the roles of these hardware classes. The bus host serves as sys- tems control processor. The DSPs support the real-time channel-processing stream, sometimes configured as one DSP per N subscriber channels, where N typically ranges from 1 to 16. The path from the ADC to the first filter- ing/decimation stage may use a dedicated point-to-point mezzanine intercon- nect such as DT ConnectTM , Data Translation. Customized FibreChannel and Transputer links have also been used. Synchronization of the block-by-block transfers across this bus with the point-by-point operations of the first fil- tering and decimation stage introduces inefficiencies that reduce throughput. Fan-out from IF processing to multiple baseband-processing DSPs also may be accomplished via a dedicated point-to-point path such as a mezzanine bus. Alternatively, an open-architecture high-data-rate bus might be used. Instead of configuring such a heterogeneous multiprocessor at the board level, one might use a preconfigured system. MercuryTM , for example, has offered a mix of SHARC 21060 [304] (Analog Devices), PowerPC RISC, and
6. HETEROGENEOUS MULTIPROCESSING HARDWARE 317 Figure 10-2 Alternative processing modules and interconnect. Intel i860 chips with Raceway interconnect [305–307]. Raceway I had nom- inally three paths at 160 MByte/sec interconnect capacity. Arrays of WE32’s were used in AT&T’s DSP-3 system. Arrays of i860’s were available from Sky Computer [308], CSPI [309], and others. Of particular note is UNISYS’ mil- itarized TOUCHSTONE processor, which was also based on the i860 [310]. Although the i860 is no longer a supported Intel product, the architectures are illustrative. System-on-a-chip level architectures also employ ASIC functions, shared memory, programmable logic arrays, and/or DSP cores. The physical packag- ing of these functions may be organized in point-to-point connections, buses, pipelines, or meshes. In each case, digital interconnect intervenes between functional building blocks and memory. Threads are traced from RF stimuli to analog and digital responses. Often in handsets, there is no ADC or DAC. Instead, RF ASICs perform channel modem functions to yield an alternative functional flow. Figure 10-2 contrasts these complementary views of interconnect and other hardware classes. The boundaries of the digital flow are the external interface components. These include the display drivers, audio ASICs, and I/O boards that access the PSTN. Tradeoffs among internal interconnect are addressed in the next section. B. Digital Interconnect Digital interconnect in systems-on-a-chip architectures is an emerging area. Over time, standards may emerge because of the need to integrate IP from a mix of suppliers on a single chip. Macroscale digital interconnect has a longer
7. 318 DIGITAL PROCESSING TRADEOFFS Figure 10-3 Illustrative classes of digital interconnect. history of product evolution, and that is the focus of this discussion. These macroscale architectures may serve as precursors to future nanoscale on-chip interconnect. Illustrative approaches to digital interconnect for open-architecture process- ing nodes are the dedicated interconnect, wideband bus, and shared memory (Figure 10-3). 1. Dedicated Interconnect Dedicated interconnect is typically available from subsystem suppliers like Pentek [311]. Pentek provides 70 MHz ADC boards and Harris or Graychip digital receiver boards. Its MIXTM bus interconnects these cards efficiently. In addition, if the set of boards and interconnect does not work, the vendor resolves the issues. This approach leverages COTS prod- ucts, with low cost and low risk. For applications with relatively small numbers of IF channels, it represents a solid engineering approach. 2. Wideband Bus The next step up in technical sophistication is the wide- band bus. The SCI bus [312], for example, has been used in supercomputer systems for several years. It is becoming available in turnkey formats includ- ing interface chip sets. The gigabyte-per-second capacity of the SCI bus could continue to increase with the underlying device technology. In addition, the design scales up easily to 8 " 140 MBps channels. The MIX bus, DT Connect, Raceway, SkyChannel [313], and other lower-capacity designs may be con- figured in parallel to attain high aggregate rates. This requires the hardware components to be appropriately partitioned. Other high-speed bus technologies are emerging, such as Vertical Laser at 115 GHz [314, 315]. 3. Shared Memory Shared memory can deliver the ultimate in interconnect bandwidth. Bulk memory of 64 MBytes easily has 16- to 64-bit paths. Scaling to 128 or 256 bits is feasible. Clock rates of 25 to 250 MHz are within reach. Thus, aggregate throughput of 3.2 to 64 gigabytes per second are becoming
9. 320 DIGITAL PROCESSING TRADEOFFS Figure 10-5 Interconnect efficiency. Most buses experience low throughput for small block sizes. Mercury char- acterizes the performance of its products thoroughly. The maximum sustain- able transfer rate of Raceway I varies as a function of DMA block length as illustrated in Figure 10-5. Although the peak rate of 160 MB/sec is not sus- tainable, it is approached with block sizes above 4096 bytes. Some devices (e.g., ADCs) may have short on-board buffers, constraining blocks to smaller sizes. In addition, algorithm constraints may proscribe smaller block sizes. A 0.5 ms GSM frame, digitized at 500 k samples per second, for example, may be processed with a block size of 250 samples (500 Bytes). If presented to Raceway in that format, the sustainable throughput would fall between 80 and 120 MB/sec as shown in the figure. If this is understood, then a constraint can be established between the algorithm and Raceway as an interconnect module. Constraint-management software can then assure that the capacity of the in- terconnect is not exceeded when instantiating a waveform into such hardware. In a more representative example, the entire bandwidth of the GSM allocation could be sampled at 50 M samples/sec, yielding 25.5 k samples per GSM frame, or over 50 kBytes. This data could be efficiently transferred to digital filter ASICs in 8 kByte blocks. 5. Architecture Implications The physical format of digital interconnect (e.g., PCI, VME, etc.) need not be incorporated into an open-architecture standard for SDR. The less specific standard encourages competition and tech- nology insertion by not unnecessarily constraining the implementations. On
10. APPLICATIONS-SPECIFIC INTEGRATED CIRCUITS (ASICS) 321 the other hand, such an architecture must recognize the fact that each class of physical interconnect entails implementation-specific constraints. An open architecture that supports multivendor product integration therefore must char- acterize those constraints to assure that software is installed on hardware with the necessary interconnect capabilities. Otherwise, interconnect capacity may become the system bottleneck that causes the node to fail or degrade unex- pectedly. An architecture standard used by a large enterprise to establish product migration paths, on the other hand, should specify the digital interconnect (e.g., PCI) and its migration from one physical realization to others as technology matures. III. APPLICATIONS-SPECIFIC INTEGRATED CIRCUITS (ASICs) The next step in the digital flow from the ADC to the back-end processors in a base station is typically a pool of ASICs. ASICs particularly suited to software radios include digital filters, FEC, and hybrid analog-digital RF-transceiver modules with programmable capabilities. Waveform-specific ASICs are ex- hibiting increased programmability, mixing the capabilities of digital filters, FEC, and general-purpose processors for new classes of waveform (e.g., W- CDMA). In addition, DSP cores with custom on-chip capabilities are ASICs, but for clarity, they are addressed in the section on DSP architectures. A. Digital Filter ASICs Base station architectures need digital frequency translation and filtering for hundreds of simultaneous users. Minimum distortion and nonlinearities are re- quired in the base-station receiver architecture to meet near–far requirements. Digital-filter ASICs therefore extract weak signals in the presence of strong signals. The architecture for such ASICs is illustrated in Figure 10-6. The fre- quency and phase of the ASIC is set so that the complex multiply-accumulator chip (CMAC) translates the wideband input to a programmable baseband. For first-generation cellular applications, the decimating digital filters (DDFs) yielded 25 or 30 kHz narrowband voice channels through computationally intensive filtering. Hogenaur realized that adjustment of the integrator, comb, and decima- tor parameters reduces aliasing as illustrated in Figure 10-7 [316]. Aliasing bands are folded into baseband at the complex sampling frequency. Choice of decimation rate and comb filter parameters places a deep null in the band of interest, achieving 90 dB of dynamic range using limited-precision inte- ger arithmetic. The Hogenaur filter thus facilitated the efficient realization of the Harris ASICs. The product-line evolved to the HSP series now owned by Intersil. Oh [317] has proposed the use of interpolated second-order polynomials as an improvement over the Hogenaur filter. Graychip has also been develop-
11. 322 DIGITAL PROCESSING TRADEOFFS Figure 10-6 Digital filter ASIC architecture. (a) top-level ASIC architecture; (b) digital decimating filter architecture. Figure 10-7 Hogenaur filter reduces aliasing. ing filtering ASICs since the late 1980s. In addition, Zangi [318] describes a transmultiplexer architecture that yields all channels in a cell site using a Discrete Fourier Transform (DFT) stage. Zangi’s transmultiplexer offers advantages for ASIC implementations. For example, with 1800 points per filter in a Digital AMPS application, Fs = 34:02 MHz, and decimation of 350, the DFT requires 1134 points for a complexity of 826 M multiplies per second. Such ASICs would simplify cell-site designs. The complexity of frequency conversion and filtering is the first-order deter- minant of the digital signal processing demand of the IF segment. In a typical application, a 12.5 MHz mobile cellular band is sampled at 30.72 MHz (M samples per second). Frequency translation, filtering, and decimation requiring
16. APPLICATIONS-SPECIFIC INTEGRATED CIRCUITS (ASICS) 327 Figure 10-13 Tunneling provides open-architecture access to proprietary IP. functions to component-level building blocks may be called tunneling. It re- quires the refinement of the layered virtual machine architecture illustrated in Figure 10-13. Several aspects of the tunneling facility need to be pointed out. These in- clude the definition of interface points, the use of the tunneled component, the identification of constraints, and the resolution of conflicts. These aspects are supported by Tunnel( ) functions that tell the radio infrastructure about the interfaces to the applications objects and the capabilities of the ASIC objects as follows. First, the tunneling points are anchored to architecture-level functional com- ponents by the $function%$ASIC%Tunnel( ) expression. In this format, the name of the tunnel includes the function requesting the tunneling service and the name of the object that is the target of the tunnel. In the figure, both the Modem and the TCP protocol tunnel to the FEC ASIC. The interface from the Modem function is specified independently of the interface from the pro- tocol stack to the FEC ASIC. If the interface to the ASIC class conforms to the architecture-level interfaces, then the resource-management function of the radio infrastructure has the information it needs to establish streams between the software objects and the ASIC. This may not always be the case. In the example, the TCP software for a specific waveform personality may use the ASIC to provide some additional