Chuyển đổi lý thuyết P8

Chia sẻ: Tien Van Van | Ngày: | Loại File: PDF | Số trang:55

0
51
lượt xem
4
download

Chuyển đổi lý thuyết P8

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

ATM Switching with Non-Blocking Multiple-Queueing Networks We have seen in the previous chapter how a non-blocking switch based on a single queueing strategy (input, output, or shared queueing) can be implemented and what traffic performance can be expected. Here we would like to investigate how two of the three different queueing strategies can be combined in the design of a non-blocking ATM switch. The general structure of a non-blocking switch with size N × M is represented in Figure 8.1. Each input port controller (IPC) and output port controller (OPC) are provided with a FIFO buffer of size B i and...

Chủ đề:
Lưu

Nội dung Text: Chuyển đổi lý thuyết P8

  1. Switching Theory: Architecture and Performance in Broadband ATM Networks Achille Pattavina Copyright © 1998 John Wiley & Sons Ltd ISBNs: 0-471-96338-0 (Hardback); 0-470-84191-5 (Electronic) Chapter 8 ATM Switching with Non-Blocking Multiple-Queueing Networks We have seen in the previous chapter how a non-blocking switch based on a single queueing strategy (input, output, or shared queueing) can be implemented and what traffic performance can be expected. Here we would like to investigate how two of the three different queueing strategies can be combined in the design of a non-blocking ATM switch. The general structure of a non-blocking switch with size N × M is represented in Figure 8.1. Each input port controller (IPC) and output port controller (OPC) are provided with a FIFO buffer of size B i and B o cells, respectively. A FIFO shared buffer with capacity NB s cells is also associated with the non-blocking interconnection network (IN). Therefore B i , B o and B s represent the input, output and shared capacity per input–output port in a squared switch. Apparently having B x = 0 for x = i, o, s corresponds to absence of input, output and shared queueing, respectively. Usually, unless required by other considerations, IPC and OPC with the same index are implemented as a single port controller (PC) interfacing an input and an output channel, so that the switch becomes squared. Unless stated otherwise, a N × N squared switch is consid- ered in the following and its size N is a power of 2. Output queueing is adopted when the interconnection network is able to transfer more than one cell to each OPC, since only one cell per slot can leave the OPC. Then the switch is said to have an (output) speed-up K meaning that up to K packets per slot can be received by each OPC. Note that the condition K ≤ min [ B o, N ] always applies. While the second bound is determined by obvious physical considerations, the former bound is readily explained con- sidering that it would make no sense to feed the output queue in a slot with a number of packets larger than the queue capacity. Therefore even the minimum speed-up K = 2 requires an output queueing capability B o ≥ 2 . The switch can also be engineered to accom- plish an input speed-up K i in that up to K i packets per slot can be transmitted by each IPC. When an input speed-up is performed ( K i ≥ 2 ) , the output speed-up factor K will be indi- cated by K o . Accordingly in the general scheme of Figure 8.1 the interconnection network is
  2. 282 ATM Switching with Non-Blocking Multiple-Queueing Networks IPC IN OPC 0 0 Bi 1 Ki Ko Bo 1 NBs 1 N-1 M-1 Ko-non-blocking Ki Ko Bi 1 network Bo 1 Figure 8.1. Model of Ko-non-blocking ATM switch labelled as a K o -non-blocking network. Nevertheless, to be more precise, the network needs only be K o -rearrangeable due to the ATM switching environment. In fact, all the I/O con- nections through the interconnection network are set up and cleared down at the same time (the slot boundaries). In general if the switch operates an input speed-up K i , then the output speed-up must take a value K o ≥ K i . In fact it would make no sense for the OPCs to be able to accept a num- ber of packets NK o smaller than the NK i packets that can be transmitted by the IPCs in a slot. Therefore, output speed-up is in general much cheaper to implement than input speed- up, since the former does not set any constraint on K i which can be set equal to one (no input speed-up). For this reason most of this section will be devoted to networks adopting output speed-up only. The availability of different queueing capabilities makes it possible to engineer the switch in such a way that packets are transferred from an upstream queue to a downstream queue, e.g. from an input queue to an output queue, only when the downstream queue does not overflow. Therefore we will study two different internal operations in the switch: • Backpressure (BP): signals are exchanged between upstream and downstream queues so that the former queues transmit downstream packets only within the queueing capability of the latter queues. • Queue loss (QL): there is no exchange of signalling information within the network, so that packets are always transmitted by an upstream queue independent of the current buffer sta- tus of the destination downstream queue. Packet storage in any downstream queue takes place as long as there are enough idle buffer positions, whereas packets are lost when the buffer is full. Thus, owing to the non-blocking feature of the interconnection network, packets can be lost for overflow at the upstream queues only in the BP mode and also at the downstream queues in the QL mode. In the analytical models we will assume that the selection of packets to be
  3. 283 backpressured in the upstream queues (BP) or to be lost (QL) in case of buffer saturation is always random among all the packets competing for the access to the same buffer. Our aim here is to investigate non-blocking ATM switching architectures combining dif- ferent queueing strategies, that is: • combined input–output queueing (IOQ), in which cells received by the switch are first stored in an input queue; after their switching in a non-blocking network provided with an out- put speed-up, cells enter an output queue; • combined shared-output queueing (SOQ), in which cells are not stored at the switch inlets and are directly switched to the output queues; an additional shared storage capability is avail- able to hold those cells addressing the same switch outlet in excess of the output speed-up or not acceptable in the output queues due to queue saturation when backpressure is applied; • combined input-shared queueing (ISQ), in which cells received by the switch are first stored in an input queue; an additional queueing capability shared by all switch inputs and outputs is available for all the cells that cannot be switched immediately to the desired switch outlet. If a self-routing multistage interconnection network is adopted, which will occur in the case of IOQ and SOQ structures, the general model of Figure 8.1 becomes the structure shown in Figure 8.2 where only output speed-up, with a factor K, is accomplished. Following one of the approaches described in Section 3.2.3, the K-non-blocking interconnection network is imple- mented as a two-block network: an N × N sorting network followed by K banyan networks, each with size N × N . The way of interconnecting the two blocks is represented as a set of N splitters 1 × N in Figure 8.2, so that the overall structure is K-non-blocking. The other imple- mentations of the non-blocking network described in Section 3.2.3 could be used as well. IPC0 1 OPC0 2 Shared 0 queue 0 0 K Bi 1 Bo 1 Sorting network NBs 1 Routing IPCN-1 network OPCN-1 N-1 N-1 N-1 Bi 1 Bo 1 Figure 8.2. Model of K-non-blocking self-routing multistage ATM switch In the following, Section 8.1 describes architectures and performance of ATM switches with combined input–output queueing. Section 8.2 and Section 8.3 do the same with com- bined shared-output and input-shared, respectively. The switch capacities of different non-
  4. 284 ATM Switching with Non-Blocking Multiple-Queueing Networks blocking switches, either with single queueing or with multiple queueing, are summarized and compared with each other in Section 8.4. Some additional remarks concerning the class of ATM switches with multiple queueing are given in Section 8.5. 8.1. Combined Input–Output Queueing By referring to the general switch model of Figure 8.1, an ATM switch with combined input– output queueing (IOQ) is characterized by B i ≥ 1 , B o ≥ 1 and B s = 0 . However, since the switch accomplishes an output speed-up K o ≥ 2 , the minimum value of the output queues is B o = 2 (see the above discussion on the relation between K o and B o ). A rather detailed description will be first given of basic IOQ architectures without input speed-up operating with both the internal protocols BP and QL. These architectures adopt the K-non-blocking self-routing multistage structure of Figure 8.2 where the shared queue is removed. A thorough performance analysis will be developed and the results discussed for this structure by preliminarily studying the case of a switch with minimum output queue size B o = K . Then a mention will be given to those architectures adopting a set of parallel non- blocking switch planes with and without input speed-up. 8.1.1. Basic architectures The basic switch implementations that we are going to describe are based on the self-routing multistage implementation of a K-non-blocking network described in Section 3.2.3, under the name of K-rearrangeable network. Therefore the interconnection network is built of a sorting Batcher network and a banyan routing network with output multiplexers, which are implemented as output queues located in the OPCs in this ATM packet switching environ- ment. As with the interconnection network with pure input queueing, the IPC must run a contention resolution algorithm to guarantee now that at most K of them transmit a packet to each OPC. Additionally here the algorithm must also avoid the overflow of the output queues when backpressure is adopted. The switch architectures with the queue loss (QL) internal protocol and without input speed-up ( K i = 1 ) will be described first, by upgrading the structures already described with pure input queueing (Section 7.1.1). Then a possible implementation of one of these architec- tures with internal backpressure between input and output queueing will be studied. 8.1.1.1. Internal queue loss The two basic architectures described with pure input queueing, that is the Three-Phase switch and the Ring-Reservation switch can be adapted to operate with an output speed-up K. In both cases no input speed-up is assumed ( K i = 1 ) . The architecture of the Three-Phase switch with combined input–output queueing is shown in Figure 8.3: it includes a K-non-blocking network, composed of a sorting network (SN), a routing banyan network (RN), and an allocation network (AN). Such structure differs
  5. Combined Input–Output Queueing 285 I0 O0 f0 g0 h0 d0 e0 PC0 a0 K K Allocation Sorting Routing network network network (AN) (SN) (RN) IN-1 ON-1 fN-1 gN-1 hN-1 dN-1 eN-1 PCN-1 aN-1 K K Figure 8.3. Architecture of the K-non-blocking Three-Phase switch with internal queue loss from that in Figure 7.3 only in the routing network, which has a size N × NK and is able to transfer up to K packets to each output queue. The coordination among port controllers so as to guarantee that at most K IPCs transmit their HOL cell to an OPC is achieved by means of a slightly modified version of the three-phase algorithm described for the Three-Phase switch with channel grouping (Section 7.1.3.1). As is usual, three types of packets are used1: • request packet (REQ), including the fields — AC (activity): identifier of a packet carrying a request ( AC = 1 ) or of an idle request packet ( AC = 0 ) ; — DA (destination address): requested switch outlet; — SA (source address): address of the IPC issuing the request packet; • acknowledgment packet (ACK), which includes the fields — SA (source address): address of the IPC issuing the request packet; — GR (grant): indication of a granted request ( GR < K ) , or of a denied request ( GR ≥ K ) ; • data packet (DATA), including the fields — AC (activity): identifier of a packet carrying a cell ( AC = 1 ) or of an idle data packet ( AC = 0 ) ; — routing tag: address of the routing network outlet feeding the addressed output queue; this field always includes the physical address DA (destination address) of the switch outlet used in the request phase; depending on the implementation of the routing net- work it can also include a routing index RI identifying one of the K links entering the addressed output queue; — cell: payload of the data packet. 1. The use of a request priority field as in a multichannel IQ switch is omitted here for simplicity, but its adoption is straightforward.
  6. 286 ATM Switching with Non-Blocking Multiple-Queueing Networks In the request phase, or Phase I, (see the example in Figure 8.4 for N = 8, K = 2 ) all IPCs issue a packet REQ that is an idle request packet ( AC = 0 ) if the input queue is empty, or requests the outlet DA addressed by its HOL packet ( AC = 1 ) . The packet also carries the address SA of the transmitting IPC. These request packets are sorted by network SN so that requests for the same switch outlet emerge on adjacent outlets of the sorting network. This kind of arrangement enables network AN to compute the content of the field GR of each packet in such a way that the first K requests for the same address are given the numbers 0, …, K – 1 , whereas GR ≥ K for other eventual requests. AC activity DA destination address SA AC SA source address SA DA GR grant GR PC PC 0 0 5 1 1 0 0 0 1 0 0 0 1 1 0 0 5 0 0 0 5 0 1 1 2 2 1 1 2 1 1 0 2 0 2 2 3 3 5 1 Network 7 1 1 Network 1 7 Network 1 3 3 4 4 5 1 SN 0 5 1 AN 0 0 SN 2 4 4 5 5 0 0 3 5 1 1 3 0 5 5 6 6 7 1 4 5 1 2 4 0 6 6 7 7 1 1 6 7 1 0 6 1 7 7 I II Request Acknowledgment Figure 8.4. Example of packet switching (Phases I and II) in the QL IOQ Three-Phase switch In the acknowledgment (ack) phase, or Phase II (see the example in Figure 8.4), packets ACK are generated by each port controller that receives field SA directly from network SN and field GR from network AN. Notice that the interconnection between networks guarantees that these two fields refer to the same request packet. Packets ACK are sorted by network SN. Since N packets ACK are received by the sorting network, all with a different SA address in the interval [ 0, N – 1 ] (each PC has issued one request packet), the packet ACK with SA = i is transmitted on output d i ( i = 0, …, N – 1 ) and is thus received by the source port controller PC i . A PC receiving GR ≤ K – 1 realizes that its request is granted; the request is not accepted if GR ≥ K . Since all different values less than K are allocated by net- work AN to the field GR of different request packets with the same DA, at most K port controllers addressing the same switch outlet have their request granted. In the data phase, or Phase III, the port controllers whose requests have been granted trans- mit their HOL cell in a packet DATA through the sorting and banyan networks, with the packet header including an activity bit and a self-routing tag. The structure of this last field depends on the type of routing network adopted. With reference to the different solutions for
  7. Combined Input–Output Queueing 287 implementing output multiplexing described in Section 3.2.3, the example in Figure 8.5 shows the data phase for the two solutions b and c'. The routing network is implemented as an 8 × 16 banyan network in the former case, thus requiring the use of a one-bit field RI follow- ing field DA, and as a set of two 8 × 8 banyan networks in the latter case with EGS interconnection from the sorting network (no field RI is here required). In the example the PCs issue six requests, of which one is not granted owing to a number of requests for outlet 5 larger than the output speed-up K = 2 . RI AC AC activity DA DA destination address PC PC 0 0 5 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 2 0 1 1 0 0 0 2 3 1 5 1 Network 0 1 1 Network 3 4 0 0 0 SN 1 1 1 RN 4 0 5 1 5 0 0 0 0 5 1 5 1 5 1 6 0 7 1 1 5 1 6 7 1 1 1 0 7 1 0 7 1 7 III Data (b) AC DA PC PC 0 5 1 0 0 0 1 1 1 0 0 0 0 1 1 1 2 1 1 0 0 2 3 5 1 Network 1 1 Network Network 3 4 0 0 SN 1 1 RN 1 RN 4 5 1 5 0 0 5 1 5 5 1 6 7 1 5 1 6 7 1 1 7 1 7 1 7 III Data (c') Figure 8.5. Example of packet switching (Phase III) in the QL IOQ Three-Phase switch About the hardware required by this IOQ Three-Phase switch N × N with QL protocol, the K-non-blocking network is assumed as the minimum-cost implementation described in Section 3.2.3 with a Batcher sorting network N × N cascaded through an EGS pattern to K N × N banyan networks. The allocation network is implemented basically as the running sum
  8. 288 ATM Switching with Non-Blocking Multiple-Queueing Networks adder network of Figure 7.14, which now includes k' = log 2K stages of adders. Some minor changes have to be applied to take into account that now an activity bit distinguishes idle packets from true request packets so that the running sum in network AN is started in cor- respondence of the first non-idle packet. With such a structure different values for the field GR in the range [ 0, 2 k' – 1 ] will be associated with the first 2 k' requests for the same switch out- let, received on adjacent inputs. The Ring-Reservation switch with pure input queueing can more easily be adapted to operate an output speed-up K. In this case the interconnection network is a K-non-blocking network, the same above adopted in the IOQ Three-Phase switch. The reservation process for the NK outlets of the routing network (K per switch outlet) simply requires that a reservation frame of NK fields makes a round along the ring crossing all the PCs. As in the case of pure input queueing, each PC reserves the first idle field of the K associated to the desired switch outlet. If all the K fields are already reserved (they have been booked by upstream PCs), the PC will attempt a new reservation in the following slot. Unlike an IQ ring reservation switch, the use of the routing network cannot be avoided here, since now a total of NK lines enter the OPCs but at most N packets are transmitted by the IPCs. By implicitly assuming the same hypotheses and following the same procedure as for the IQ multichannel Three-Phase switch described in Section 7.1.3.1 (MULTIPAC switch), the switching overhead η = T I – II ⁄ T III required to perform the three-phase algorithm in the IOQ QL switch is now computed. The duration of Phase I is given by the latency n ( n + 1 ) ⁄ 2 in the Batcher network and the transmission time 1 + n of the first two fields in packet REQ (the transmission time of the other field in packet REQ is summed up in Phase II). The duration of Phase II includes the latency n ( n + 1 ) ⁄ 2 in the Batcher network and the transmission time n + 1 + k' of packet ACK since the time to cross network AN need not be summed (see the analogous discussion for the multichannel IQ switch in Section 7.1.3.1). Hence, the duration of the first two phases for an IOQ switch is given by T I – II = n ( n + 3 ) + k' + 2 = log N ( log 2N + 3 ) + log 2K + 2 2 whereas T I – II = log N ( log 2N + 4 ) + 1 in the basic IQ Three-Phase switch. Therefore 2 adding the output speed-up to the basic IQ architecture does not increase the channel internal channel rate C ( 1 + η ) , owing to the use of idle request packets that make it useless for pack- ets ACK to cross network RN. In the ring reservation IOQ switch the minimum bit rate on the ring is clearly K times the bit rate computed for the basic IQ switch, since now the reser- vation frame is K times longer. Therefore, the minimum bit rate is equal to KNC ⁄ ( 53 ⋅ 8 ) . 8.1.1.2. Internal backpressure Basic architecture. An architecture of IOQ switch with internal backpressure (BP) that pre- vents packet loss in the output queues [Pat91] is obtained starting from the IOQ Three-Phase switch with QL protocol described in the previous section. The architecture of an N × N BP IOQ switch is represented in Figure 8.6: it includes an N × N sorting network (SN), an N × NK routing network (RN), and three networks with size N × 2N , that is a merge net- work (MN), an allocation network (AN) and a concentration network (CN).
  9. I0 O0 a0 c0 d0 e0 Combined Input–Output Queueing f0 K PC0 j0 k0 b0 Sorting Routing Network network SN RN IN-1 ON-1 fN-1 aN-1 Merge cN-1 Allocation dN-1 Concentr. eN-1 K network PCN-1 network network jN-1 MN AN CN kN-1 bN-1 aN cN dN eN with internal backpressure a2N-1 c2N-1 d2N-1 e2N-1 Figure 8.6. Architecture of the K-non-blocking Three-Phase switch 289
  10. 290 ATM Switching with Non-Blocking Multiple-Queueing Networks Since now the information about the content of each output queue is needed for the reser- vation process, a new packet type, the queue status packet, is used in combination with the three other types of packet. Therefore, the packet types are (as in the QL switch the use of a request priority field is omitted for simplicity): • request packet (REQ), including the fields — AC (activity): indicator of a packet REQ carrying a request ( AC = 1 ) or of an idle packet REQ ( AC = 0 ) ; — DA (destination address): requested switch outlet; — PI (packet indicator): identifier of the packet type, always set to 1 in a packet REQ; — CI (concentration index): field initially set to 0 that will be filled by the sorting net- work; — SA (source address): indicates the address of the IPC issuing the request packet; — GR (grant): information used to signal back to each requesting IPC if its request is granted; it is initially set to 0; • queue status packet (QUE), including the fields — AC (activity): always set to 1 in a packet QUE; — DA (destination address): output queue address; — PI (packet indicator): identifier the packet type, always set to 0 in a packet QUE; — IF (idle field): field used to synchronize packets REQ and QUE; — QS (queue status): indication of the empty positions in the output queue; • acknowledgment packet (ACK), which is generated by means of the last two fields of a request packet, and thus includes the fields — SA (source address): address of the IPC issuing the request packet; — GR (grant): indication of a granted request if GR < K , or of a denied request if GR ≥ K ; • data packet (DATA), including the fields — AC (activity): identifier of a packet carrying a cell ( AC = 1 ) or of an idle data packet ( AC = 0 ) ; — DA (destination address): switch outlet addressed by the cell; — cell: payload of the data packet. In the request phase (see the example of Figure 8.7) each PC issues a packet REQ, either idle or containing the request for a switch outlet, and a packet QUE, indicating the status of its output queue. The field QS of the packet QUE transmitted by PCi is equal to max [ 0, K – q i ] where q i is the number of empty cell positions in the output queue of the port controller after the eventual transmission in the current slot. Since one packet per slot is always transmitted by the OPC on the output channel O t , then q i ≥ 1 , that is at least one packet per slot can be received by any OPC. Packets REQ, which are issued first, are sorted by network SN and offered to merge network MN synchronously with packets QUE. MN merges the two sets of packets so that packets REQ requesting a given outlet and the packet QUE carrying the corresponding output queue status are adjacent to each other. The packets REQ requesting a specific switch outlet, whose number is k, emerge grouped on the adjacent outlets c i + 1 – c i + k , the outlet c i carrying the packet QUE associated with that outlet. This configuration enables the allocation network AN to assign a different field GR to each packet REQ whose request is granted. Let x ( y i ) denote the content of the field x
  11. Combined Input–Output Queueing 291 QS PI AC SA PI IF DA GR CI PC PC AC activity 7 0 0 DA destination address 1 0 7 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 PI packet identifier 6 0 0 0 6 1 0 5 1 1 0 0 1 5 1 1 1 5 0 1 1 CI concentration index Network SN 5 0 0 0 5 1 0 0 0 0 1 0 0 0 1 2 1 2 2 SA source address GR grant 4 0 0 0 4 1 1 0 0 1 1 1 0 0 2 7 1 3 3 IF idle field 3 1 0 0 3 1 0 2 2 1 1 1 1 2 2 1 0 0 2 4 4 QS queue status 2 0 0 0 2 1 0 7 3 1 1 1 2 7 3 1 1 3 1 5 5 Network CN SA PI AC Network AN 0 0 0 Network MN GR CI DA 1 1 0 1 1 0 0 2 1 0 0 2 4 1 6 6 0 0 0 0 0 1 1 0 0 3 1 1 0 0 1 6 2 7 7 0 0 0 0 1 5 1 0 1 0 1 0 0 0 0 0 4 1 0 0 0 1 0 1 0 1 0 0 0 5 1 1 0 0 0 0 0 5 1 0 0 0 Network SN 2 0 2 0 1 1 1 0 2 2 1 1 1 0 0 4 1 5 1 0 0 4 1 3 0 3 0 1 5 1 0 7 3 1 1 1 0 3 5 1 5 1 1 3 5 1 4 0 4 0 1 5 1 0 0 4 1 5 1 0 4 6 1 5 1 2 4 6 1 5 0 5 0 1 0 0 0 3 5 1 5 1 0 0 0 6 1 0 0 0 6 0 6 0 1 7 1 0 4 6 1 5 1 1 0 0 7 1 1 0 0 7 0 7 0 1 1 1 0 6 7 1 7 1 0 6 7 1 7 1 1 6 7 1 I II Request Acknowledgment Figure 8.7. Example of packet switching (Phases I and II) in the BP IOQ Three-Phase switch transmitted on lead y i and C ( i, k ) = { c i + 1, …, c i + k } represent the complete set of MN outlets carrying the k packets REQ with the same address DA. The field GR is computed by network AN so that GR ( d i + j ) = QS ( c i ) + j – 1 ( 1 ≤ j ≤ k, c i ∈ C ( i, k ) ) A request is granted if its associated field GR is less than K. Since the value of QS is set equal to 0 if the queue has at least K empty positions, or to the speed-up factor decreased by the number of empty positions otherwise, then such an operation of network AN guarantees that 1. no more than K requests for a given switch outlet are granted; 2. the number of granted request for the switch outlet O i in a slot never exceeds the output queueing capability of PCi in the slot. The first two fields of packets REQ and QUE are dropped by network AN which then starts the acknowledgment phase (see again the example of Figure 8.7). Field PI acts now as activity bit in the concentration network CN, which separates packets REQ from packets QUE using field CI as a routing index within CN. Note that the field CI has been filled at the output of network SN using the physical address of the SN outlet engaged by the packet REQ. There- fore CN routes the N “active” packets to the top N outlets and drops the fields PI and CI of packets REQ. So the remaining two fields SA and GR, which represent now a packet ACK, enter network SN to be now sorted based on field SA. In such a way each PC receives the
  12. 292 ATM Switching with Non-Blocking Multiple-Queueing Networks outcome of the contention phase carried by field GR: GR < K means a granted request, whereas GR ≥ K means a denied request. In the data phase the PCs that either lose contention in the request phase, or issue an idle packet REQ, transmit an idle packet DATA, whereas the contention winners transmit a packet DATA carrying their HOL cell (see the example in Figure 8.8 referred to in the solution c' for the implementation of the routing network described in Section 3.2.3). The sorting–routing structure is K-non-blocking so that all the non-idle packets DATA reach the addressed OPC. AC AC activity DA DA destination address PC PC 0 5 1 0 0 0 1 0 0 0 0 1 1 1 2 1 1 0 0 2 3 5 1 Network 0 0 Network Network 3 4 0 0 SN 1 1 RN 1 RN 4 5 1 5 0 0 5 1 5 5 1 6 7 1 5 1 6 7 0 0 7 1 7 1 7 III Data Figure 8.8. Example of packet switching (Phase III) in the BP IOQ Three-Phase switch In the example of Figures 8.7–8.8 for N = 8, K = 2 , the PCs issue six requests, of which two are denied: one addressing outlet 1 owing to queue saturation ( q 1 = 1 ) and one addressing outlet 5 owing to a number of requests larger than K. The same packet configura- tion of this example, applied to an IOQ switch without backpressure, has been shown in Figures 8.4–8.5: in that case five packets have been successfully switched rather than four in the BP switch due to absence of queue saturation control. Hardware structure. In the above IOQ Three-Phase switch with BP protocol and size N × N , the K-non-blocking network is the minimum-cost implementation described in Section 3.2.3 with a Batcher sorting network N × N cascaded through an EGS pattern onto K banyan networks N × N . As already discussed, the only self-routing tag DA suffices in such a K-non-blocking structure to have the packet self-routing. The merge network MN receives a packet bitonic sequence, that is the juxtaposition of an address-ascending sequence of packets QUE on inlets a 0 – a N – 1 and an address-descending sequence of packets REQ on inlets a N – a 2N – 1 . Therefore MN can be implemented as an Omega (or n-cube) network with log 2 ( 2N ) = n + 1 stages of 2 × 2 sorting elements. The allocation network AN is a variation of the running sum adder network already described for the channel grouping application in an IQ switch (see Section 7.1.3.1) that sums
  13. Combined Input–Output Queueing 293 on partitioned sets of its inlets. Here the network size is doubled, since also the queue status information is used in the running sum. Now the allocation network has size 2N × 2N and includes k' = log 2K stages of adders. Its structure for N = 8 and 5 ≤ K ≤ 8 is shown in Figure 8.9. The component with inlets c i – 1 and c i , outlets reset i and info i , which is referred to as an i-component, has the function of identifying the partition edges, that is the inlets with smallest and largest address carrying packets REQ with the same address DA. The network AN drops the fields AC and DA, while transferring transparently all the other fields of packets REQ and QUE except their last fields GR and QS, respectively. As long as this transparent transfer takes place, the adders are disabled, so that they start counting when they receive the first bit of GR and QS. If ( 0 ) b and ( 1 ) b denote the binary numbers 0 and 1, each with k' + 1 bits, the outlets of the i-component ( 1 ≤ i ≤ N – 1 ) assume the following status:  low if DA ( c i ) ≠ DA ( c i – 1 ) or AC ( c i – 1 ) = 0 reset i =   high if DA ( c i ) = DA ( c i – 1 ) and AC ( c i – 1 ) = 0  QS ( c i ) if PI ( c i ) = 0  info i =  ( 0 ) b if PI ( c i ) = 1, PI ( c i – 1 ) = 0   ( 1) if PI ( c i ) = 1, PI ( c i – 1 ) = 1 b The 0-component always transfers transparently on info 0 the field QS or GR received on c 0 . Figure 8.9 also shows the output of the i-components and adders at each stage, when the run- ning sum is actually performed with the same pattern of packets of Figure 8.7. The 2N × 2N concentration network CN is required to route N packets from N inlets out of the 2N inlets d 0 – d 2N – 1 to the top N outlets e 0 – e N – 1 . The outlets requested by the packets REQ represent a monotonic sequence of increasing addresses, as is guaranteed by the pattern of indices written in field CI by SN and by the topological mapping between adjacent networks. Therefore it can easily be shown that CN can be implemented as a reverse n-cube network with log 2N + 1 stages. In fact, if we look at the operations of the concentrator by means of a mirror (inlets becomes outlet and viceversa, the reverse n-cube network becomes an n-cube network) the sequence of packets to be routed to the outlets of the mirror network is compact and monotone (CM sequence) and thus an n-cube network, which is functionally equivalent to an Omega network, accomplishes this task (see Section 3.2.2). As in all the other applications of the three-phase algorithm, an internal channel rate C ( 1 + η ) bit/s higher than the external rate C bit/s must be used since the request, acknowledgment and data phases share the same hardware. The switching overhead factor η is computed assuming the same hypotheses (e.g., each sorting/switching/adding stage accounts for a 1-bit latency) and following the same procedure as for the IQ switch (Section 7.1.1.1). Therefore, we would come up with the latencies n ( n + 1 ) ⁄ 2 in SN, n + 1 in MN and k' in AN for the request phase as well as n + 1 in CN and n ( n + 1 ) ⁄ 2 in SN for the acknowledg- ment phase. Since the length of the six fields in a packet REQ sums up to 1 + n + 1 + n + n + k' + 1 bit, then the total duration of the first two phases is T I – II = n ( n + 6 ) + 2k' + 5 = log 2N ( log 2N + 6 ) + 2 log 2K + 5
  14. 294 ATM Switching with Non-Blocking Multiple-Queueing Networks QS/ GR PI DA AC c0 0 0 d0 0 1 0 0 low c1 1 1 1 d1 0 1 0 0 low c2 0 0 0 0 d2 0 0 0 1 low c3 1 1 1 1 d3 1 0 1 1 high c4 0 1 1 1 d4 0 1 1 1 high c5 1 1 2 2 d5 0 1 1 1 low c6 0 0 0 0 d6 0 0 2 1 low c7 1 1 1 1 d7 1 0 3 1 low c8 0 0 0 0 d8 0 0 4 1 low c9 0 0 0 0 d9 0 0 5 1 high c10 0 0 0 0 d10 0 1 5 1 high c11 1 1 1 1 d11 0 1 5 1 high c12 1 2 2 2 d12 0 1 5 1 low c13 0 0 0 0 d13 0 0 6 1 low c14 1 1 1 1 d14 1 0 7 1 high c15 0 1 1 1 d15 0 1 7 1 y ci-1 reset i x z=x+y = A1 = adder ad-hoc I/O function ci infoi Figure 8.9. Hardware structure of the allocation network
  15. Combined Input–Output Queueing 295 For N = 1024 and K = 4 , the overhead factor is η = T I – II ⁄ T III = 0.399 (the data phase lasts 53 ⋅ 8 bit times). Such an overhead, which is higher than in a multichannel switch (see Section 7.1.3.1), can be reduced by pipelining as much as possible the transmission of the different packets through the interconnection network. According to the procedure explained in [Pat90], a reduced duration of the first two phases is obtained, T I – II = n ( n + 11 ) ⁄ 2 + 2k' + 3 , which gives an overhead η = 0.264 for the same network with N = 1024 and K = 4 . 8.1.2. Performance analysis The analysis of a non-blocking N × N switch with input and output queueing (IOQ) is now performed under the random traffic assumption with average value of the offered load p ( 0 < p ≤ 1 ) . The analysis relies as in the case of pure input queueing on the concept of virtual queue VQ i ( i = 0, …, N – 1 ) defined as the set of HOL positions in the different input queues holding a cell addressed to outlet i. A cell with outlet address i entering the HOL posi- tion also enters the virtual queue VQ i . So, the capacity of each virtual queue is N (cells). The virtual queue VQ i feeds the output queue OQ i , whose server is the switch output O i . A graphical representation of the interaction among these queues is given in Figure 8.10. Ii Oi n j Ih n k Oj Ik j j j Virtual queue Om Input queues Output queues Figure 8.10. Queue model for the non-blocking IOQ switch The analysis always assumes a FIFO service in the input, virtual and output queue and, unless stated otherwise, an infinite switch size is considered ( N = ∞ ) . Thus the number of cells entering the virtual queues in a slot approaches infinity and the queue joined by each such cell is independently and randomly selected. Furthermore, since the arrival process from individual inlets to a virtual queue is asymptotically negligible, the interarrival time from an input queue to a virtual queue becomes sufficiently long. Therefore, the virtual queues, the output queues, as well as the input queues, are mutually independent discrete-time systems. Owing to the random traffic assumption, the analysis will be referred to the behavior of a generic “tagged” input, virtual and output queue, as representative of any other queue of the
  16. 296 ATM Switching with Non-Blocking Multiple-Queueing Networks same type. Let p i , p v and p o denote the probability that a packet is offered to the tagged input queue, that a packet leaves the tagged input queue and that a packet leaves the tagged output queue, respectively. If the external offered load is p, then p i = p . As usual, the three main performance parameters will be computed, that is the switch capacity ρ max ( ρ max ≤ 1 ) , the average packet delay T ( T ≥ 1 ) and the packet loss probabil- ity π ( 0 ≤ π ≤ 1 ) . These two latter measures will be computed as T = E [ ηi] + E [ ηv] + E [ δo] (8.1) po π = 1 – --- = 1 – ( 1 – π i ) ( 1 – π o ) = π i + π o – π i π o - (8.2) pi where η denotes a waiting time, δ a queueing time (waiting plus service) and the subscripts i, v and o indicate the input, virtual and output queue, respectively. Note that η i denotes the time it takes just to enter the HOL position in the input queue, as the waiting time before winning the output contention is represented by η v . The analysis will be first developed for the case of an output queue size equal to the speed- up factor and infinite input queues and then extended to the case of arbitrary capacities for input and output queues. 8.1.2.1. Constrained output queue capacity Our aim here is to analyze the IOQ switch with backpressure, output queue capacity equal to the switch speed-up ( B o = K ) and very large input queue size ( B i = ∞ ) , so that p v = p i [Ili90]. Note that the selected output queue capacity is the minimum possible value, since it would make no sense to enable the switching of K packets and to have an output queue capac- ity less than K. Rather than directly providing the more general analysis, this model in which the output queues has the minimum capacity compatible with the switch speed-up has the advantage of emphasizing the impact of the only speed-up factor on the performance improvements obtained over an IQ switch. Let us study first the tagged input queue, which is fed by a Bernoulli process. Therefore it can be modelled as a Geom ⁄ G ⁄ 1 queue with arrival rate p i = p and service time θ i given by the waiting time η v it takes for the tagged packet to begin service in the tagged virtual queue plus the packet transmission time, that is θ i = η v + 1 . Using a well-known result [Mei58] about this queue, we get 2 pi E [ θi ( θi – 1) ] pi ( E [ ηv ] + E [ ηv] ) E [ η i ] = --------------------------------------- = -------------------------------------------------- - - (8.3) 2 ( 1 – pi E [ θi] ) 2 ( 1 – pi E [ ηv + 1] ) in which the first and second moment of the waiting time in the virtual queue are obtained from the study of the tagged virtual and output queue. Let us introduce the following notations: A m number of packets entering the tagged virtual queue at the beginning of slot m; R m number of packets in the tagged virtual queue after the packet switch taking place in slot m;
  17. Combined Input–Output Queueing 297 V m number of packets switched from the tagged virtual queue to the tagged output queue in slot m; Q m number of packets in the tagged output queue at the end of slot m, that is after the packet switch taking place in slot m and after the eventual packet transmission to the corresponding switch outlet. Based on the operations of the switch in the BP mode, the state equations for the system can be easily written (an example is shown in Figure 8.11 for R m – 1 > 0, Q m – 1 = 3, B o = 3 ): Rm = Rm – 1 + Am – Vm V m = min { B o – max { 0, Q m – 1 – 1 } , R m – 1 + A m } (8.4) Q m = max { 0, Q m – 1 – 1 } + V m Tm-1 Rm-1 Qm-1 Virtual queue Output queue Am Vm Rm Qm Tm Figure 8.11. Example of tagged VQ-OQ switching for Bo=K These equations take into account that the packet currently transmitted by the output queue to the switch outlet occupies a position of the output queue until the end of its transmission. Then, the cumulative number T m of packets in the tagged virtual queue and tagged output queue is given by T m = R m + Q m = R m – 1 + max { 0, Q m – 1 – 1 } + A m (8.5) Note that according to this equation, if one packet enters an empty tagged virtual queue and the output queue is also empty, then T m = 1 . In fact, the packet is immediately switched from the tagged virtual queue to the tagged output queue, but the packet spends one slot in the tagged output queue (that is the packet transmission time). Since Q m = 0 implies R m = 0 , Equation 8.5 can be written as T m = max { 0, R m – 1 + Q m – 1 – 1 } + A m = max { 0, T m – 1 – 1 } + A m (8.6)
  18. 298 ATM Switching with Non-Blocking Multiple-Queueing Networks As with pure input queueing, the procedure described in [Kar87] enables us to prove that the random variable A m , also representing the number of packets becoming HOL in their input queue and addressing the tagged output queue, has a Poisson distribution as N → ∞ . Since Equation 8.6 describes the evolution of a single-server queue with deterministic server (see Equation A.1), the system composed by the cascade of the virtual queue and tagged out- put queue behaves as an M ⁄ D ⁄ 1 queue with an infinite waiting list. Note that T m represents here the total number of users in the system including the user currently being served. In order to compute the cumulative delay spent in the tagged virtual queue and tagged output queue, we can say that a packet experiences two kinds of delay: the delay for transmit- ting the packets still in the virtual queue arrived before the tagged packet, whose number is R m – 1 , and the time it takes for the tagged packet to be chosen in the set of packets with the same age, i.e. arriving in the same slot. By still relying on the approach developed in [Kar87], it is possible to show that the cumulative average delay in the tagged virtual queue and tagged output queue is given by the average delay (waiting time plus service time) of an M ⁄ D ⁄ 1 queue with arrival rate p, that is pv E [ η v ] + E [ δ o ] = ----------------------- + 1 (8.7) 2 ( 1 – pv) However, a more detailed description of the virtual queue operation is needed since the average waiting time in the input queue is a function of the first two moments of the waiting time in the virtual queue (see Equation 8.3). Since it has been proved [Ili90] that E [ R] E [ η v ] = -------------- - p 2 2 E [R ] E [ η v ] = ---------------- - p we simply have to compute the first two moments of the virtual queue content. The assump- tion B o = K greatly simplifies the computation of these two moments, since the occurrence of at least one empty position in the tagged output queue implies that the tagged virtual queue is empty. So, the only state variable T m fully describes the content of the two queues, which can be seen as a single queue with the first B o positions representing the output queue and the other N positions modelling the virtual queue (Figure 8.12). Therefore ∞ ∞ E [ R] = ∑ i ⋅ Pr [ T = B o + i ] = ∑ ( i – Bo) ti i=1 i = Bo + 1 (8.8) ∞ ∞ ∑i ∑ 2 2 2 E [R ] = ⋅ Pr [ T = B o + i ] = ( i – Bo) ti i=1 i = Bo + 1
  19. Combined Input–Output Queueing 299 VQ OQ N Bo Figure 8.12. Combined representation of tagged VQ and OQ for Bo=K which after some algebraic manipulations become Bo – 1 E [ R] = E [ T] – Bo ( 1 – t0) + ∑ ( Bo – i) ti i=1 (8.9) Bo – 1 ∑ 2 2 2 2 E [ R ] = E [ T ] – 2B o E [ T ] – B o ( 1 – t 0 ) – ( Bo – i) ti i=1 The numerical values of the first and second moment of the variable T, as well its distribution t i , are those of an M ⁄ D ⁄ 1 queue with internal server (see Appendix). The asymptotic throughput ρ max of the switch, i.e. its capacity, is obtained by computing the limiting load conditions for the input queue, that is the root of the function F ( p i ) = 1 – p i – p i E [ η v ] (see Equation 8.3). The average packet delay is obtained as the sum of the waiting times in the three queues increased by one (the service time, see Equation 8.1). No cell loss takes place as the backpressure prevents loss in the output queues, in spite of their finite capacity, and infinite input queues are lossless by definition. The switch capacity ρ max is shown in Figure 8.13 for increasing values of the speed-up K. It increases starting from the value ρ max = 0.58 for K = 1 , which means pure input queue- ing (output queues are not needed as each output port controller receives at most K = 1 packet per slot). The increase of the maximum throughput is not as fast as we would have expected, since a speed-up K = 19 is needed to have ρ max > 0.9 . The average packet delay T, given by Equations 8.1, 8.3 and 8.7, is plotted in Figure 8.14 as a function of the offered load p. Again, the curve with K = 1 refers to the case of pure input queueing, whereas the delay approaches the behavior of an M ⁄ D ⁄ 1 queue as K → ∞ . In fact if the output queue has an infinite capacity the tagged virtual queue is always empty and A m , whose distribution is Poisson, describes directly the arrival process to the output queue. In the following section we will see that a much better performance can be obtained, even with a very small speed-up value, by simply providing a large output queueing capability. 8.1.2.2. Arbitrary input and output queue capacities Now the more general analysis of the IOQ switch with arbitrary size B i of the input queue and B o of the output queue is provided [Pat93]. The study of the tagged input queue is first developed and then the behavior of the tagged virtual and output queues is analyzed.
  20. 300 ATM Switching with Non-Blocking Multiple-Queueing Networks IOQ BP - N=∞, Bo=K 1.00 Maximum throughput, ρmax 0.90 0.80 0.70 0.60 0.50 0 5 10 15 20 25 30 35 Speed-up value, K Figure 8.13. Switch capacity of a BP IOQ switch when Bo=K IOQ BP - N=∞, Bo=K 102 K=1 2 4 8 16 32 Average packet delay, T 101 M/D/1 100 0.0 0.2 0.4 0.6 0.8 1.0 Switch throughput, ρ Figure 8.14. Delay performance of a BP IOQ switch when Bo=K
Đồng bộ tài khoản