Mạng và viễn thông P36

Chia sẻ: Hug Go Go | Ngày: | Loại File: PDF | Số trang:19

Thêm vào BST

Báo xấu

108
lượt xem 5
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Maintaining the Network into the design of a network, and no matter how reliable the individual components are, corrective action will always be required in some form or another, to prevent or make good network and component failures, and maintain overall service standards. However, attitudes towards maintenance and the organization behind it vary widely, ranging from the ‘let it fail then fix it’ school of thought right through to ‘prevent faults at any cost’

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Mạng và viễn thông P36

Networks and Telecommunications: Design and Operation, Second Edition. Martin P. Clark Copyright © 1991, 1997 John Wiley & Sons Ltd ISBNs: 0-471-97346-7 (Hardback); 0-470-84158-3 (Electronic) 36 Maintaining the Network No matter how much careful planning goes into the design of a network, and no matter how reliable the individual components are, corrective action will always be required in some form or another, to prevent or make good network and component failures, and maintain overall service standards. However, attitudes towards maintenance and the organization behind it vary widely, ranging from the ‘let it fail then fix it’ school of thought right through to ‘prevent faults at any cost’. This chapter describes a typical maintenance regime in its philosophical, organizational and procedural aspects. 36.1 THE OBJECTIVES OF GENERAL MAINTENANCE As succinctly stated by ITU-T, the objective of a general maintenance organization to is minimize the occurrence of failures. and to ensure that in case of failure 0 the right personnel can be sent to 0 therightplacewith 0 therightequipment at 0 therighttimetoperform 0 theright corrective actions. 36.2 MAINTENANCE PHILOSOPHY Inpursuingthese objectives, the wise networkoperator establishes a maintenance philosophy closely linked overall with targets for network quality for and the 663
664 MAINTAINING THE NETWORK proportion of time that the network is intended to be fault-free (available). In this task he will take due account of network economics and the most likely causes of failure. Networks fail for all sorts of reasons; common examples are 0 cable or connector damage or disturbance 0 equipment overheating 0 electronic component failure 0 mechanical equipment jamming, or other failure 0 mechanical wear 0 dirty (high resistance) relay or switch contacts (a diminishing problem as electro- mechanical exchanges are withdrawn) 0 powersupplyloss 0 vandalism(e.g.topublicpayphones) 0 software errors 0 erroneous exchange data 0 poor connections between cables or other components (e.g. dry soldered joints) 0 interference(e.g.due toelectromagneticdisturbances,recentstandards on E M C , electromagetic compatibility, are designed to ensure that equipment does not cause electromagnetic disturbance and is itself not unduly sensitive to such interference, i.e. is electromagnetically protected) Each cause of failure has its own cure, but broadly speaking, there are three main approaches 0 corrective maintenance 0 preventive maintenance 0 controlled maintenance Correctivemaintenance is carriedoutafterthe failurehas been diagnosed,and it consists of the repair or replacement of faulty components. Preventive maintenance is to eliminate accumulation the of faults.Preventive maintenance usually consists of routine testing and correction of working equipment (as opposed to failed equipment) to prevent degradation in performance before any failure actually occurs. Controlled maintenanceis a more systematic approach, combining both the corrective and preventive methods. The underlying philosophy of controlled maintenance is to prevent network failure. This is done by using special analysis techniques to monitor day-to-day network performance and degradation, thereby avoiding maintenance work.
MAINTENANCE ORGANIZATION 665 The advantage of controlled maintenance is that it concentrates on areas where the customer is likely to benefit most, and it reduces the extent of preventive maintenance andthe complications of correctivemaintenance.When new networksare being designed or extended, or when new capabilities are being added to existing networks, consideration needs to be given to the maintenance philosophy andto the organization, the maintenancefacilities, and thetest equipment thatwill support it. The best controlled mix of both corrective and preventive philosophies depends the number and nature on of problems. These, in turn, depend on the overall network structure and the component equipment types. So, in the days of widespread electromechanical switches and relays of when a frequent cause failure was mechanical wear, a good deal time was spent on of preventive type maintenance, oiling themovingparts as it were. Nowadays,when hardware faults in modern electronic equipment are relatively rare and software faults often take some to present themselves (the exchange operating fault-free for extended periods between occurrences), a corrective philosophy is adopted. Faulty component boards arecompletely replaced without even attempting a diagnosis, and software faults are debugged as they arise. 36.3 MAINTENANCE ORGANIZATION Real networks are in a constant state change throughout their lives. To matchtraffic of demand, new circuits are continually established between exchanges. Established cir- cuits may need to be re-arranged as transmission systems are upgraded or taken down and faulty equipment needs to be repaired, replaced or avoided by a diversion. For optimum efficiency the organizations set up to establish these maintenance tasks, and the tools with which they are provided, should be planned in such a way that lifetime costs will be minimized. There is a choice, for instance, between paying more at the start for expensive but reliable equipment, or using cheaper equipment and incurring higher ongoing running costs. Lifetime cost analysis must include 0 initialcost of equipment 0 cost of spares and test equipment 0 ongoing running and maintenance costs 0 costs associated with periods of lost service High wages and skills shortages in recent years have weighed the scales in favour of using more reliable equipment and a smaller maintenance workforce. Indeed, in some instances the field workforce has been pared to the minimum of two people, one worker plus a stand-in to cover annual leave and periods of sickness. Some observers question the sense of this, pointing to the fact that so complex are the devices, so computerized the routine activities and so rare the faults, that the field maintenance staff often do not have the experience to cope. For thisreasona comprehensive headquarters maintenance support, technical support or back-up organization (sometimes called second line support or third line
666 MAINTAINING THE NETWORK support) is needed in addition to the direct maintenance staff, to perform the following functions. 0 To provide detailed equipment and maintenance documentation. 0 To provide maintenance training on new equipment document and ‘fail-safe’ maintenance procedures (explaining the general methods be adopted, and making to sure that unintended disturbance to other customers is not caused by maintenance action). 0 To develop and put in place a fault-reporting procedure. 0 To repair complicated items of equipment or resolve complex software problems. 0 To develop and procure the necessary test equipment. 0 To maintain an appropriate store of spare parts and to call for re-design of poor equipment, taking duly into account the failure rate of each item, the number of items in operation, and the actual repair turn around time(e.g.repairtime, or delivery time for a part not held in stock), and calculating the service and revenue risk if no spare part is available. 0 To developandmaintainanequipmentidentificationandinventory scheme for tracking equipment in use and spare equipment either stock or on order (in addi- in tion, maintenance staff in different exchanges need be able to indicate faulty lines to or circuits to one another). 0 Topreparea list of contactpoints andtelephonenumbersthrough which the maintenance staffs in different maintenance centres may communicate. The direct maintenance workforce is usually collocated with the exchange, some staff being switching experts while others have transport so that they can go out and deal with exterior plant problems. The staff located within the exchange provide a ‘control point’ for new circuit lineups, and for the initial reporting and diagnosis of faults. The number of staff located at any exchange depends on the size, complexity and reliability of the exchange. Not every exchange can justify its own on-site maintenance staff,andout-stationed staffmay be postedtotheexchangeeitheronaregular preventive maintenance schedule, or simply when there is a fault. 36.4 CENTRALIZED OPERATIONAND MAINTENANCE A modern practice, aimedreducing maintenance at the workforce, is to leave exchanges unmanned. Computer technology and extended alarms allow staff ina single centralized operation and maintenance (CO&M)centre to monitor and control a number of differentexchangesin real time (giving instantaneousand live control of each exchange).Figure 36.1 illustratesatypicalcentralizedoperationandmaintenance scheme.
LINING UP ANALOGUE ANDANALOGUE/DIGITS MIXED CIRCUITS 667 Remote exchanges * Centralized operation and e maintenance centre M a i nCe n ap u t e r t om nce staff * * m- - Data links to monitor a n d c o n t r o l the exchanges Figure 36.1 Centralizedoperation and maintenance Centralized operation and maintenance (CO&M) can be introduced only when the remoteexchanges are computer-controlled, and are designed to be capable ofself- diagnosis of faults. A datalink back to a computer at the centralized operation and maintenance centre allows the maintenance staff to monitor the exchange performance, noting any problems and applying any necessary controls. UnderCO&M scheme, the a exchanges are designed with duplicated items of equipment, which remain idle until they are activated electronically to take over the functiona failed item. The exchange of can thus continue to work at load, while a member of the maintenance team is sent full out to the exchange site, to repair the faulty equipment,or to replace it completely by a circuit-board change. 36.5 LINING UP ANALOGUE AND MIXED ANALOGUE/DIGITS CIRCUITS The transmission links between exchanges are commissioned (or lined up) using a two- stage method as follows. First, the lineplant itself is established in sections which are tested and calibrated in turn, and then connected together. A number of reference measurements are made along the entire length to check and calibrate the overall end- to-end performance. When the line system as a whole been established, multiplexing has and other terminal equipment is applied to its ends to obtain the individual circuits or groups which may be tested individually. The individual circuit testing is necessary because, as Figure 36.3 shows, a real circuit (or group) likely to traverse a numberof is line systems,which may interact adversely.So although each section in isolation may be within limits, the combination may not so. The calibration of each group and circuit be is thus carried out on an end-to-end basis.
668 MAINTAINING THE NETWORK Figure 36.2 Maintenance workstation. The AT&T SESS telephoneexchangenowincludesa video monitor that displays colour-coded diagrams, eachwhich indicates the status a certain of of part of the system.A central office technician is checking the status digital trunks.(Courtesy o of f AT&T)
LINING UP ANALOGUE ANDANALOGUE/DIGITS MIXED CIRCUITS 669 I I r---J I I l l
670 MAINTAINING THE NETWORK Figure 36.3 shows four exchangesA, B, C and D, located in transmission centres a, b, c and d. From the diagram of Figure 36.3 we see that theexchanges are configured inset topologically in a fully interconnected manner. Each exchange has a circuit to every other exchange but the total of six circuits has been achieved with the use of three line systemsonly: a-b, b-c and b-d. Wherethetransmissioncentresarenotdirectly interconnected by a line system, a circuit been provided by the concatenation of two has linesystems,witha jumper wire completing the connection across the intermediate transmission centre. Thus for example the circuit from exchange A to exchange C uses line systems a-b and b-c, and a jumper wire across transmission centre b. The number and diversity of calibration measurements and adjustments necessary during circuit line-up,and the amountof deviation allowed in the ongoing values depend on the type of circuit (e.g. analogue or digital), and on the use to which the circuit is being put (e.g. voice or data). High grade data circuits, for example, have more strin- gent line conditioning requirements than simple voice grade circuits. The measurements andadjustmentsensurethatthe circuitconformswiththetransmissionplan (see Chapter 33). Thus on analogue and mixed analogue/digital line systems and circuits line-up, measurements will be made of 0 overall loss in signal strength (in dB) 0 amplitudeloss/frequency attenuationdistortion 0 group delay (particularly if the circuit is to be used for data) 0 noise,crosstalk,echo, etc. 0 inter-exchangesignallingtests (if appropriate) Various equipments, including amplfiers, equalisers, filters, and echo controllers, are thenadjustedtobringthelineconditionswithinthesetlimits,usingmeasuring equipment as follows 0 signal tone generators (calibrated for precise frequencies and signal strengths) 0 calibrated frequency and signal strength detectors 0 noise meters 0 equipment for inter-exchange signalling or data protocol testing First, the end-to-end circuit loss is determined by sending a calibrated 1020Hz (or in purelyanaloguenetworks,800Hz)signal of aknownstrength,andmeasuringthe received strength of this allows the circuit amplification to be adjusted accordingly. Next, a range of different calibrated signal frequencies across the whole circuit band- width is sent, and the received signal strengths are again measured. This allows the frequency distortion equalizers to be adjusted.Thenthegroup delaydistortion is correctedusing group delayequalizers.Thisequalization is particularlyimportant for high speed modem data circuits, and it is achieved by measuring the relative phase ofdifferentfrequenciesrelative to the1020Hz(or800Hz) signal.Followingthis,
DE HIGH 671 psophometric noisechecks conducted, are together tests with of inter-exchange signalling systems or of data protocols (e.g. X.25, frame relay or IBM’s SNA), and finally a test call is established. Where a pure tone signal of calibrated strength needs to be injected into the digital part of a mixed analogue/digital connection or network, this can be done either using an anlogue tone sender and an analogue/digital converter, or by the use of a digital referencesequence ( D R S ) . A digitalreferencesequence is adigitalbit patterncor- responding to a particular analogue signal frequency and strength. Such a pattern is easy to store in computer-like memory and is a very reliable means of reproducing an accurately calibrated signal. 36.6 HIGH GRADE DATACIRCUIT LINE-UP In high grade datacircuits which use modems over analogue or mixed analogue/digital plant, a number of extra line-up measurements may be necessary, as follows m weighted noise m notched noise m impulse noise m phase hits and gain hits m harmonicdisturbance (orinter-modulationnoise) m frequencyshiftdistortion m jitter Weighted noise is a measure of the noise inthe middle of the channel bandwidth. Noise frequencies this in range are most likely to cause modem errors. Psophometric (European) and C-messageJilters (United States) are used to measure this type of noise. Notchednoise is measured by applyingapurefrequencytone at oneend and removing it with a notchJilter at the other; the remaining noise is then measured. The notched noise itself arises from the way in which signals have been digitized or other- wise processed over the course of the link. It is thus similar to quantization distortion, which was discussed in Chapter 5. Impulse noise is characterized by large ‘spikey’ waveforms and arises from unsup- pressed power surges or mechanical switching noise. It is most common on electro- mechanical switched networks. Phase hits and gain hits are intermittent but only moderate and relatively short (less than 200 ms) disturbances in the phaseor amplitude of a signal. Typically, less than 10 should be recorded in a 15 minute test period. Special test equipment is required. More serious gain hits are called dropouts. Gain hits are most troublesome in voice use; phase hits manifest themselves as bit errors in data signals.
672 MAINTAINING THE NETWORK Harmonic disturbance may result from the intermodulation of two different signal frequencies F1 and F2 when passed through nonlinear processing devices. New stray signals of frequencies F1 + F2, F1 - F2, F1 + 2F2, etc., are produced. This type of disturbance is measured by a spectrum analyzer. Frequency shift is also measured by a spectrum analyzer. Frequency shift obviously is a problem for a data modemif the frequency received differs to such an extent from that sent as to be mis-interpreted. It is most likely to occur when a carrier system or other frequency modulating signal processing has been used. Jitter or phase jitter arises when the timing of the pulses on incoming data signal varies slightly, so that the pulse pattern is not quite regular. The effects of jitter can accumulate over a number of regenerated links, and they result in received bit errors. It canbe reduced by reading the incoming data into a store and then reading it back out at an accurate rate, using a highly stable clock controlledby a phase-locked loop ( P L L ) circuit. l a ) V22 bis modem - perfect 1 b ) Noisy signal-cloudy undistorted signal - pattern of dots appears as clean’dots’ l c ) Signal affected by phase ( d ) Signal affected by gain hits-appears as circular hits-appears as radial streaks streaks Figare 36.4 Detecting analogue line disturbances using constellation diagrams
LINING UP DIGITAL CIRCUITS 673 All of the above parameters should be checked when the line systemor circuit is first established. Problems are likely to reflect poorly designed or poorly installed equip- ment, and the best remedy is prevention: check the quality of work which is made, because correction circuits are expensive and not entirely effective. In Chapter 9 we introduced the idea of constellation diagrams for modems. We are now in a position to illustrate their practicaluse for detecting noise, phase hitsand gain (or amplitude) hits on analogue lines employing modems. Special test equipment may be connected at the receiving end of the line in place of the receiving modem. The test equipmentdisplayson an oscilloscope-likescreentheconstellation pattern of the received data signal. Disturbances appear as shown in Figure 36.4. As is apparent from the patterns of Figure 36.4, the problem with noise, phase and gain hits is that if they become too great, they result in the incorrect interpretation of the signal; the received bit pattern then includes errors. 36.7 LINING UP DIGITALCIRCUITS Digital systems line and their tributary bitstreams must conform to adifferent transmissionplanfromtheiranalogueequivalents and are therefore lined up ina different manner. The important parameters, as we saw in Chapter 33, are e the biterrorratio (BER) e thenetwork synchronization e the quantization or quantizingdistortion In practice, the network synchronization (i.e. the jitter and clock accuracy) and the quantization distortion are set by the design of a circuit and can be improved only by re- design. The lining-up process can only address the questionof error rate. Theaccuracy of synchronization depends on the use of highly stable clocking sources and inter- exchange synchronization links, as described in Chapter Quantization distortion is a 33. form of noise, which affects analoguevoice and data signals when they are carried over digital media using pulse code modulation ( P C M ) and other signal processing tech- niques (see Chapter 5). It can be reducedonly by re-designingthecircuit to avoid multiplesignalconversions(e.g.analogue to digitalconversion, signalcompression, etc.). In the case of digital data circuits, these should never be designed to include any form of signal processing because digital data devices are intolerant of the high biterror rates that arise from quantization distortion. Standard practicefordigitalcircuitline up is to performatest of digital error performance. This may involve a lengthy stability test over several days simply be a or quick-check (15 minute) test. A pseudo-random bit pattern generator (a digital signal generator) is used to provide a test digital signal. During the test the proportion of bit errors (the bit error ratio, or B E R ) is measured, along with the proportion error free of seconds ( E F S ) .Expected values are typically no more than 1 error in 10’ or 109for BER and at least 99.5% EFS. The errors, if excessive, may have arisen as the result of any number of different impairments. Some causes can be eliminated easily (e.g. by increasing the transmitted power to reduce the effect of noise). Other causes need more radical circuit checks.
674 MAINTAINING THE NETWORK Like their analogue equivalents, digital bit streams are lined-up in two stages, first at higher order (e.g. at 140 Mbit/s, if this is the line system bit rate)and then on an end-to- end basis for each tributary stream (e.g. 2 Mbit/s, 1.5 Mbit/s). The secondary checking of lower order tributary streams is necessary for the same reasons as in the comparable analogue case exemplified by Figure 36.3. 36.8 PERFORMANCE OBJECTIVES In recognition of the fact that practical networks can never match idealized perfor- mance objectives, ITU-T recommendation G. 102 sets out different operating limits for the target performance objective, the design objective, the commissioning objective (also known as the line-up limit) and the maintenance limit. The performance objective is the performance ideal level for particular the application. The designobjective transfersthisobjectiveintoarealizabletargetrange,within which economically designed equipment can be expected to operate, given optimum conditions of power supply, temperature, humidity, etc. The line-up limit recognizes that the optimum conditions can rarely be achieved in practice, but nonetheless sets a stringent practical range within which the equipment must operate on first establishment. Any faults identified during line-up, which cause thecircuit to operate outside thisrange,shouldbecleared and the circuitre-lined, before taking it into service. The maintenance limit is the least stringent range of operating conditions, but still within a ‘tolerable’ range as far as the application is concerned. The limit is chosen so that, if exceeded, a fault is considered to exist. The fault should be cleared and the circuit re-lined-up. As Figure 36.5 shows, each of the last three performance ranges described is a slight relaxation of its predecessor, so giving targets which are achievable in practice. Thus even if a slight degradation in quality occurs after circuit line-up, the circuit still operates within the maintenance limit operating range. range Unusable Measured parameter value \\\\\\\\\\\\\\\ __---- ’ I
MAINTENANCE POINTS’ 675 A steady deterioration in performance items inservice can be expectedas the result of of operational use, occasional overload and general ageing as well as genuine faults. Often the degradation is slow enough to be imperceptible and not to warrant routine checks. For thisreason,it is commonpracticeto use automaticalarmstoalert maintenance staff to the need forcorrectiveaction, so that faults can promptly be eliminated and the equipment returned to its line-up operating range. Two levels of alarmare possible,one at the maintenancelimit level, andoneatthe ‘unusable’ performance level. Faults on the latter level clearly need more urgent attention than those on the former. 36.9 MAINTENANCE ‘ACCESS POINTS’ For the purpose of network lining-up and subsequent maintenance, it is necessary to provide a number of test access points for maintenance. Ideally, test access points are provided at a number of different points in a network to enable easy localization of faults and initial segment-by-segment circuit alignment. A very large number of test points, increases the overall networkand equipment costs, and may in itself be a source ofunreliability.Figure 36.6 showspossibletest access arrangementsforasimple network of two exchanges (one analogue and one digital) interconnected by a digital transmissionlink. Note howtest access pointshavebeenprovidedbetweenallthe major items of equipment. This allows the causes any faultsor line-up problems to be of quickly narrowed down. As is typically the case, both exchangesinFigure 36.6 havebuilt-intest access equipment for diagnosing and correcting faults within the exchange itself. In addition, Transmission Analogue Transmission Digital exchange cent re centre exchange Switch Digital matrix transmission line Cross connict frame TAE - testaccessequipment ( b u i l t intotheexchange) A/D - analogue to d i g i t a l conversion equipment @ - test access point Figure 36.6 Testaccesspoints. TAE, testaccessequipment(built into theexchange); A/D, analogue to digital conversion equipment; T, test access point
676 MAINTAINING THE NETWORK five external test access points enable the exchange’s external conditions to be verified and faults in other equipment along the link to be localized, diagnosed and corrected. 36.10 LOCALIZING NETWORK FAULTS In anefficient maintenance organization, faults are discovered before the customer finds them. In this way some of the faults can be correctedeven before the customer is aware of them. Means for fault detection include 0 equipment alarms 0 equipmentroutinetesting(so-called routining) 0 live networkmonitoring 0 customerfaultreporting As before, a combination of techniques is usually appropriate. The four are discussed in turn.Equipment alarms are used to indicate an abnormal state (i.e. when the maintenance limit has been exceeded) in an equipment or an environment of crucial importance. environment (An fault might be too hightemperature: a computer equipment is prone to failure under such conditions.) Alarms are usually ranked in order of importance, and areusually generated by continuous monitoring of some type of ‘heartbeat’ signal. When the heartbeat stops, it is time for action. For example, the pilot signal on analogue transmission systems is a low level and single tone frequency outside the normal bandwidth range which is continuously transmitted. Absence of the pilot signal at the receiving end is interpreted as a transmission link failure, and it is usually notified to maintenance staff as an audible and/or visible alarm). Equipment routine testing, on the other hand, is suited to any equipment which is fault- or wear-prone(e.g. mechanical equipment). It can be carried out manually or by automatic test equipment, the choice depending on the number and type of devices, their complexity, and the perioridicity of tests. For routine network testing, ITU-T has described a number of automatic transmission, measuring and signalling test equipments ( A T M E ) in its 0-series of recommendations. Live networkmonitoring helps to pick up transientfaultsand thenavertthem; network overload can thus be foreseen, and its adverse effects corrected, as we shall see in Chapter 37. Usefulmeans of monitoring live networkperformanceincludecall completion rate and congestion statistics, and quality sampling of a small number of connections. For example, a sudden flood of calls or a rapid dropin the call completion rate may indicate onset the of congestioncaused by network failure or quality degradation. Not all faults, however, can be detected other than by the user, and so in the last resort an efficient customer or user reporting fault and handling procedure is paramount. After detecting a fault, the next step is to localize it and diagnose its nature, all of which may be directly apparent or correctly reported, but usually more diagnosis is required. For example, when a major transmission system fails, a whole host alarms of
LOCALIZING 677 may go off in the maintenance centre,indicating failure only not of the main transmission system, but also of each of the derived circuits or tributaries. The alarms must be cleared in priority order. Correcting the main transmission fault may clear first the alarms on each individual tributary, but will not doso if lesser problems remain on particular tributaries. Thus, in our example, maintenance staff should first attempt to localize the main transmission failure to a given section of the link. This they can do by making use of the various test access points available to them, andby liaison with staff in other maintenance centres through which the link passes. The example is illustrated in Figure 36.7, where the failure in one of the channels in section A-B and the link A-B-C has been detected by maintenance staff in station C. The absence of the main transmissionpilot signal inthe receive channel at stationChas raised analarm. Telephone liaison with the staff in station B reveals that though this alarm actually notifies one of the loss of a whole analogue or digital group of circuits, it is evident that this is the likely cause of the problemon the particular A-C channel. If instead the fault had lain on the transmission link between stations B and C, the alarm in station B would not have gone off. Furthermore, if the fault had been in station C’s transient channel, the alarms would have gone off at one or both of stations A and B. Some faults in switched networks, can be extremely difficult to trace. A common problem that is encountered when trying to locate faults in switched networks is trying to trace which precise links and exchanges were traversed at the time of the fault. It is like knowing that your friend is driving between London and Birmingham but not knowing which route he took. You know he has broken down, but where do you look first? Faults of this nature can persist for many months and arefinally cleared, either as the result of routine maintenance, or because routine testing of one of the individual network components revealed the fault. Not all switched network faults are difficult to trace. For example, a phenomenon known to all experienced maintenance staff is the occurrence of a killer trunk. Imagine that the first choice circuit on the route between themainLondon-to-Birmingham telephone routehas become faulty.Imaginealso thatthefault resultsin any call attempting to use the circuit being immediately failed and released. Affected callers receive nothing at all! N o so bad, you might think: one circuit faulty in hundreds will not make much difference. However, the fault has an incredibly wide-ranging effect, nearly ‘killing’ all the traffic on the route. The reason is that, being the first choice circuit, all calls will attempt to seize it before trying other circuits. Of course, any call that attempts to do so is immediately failed and released. So nearly all calls fail! Hence Alarm ‘Pilot’raises raised in statlon C V 0 0 Station A ~ ~ Station B~ ~ S t a t~o n i C ~ A link failure Figure 36.7 Transmissionfailureand alarm
678 MAINTAINING THE NETWORK the name killer trunk. To the experienced maintenance man, however, the killer trunk phenomenon shows up like a sore thumb, because traffic records reveal 0 a huge number of short holding time calls 0 a very low call success (i.e. answering) rate 0 very littleoveralltrafficinerlangs 0 virtually no activity on manycircuits The condition can be cleared in the first instance simply by busying out the circuit (i.e. making it unavailable for use). A repair can then be carried out. The worst effects of a killer trunk can also be prevented by making sure that circuits within the route are chosen randomly, not always scanned in the same order. Before leaving the subjectof network test points and fault localization procedures we should also mention the use of loopback techniques for detecting faults within the tail part of a circuit between the network operator’s site and the end user’s terminal. This part of the circuit can be the most inaccessible (say in unmanned premises), may be or it subject to quite adverse conditions in congested ducts, strung between telegraph poles or even draped across desks. It is just as prone to failure as any other section, and the network operator needs to be able to localize faults here as anywhere else, ideally being able to distinguishsuchfaultsfromfaultsinthe user’s terminalequipment,and preferably without even visiting the user’s location. What makesthis possible is a circuit loopback or a responding equipment. A loopback simply loops the user’s receive channel directly to the transmit channel,so enabling the network operator to send a test signal both ways along the tail and to confirm its correct return. Loopbacks may be either manually operated by the user (on request from the network operator) or invoked by the network operator using a Normal connections User‘s receive channel User’s ‘ l e s t loopback‘ terminal Figure 36.8 Testing a connection tail using a loopback
HARDWARE FAULTS 679 2713 Hz tone orequivalent signal to switch a remotely controlled equipment within the line termination socketat the user’s premises. As we see from Figure 36.8, the loopback enables the faults to be localized beyond doubt as being either on the line or in the user’s terminal equipment, all without a visit to the user’s premises. Loopbacktechniques are widely used onpoint-to-pointdata connections,where users are intolerant of any significant downtime.PTOs provide them on thetails of their private leased circuits, and other data network providers put them on their equipment circuits. Loopbacks for modems are described in ITU-T Recommendation V.54. Another way of testing lines to remote unmanned locations is to use responding equipment which answers calls made to the location, giving standard test signals and other responses in accordance with commands. 36.11 HARDWARE FAULTS Historically most the common faults come have frommechanicalequipment or hardware. Some maintenanceorganizations are well prepared deal such to with component failures, and general wear and tear, by preventive action or by repair, but it must be recognized that the semiconductor age has brought with it much greater levels of reliability and complexity and much greater risk associated with failure. All this has forced change on the handling of software faults. Nowadays, it is common to design electronic equipment to be tolerant of faults; computer processors and network exchanges have their crucial parts duplicated, the two halves sharing the traffic load and continually monitoring one another for faults. If a fault is detected in either half, then itis shut down and the other half takes over the full traffic load until the faulty half is repaired by maintenance staff, in response to the alarm. Self-diagnosis software within the computer or exchange is used to localize the fault to a particular circuit board. Correction achieved merely by sliding the board out and is replacing it with a new one. The processor or exchange can then be restored to normal working, and thecircuit board can be repaired at leisure, sent back to the manufacturer or thrown away. 36.12 SOFTWARE FAULTS Errors in computer programs (or software) are becoming the most common cause of faultsintelecommunicationsequipment. Newly developedsoftware is oftenlittered witherrors or bugs, of anunpredictablenaturewhichcometolight in operation, sometimes years later, so that it is difficult to build experience. In addition, because the background of most telecommunications technicians is not in computer programming, the skills for rectification of faults are extremely rare. Software bugs may take some time to find, and they require the expert attention of specialized maintenance support staff andthe originaldesigner,beingoutsidethe competence of normal direct maintenance staff.
680 MAINTAINING THE NETWORK So, you may ask, how do we keep the service on the exchange running while we sort out the problem? The answer is by initiating a manual or automatic restart procedure which re-boots (resets) the whole exchange to allow it to carry on working. This is done by re-loading historical data and software, overwriting any corruptionswhich may have crept in as a result of the software bug. The software and data resident in the exchange processor at the time of failure is downloaded for a later fault diagnosis, which attempts to trace back the processing events leading up to the problem. Meanwhile the exchange is likely to run quite normally for some time, until a similar sequence events conspires of against it again. Localized restarts of minor functional units arelikely to be automatically invoked by the exchange itself to minimize the off-air time, but more (e.g. whole exchange) restarts may require manual initiation, to prevent an automatic restart from disrupting a large number of unaffected calls at an inopportune moment. Thecomputerization of manymaintenancetasksmayhavegreatlyreducedthe human numbers required, but the extra skills and knowledge demanded of those staff that remain make them a prized commodity! 36.13 CHANGE CONTROLPROCEDURE FOR HARDWARE AND SOFTWARE As telecommunications equipment has become increasingly computerized over the last few years, so has the importance of effective change control procedures for the various hardware and software releases. It is normal nowadays equipment for manufacturers further to develop their hardware and software in one year steps, offering new hardware and software releases at least once a year, if not more often. Sometimes these new releases resolve previous functional or operating problems, sometimes bring them they with new service capabilities. Nearly always, even for those releases which are backward-compatible with previous releases, is some sort of system procedure necessary. New releases may be backward-compatible only with the immediately preceding release, so that any older releases of the softwareor hardware may need tobe updated before the new release can be introduced. On other occasions,someform of network,equipmentortopology configuration change may be necessary before the release is installed and activated. new Theact of installing the new release must a becarefullyplanned and executed procedure. The first step in introducing a new hardware or software release to the network should be the verification of its correct functioning and good quality. Here an ofline (i.e. non-live) test network is helpful, not only in checking the correct functioning of new servicecapabilitiesbutalso in the regressiontesting offunctionsalready used extensively within the network (which were already available in a previous release, but may not operate in quite the sameway in the new release). The test network also serves to practice the installation of the new release, refining the most appropriate order of steps to be taken to minimizeservicedisturbance to customers when the release is installed in the live network.
CHANGE CONTROL PROCEDURE FOR HARDWARE AND SOFTWARE 681 Having installed anew release of hardware or software itis very important tokeep an inventory of the hardware and software level installed in each of the individual nodes and equipments making up a network, so that further release updates can be correctly administered and installed. Ideally, each of the nodes should be updated to run at the mostrecent level of hardwareand software.Thismuchreducestheproblemsof different software and hardware vintages having to interoperate with one another, and usually also guarantees better support from the manufacturer should a problem arise. (Thespecialistsalwaystend to be most acquainted with the latest version, tending slowly to forget the idiosyncrasies of older versions). Unfortunately, practical reality does not usually allow this for various reasons 0 the equipment manufacturers’ pricing scheme may charge software according to the number of nodes in which itis installed; if the service benefit is only warranted in a few nodes it may be uneconomic to install it in all the nodes 0 a new software release may demand hardware upgrades which are not affordable 0 services may operate ina different and (froma particular user-perspective) unacceptable way in a new release Experience with software demonstrates that each new release and even each new patch (a software correction intended to eliminate a problem or software error) should be treated with caution. You cannot treat new software as ‘only for the good’; it can also bring problems.