YOMEDIA
ADSENSE
Hardware and Computer Organization- P17
61
lượt xem 4
download
lượt xem 4
download
Download
Vui lòng tải xuống để xem tài liệu đầy đủ
Hardware and Computer Organization- P17:Today, we often take for granted the impressive array of computing machinery that surrounds us and helps us manage our daily lives. Because you are studying computer architecture and digital hardware, you no doubt have a good understanding of these machines, and you’ve probably written countless programs on your PCs and workstations.
AMBIENT/
Chủ đề:
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Hardware and Computer Organization- P17
- Chapter 9: Solutions for Odd-Numbered Problems 1. This is an example of the addressing mode known as “address register indirect with index and displacement”. The effective address is the sum of the address value in A0, the index value, D0, and the 2’s complement displacement. Since $84 is a negative number, –7C. Thus, the effective address, EA = $2000 + $0400 + ( –$7C ). EA = $2384 The program is not relocatable for two reasons: 1. There is a jump to an absolute address, start 2. An absolute address is loaded into A0. The program could still be relocatable by managing what gets loaded into A0 and D0, but the jump instruction forces it to be absolute. 3. The value in D0 after the highlighted instruction is $0000002A. 5. 00000400 067955550000AAAA ADDI.W #$5555,$0000AAAA 00000408 06B9AAAA55550000FFFE ADDI.L #$AAAA5555,$0000FFFE 00000412 0640AAAA ADDI.W #$AAAA,D0 7. ******************************************************************** * * CSS 422 HW #4: Relocatable Memory test program * ******************************************************************** * System equates pattern1 EQU $AAAA * First test pattern pattern2 EQU $FFFF * Second test pattern pattern3 EQU $0001 * Third test pattern st_addr EQU $00000400 * Starting address of test end_addr EQU $0009FFF0 * Ending address of the test stack EQU $000C0000 * Location of the stack pointer word EQU 2 * Length of a word, in bytes byte EQU 1 * One byte long, NO MAGIC NUMBERS! bit EQU 1 * Shifting by bits exit_pgm EQU $2700 * Simulator exit code data EQU $500 * Data storage region start EQU $400 * Program starts here 463
- Appendix A new_ad EQU $000A0000 * Relocated program runs here pr_cmd EQU 00 * Command to print message * Main Program OPT CRE * Turn on cross references ORG start * Program begins here LEA stack,SP * Initialize the stack pointer LEA relo,A0 * Starting address pointer LEA last_addr,A1 * End pointer LEA new_ad,A3 * Destination relo_lp MOVE.W (A0)+,(A3)+ * Move a word CMPA.L A0,A1 * Have we moved enough? BPL relo_lp JMP new_ad relo LEA test_patt(PC),A3 * A3 points to the test pattern to use LEA bad_cnt(PC),A4 * A4 points to bad memory counter LEA bad_addr(PC),A5 * A5 points to the bad addr location LEA data_read(PC),A6 * A6 points to data storage CLR.B (A4) * Clear bad address count MOVE.W (A3)+,D0 * Get current pattern, point to next one BSR do_test * Run first test NOT.W D0 * Complement bits for next test BSR do_test * Run second test MOVE.W (A3)+,D0 * Get next pattern BSR do_test * Run third test NOT.W D0 * Complement bits for fourth test MOVE.W (A3),D0 * Get last pattern shift1 BSR do_test * Run shift test ROL.W #bit,D0 * Shift bits BCC shift1 * Done yet? No go back MOVE.W -(A3),D0 * Get test pattern 3 again NOT.W D0 * Complement test pattern 3 shift2 BSR do_test * Run the test ROL.W #bit,D0 * Shift the bits BCS shift2 * Done yet? If not go back message MOVE.B #pr_cmd,D0 * Load command to print banner LEA string(PC),A1 * Point to message MOVE.W str_len(PC),D1 TRAP #15 *Do it! done STOP #exit_pgm * Quit back to simulator 464
- Solutions for Odd-Numbered Problems ************************************************************************ * Subroutine: do_test * * Performs the actual memory test. Fills * the memory with the test pattern of interest. * Registers used: D1,A0,A1,A2 * Return values: None * Registers saved: None * Input parameters: * D0.W = test pattern * A4.L = Points to memory location to save the count of bad addresses * A5.L = Points to memory location to save the last bad address found * A6.L = Points to memory location to save the data_read back and data * written * * Assumptions: Saves all registers used internally ************************************************************************ do_test MOVEM.L A0-A2/D1,-(SP) * Save registers LEA st_addr,A0 * A0 points to start address LEA end_addr,A1 * A1 points to last address MOVE.L A0,A2 * Fill A2 will point to memory fill_loop MOVE.W D0,(A2)+ * Fill and increment pointer CMPA.L A1,A2 * Are we done? BLE fill_loop MOVE.L A0,A2 * Reset pointer test_loop MOVE.W (A2),D1 * Read value back from memory CMP.W D0,D1 * Are they the same? BEQ addr_ok * OK, check next location not_ok MOVE.L A0,(A5) * Save the address of the bad loca- tion ADDQ.W #byte,(A4) * Increment the counter MOVE.W D1,(A6)+ * Save the data read back MOVE.W D0,(A6) * Save the data written SUBQ.L #word,A6 * Restore A6 as a pointer addr_ok ADDQ.L #word,A2 * A2 points to next memory location CMPA.L A1,A2 * Have we hit the last address yet? BLE test_loop * No, keep testing MOVEM.L (SP)+,D1/A0-A2 * Restore registers RTS * Go back * Data Space 465
- Appendix A test_patt DC.W pattern1,pattern2,pattern3 * Memory test patterns bad_cnt DS.W 1 * Keep track of # of bad addresses bad_addr DS.L 1 * Store last bad address found here data_read DS.W 1 * What did I read back? data_wrt DS.W 1 * What did I write? string DC.B ‘End of test’ * Exit message str_len DC.W str_len-string last_addr DS.W 1 END start 466
- Chapter 10: Solutions for Odd-Numbered Problems 1. = 15C7h. The word is aligned. 3. 0F57Ch 5. MOV CX,4 MOV BX,10 loop1: inc BX dec CX jnz loop1 7. = 0AF3DH 9. MOV AX,8200H ;Get segment value MOV DX,AX ;Load segment register MOV SI,0000 ;Load source index register MOV DI,0200H ;Load destination index register MOV CX,1000 ;Load counter loader: MOV AL,[SI] ;Get byte MOV [DI],AL ;Store byte INC SI ;advance pointers INC DI DEC CX JNZ loader 467
- Chapter 11: Solutions for Odd-Numbered Problems 1. The 68K has two operational modes, user and supervisor. The ARM architecture allows for 7 operational modes. User mode is the lowest privilege level.. The other modes are: System, Supervisor, Abort, Fast Interrupt Request, Interrupt Request and Undefined. 3. The biggest difference is that, with the exception of registers, r13-r15, all registers are com- pletely general-purpose. Any register may be used as part of an arithmetic operation or as an address pointer. This is in sharp contrast to the distinction that the 68K architecture makes between the address registers, A0-A6 and the data registers, D0-D7. 5. MOV r4,#&100 ORR r4,r4,#3 7. = &0013E94C 9. If the Z flag = 0, then the value in register r1, &DEF02340, is incremented by 4 to &DEF02344 and that value is used as an address pointer to retrieve the 16-bit data object stored in that memory location. The 16-bit value is then loaded into general-purpose register r4. If the Z flag = 1, then the instruction is not executed. 469
- Chapter 12: Solutions for Odd-Numbered Problems 1. *********************************************************************** * Subroutine: xmitStr * Purpose: Transmits a string of characters to the UART * serial port. * Input register list: * A6- Pointer to the data string to be sent. * Return register list: * A6- Pointer to the character after the string terminating * character. * Register usage: All registers used by xmitStr will be saved and * restored upon exit * * Assumptions: * - There is at least one character to transmit. * - String is terminated by $FF. * ******************************************************************** * Data definitions eom EQU $FF *End of message character status EQU $2001 *Status register xmit EQU $2000 *Transmit data register tbmt_mask EQU $01 *Isolate transmit buffer * Subroutine starts here xmitStr MOVEM.L D0/D1/A0/A1,-(SP) *Save the registers LEA.L xmit,A0 *A0 points to transmitter LEA.L status,A1 *A1 points to the status reg. xmit_loop MOVE.B (A1),D1 *Get status ANDI.B #tbmt_mask,D1 *Isolate bit 471
- Appendix A BEQ xmit_loop *Still busy, keep waiting MOVE.B (A6)+,D0 *Get byte CMPI.B #eom,D0 *Last byte? BEQ quit *Yes, we’re done MOVE.B D0,(A0) *Ship it BRA xmit_loop *Go back quit MOVEM.L (SP)+,D0/D1/A0/A1 *Restore the registers RTS 3. The successive approximation always takes the same number of clock cycles to digitize the unknown signal. Since it is 16 bits of resolution, it takes 16 clock cycles. Since this is a 1 MHz clock rate, it takes 16 microseconds to do the digitization. The single ramp A/D must count up to the unknown voltage. Therefore, we need to determine how many counts it takes to get to 1.5001 volts. However, we can easily see that the range of a 16-bit converter is 0 through 65,535 ($0000 to $FFFF in hexadecimal). Thus, the minimum voltage increment of the A/D converter is 0.0001 volts. Thus, it takes 15,001 clock cycles or 15,001 microseconds to digitize the unknown voltage. 5a. An 11-bit, 2’s complement number can represent a number in the range of –1028 to +1027, so each change of 1 digital value corresponds to 0.010 volts. Anything smaller might not be detectable. 5b. Since we know that each digital code increment represents 0.01 volts, we know that +5.11 volts would be represented as 511 ( 5.11 volts / 0.01 volts/count = 511 counts ). In binary, +511 would be 00011111111, so the 2’s complement negative value (–5.11) would be 11100000001. 5c. 8.96 volts would correspond to a digital value of 896, or 01110000000. In order to properly represent this as a 16-bit number we need to add the appropriate number of leading zeros. Thus, the result is 0000 0011 1000 0000 or 0x0380. 5d. In order to digitize an 11-bit value using successive approximation, which is the hardware analogy of a binary search algorithm, we would need LOG2 211 or 11 samples. 5e. Since we take a sample on every rising edge of the clock and we need 11 samples, we need 11 rising edges. The clock frequency is 1 MHz, so the clock period is 1 microsecond. Thus, it takes 11 microseconds to digitize the analog signal. 7a. 25 microseconds = 40 KHz frequency. In order to collect 4 samples per cycle, the maximum frequency of the unknown waveform must be no greater than 10 KHz. 7b. 14 bit conversion = 1 part in 16,384. 10V/16,384 = .0006V 7c. In one millisecond it droops 1 volt. in 25 microseconds it droops. (25/1000) * 1 = .025 volts. Since this is significantly greater than the 0.0006 resolution of the converter, the S/H would introduce an unacceptably large error. Thus, it can’t be used. 472
- Solutions for Odd-Numbered Problems 9. Solution: A. f - Initialize hardware B. c - Confidence check C. e - Select channel D. g - S/H E. d - Digitize F. a - Wait G. b - Get data Alternative solution: A. c - Confidence check B. f - Initialize hardware C. e - Select channel D. g - S/H E. d - Digitize F. a - Wait G. b - Get data 473
- Chapter 13: Solutions for Odd-Numbered Problems 1. In this particular example, Segment A would be result in better pipeline efficiency. The reason is that each of the instructions is independent of the others; there is no dependencies between them. In Segment B, each instruction must complete before the next instruction has enough information to complete. Thus, the MOVE.W D1,D0 must put the result in D0 before the ADD.W instruction can begin to operate. Likewise, MULU, can’t begin until the result of the ADD operation in the previous instruction has completed. Thus, in a pipelined operation, the instructions must each complete before the next one can finish. 3. a. No, because it involves a memory to memory transfer. b. Yes, the addition occurs between two registers. c. Yes, the move is a store operation that transfers a register to memory. d. No, The AND operation takes place between an immediate value and memory. e. Yes, the operation is an immediate load operation that transfers data from memory to a register. 5. There are several RISC characteristics illustrated here. The most important RISC characteristic is that the ADD operation could only take place between data stored in the general purpose registers. Also, there was no effective addressing mode that allowed the memory address to be directly specified, the memory addresses had to be loaded as literals into the registers and then the registers were used as memory pointers. Thus, we see only two addressing modes used. 7a. Since the pipeline has seven stages, and each stage requires 2 clock cycles, then it takes 14 clock cycles for the first instruction to move down the pipeline. Since each clock cycle takes 10 nanoseconds, the total time for the first instruction is 140 nanoseconds. 7b. If we assume no stalls, after the first instruction is retired, the next 9 instructions would follow at intervals of 2 clock cycles, so we would have 9 times 20 nanoseconds, or 180 nanoseconds for the basic block to completely execute. However, the pipeline will stall twice for 4 clock cycles, this adds another 80 nanoseconds ( 2 × 4 × 10), so the total elapsed time is: ET = 140 ns + 180 ns + 80 ns = 400 ns 475
- Chapter 14: Solutions for Odd-Numbered Problems 1. a. The memory hierarchy is often represented as a pyramid with the CPU at the top. It illus- trates the point that the fastest memory, but least amount of memory, is closest to the CPU and that as we get further from the CPU the amount of memory goes up, but the speed goes down. Thus, there is a reciprocal relation between the access speed of memory and the size of the memory. Also, the cost per bit goes down as you get further from the top. b. Spatial locality refers to the fact that instructions and data tend to be grouped together. Instructions are located in sequence and data tends to stored in clusters. For caches, this means that a cache can be much smaller than main memory but still be efficient in terms of the probability that if instructions or data are already in the cache, then it is likely that successive instructions or data will be there as well. Temporal locality refers to the fact that if an instruction or data was recently accessed, it is likely to be accessed soon, again. Thus, if something is in the cache and has been recently accessed, it is likely that it will be accessed again, thus improving the efficiency of the cache. c. With caches, we want to maximize the hit rate and minimize the miss penalty. One way to minimize the miss penalty is to refill a portion of the cache in a burst, rather than one word at a time whenever there is a cache miss. Modern SDRAM memory is designed to refill the on-chip cache in a burst of data reads, thus minimizing the penalty or reloading. d. A write through cache will always write the data into the cache and to main memory at the same time, thus avoiding the problem of data differences between the cache and main memory, but sacrificing some performance. The write back cache will hold the data written only to the cache and then write it to main memory when the bus is available. Per- formance is improved but runs the risk of memory being corrupted. 3. Spatial locality can be demonstrated in three ways. a. The compiled instructions occupy a very small region of memory, only 32 bytes in length. Thus, we may assume that they are located close to each other. b. Since the variables in the array DataStream are being accessed by de-referencing the pointer variable, DataStream + an offset value, count, the individual elements of the array must be located adjacent to each other in memory. c. The variables, count and maxcount are local variables to the function, main(). As such the compiler has created a stack frame on the system stack just large enough to hold two integer values, so they must also be located near each other. 477
- Appendix A Temporal locality can be demonstrated as follows: a. Since the main part of the program is a for loop, the instructions in the loop are executed 11 times in a row. b. The variables count and maxcount are repreatedly accessed because count is being incre- mented and compared with maxcount each time through the loop. c. The pointer variable DataStream is repeatedly being de-referenced to place the values of count squared into successive memory locations. 5. a. Main memory has an address range of 00000...FFFFF, or 220 discreet addresses. This is approximately 1 Mbytes of addresses. If each refill line has 64 bytes, or 26, then the number of refill lines = 220 / 26 = 214 16,384 refill lines in main memory b. The cache memory is 4096 bytes in size. Using the same method as in A, above, the num- ber of refill lines in the cache = 212 / 26 = 26 64 refill lines in the cache memory c. Since this is a direct mapped cache there must be the same number of rows of refill lines in the cache memory as there are in the main memory. Therefore, the number of rows of refill lines multiplied by the number of columns of refill lines = 16,384 Number of columns of refill lines = 16,384/64= 214 / 26 = 28 = 256 256 columns of refill lines in the main memory d. Since there are 256 columns, the TAG memory must contain 8 bits in order to be able to address any one of the 256 columns. Thus, tag memory requires 8 bits. e. See the below diagram: Cache RAM Addr = 00000 Addr = 0003F Refill line TAG Column 0 Column 1 Column 2 Column FE Column FF Addr Line 0 Line 62 Line 63 •••• Addr = 00FC0 Addr = 00FFF Addr = FFFFF Memory byte address = Column Address + Row Address*64 + Byte Offset 7. Effective execution time = hit rate * hit time + miss rate * miss penalty Effective execution time = .98*10 + .02 * 100 * 10 = 9.8 + 20 = 29.9 nsec. 9. When the processor is initialized at start-up, all TLB entries are invalid. The validity bit is needed to know when a valid entry has been placed in the TLB or if it is just garbage. 478
- Chapter 15: Solutions for Odd-Numbered Problems 1. Video gamers are notorious for overclocking their CPUs to gain the last ounce of performance from the machine. However, overclocking generates more heat, which slows down the internal processes, and also causes the processor to run closer to its design limits. Liquid cooling is more efficient at removing heat, so the CPU can run cooler with a higher heat load on the CPU. 3. For computer #1: 1. Each instruction executes in 1 clock cycle, or 1/100MHz = 10 nanoseconds. 2. It must execute a total of 1000 + 200 * 100 = 21000 instructions 3. Total execution time = 21000 × 10 nanoseconds = 2.1 × 104 times 10 × 10−9 = 21 × 10−5 = 0.210 × 10−6 or 210 microseconds. For computer #2: 1. It must execute the same 21000 instructions, but some take twice as long as others. There- fore 40% of the 21000 instructions take 1 clock cycle and 60% of the 21000 instructions take 2 clock cycles. 2. At 250 MHz, 1 clock cycle takes 4 nanoseconds and 2 clock cycles take 8 nanoseconds. 3. Therefore, the total execution time is 0.4 × 21000 × 4 ×10−9 + 0.6 × 21000 × 8 × 10−9 = (8.4 × 103) × (4 × 10−9) + (12.2 × 103) × (8 × 10−9) = (33.6 × 10−6) + (97.6 × 10−6) = 131.2 × 10−6 = 131.2 microseconds. 5. Cycles per instruction x seconds per clock cycle = seconds per instruction This is the measure we want: Computer #1 requires 2 cycles per instruction and each clock cycle takes 1 ns (1/1GHz). Therefore, computer #1 requires 2 nanoseconds to execute 1 instruction. Computer #2 requires 1.2 cycles per instruction and each clock cycle takes 2 ns (1/500MHz). Therefore, computer #2 requires 2.4 nanoseconds to execute 1 instruction. Thus, performance = 2.4/2.0 = 1.2. Or computer #1 has 20% better performance. 7. Analyzing this problem requires that we consider the number of accesses required for both the instructions and the actual add operation. Let’s use 68000 assembly language for this example. Here’s a representative code snippet: MOVE.L var1,D0 *6 bytes long ADD.L var2,D0 *6 bytes long MOVE.L D0,var3 *6 bytes long 479
- Appendix A Thus, the add operation requires 18 bytes to be read from, or written to, memory. The 8-bit wide bus would require 18 memory accesses and the 16-bit wide bus would require 9 accesses, so in this case, the additional 9 accesses would have to be accounted for. 480
- Chapter 16: Solutions for Odd-Numbered Problems 1. The fuse map is shown below: A B C D = Intact fuse Input/Invert = Blown fuse A A B B C C D D OR X 3. A circuit such as this, with a large number of stages, can generate a long sequence of pseudo- random numbers. If there are defective elements, the sequence of numbers will quickly diverge from the expected sequence if the circuitry was perfect. In a sense, this is the hardware analog of a good hashing function. Thus, any imperfection quickly generates a result that is very different from the standard. 481
ADSENSE
CÓ THỂ BẠN MUỐN DOWNLOAD
Thêm tài liệu vào bộ sưu tập có sẵn:
Báo xấu
LAVA
AANETWORK
TRỢ GIÚP
HỖ TRỢ KHÁCH HÀNG
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn