Model-Based Design for Embedded Systems- P12

Chia sẻ: Cong Thanh | Ngày: | Loại File: PDF | Số trang:30

Thêm vào BST

Báo xấu

73
lượt xem 5
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Model-Based Design for Embedded Systems- P12:The unparalleled flexibility of computation has been a key driver and feature bonanza in the development of a wide range of products across a broad and diverse spectrum of applications such as in the automotive aerospace, health care, consumer electronics, etc.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Model-Based Design for Embedded Systems- P12

306 Model-Based Design for Embedded Systems are different ways in which the cost may be calculated. Steps 6–7 in Figure 10.12 illustrate two different types of processing elements that may be used, and the interface to inform them which processing rou- tine they should compute a cost for. The type of the processing element may be changed easily to provide the necessary balance between the speed of simulation and the required pre-simulation effort. 10.6.1.3 Mapped System Table 10.4 describes the 48 mappings investigated. These vary from 11 PEs to 1 PE. Partitions are broken down by the Rx, the Tx, the RLC, and the MAC functionalities. Each is categorized into one of nine separate classes based on the number of processing elements and the mix of pre-profiled and runtime processing elements. Mappings are further categorized as purely runtime processing (RTP) elements, purely profiled processing (PP) elements, or a mix (MIX). 10.6.1.4 Results Results relating to the design effort, the processing time, the framework sim- ulation time, and the event processing are analyzed. Five different models were used: a timed SystemC UMTS model [55], a timed METRO II UMTS model, an untimed METRO II UMTS model, a SystemC runtime processing model, and a METRO II architectural model. In specific configurations, METRO II constraints were used as opposed to explicit synchronization. The selection of constraints, functional model configuration, architectural model parame- ters, and mapping assignment is all achieved through small changes to the top-level netlist. All results are gathered on a 1.8 GHz Pentium M laptop running Windows XP with 1GB of RAM. Figure 10.13 shows the UMTS estimated execution times (cycles) along with the average processing-element utilization. Utilization is calculated as the percentage of simulation rounds that an architectural processing element has enabled outstanding functional model event requests for its services. Low utilization indicates that a processing element is idle despite available, outstanding requests. The x-axis (mapping #) is ordered by increasing execu- tion times. The data is collected for each of the three scheduling algorithms. For round-robin scheduling, the lowest and highest execution times are obtained with mapping #1 (11 Sparcs) and mapping #46 (1 μBlaze), respec- tively. Mapping #1 is 2167% faster than mapping #46. This shows a large range in potential performances across mappings. It is interesting to note that there are 23 different mappings that offer better performance than the 11 μBlaze or 11 ARM7 cores (mappings #2 and #3). This illustrates that inter- processor communication is a bottleneck for many designs, and despite hav- ing more concurrency those designs cannot keep pace with smaller, more heavily-loaded mappings. Among all four processor systems, mapping #14 has the lowest execution time (two ARM9s used for the receiver and two
TABLE 10.4 Mapping Scenarios for the UMTS Case Study # Type Partition # Type Partition # Type Partition 1 1: RTP 11 Sp 17 6: PP 2 μB (2), 2 A9 (3) 33 7: MIX A7 (4), Sp (5), μB (6), A9 (7) 2 2: PP 11 μB 18 6: PP 2 A9 (2), 2 μB (3) 34 7: MIX A7 (4), Sp (5), A9 (6), μB (7) 3 2: PP 11 A7 19 6: PP 2 A7 (2), 2 A9 (3) 35 7: MIX A7 (4), μB (5), Sp (6), A9 (7) 4 2: PP 11 A9 20 6: PP 2 A9 (2), 2 A7 (3) 36 7: MIX A7 (4), μB (5), A9 (6), Sp (7) 5 3: RTP 4 Sp (1) 21 7: MIX Sp (4), μB (5), A7 (6), A9 (7) 37 7: MIX A7 (4), A9 (5), μB (6), Sp (7) 6 4: PP 4 μB (1) 22 7: MIX Sp (4), μB (5), A9 (6), A7 (7) 38 7: MIX A7 (4), A9 (5), Sp (6), μB (7) 7 4: PP 4 A7 (1) 23 7: MIX Sp (4), A7 (5), μB (6), A9 (7) 39 7: MIX A9 (4), Sp (5), μB (6), A7 (7) 8 4: PP 4 A9 (1) 24 7: MIX Sp (4), A7 (5), A9 (6), μB(7) 40 7: MIX A9 (4), Sp (5), A7 (6), μB (7) 9 5: MIX 2 Sp (2), 2 μB (3) 25 7: MIX Sp (4), A9 (5), A7 (6), μB (7) 41 7: MIX A9 (4), μB (5), Sp (6), A7 (7) 10 5: MIX 2 μB (2), 2 Sp (3) 26 7: MIX Sp (4), A9 (5), μB (6), A7 (7) 42 7: MIX A9 (4), μB (5), A7 (6), Sp (7) 11 5: MIX 2 Sp (2), 2 A7 (3) 27 7: MIX μB (4), Sp (5), A7 (6), A9 (7) 43 7: MIX A9 (4), A7 (5), μB (6), Sp (7) 12 5: MIX 2 A7 (2), 2 Sp (3) 28 7: MIX μB (4), Sp (5), A9 (6), A7 (7) 44 7: MIX A9 (4), A7 (5), Sp (6), μB (7) 13 5: MIX 2 Sp (2), 2 A9 (3) 29 7: MIX μB (4), A7 (5), Sp (6), A9 (7) 45 8: RTP 1 Sp 14 5: MIX 2 A9 (2), 2 Sp (3) 30 7: MIX μB (4), A7 (5), A9 (6), Sp (7) 46 9: PP 1 μB 15 6: PP 2 μB (2), 2 A7 (3) 31 7: MIX μB (4), A9 (5), A7 (6), Sp (7) 47 9: PP 1 A7 16 6: PP 2 A7 (2), 2 μB (3) 32 7: MIX μB (4), A9 (5), Sp (6), A7 (7) 48 9: PP 1 A9 (1 = Rx MAC, Tx MAC, Rx RLC, Tx RLC), (2 = Rx MAC, Rx RLC), (3 = Tx MAC, Tx RLC) (4 = Rx MAC), (5)(Rx RLC), (6)(Tx MAC), (7 = Tx RLC) (Sp = Sparc, μB = Microblaze, A7 = ARM7, A9 = ARM9) Platform-Based Design and Frameworks: METROPOLIS and METRO II 307
308 UMTS estimated execution time and utilization for various OS scheduling policies 6.0E + 07 110% 100% 5.0E + 07 90% 80% 4.0E + 07 70% 60% 3.0E + 07 50% Execution cycles 40% 2.0E + 07 Percentage utilization per PE 30% 1.0E + 07 20% 10% 0.0E + 00 0% 1 4 14 31 37 26 27 33 13 5 8 45 48 12 24 29 30 43 19 35 10 21 36 42 17 2 22 28 39 41 32 25 34 44 20 23 38 11 15 18 40 3 7 9 16 6 47 46 Model-Based Design for Embedded Systems Mapping RR Ex PR Ex FCFS Ex RR Util PR Util FCFS Util FIGURE 10.13 The UMTS estimated execution time vs. utilization for various OS scheduling policies.
Platform-Based Design and Frameworks: METROPOLIS and METRO II 309 Sparcs used for the transmitter). Mapping #31 has a similar execution time with four different processors (Rx MAC on μBlaze, Rx RLC on ARM9, Tx MAC on ARM7, and Tx RLC on Sparc). Many of the execution times are similar and the graph shows that there are essentially four performance groupings. The lowest utilization values for round robin occur in the 11 processor setups (an average of 15%). The highest is 100% for all single processor setups. The max utilization before 100% is 39%. This gap points to ineffi- ciency in the round-robin scheduler. It may be a goal of the other scheduling algorithms to close this gap. Also notice that for similar execution times, uti- lization can vary as much as 28% (mappings #41 and #32, for example). The priority-based scheduling keeps the same relative ordering amongst the execution times but reduces them on average by 13%. The highest is an 18% reduction (mapping #22, for example) and the smallest reduction is 9% (mapping #8, for example). The utilization numbers are actually reduced as well by an average of 2%. The largest reduction was 7% (in mapping #6, for example) and the smallest was 1% (in mapping #31, for example). As expected there was no change in the utilization or execution times for mappings involving either eleven processing elements (fully concurrent) or those with one element (no scheduling options). The utilization drop results from high-priority, data-dependent jobs running before low-priority, data- independent jobs. The FCFS scheduling also does not change the relative ordering of execu- tion times but is not as successful at reducing them. The average reduction is only 7%. The maximum reduction is 11% (in mapping #24, for example) and the minimum reduction is 4% (in mapping #5, for example). However, utilization is increased by 27%. The max increase was 45% (in mapping #31, for example) and the minimum improvement was 20% (in mapping #5, for example). The FCFS increases utilization due to the fact that many jobs that would be low priority often request processing in the same round as high-priority jobs. While technically they are both “first,” the priority would negate this fact. The FCFS’s round-robin tie-breaking scheme helps smaller jobs in this case. The analysis of execution and utilization for the UMTS shows that high utilization is difficult to obtain due to the data dependencies in the applica- tion. Also, some of the partitions explored do not balance computation well amongst the different processing elements in the architecture. Many of the coarser mappings only make this problem worse. A solution is to further refine the functional model to extract more concurrency. From an execution- time standpoint, scheduling can improve the overall execution time but not as much as is needed to make a large majority of these mappings desirable for an actual implementation. An accuracy comparison was performed with mappings #2, #6, and #46 (pure μBlaze mappings). These designs were created on the Xilinx ML310 development board. For mappings #2 and #46, there was only a 3.1% and
310 Model-Based Design for Embedded Systems a 2% increase, respectively, in execution times in the actual designs. For mapping #6 (when scheduling affects the outcome), the increase was 16.2% (RR), 18% (PR), and 15% (FCFS). Mapping #46 inaccuracy is due to the start- up code and IO operations not captured by the model. Mapping #2 suffers from a slightly oversimplified point-to-point communication scheme in the model as compared to the FSL links used by the MicroBlazes. Finally, map- ping #6 requires a more refined OS model to more closely match the schedul- ing overhead of the actual OS used. This comparison shows that METRO II simulation can closely (within 5%) reflect actual implementations, and in the cases where the differences are greater, a trade-off between the mod- eling detail, the simulation performance, and the accuracy can be quickly analyzed. The untimed METRO II UMTS functional model contains 12 processes while the architectural model may contain up to 26 processes. This is a large design, spread across 85 files and 8,300 lines of code. The changing of a map- ping is trivial however, which requires only changing a few macros and recompiling two files (2.3% of total;
Platform-Based Design and Frameworks: METROPOLIS and METRO II 311 Runtime spent in different phases 100% 90% 80% 70% Percentage runtime 60% System C Phase 3 50% Phase 2 40% Phase 1 30% 20% 10% 0% 1 2 3 4 5 6 7 8 9 RTP PP avg Mix Avg. Class avg avg FIGURE 10.14 METRO II phase runtime analysis. simulation environments.) For mixed classes, the numbers are 82%, 2.6% and 7.6%. Again the runtime processing elements dominate. It should be noted that while Ps have higher averages, the average runtime to process 7000 bytes of data was 54 seconds. The Phase 1 runtime and the SystemC overhead are the main contributors to overall runtime. If we consider the SystemC timed functional model, the METRO II timed functional model, and the METRO II untimed functional model mapped to an architecture, the METRO II timed functional model had an average increase of 7.4% in runtime for the nine classes while the mapped version had a 54.8% reduction. This reduction is due to the fact that METRO II Phases 2 and 3 have significantly less overheads than the timer- and scheduler-based sys- tem required by the SystemC timed functional model. Table 10.5 shows the average number of event state changes per phase and the average number of phases an event waits. On an average, only 0.14 events are annotated or scheduled per round. Because of the architectural model integration with the UMTS functional model, there are a limited number of synchronization points (which satisfy a rendezvous constraint, and, hence, an event state change). As shown in Fig- ure 10.14, Phases 2 and 3 do not account for a large portion of the runtime, so, while the event state change activity is low, it does not translate to increased runtime. Runtime is not increased directly by changing an event’s state, but rather by the total number of events in Phases 2 and 3.
312 Model-Based Design for Embedded Systems TABLE 10.5 METRO II Phase Event Analysis Class Event/Ph. Comp. % Comm. % Coord. % Avg Wait 1 0.091 0.083 0.083 0.833 3839.240 2 0.091 0.083 0.083 0.833 3839.240 3 0.169 0.125 0.042 0.833 6276.190 4 0.169 0.125 0.042 0.833 6276.190 5 0.131 0.170 0.114 0.716 5117.003 6 0.169 0.170 0.114 0.716 6276.190 7 0.150 0.101 0.088 0.811 5691.130 8 0.176 0.319 0.043 0.638 6718.550 9 0.176 0.319 0.043 0.638 6718.550 Avg 0.147 0.166 0.072 0.761 5639.143 Events in Classes 1 and 2 on average wait 42% less than the worse case. These classes are precisely those that provide maximum concurrency (11 processing elements). The worst is in Classes 8 and 9 (single processing ele- ments). As one would expect, when the scheduling overhead is lower and more processing elements are available, events wait much less for resource availability. Finally, it should be noted that runtime processing vs. pre-profiled pro- cessing does not impact this aspect of simulation. Comparing Classes 1 with 2 or 3 with 4 confirms this. This contrasts heavily with the runtime of the simulation (in which the PE type is a key factor). The runtime processing in the microarchitectural model is treated as a black box by METRO II such that the internal events are unseen and do not trigger phase changes. This indi- cates that SystemC components can be imported quite easily into METRO II without affecting the three-phase execution semantics. The 3rd, 4th, and 5th columns of Table 10.5 categorize the events in Phase 1. Computational events request processing-element services directly. Communication events transfer data between FIFOs, and coordination events maintain correct simulation semantics and operation. The table indi- cates that events in the system are heavily related to coordination. Classes 8 and 9 have the lowest percentage of coordination events (64%), since these are single-PE systems. 10.6.1.5 Conclusions We illustrated how an event-based design framework, METRO II, may be used to carry out architectural modeling and design-space exploration. Experi- mental results show that METRO II is capable of capturing functional mod- eling, architectural modeling, and mapping for a UMTS case study with limited overhead as compared with a baseline SystemC model. We showed that the design effort involved in carrying out 48 separate mappings with a variety of architectural models is minimal. Within the framework, we detail
Platform-Based Design and Frameworks: METROPOLIS and METRO II 313 the runtime spent in the three different METRO II execution phases and pro- vide an idea of how events move throughout the system. Future work involves identifying and removing events not relevant for annotation or scheduling from METRO II’s second and third phases, support for a wider variety of declarative constraints, and the analysis of other appli- cations that may be mapped onto similar architectural platforms. 10.6.2 Intelligent Buildings: Indoor Air Quality The construction of future energy-efficient commercial buildings will make use of sophisticated control architectures that are able to sense several phys- ical quantities, compute control laws, and apply control actions through actuators. Sensors, actuators, and computation units are physically dis- tributed over the buildings. The control algorithm can be run on either distributed controllers or a central controller. The control performance is crit- ically affected by both computation and communication delays that need to be within precise bounds in order to guarantee energy savings while main- taining the comfort level. Thus, a major challenge in designing such systems is to balance the computation and communication efforts. In particular, a designer needs to decide how to map the control algorithm on a set of con- trollers and needs to find an optimal communication network, meaning the communication medium and the network topology. The goal of this case study is to model and simulate the control of the temperature in the rooms of a building at a high level of abstraction. The simulation results will be used to partition the sensor–actuator delay into computation and communication latency requirements. The communication latency requirements are then passed to an optimization tool that finds the best communication network that supports the gathering of data from the sensors and the delivery of commands to actuators. Our design flow is shown in Figure 10.15. In Step 1, both the function- ality of the system and the architecture platform are modeled. The map- ping between function and architecture models is carried out where the controllers and the point-to-point communication between sensors, actu- ators, and controllers are annotated with actual computation delays and virtual communication delays. The performance of the control algorithm is evaluated for different values of the communication delays until the least constraining latency requirements are found. The communication require- ments are then passed to an external network synthesis tool—the commu- nication synthesis infrastructure (COSI) [51]. In Step 2, the COSI synthe- sizes the communication network of the system based on the simulation results. Then, in Step 3, the abstract point-to-point communication channels are mapped to the communication network obtained by COSI. Both the functionality and the architecture platforms of the control sys- tem are modeled in METRO II, while the environment dynamics is modeled in OpenModelica [27], an external simulation tool. OpenModelica interacts
314 Model-Based Design for Embedded Systems Step 1: modeling and simulation Mapping Function Architecture model model Step 3: refinement COSI Simulation synthesis results results COSI Step 2: synthesis FIGURE 10.15 Design flow of the room temperature control system. with the function model of the system. The METRO II function model of a two-room example and its interaction with OpenModelica is shown in Figure 10.16. The environment dynamics is described in the Modelica programming language. The Modelica language is designed to allow Modelica model OpenModelica CORBA communication METRO II A2 S2 Interface to OpenModelica A1 S1 FIFO_a1c Controller1 FIFO_s1c FIFO_a2c Controller2 FIFO_s2c FIGURE 10.16 METRO II function model and OpenModelica.
Platform-Based Design and Frameworks: METROPOLIS and METRO II 315 convenient, component-oriented modeling of complex physical systems, e.g., systems containing mechanical, electrical, electronic, hydraulic, thermal, con- trol, electric power, or process-oriented subcomponents [46]. The Modelica model in the indoor air quality case study deals with pressure and tempera- ture dynamics in an indoor environment. It takes into account the structure of the building, its floorplan, the sizes of the different rooms, and the place- ment of doors and windows. Moreover, it includes outlet vents that can inject a cold/hot air flow to perform cooling/heating of the environment; they are the actuators of the control system, but expressed in Modelica in terms of their effect on the temperature and pressure dynamics of the system. The METRO II model and the Modelica model are run together (co- simulation [57]). Sensors and actuators in the functional model interact with the plant to retrieve temperature values in the different rooms and to set the status (closed/open; hot/cold air flow) of the vents. These operations obvi- ously require synchronization and information exchange between the tools. They are managed by the environment functional module, which controls the execution of the Modelica model (start and stop the simulation) and it is able to set and get the value of its parameters. From an implementation point of view, this interaction is performed by the remote calling of a set of services provided by OpenModelica over a CORBA connection [18] estab- lished between the tools. The architecture model includes generic electronic control units (ECUs) communicating with sensors and actuators. During mapping, the controllers in the function model are allocated onto ECUs. If multiple controllers are mapped onto one ECU, a METRO II scheduler is constructed to coordinate their executions. Various scheduling policies can be applied by designing different types of schedulers, while keeping the controller tasks intact. In our example, we use round-robin scheduling. Sensors and actuators in the function model are mapped to architectural sensors and actuators. The com- munication between ECUs and sensoring/actuating units is modeled at an abstract level in Step 1 of the design flow. The services of sensing, computing control algorithms, and actuating are annotated with time by METRO II anno- tators. The end-to-end delays from sensing to actuating are computed dur- ing simulation. The simulation results are sent to COSI, which synthesizes the communication network in Step 2 of the design flow. Then the synthesis results are utilized to refine the abstract communication network in Step 3 of the flow. 10.7 Conclusions We discussed the trends and challenges of system design from a broad per- spective that covers both semiconductor and industrial segments that use
316 Model-Based Design for Embedded Systems embedded systems. We argued in favor of the need of a unified way of thinking about system design as the basis for a novel system science. One approach was presented, the PBD, that aims at achieving that unifying role. We discussed some of the most promising approaches for chip and embed- ded system design in the PBD perspective. METROPOLIS and its successor METRO II frameworks were presented. Some examples of METRO II applica- tions to different industrial domains were then described. While we believe we are making significant inroads, much work remains to be done to transfer the ideas and approaches that are flourishing today in research and in advanced companies to the generality of IC and embedded system designers. To be able to do so, • We need to further advance the understanding of the relationships among parts of a heterogeneous design and its interaction with the physical environment. • The efficiency of algorithms and tools must be improved to offer a solid foundation to the users. • Models and use cases have to be developed. • The scope of system-level design must be extended to include fault tolerance, security, and resiliency. • The EDA industry has to embrace the new paradigms and venture into unchartered waters to grow beyond where it is today. It must create the necessary tools to help engineers to apply the new paradigms. • Academia must develop new curricula (e.g., [13]) that favor a broader approach to engineering while emphasizing the importance of founda- tional disciplines such as mathematics and physics; embedded system designers require a broad view and the capability of mastering hetero- geneous technologies. • The system and semiconductor industry must recognize the impor- tance of investing in training and tools for their engineers to be able to bring new products and services to market. Acknowledgments We wish to acknowledge the support of the Gigascale System Research Cen- ter, the support of NSF-sponsored Center for Hybrid and Embedded Soft- ware Systems, the support of the EU networks of excellence ARTIST and HYCON, and of the European community project SPEEDS. The past and the present support of General Motors, Infineon, Intel, Pirelli, ST, Telecom Italia (in particular, Marco Sgroi, Fabio Bellifemine, and Fulvio Faraci), UMC, and United Technologies Corporation (in particular, the strong interaction with Clas Jacobson, John F. Cassidy Jr., and Michael McQuade) is also gratefully acknowledged.
Platform-Based Design and Frameworks: METROPOLIS and METRO II 317 References 1. A. Agrawal. Graph rewriting and transformation (GReAT): A solution for the model integrated computing (MIC) bottleneck. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering (ASE03), Montreal, Canada, 2003. 2. P. Alexander. System Level Design with Rosetta. Elsevier, San Francisco, CA, 2006. 3. K. Arnold and J. Gosling. The Java Programming Language. Addison Wesley, Reading, MA, 1996. 4. A. Bakshi, V. K. Prasanna, A. Ledeczi, V. Mathur, S. Mohanty, C. S. Raghavendra, M. Singh, A. Agrawal, J. Davis, B. Eames, S. Neema, and G. Nordstrom. MILAN: A model based integrated simulation framework for design of embedded systems. In Proceedings of the Workshop on Lan- guages, Compilers and Tools for Embedded Systems (LCTES 2001), Snowbird, UT, June 2001. 5. F. Balarin, M. Chiodo, P. Giusto, H. Hsieh, A. Jurecska, L. Lavagno, C. Passerone, A. Sangiovanni-Vincentelli, E. Sentovich, K. Suzuki, and B. Tabbara. Hardware-Software Co-Design of Embedded Systems: The Polis Approach. Kluwer Academic Press, Boston, MA, June 1997. 6. F. Balarin, L. Lavagno, C. Passerone, A. Sangiovanni-Vincentelli, G. Yang, and Y. Watanabe. Concurrent execution semantics and sequen- tial simulation algorithms for the metropolis meta-model. In Proceed- ings of the Tenth International Symposium on Hardware/Software Codesign. CODES 2002, Estes Park, CO, May 6–8, 2002, pp. 13–18. IEEE Computer Society Press, 2002. 7. F. Balarin, L. Lavagno, C. Passerone, A. Sangiovanni-Vincentelli, M. Sgroi, and Y. Watanabe. Modeling and designing heterogenous sys- tems. In J. Cortadella, A. Yakovlev, and G. Rozenberg, editors, Concur- rency and Hardware Design, pp. 228–273. Springer, Berlin, Heidelberg, 2002. LNCS2549. 8. F. Balarin, H. Hsieh, L. Lavagno, C. Passerone, A. Sangiovanni- Vincentelli, and Y. Watanabe. Metropolis: An integrated environment for electronic system design. IEEE Computer, 36(4): 45–52, April 2003. 9. A. Basu, M. Bozga, and J. Sifakis. Modeling heterogeneous real-time com- ponents in BIP. In Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM06), pp. 3–12, Washington, DC, 2006. IEEE Computer Society.
318 Model-Based Design for Embedded Systems 10. G. Berry and G. Gonthier. The ESTEREL synchronous programming lan- guage: Design, semantics, implementation. Science of Computer Program- ming, 19(2):87–152, November 1992. 11. S. Bliudze and J. Sifakis. The algebra of connectors—structuring inter- actions in BIP. In Proceedings of the 7th ACM & IEEE International con- ference on Embedded Software (EMSOFT07), Salzburg, Austria, September 30–October 3, 2007. 12. C. Brooks, E. A. Lee, X. Liu, S. Neuendorffer, Y. Zhao, and H. Zheng (eds.). Heterogeneous concurrent modeling and design in Java (Vol- ume 1: Introduction to Ptolemy II). Technical Report UCB/ERL M05/21, University of California, Berkeley, CA, July 2005. 13. A. Burns and A. Sangiovanni-Vincentelli. Editorial. ACM Transactions on Embedded Computing Systems, Special Issue on Education, 4(3):472–499, August 2005. 14. San Jose Mercury News (CA). Census counts on pencils, not computers. April 4, 2008. 15. X. Chen, F. Chen, H. Hsieh, F. Balarin, and Y. Watanabe. Formal verifica- tion of embedded system designs at multiple levels of abstraction. Inter- national Workshop on High Level Design Validation and Test—HLDVT02, Cannes, France, September 2002. 16. X. Chen, H. Hsieh, F. Balarin, and Y. Watanabe. Automatic genera- tion of simulation monitors from quantitative constraint formula. Design Automation and Test in Europe, Munich, Germany, March 2003. 17. CoFluent Design. CoFluent Studio. World Wide Web, http://www. cofluentdesign.com, 2007. 18. Common object request broker architecture. OMG Available Specifica- tion 3.1, OMG, January 2008. 19. J. Cortadella, A. Kondratyev, L. Lavagno, C. Passerone, and Y. Watan- abe. Quasi-static scheduling of independent tasks for reactive systems. In Proceedings of the 23rd International Conference on Application and Theory of Petri Nets, Adelaide, South Australia, June 2002. 20. P. Cumming. The TI OMAP platform approach to SOC. In G. Martin and H. Chang, editors, Winning the SoC Revolution, Kluwer Academic, Norwell, MA, 2003. 21. A. Davare, D. Densmore, T. Meyerowitz, A. Pinto, A. Sangiovanni- Vincentelli, G. Yang, and Q. Zhu. A next-generation design framework for platform-based design. In Design and Verification Conference (DV- CON’07), San Jose, CA, February 2007.
Platform-Based Design and Frameworks: METROPOLIS and METRO II 319 22. A. Davare, Q. Zhu, J. Moondanos, and A. Sangiovanni-Vincentelli. JPEG encoding on the Intel MXP5800: A platform-based design case Study. In 3rd Workshop on Embedded Systems for Real-time Multimedia, New York, September 2005. 23. J. A. de Oliveira and H. van Antwerpen. The Philips Nexperia digital video platform. In G. Martin and H. Chang, editors, Winning the SoC Rev- olution, Kluwer Academic, Norwell, MA, 2003. 24. D. Densmore, A. Donlin, and A. L. Sangiovanni-Vincentelli. FPGA architecture characterization for system level performance analysis. In DATE06, Munich, Germany, March 6–10, 2006. 25. D. Densmore, R. Passerone, and A. L. Sangiovanni-Vincentelli. A platform-based taxonomy for ESL design. IEEE Design & Test of Comput- ers, 23(5):359–374, May 2006. 26. J. Eker, J. W. Janneck, E. A. Lee, J. Liu, X. Liu, J. Ludvig, S. Neuendorffer, S. Sachs, and Y. Xiong. Taming heterogeneity—the Ptolemy approach. Proceedings of the IEEE, 91(1):127–144, January 2003. 27. P. Fritzson, P. Aronsson, A. Pop, H. Lundvall, K. Nystrom, L. Saldamli, D. Broman, and A. Sandholm. Openmodelica—a free open-source envi- ronment for system modeling, simulation, and teaching. 2006 IEEE Inter- national Symposium on Computer-Aided Control Systems Design, Munich, Germany, pp. 1588–1595, October 2006. 28. G. J. Holzmann. The model checker spin. IEEE Transactions on Software Engineering, 23(5):279–258, May 1997. 29. S. Ito. Convergence and divergence in parallel for the ubiquitous era. Solid-State Circuits Conference, 2007. ASSCC ’07. IEEE Asian, Jeju, Korea, pp. 143–143, November 2007. 30. A. Jantsch. Modeling Embedded Systems and SOC’s: Concurrency and Time in Models of Computation. Morgan Kaufmann Publishers, San Francisco, CA, 2003. 31. G. Kahn. The semantics of a simple language for parallel programming. In J. L. Rosenfeld, editor, Proceedings of the IFIP Congress 74, Information Processing 74, pp. 471–475, North Holland, Amsterdam, the Netherlands, 1974. 32. G. Karsai, J. Sztipanovits, A. Ledeczi, and T. Bapty. Model-integrated development of embedded software. Proceedings of the IEEE, 91(1):145– 184, January 2003. 33. K. Keutzer, S. Malik, A. R. Newton, J. M. Rabaey, and A. Sangiovanni- Vincentelli. System-level design: Orthogonalization of concerns and
320 Model-Based Design for Embedded Systems platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 19(12):1523–1543, December 2000. 34. C. Kong and P. Alexander. The Rosetta meta-model framework. In Pro- ceedings of the IEEE Engineering of Computer-Based Systems Symposium and Workshop, Huntsville, AL, April 7–11, 2003. 35. M. Krigsman. IT failure at Heathrow T5: What really happened. April 7, 2008. http://blogs.zdnet.com/projectfailures/?p=681. 36. A. Ledeczi, J. Davis, S. Neema, and A. Agrawal. Modeling methodol- ogy for integrated simulation of embedded systems. ACM Transactions on Modeling and Compututer Simulation, 13(1):82–103, 2003. 37. A. Lee and A. Sangiovanni-Vincentelli. A framework for comparing models of computation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 17(12):1217–1229, December 1998. 38. X. Liu, Y. Xiong, and E. A. Lee. The Ptolemy II framework for visual languages. In Proceedings of the IEEE 2001 Symposia on Human Centric Computing Languages and Environments (HCC’01), Stresa, Italy, p. 50. IEEE Computer Society, 2001. 39. D. Mathaikutty, H. Patel, and S. Shukla. EWD: A metamodeling driven customizable multi-MoC system modeling environment. FERMAT Tech- nical Report 2004-20, Virginia Tech, 2004. 40. D. A. Mathaikutty, H. Patel, and S. Shukla. A functional programming framework of heterogeneous model of computation for system design. In Forum on Specification and Design Languages (FDL’04), Lille, France, September 13–17, 2004. 41. D. A. Mathaikutty, H. D. Patel, S. K. Shukla, and A. Jantsch. UMoC++: A C++-based multi-MoC modeling environment. In A. Vachoux, editor, Application of Specification and Design Languages for SoCs - Selected paper from FDL 2005, Chapter 7, pp. 115–130. Springer, Berlin, 2006. 42. T. Meyerowitz, A. Sangiovanni-Vincentelli, M. Sauermann, and D. Lan- gen. Source level timing annotation and simulation for a heterogeneous multiprocessor. In DATE08, Munich, Germany, March 10–14, 2008. 43. J. Miller and J. Mukerji, editors. MDA guide version 1.0.1. Technical Report omg/2003-06-01, OMG, 2003. 44. Mirabilis Design. Visual Sim. World Wide Web, http://www. mirabilisdesign.com, 2007. 45. MLDesign Technologies. MLDesigner. World Wide Web, http://www. mldesigner.com, 2007.
Platform-Based Design and Frameworks: METROPOLIS and METRO II 321 46. http://www.modelica.org/. 47. S. Neema, J. Sztipanovits, and G. Karsai. Constraint-based design-space exploration and model synthesis. In Proceedings of the Third International Conference on Embedded Software (EMSOFT03), Philadelphia, PA, October 13–15 2003. 48. Object constraint language, version 2.0. OMG Available Specification formal/06-05-01, Object Management Group, May 2006. 49. Open SystemC Initiative. Functional Specification for SystemC 2.0, Septem- ber 2001. avaliable at www.systemc.org. 50. H. D. Patel, S. K. Shukla, and R. A. Bergamaschi. Heterogeneous behav- ioral hierarchy extensions for SystemC. IEEE Transactions on Computed- Aided Design of Integrated Circuits and Systems, 26(4):765–780, 2007. 51. A. Pinto, L. Carloni, and A. Sangiovanni-Vincentelli. A communication synthesis infrastructure for heterogeneous networked control systems and its application to building automation and control. In Proceedings of the Seventh International Conference on Embedded Software (EMSOFT), 2007, Salzburg, Austria, October 2007. 52. Third Generation Partnership Project. General universal mobile telecom- munications system (umts) architecture. Technical Specification TS 23.101, 3GPP, December 2004. 53. I. Sander and A. Jantsch. System modeling and transformational design refinement in ForSyDe. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 23(1):17–32, January 2004. 54. A. Sangiovanni-Vincentelli. Defining platform-based design. EEDesign, February 2002. 55. A. Simalatsar, D. Densmore, and R. Passerone. A methodology for archi- tecture exploration and performance analysis using system level design languages and rapid architecture profiling. In Third International IEEE Symposium on Industrial Embedded Systems (SIES), La Grande Motte, France, June 11–13, 2008. 56. S. Solden. Architectural services modeling for performance in HW-SW co-design. In Proceedings of the Workshop on Synthesis And System Integra- tion of MIxed Technologies SASIMI2001, Nara, Japan, October 18–19, 2001, pp. 72–77, 2001. 57. Speeds methodology. white paper 1.2, SPEEDS IST European project, April 2008. avaliable at www.speeds.eu.com/downloads/ SPEEDS_WhitePaper.pdf.
322 Model-Based Design for Embedded Systems 58. K. Strehl, L. Thiele, M. Gries, D. Ziegenbein, R. Ernst, and J. Teich. FunState—an internal design representation for codesign. IEEE Transac- tions on Very Large Scale Integration (VLSI) Systems, 9(4): 524–544, August 2001. 59. Metropolis Project Team. The metropolis meta-model - version 0.4. Tech- nical Report UCB/ERL M04/38, EECS Department, University of Cali- fornia, Berkeley, 2004. 60. VaST Systems. Comet/Meteor. World Wide Web, http://www. vastsystems.com, 2007. 61. A. Sangiovanni-Vincentelli. Quo Vadis, SLD? Reasoning about trends and challenges of system level design. Proceedings of the IEEE, 95(3): 467–506, March 2007. 62. G. Yang, X. Chen, F. Balarin, H. Hsieh, and A. Sangiovanni-Vincentelli. Communication and co-simulation infrastructure for heterogeneous sys- tem integration. In Design Automation and Test in Europe 2006, Munich, Germany, March 2006. 63. G. Yang, Y. Watanabe, F. Balarin, and A. Sangiovanni-Vincentelli. Sepa- ration of concerns: Overhead in modeling and efficient simulation tech- niques. In Fourth ACM International Conference on Embedded Software (EMSOFT’04), Pisa, Italy, September 2004. 64. G. Yang, H. Hsieh, X. Chen, F. Balarin, and A. Sangiovanni-Vincentelli. Constraints assisted modeling and validation in metropolis framework. In Asilomar Conference on Signal, Systems and Computers, Pacific grove, CA, October 2006. 65. J. Yoshida. Philips Semi see payoff in platform-based design. EE Times, October 2002. 66. D. Ziegenbein, R. Ernst, K. Richter, J. Teich, and L. Thiele. Combining multiple models of computation for scheduling and allocation. In Pro- ceedings of the 6th International Workshop on Hardware/Software Codesign (CODES98), pp. 9–13, Seattle, WA, March 15–18, 1998. IEEE Computer Society, Los Alamitos, CA.
11 Reconﬁgurable Multicore Architectures for Streaming Applications Gerard J. M. Smit, André B. J. Kokkeler, Gerard K. Rauwerda, and Jan W. M. Jacobs CONTENTS 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 11.1.1 Streaming Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 11.1.2 Multicore Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 11.1.2.1 Heterogeneous Multicore SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.1.3 Design Criteria for Streaming Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.1.3.1 Predictable and Composable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.1.3.2 Energy Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 11.1.3.3 Programmability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.1.3.4 Dependability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.3 Sample Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 11.3.1 MONTIUM/ANNABELLE System-on-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 11.3.1.1 MONTIUM Reconfigurable Processing Core . . . . . . . . . . . . . . . . . 333 11.3.1.2 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 11.3.1.3 ANNABELLE Heterogeneous System-on-Chip . . . . . . . . . . . . . . . 336 11.3.1.4 Average Power Consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 11.3.1.5 Locality of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 11.3.1.6 Partial Dynamic Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 11.3.2 Aspex Linedancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 11.3.2.1 ASProCore Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 11.3.2.2 Linedancer Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 341 11.3.2.3 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 11.3.3 PACT-XPP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 11.3.3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 11.3.3.2 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 11.3.4 Tilera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 11.3.4.1 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 11.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 323
324 Model-Based Design for Embedded Systems 11.1 Introduction This chapter addresses reconfigurable heterogenous and homogeneous multicore system-on-chip (SoC) platforms for streaming digital signal pro- cessing applications, also called streaming DSP applications. In streaming DSP applications, computations can be specified as a data flow graph with streams of data items (the edges) flowing between computation kernels (the nodes). Most signal processing applications can be naturally expressed in this modeling style [14]. Typical examples of streaming DSP applications are wireless baseband processing, multimedia processing, medical image processing, sensor processing (e.g., for remote surveillance cameras), and phased array radars. In a heterogeneous multicore architecture, a core can either be a bit-level reconfigurable unit (e.g., FPGA), a word-level reconfig- urable unit, or a general-purpose programmable unit (digital signal proces- sor (DSP) or general purpose processor (GPP)). We assume the cores of the SoC are interconnected by a reconfigurable network-on-chip (NoC). The pro- grammability of the individual cores enables the system to be targeted at multiple application domains. We take a holistic approach, which means that all aspects of system design need to be addressed simultaneously in a systematic way (e.g., [24]). We believe that this is key for an efficient overall solution, because an inter- esting optimization in a small corner of the design might lead to inefficiencies in the overall design. For example, the design of the NoC should be coordi- nated with the design of the processing cores, and the design of the process- ing cores should be coordinated with the tile specific compilers. Eventually, there should be a tight fit between the application requirements and the SoC and NoC capabilities. We first introduce streaming applications and multicore architectures in Sections 11.1.1 and 11.1.2, next we present key design criteria for streaming applications in Section 11.1.3. After that we give a multidimensional classi- fication of architectures for streaming applications in Section 11.2. For each category, one or more sample architectures are presented in Section 11.3. We end this chapter with a conclusion. 11.1.1 Streaming Applications The focus of this chapter is on multicore SoC architectures for streaming DSP applications where we can assume that the data streams are semi-static and have a periodic behaviour. This means that for a long period of time subsequent data items of a stream follow the same route through the SoC. The common characteristics of typical streaming DSP applications are as follows:
Reconﬁgurable MultiCore Architectures 325 • They are characterized by a relatively simple local processing of a huge amount of data. The trend is that energy costs for data communication dominates energy costs of processing. • Data arrives at nodes at a rather fixed rate, which causes periodic data transfers between successive processing blocks. The resulting com- munication bandwidth is application dependent and a large variety of communication bandwidth is required. The size of the data items is application dependent (e.g., 14-bit samples for a sensor system, 64 32-bit words for HiperLAN/2 [15] OFDM symbols, or 8 × 8 × 24-bit macro blocks for a video application). Also the data rate is application dependent (e.g., 100 Msamples/sec after the A/D converter for a sen- sor system, 200k OFDM symbols per second for HiperLAN/2, and 50 frames/sec for video). • The data flows through the successive processes in a pipelined fash- ion. Processes may work in parallel on parallel processors or can be time-multiplexed on one or more processors. Therefore, streaming applications show a predictable temporal and spatial behavior. • For our application domains, typically throughput guarantees (in data items per sec) are required for communication as well as for processing. Sometimes latency requirements are also given. • The lifetime of a communication stream is semi-static, which means a stream is fixed for a relatively long time. 11.1.2 Multicore Architectures Flexible and efficient SoCs can be realized by integrating hardware blocks (called tiles or cores) of different granularities into heterogeneous recon- figurable SoCs. In this chapter the term “core” is used for processor-like hardware blocks and the term “tile” is used for ASICs, fine-grained recon- figurable blocks, and memory blocks. We assume that the interconnected building blocks can be heterogeneous (see Figure 11.1), for instance, bit- level reconfigurable tiles (e.g., embedded FPGAs), word-level reconfig- urable cores (e.g., domain-specific reconfigurable cores), general-purpose programmable cores (e.g., DSPs and GPPs), and memory blocks. From a systems point of view these architectures are heterogeneous multiproces- sor systems on a single chip. The programmability and reconfigurability of the architecture enables the system to be targeted at multiple application domains. Recently, a number of multicore architectures have been proposed for the streaming DSP application domain. Some examples will be discussed in Section 11.3. A multicore approach has a number of advantages: • It is a future-proof architecture as the processing cores do not grow in complexity with technology. Instead, as technology scales, simply the number of cores on the chip grows.