The New C Standard- P15

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:100

0
50
lượt xem
6
download

The New C Standard- P15

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Tham khảo tài liệu 'the new c standard- p15', công nghệ thông tin, kỹ thuật lập trình phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:
Lưu

Nội dung Text: The New C Standard- P15

  1. 6.8.4 Selection statements 1740 might have to be written as: 1 { 2 x=y; 3 if (x != 0) 4 /* ... */ 5 } Neither of these reasons could be said to contain an actual benefit. The cost associated with side effects in controlling expressions is the possibility that they will go unnoticed by a reader of the source (especially if 770 reading scanning along the left edge looking for assignments). kinds of The most common form of side effect in a controlling expression is assignment, in particular simple assignment. The case where the author of the code intended to type an equality operator, rather than a simple assignment operator is a fault and these coding guidelines are not intended to recommend against the use of constructs that are obviously faults. However, it is possible that a reader of the visible source will mistake 0 guidelines not faults a simple assignment for an equality operator (the token == is much more likely than = in the context of a 1740 controlling controlling expression) and reducing the likelihood of such a mistake occurring is also a cost reduction. expression if statement This discussion has referred to controlling expressions as if these costs and benefits apply to their use in all contexts (i.e., selection and iteration statements). The following example shows that writing code to avoid the occurrence of side effects in controlling expressions contained with iteration statements requires two, rather than one, assignments to be used. 1 extern int glob_1, 2 glob_2; 3 4 void f_1(void) 5 { 6 if (glob_1 = glob_2) 7 ; 8 while ((glob_1 = glob_2 + 1) != 3) 9 { /* ... */ } 10 } 11 12 void f_2(void) 13 { 14 { 15 glob_1 = glob_2; /* Single statement. */ 16 if (glob_1 != 0) 17 ; 18 } 19 20 glob_1 = glob_2 + 1; /* Statement 1: always occurs. */ 21 while (glob_1 != 3) 22 { 23 /* ... */ 24 glob_1 = glob_2 + 1; /* Statement 2: occurs after every iteration. */ 25 } 26 } Duplicating the assignment to glob_1 creates a maintenance dependency (any changes to one statement need to be reflected in the other). The increase in cost caused by this maintenance dependency is assumed to be greater than the cost reduction achieved from reducing the likelihood of a simple assignment operator being mistaken treated as an equality operator. Cg 1740.1 The simple assignment operator shall not occur in the controlling expression of an if statement. June 24, 2009 v 1.2
  2. 1741 6.8.4 Selection statements Experience has shown that there are a variety of other constructs, appearing in a controlling expression, that developer have difficulty comprehending, or simply miscomprehend when scanning the source. However, no other constructs are discussed here. The guideline recommendation dealing with the use of the assignment operator has the benefit of simplicity and frequency of occurrence. It was difficult enough analyzing the cost/benefit case for simple assignment and others are welcome to address more complicated cases. Experience shows that many developers use the verbal form “if expression is not true then” when thinking about the condition under which an else form is executed. This use of not can lead to double negatives when reading some expressions. For instance, possible verbal forms of expressing the conditions under which the arms of an if statement are executed include: 1 if (!x) 2 a(); /* Executed if not x. */ 3 else 4 b(); /* Executed if not x is not true. */ 5 /* Executed if not x is equal to 0. */ 6 /* Executed if x is not equal to 0. */ 7 8 if (x != y) 9 c(); /* Executed if x is not equal to y. */ 10 else 11 d(); /* Executed if x is not equal to y is not true. */ ! 1103 The possible on linguistic impact of the ! operator on expression comprehension is discussed elsewhere. operand type Cg 1740.2 The top-level operator in the controlling expression of an if statement shall not be ! or != when that statement also contains an else arm. If the value of the controlling expression is known a translation time, the selection statement may contain dead code 190 dead code and the controlling expression is redundant. These issues are discussed elsewhere. redun- 190 dant code Usage In the translated form of this book’s benchmark programs 1.3% of selection-statements and 4% of iteration-statements have a controlling expression that is a constant expression. Use of simple, non- iterative, flow analysis enables a further 0.6% of all controlling expressions to be evaluated to a constant expression at translation time. block A selection statement is a block whose scope is a strict subset of the scope of its enclosing block. 1741 selection state- ment Commentary enum {a, b}; Rationale int different(void) { if (sizeof(enum {b, a}) != sizeof(int)) return a; // a == 1 return b; // which b? } In C89, the declaration enum {b, a} persists after the if-statement terminates; but in C99, the implied block that encloses the entire if statement limits the scope of that declaration; therefore the different function returns different values in C89 and C99. The Committee views such cases as unintended artifacts of allowing declarations as operands of cast and sizeof operators; and this change is not viewed as a serious problem. block 1742 selection sub- statement See the following C sentence for a further discussion on the rationale. v 1.2 June 24, 2009
  3. 6.8.4 Selection statements 1742 C90 See Commentary. C++ The C++ behavior is the same as C90. See Commentary. Coding Guidelines Developers are more likely to be tripped up by the lifetime issues associated with compound literals than enumeration constants. For instance in: 1 if (f(p=&(struct S){1, 2})) 2 /* ... */ 3 val=p->mem_1; the lifetime of the storage whose address is assigned to p ends when the execution of the if statement terminates. Ensuring that developers are aware of this behavior is an educational issue. However, developers intentionally relying on the pointed-to storage continuing to exist (which it is likely to, at least until storage needs to be allocated to another object) is a potential guideline issue. However, until experience has been gained on how developers use compound literals it is not known whether this issue is simply an interesting theoretical idea of a real practical problem. 1742 Each associated substatement is also a block whose scope is a strict subset of the scope of the selection block selection sub- statement. statement Commentary A new feature of C99: A common coding practice is always to use compound statements for every selection Rationale and iteration statement because this guards against inadvertent problems when changes are made in the future. Because this can lead to surprising behavior in connection with certain uses of compound literals (§6.5.2.5), the concept of a block has been expanded in C99. Given the following example involving three different compound literals: extern void fn(int*, int*); int examp(int i, int j) { int *p, *q; if (*(q = (int[2]){i, j})) fn(p = (int[5]){9, 8, 7, 6, 5}, q); else fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1); return *p; } it seemed surprising that just introducing compound statements also introduced undefined behavior: extern void fn(int*, int*); int examp(int i, int j) { int *p, *q; June 24, 2009 v 1.2
  4. 1742 6.8.4 Selection statements if (*(q = (int[2]){i, j})) { fn(p = (int[5]){9, 8, 7, 6, 5}, q); } else { fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1); } return *p; // undefined--no guarantee *p designates an object } Therefore, the substatements associated with all selection and iteration statements are now defined to be blocks, even if they are not also compound statements. A compound statement remains a block, but is no longer the only kind of block. Furthermore, all selection and iteration statements themselves are also blocks, implying no guarantee that *q in the previous example designates an object, since the above example behaves as if written: extern void fn(int*, int*); int examp(int i, int j) { int *p, *q; { if (*(q = (int[2]){i, j})) { // *q is guaranteed to designate an object fn(p = (int[5]){9, 8, 7, 6, 5}, q); } else { // *q is guaranteed to designate an object fn(p = (int[5]){4, 3, 2, 1, 0}, q + 1); } } // *q is not guaranteed to designate an object return *p; // *p is not guaranteed to designate an object } If compound literals are defined in selection or iteration statements, their lifetimes are limited to the implied enclosing block; therefore the definition of “block” has been moved to this section. This change is compatible with similar C++ rules. C90 The following example illustrates the rather unusual combination of circumstances needed for the specification change, introduced in C99, to result in a change of behavior. 1 extern void f(int); 2 enum {a, b} glob; 3 4 int different(void) 5 { 6 if (glob == a) 7 /* No braces. */ 8 f((enum {b, a})1); /* Declare identifiers with same name and compatible type. */ 9 10 return b; /* C90: numeric value 1 */ 11 /* C99: numeric value 0 */ 12 } v 1.2 June 24, 2009
  5. 6.8.4.1 The if statement 1742 C++ The C++ behavior is the same as C90. Coding Guidelines Some coding guideline documents recommend that the block associated with both selection and iteration using braces block statements always be bracketed with braces (i.e., that it is always a compound statement). When the 1729 compound statement compound statement contains a single statement the use of braces is redundant and their presence decreases syntax the amount of information visible on a display (the number of available lines is fixed and each brace usually occupies one line). However, experience has shown that in some cases the presence of these braces can: • Provide additional visual cues that can reduce the effort needed, by readers, to comprehend a sequence of statements. However, the presence of these redundant braces reduces the total amount of information immediately visible, to a reader, on a single display (i.e., the amount of source code that can be seen without expending motor effort giving editor commands to change the display contents). The way in 0 cost/accuracy trade-off which these costs and benefits trade-off against each other is not known. • Help prevent faults being introduced when code is modified (i.e., where a modification results in 1707 statement unintended changes to the syntactic bindings of blocks to statement headers). Experience shows that header nested if statements are the most common construct whose modification results in unintended changes to the syntactic bindings of blocks. In the following example the presence of braces provides both visual cues that the else does not bind to the outer if and additional evidence (its indentation provides counter evidence because it provides an interpretation that the intent is to bind to the outer if) that it is intended to bind to the inner if. 1 void f(int i) 2 { 3 if (i > 8) 4 if (i < 20) 5 i++; 6 else 7 i--; 8 9 if (i > 8) 10 { 11 if (i < 20) 12 i++; 13 else 14 i--; 15 } 16 } Blocks occur in a number of statements, is there a worthwhile cost/benefit in guideline recommendation specifying that these blocks always be a compound statement? The block associated with a switch statement is invariably a compound statement. A guideline recom- mendation that braces be used is very likely to be redundant in this case. Iteration statements are not as common as selection statements and much less likely to be nested (in other iteration statements) than selection statements (compare Figure 1739.2 and Figure 1763.2), and experience suggests developer comprehension of such constructs is significantly affected by the use of braces. Experience suggests that the nested if statement is the only construct where the benefit of the use of braces is usually greater than the cost. Cg 1742.1 The statement forming the block associated with either arm of an if statement shall not be an if statement. June 24, 2009 v 1.2
  6. 1743 6.8.4.1 The if statement 6.8.4.1 The if statement Constraints if statement The controlling expression of an if statement shall have scalar type. 1743 controlling ex- pression scalar Commentary type Although the type _Bool was introduced in C99 the Committee decided that there was too much existing code to change the specification for the controlling expression type. C++ The value of a condition that is an expression is the value of the expression, implicitly converted to bool for 6.4p4 statements other than switch; if that conversion is ill-formed, the program is ill-formed. If only constructs that are available in C are used the set of possible expressions is the same. Other Languages In many languages the controlling expression is required to have a boolean type. Languages whose design has been influenced by C often allow the controlling expression to have scalar type. Coding Guidelines selection 1739 statement syntax The broader contexts in which readers need to comprehend controlling expression are discussed elsewhere. This subsection concentrates on the form of the controlling expression. The value of a controlling expression is used to make one of two choices. Values used in this way are generally considered to have a boolean role. Some languages require the controlling expression to have a boolean type and their translators enforce this requirement. Some coding guideline documents contain recommendations that effectively try to duplicate this boolean type requirement found in other languages. Recommendations based on type not only faces technical problems in their wording and implementation (caused by the implicit promotions and conversions performed in C), but also fail to address the real issues of developer comprehension and performance. In the context of an if statement do readers of the source distinguish between expressions that have two possible values (i.e., boolean roles), and expressions that may have more than two values being used in a context where an implicit test against zero is performed? Is the consideration of boolean roles a cultural baggage carried over to C by developers who have previously used them in other languages? Do readers who have only ever programmed in C make use of boolean roles, or do they think in terms of a test against zero? In the absence of studies of developer mental representations of algorithmic and source code constructs, it is not possible to reliably answer these questions. Instead the following discussion looks at the main issues involved in making use of boolean roles and making use of the implicit a test against zero special case. A boolean role is not about the type of an expression (prior to the introduction of the type _Bool in C99, a character type was often used as a stand-in), but about the possible values an expression may have and how they are used. The following discussion applies whether a controlling expression has an integer, floating, or pointer type. In some cases the top-level operator of a controlling expression returns a result that is either zero or one (e.g., the relational and equality operators). The visibility, in the source, of such an operator signals its boolean role to readers. However, in other cases (see Table 1763.2) developers write controlling expressions that do not contain explicit comparisons (the value of a controlling expression is implicitly compared against zero). What are the costs and benefits of omitting an explicit comparison? The following code fragment contains examples of various ways of writing a controlling expression: 1 if (flag) /* 1 */ 2 /* ... */ v 1.2 June 24, 2009
  7. 6.8.4.1 The if statement 1743 3 4 if (int_value) /* 2 */ 5 /* ... */ 6 7 if (flag == TRUE) /* 3 */ 8 /* ... */ 9 10 if (int_value != 0) /* 4 */ 11 /* ... */ Does the presence of an explicit visual (rather than an implicit, in the developers mind) comparison reduce either the cognitive effort needed to comprehend the if statement or the likelihood of readers making mistakes? Given sufficient practice readers can learn to automatically process if (x) as if it had been written as if (x != 0). The amount of practice needed to attain an automatic level of performance is unknown. Another unknown is the extent to which the token sequence != 0 acts as a visual memory aid. When the visible form of the controlling expression is denoted by a single object (which may be an ordinary identifier, or the member of a structure, or some other construct where a value is obtained from an object) that name may provide information on the values it represents. To obtain this information readers might make use of the following: • Software development conventions. In the software development community (and other communities) the term flag is generally understood to refer to something that can be in one of two states. For instance, the identifier mimbo_flag is likely to be interpreted as having two possible values relating to a mimbo, rather than referring to the national flag of Mimbo. Some naming conventions contain a greater degree of uncertainty than others. For instance, identifiers whose names contain the character sequence status sometimes represent more than two values. • Natural language knowledge. Speakers of English regard some prepositions as being capable of representing two states. For instance, a cat is or is not black. This natural language usage is often adopted when selecting identifier names. For instance, is_flobber is likely to be interpreted as representing one of two states (being a, or having the attribute of, flobber or not). • Real world knowledge. A program sometimes needs to take account of information from the real world. For instance, height above the ground is an important piece of information in an airplane flight simulator, with zero height having a special status. • Application knowledge. The design of a program invariably makes use of knowledge about the application domain it operates within. For instance, the term ziffer may be used within the application domain that a program is intended to operate. Readers of the source will need the appropriate application knowledge to interpret the role of this identifier. • Program implementation conventions. The design of a program involves creating and using various conventions. For instance, a program dealing with book printing may perform special processing for books that don’t contain any pages (e.g., num_pages being zero is a special case). • Conventions and knowledge from different may be mixed together. For instance, the identifier name current_color suggests that it represents color information. This kind of information is not usually thought about in terms of numeric values and there are certainly more than two colors. However, assigning values to symbolic qualities is a common software development convention, as is assigning a special interpretation to certain values (e.g., using zero to represent no known color, a program implementation convention). The likelihood of a reader assuming that an identifier name has a boolean role will depend on the cultural beliefs and conventions they share with the author of the source. There is also the possibility that rather than using the identifier name to deduce a boolean role, readers may use the context in which it occurs to infer a 476 boolean role boolean role. This is an example of trust based usage. Requiring that values always be compared (against 792 trust based usage true/false or zero/nonzero) leads to infinite regression, as in the sequence: June 24, 2009 v 1.2
  8. 1746 6.8.4.1 The if statement 1 if (flag) 2 if (flag == TRUE) 3 if ((flag == TRUE) == TRUE) 4 and so on... At some point readers have to make a final comparison in their own mind. The inability to calculate (i.e., automatically enforceable) the form a controlling expression should take to minimize readers cognitive effort prevents any guideline recommendations being made here. Semantics if statement In both forms, the first substatement is executed if the expression compares unequal to 0. 1744 operand compare against 0 Commentary Depending on the type of the other operand this 0 may be converted to an integer type of greater rank, a null pointer 748 constant floating-point 0.0, or a null pointer constant. C++ The C++ Standard expresses this behavior in terms of true and false (6.4.1p1). The effect is the same. Other Languages In languages that support a boolean type this test is usually expressed in terms of true and false. Common Implementations logical 1111 negation The machine code generation issues are similar to those that apply to the logical operators. The degree to result is && 1250 operand com- which this comparison can be optimized away depends on the form of the controlling expression and the pare against 0 processor instruction set. If the controlling expressions top-level operator is one that always returns a value of zero or one (e.g., an equality or relational operator), it is possible to generate machine code that performs a branch rather than returning a value that is then compared. Some processors have a single instruction that performs a comparison and branch, while others have separate instructions (the comparison instruction setting processor condition flags that are then tested by a conditional branch instruction). On some processors simply loading a value into a register also results in a comparison against zero being made, with the appropriate conditional 1739 processor condition flags being set. The use of conditional instructions is discussed elsewhere. instructions The machine code for the first substatement is often placed immediately after the code to evaluate the basic block 1710 controlling expression. However, optimizers may reorder blocks of code in an attempt to maximize instruction cache utilization. else In the else form, the second substatement is executed if the expression compares equal to 0. 1745 equality 1221 Commentary operators Implementations are required to ensure that exactly one of the equality comparisons is true. exactly one relation is true Coding Guidelines Some coding guideline documents recommend that the else form always be present, even if it contains no executable statements. Such a recommendation has the benefit of ensuring that there are never any mismatching if/else pairs. However, then the same effect can be achieved by requiring nested if statements using braces 1742 block to be enclosed in braces (this issue is discussed elsewhere). The cost of adding empty else forms increases the amount of source code that may need to be read and in some cases decrease in the amount of non-null source that appears on a display device. Such a guideline recommendation does not appear worthwhile. Usage In the visible form of the .c files 21.5% of if statements have an else form. (Counting all forms of if supported by the preprocessor, with #elif counting as both an if and an else, there is an #else form in 25.0% of cases.) v 1.2 June 24, 2009
  9. 6.8.4.2 The switch statement 1748 1746 If the first substatement is reached via a label, the second substatement is not executed. Commentary The flow of control of a sequence of statements is not influenced by how they were initially reached, in the 1753 switch flow of control. The label may be reached as a result of executing a switch statement, or a goto statement. statement causes jump The issue of jumping into nested blocks or the body of iteration statements is discussed elsewhere. 1789 goto causes uncon- ditional jump C++ 1783 jump state- ment causes jump The C++ Standard does not explicitly specify the behavior for this case. to 1766 iteration Other Languages statement executed repeat- edly This statement applies to all programming languages that support jumps into more deeply nested blocks. 1747 An else is associated with the lexically nearest preceding if that is allowed by the syntax. else binds to nearest if Commentary 1739 selection As it appears in the standard the syntax for if statements is ambiguous on how an else should be associated statement syntax in a nested if statement. This semantic rule resolves this ambiguity. Other Languages Languages that support nesting of conditional statements need a method of resolving which construct an else binds to. The rules used include the following: • Not supporting in the language syntax unbracketed nesting (i.e., requiring braces or begin/end pairs) within the then arm. For instance, Algol 60 permits the usage IF q1 THEN a1 ELSE IF q2 THEN a2 ELSE a3, but the following is a syntax violation IF q1 THEN IF q2 THEN a1 ELSE a2 ELSE a3. • Using a matching token to pair with the if. The keyword fi is a common choice (used by Ada, Algol 68, while the C preprocessor uses endif). In this case the bracketing formed by the if/fi prevents any ambiguity occurring. • Like C— using the nearest preceding rule. Coding Guidelines 1742.1 if statement If the guideline recommendation on using braces is followed there will only ever be one lexically preceding block not an if statement if that an else can be associated with. Some coding guideline documents recommend that an if statement always have an associated else form, even if it only contains the null statement. 1733 null state- ment 6.8.4.2 The switch statement Constraints 1748 The controlling expression of a switch statement shall have integer type. switch statement Commentary A switch statement uses the exact value of its controlling expression and it is not possible to guarantee the exact value of an expression having a floating type (there is a degree of unpredictability in the value between different implementations). For this reason implementations are not required to support controlling expressions having a floating type. C++ 6.4.2p2 June 24, 2009 v 1.2
  10. 1749 6.8.4.2 The switch statement The condition shall be of integral type, enumeration type, or of a class type for which a single conversion function to integral or enumeration type exists (12.3). If only constructs that are available in C are used the set of possible expressions is the same. Common Implementations The base document did not support the types long and unsigned long. Support for integer types with rank greater than int was added during the early evolution of C.[1199] Other Languages There are some relatively modern languages (e.g., Perl) that do not support a switch statement. Java does not support controlling expressions having type long. Some languages (e.g., PHP) support controlling expressions having a string type. Coding Guidelines A controlling expression, in a switch statement, having a boolean role might be thought to be unusual, an if statement being considered more appropriate. However, the designer may be expecting the type of the controlling expression to evolve to a non-boolean role, or the switch statement may have once contained more case labels. Table 1748.1: Occurrence of switch statements having a controlling expression of the given type (as a percentage of all switch statements). Based on the translated form of this book’s benchmark programs. Type % Type % int 29.5 bit-field 3.1 unsigned long 18.7 unsigned short 2.8 enum 14.6 short 2.5 unsigned char 12.4 long 0.9 unsigned int 10.0 other-types 0.2 char 5.1 switch If a switch statement has an associated case or default label within the scope of an identifier with a variably 1749 past variably modified type modified type, the entire switch statement shall be within the scope of that identifier.133) • no default • × with default ∆ × 1,000 ∆ embedded × • × • × × × × × • • ∆ switch statements • • × •∆• × × ×××× × ∆ • ו • •• 100 ∆ ∆ × × • • • • ∆ ∆ × • × × × ×• × × • ∆ •× • ×× × • • ∆ •∆ ∆ • • ••×••×• × • • ∆ × × ××∆ × • ∆×× • • • • • × • ו × ×• ∆ ∆ • • • • × • × ×× ×•× × × • • × ×• • ••×• • × × •× • × ×• • •• ∆ ×וו × × ×• × 10 ∆ • × ×× •×× ×× ×× • × × •× × ×•× ∆ • × × × ××ו×ו×ו×ו• •× • ••×•× •× × • ××× × ×× ••× • × ×× × × × ×××× × ×× × • × × ×× × × • × • • ו •••×ו • ו•••• × •×× • • • • ••••••×××× • ••• • • • •• × •• • ••×× • × ×× ××××× ×× ×× × ×× ××××××××× ×× ×× 1 ×× × × × ×××× × ×××××× × ×××××× × × × × × × ××× × × × × ×× × × 0 20 40 60 80 100 1 10 100 1000 case value density case value span Figure 1748.1: Density of case label values (calculated as (maximum case label value minus minimum case label value minus one) divided by the number of case labels associated with a switch statement) and span of case label values (calculated as (maximum case label value minus minimum case label value minus one)). Based on the translated form of this book’s benchmark programs and embedded results from Engblom[397] (which were scaled, i.e., multiplied by a constant, to allow comparison). The no default results were scaled so that the total count of switch statements matched those that included a default label. v 1.2 June 24, 2009
  11. 6.8.4.2 The switch statement 1749 • • 10,000 • • • case/default labels • •• 1,000 •• •• •••• ••• ••• •• 100 ••• ••• •• ••• ••• ••••• •• •••• ••• •• 10 ••••••••• • • • •••• • •• •• •••• • ••••••••••• • • ••••• • • •• ••• •••• • • • 1 • ••••••••••••••• •••• • • •• • • • • • 1 10 100 1000 Statements Figure 1748.2: Number of case/default labels having s given number of statements following them (statements from any nested switch statements did not contribute towards the count of a label). Based on the visible form of the .c files. Commentary The declaration of an identifier having variable modified type can occur in one of the sequence of statements labeled by a case or default, provided it appears within a compound statement that does not contain any other case or default labels associated with that switch statement, or it appear after the last case or default label in the switch statement. In the compound statement case the variably modified type will not be within the scope of any case or default labels (its lifetime terminates at the end of the compound statement). The wording of the requirement is overly strict in that it prohibits uses that might be considered well behaved. For instance: 1 switch (i) 2 { 3 case 1: 4 int x[n]; 5 /* ... */ 6 break; 7 8 case 2: 9 /* Statements that don’t access x. */ 10 } Attempting to create wording to support such edge cases was considered to be a risk (various ambiguities may later be found in it) that was not worth the benefit. Additional rationale for this requirement is discussed 1788 goto elsewhere. past variably modified type C90 Support for variably modified types is new in C99. C++ Support for variably modified types is new in C99 and is not specified in the C++ Standard. The C++ Standard contains the additional requirement that (the wording in a subsequent example suggests that being visible rather than in scope is more accurate terminology): It is possible to transfer into a block, but not in a way that bypasses declarations with initialization. A program 6.7p3 that jumps77) from a point where a local variable with automatic storage duration is not in scope to a point where it is in scope is ill-formed unless the variable has POD type (3.9) and is declared without an initializer (8.5). June 24, 2009 v 1.2
  12. 1751 6.8.4.2 The switch statement 1 void f(void) 2 { 3 switch (2) 4 { 5 int loc = 99; /* strictly conforming */ 6 // ill-formed 7 8 case 2: return; 9 } 10 } Example 1 extern int glob; 2 3 void f(int p_loc) 4 { 5 switch (p_loc) /* This part of the switch statement is not within the scope of a_1. */ 6 { 7 case 1: ; 8 int a_1[glob]; /* This declaration causes a constraint violation. */ 9 10 case 2: a_1[2] = 4; 11 break; 12 13 case 3: { 14 long a_2[glob]; /* Conforming: no case label within the scope of a_2. */ 15 /* ... */ 16 } 17 break; 18 } 19 } case label unique The expression of each case label shall be an integer constant expression and no two of the case constant 1750 in same switch expressions in the same switch statement shall have the same value after conversion. Commentary Two case labels having the same value is effectively equivalent to declaring two labels, within the same function, having the same name. Coding Guidelines Some sequences of case label values might be considered to contain suspicious entries or omissions. For instance, a single value that is significantly larger or smaller than the other values (an island), or a value missing from the middle of a contiguous sequence of values (a hole). While some static analysis tools check for such suspicious values, it is not clear to your author what, if any, guideline recommendation would be worthwhile. default label There may be at most one default label in a switch statement. 1751 at most one Commentary A default label is the destination of a jump for some, possible empty, set of values of the controlling expression. As such it is required to be unique (if it occurs) within a switch statement. A bug in the terminology being used in the standard “may” ⇒ “shall”. v 1.2 June 24, 2009
  13. 6.8.4.2 The switch statement 1753 Coding Guidelines Some coding guideline documents recommend that all switch statements contain a default label. There does not appear to be an obvious benefit (as defined by these coding guideline subsections, although there may be benefits for other reasons) for such a guideline recommendation. To adhere to the guideline developers simply need to supply a default label and an associated null statement. There are a number of situations where adhering to such a guideline recommendation leads to the creation of redundant code (e.g., if all 190 redundant code possible values are covered by the case labels, either because they handle all values that the controlling expression can take or because execution of the switch statement is conditional on an if statement that guarantees the controlling expression is within a known range). Usage In the visible form of the .c files, 72.8% of switch statements contain a default label. 1752 (Any enclosed switch statement may have a default label or case constant expressions with values that duplicate case constant expressions in the enclosing switch statement.) Commentary This specification (semantics in a Constraints clause) clarifies the interpretation to be given to the phrase “in 1750 case la- the same switch statement” appearing earlier in this Constraints clause. bel unique in same switch Semantics 1753 A switch statement causes control to jump to, into, or past the statement that is the switch body, depending switch statement causes jump on the value of a controlling expression, and on the presence of a default label and the values of any case labels on or in the switch body. Commentary This defines the term switch body. Developers also use the terminology body of the switch. It is possible to write a switch statement as an equivalent sequence of if statements. However, experience shows that in some cases the switch statement appears to require less significantly less (cognitive) effort to comprehend than a sequence of if statements. Common Implementations Many processors include some form of instruction (often called an indirect jump) that indexes into a table (commonly known as a jump table) to obtain a location to jump to. The extent to which it is considered to be more efficient to use such an instruction, rather than a series of if statements, varies between processors (whose behavior varies for the situation where the index is out of range of the jump table) and implementations (the sophistication of the available optimizer). The presence of a default label creates additional complications in that all values of the controlling expression, not covered by a case label, need to be explicitly handled. Spuler[1300] discusses the general issues. Some translators implement switch statements as a series of if statements. Knowledgeable developers know that, in such implementations, placing the most frequently executed case labels before the less frequently executed ones can provide a worthwhile performance improvement. Some translators[22, 588] provide an option that allows the developer to specify whether a jump table, sequence of if statements or some other method should to be used. Optimal execution time performance is not the only factor that implementations need to consider. The storage occupied by the jump table sometimes needs to be taken into account. In a simple implementation it is proportional to the difference between the maximum and minimum values appearing in the case labels (which may not be considered an efficient use of storage if there are only a few case labels used within this range). A more sophisticated technique than using a series of if statements is to create a binary tree of case label values and jump addresses. The value of the controlling expression being used to walk this tree to obtain the destination address. Some optimizers split the implementation into a jump table for those case label values that are contiguous and a binary tree for the out lying values. June 24, 2009 v 1.2
  14. 1756 6.8.4.2 The switch statement Translator vendors targeting modern processors face an additional problem. Successful processors often contain a range of different implementations, creating a processor family, (e.g., the Intel Pentium series). These different processor implementations usually have different performance characteristics, and in the case of the switch statement different levels of sophistication in branch prediction. How does a translator make the decision on whether to use a jump table or if statements when the optimal code varies between different implementations of a particular processor? A study by Uh and Whalley[1420] compared (see Table 1753.1) the performance of a series of if statements and the equivalent jump table implementation. For three of the processors it was worth using a jump table when there were more than two if statements were likely to be executed. In the case of the U LTRA SPARC-1 the figure was more than eight if statements executed (this was put down to the lack hardware support for branch prediction of indirect jumps). Table 1753.1: Performance comparison (in seconds) of some implementation techniques for a series of if statements (contained in a loop that iterated 10,000,000 times) using (1) linear search (LS), or (2) indirect jump (IJ), for a variety of processors in the SPARC family. br is the average number of branches per loop iteration. Based on Uh and Whalley.[1420] Processor Implementation 2.5br LS 4.5br LS 8.5br LS 2.5br IJ 4.5br IJ 8.5br IJ SPARC STATION -IPC 3.82 5.53 8.82 2.61 2.71 2.76 SPARC STATION-5 1.03 1.65 2.74 0.63 0.76 0.76 SPARC STATION-20 0.93 1.60 2.65 0.87 0.93 0.94 U LTRA SPARC-1 0.50 1.16 1.56 1.50 1.51 1.51 A case or default label is accessible only within the closest enclosing switch statement. 1754 Commentary This requirement needs to be explicitly stated because there is no syntactic association between case labels and their controlling switch statement. Coding Guidelines statement 1707 The issue most likely to be associated with a nested switch statement is source layout (because the amount visual layout of indentation used is often greater than in nested if statements). However, nested switch statements are relatively uncommon. For this reason the issue of the comprehension effort needed for this form of nested construct is not discussed. The integer promotions are performed on the controlling expression. 1755 Commentary integer pro- 675 motions The rationale for performing the integer promotions is the same as that for the operands within expressions. Common Implementations When the controlling expression is denoted by an object having a character type the possible range of values is known to fit in a byte. Even relatively simple optimizers often check for, and make use of, this special case. The constant expression in each case label is converted to the promoted type of the controlling expression. 1756 Commentary Prior to this conversion the type of the constant expression associated with each case label is derived from the form of the literals and result type of the operators it contains. The relationship between the value of a case label and a controlling expression is not the same as that between the operands of an equality operator. The conversion may cause the rank of the case label value to be reduced. If the types of both expressions are unsigned it is possible for the case label value to change (e.g., a modulo reduction). Like all integer conversions undefined behavior may occur for some values and types. v 1.2 June 24, 2009
  15. 6.8.4.2 The switch statement 1756 Other Languages Many languages have a single integer type, so there is no conversion to perform for case label values. Strongly typed languages usually require that the type of the case label value be compatible with the type of the controlling expression, there is not usually any implicit conversions. Enumerated constants are often defined to be separate types, that are not compatible with any integer type. Coding Guidelines This C sentence deals with the relationship between individual case label values and the controlling expression. The following points deal with the relationship between different case label values within a given switch statement: • Mixing case labels whose values are represented using both character constants and integer constants is making use of representation information (in this context the macro EOF might be interpreted in its symbolic form of representing an end-of-file character, rather than an integer constant). There does not appear to be a worthwhile benefit in having a deviation that permits the use of the integer constant 0 rather than the character constant ’\0’, on the grounds of improved reader recognition performance. The character constant ’\0’ is the most commonly occurring character constant (10% of all character constants in the visible form of the .c files, even if it only represents 1% of all constant tokens denoting the value 0). • Mixing case labels whose values are represented using both enumeration constants and some other form of constant representation (e.g., an integer constant) is making use of the underlying representation of the enumerated constants. The same is also true if enumerated constants from different enumerations types are mixed. • Mixing integer constants represented using decimal, hexadecimal, or octal lexical forms. The issue of 1875 form of rep- visually mixing integer constants having different lexical forms is discussed elsewhere. resentation mixing Floating point literals are very rarely seen in case labels. The guideline recommendation dealing with exact 1214.1 equality comparison of floating-point values is applicable to this usage. operators not floating-point operands Example 1 #include 2 3 enum {red, green, blue}; 4 5 extern int glob; 6 7 void f(unsigned char ch) 8 { 9 switch (ch) 10 { 11 case ’a’: glob++; 12 break; 13 14 case green: glob+=2; 15 break; 16 17 case (int)7.0: glob--; 18 break; 19 20 case 99: glob -= 9; 21 break; 22 23 case ULONG_MAX: glob *= 3; 24 break; 25 } 26 } June 24, 2009 v 1.2
  16. 1761 6.8.4.2 The switch statement If a converted value matches that of the promoted controlling expression, control jumps to the statement 1757 following the matched case label. Commentary A case label can appear on any statement in the switch body. 1 switch (x) 2 default : if (prime(x)) 3 case 2: case 3: case 5: case 7: 4 process_prime(x); 5 else 6 case 4: case 6: case 8: case 10: 7 process_composite(x); Duff’s Device 1766 There can be more practical uses for this functionality. Coding Guidelines Experience suggests that developers treat the case label value as being the result of evaluating the expression appearing in the source (i.e., that no conversion, driven by the type of the controlling expression, takes place). A conversion that causes a change of value is very suspicious. However, no instances of such an event occur in the Usage .c files or have been experienced by your author. Given this apparent rarity no guideline recommendation is made here. Otherwise, if there is a default label, control jumps to the labeled statement. 1758 Commentary A switch statement may be thought of as a series of if statements with the default label representing the final else arm (although other case labels may label the same statement as a default label). Common Implementations Having a default label may not alter the execution time performance of the generated machine code. All of the tests necessary to determine that the default label should be jumped to are the same as those necessary to determine that no part of the switch should be executed (if there is no default label). If no converted case constant expression matches and there is no default label, no part of the switch body is 1759 executed. Commentary This behavior is the same as that of a series of nested if statements. If all of their controlling expressions are false and there is no final else arm, none of the statement bodies is executed. Other Languages Some languages require that there exist a case label value, or default, that matches the value of the controlling expression. If there is no such matching value the behavior may be undefined (e.g., Pascal specifies it is a dynamic-violation) or even defined to raise an exception (e.g., Ada). Coding Guidelines default label 1751 The coding guideline issue of always having a default label is discussed elsewhere. at most one Implementation limits As discussed in 5.2.4.1, the implementation may limit the number of case values in a switch statement. 1760 Commentary limit 296 This observation ought really to be a Further reference subclause. case labels v 1.2 June 24, 2009
  17. 6.8.5 Iteration statements 1763 1761 133) That is, the declaration either precedes the switch statement, or it follows the last case or default label footnote 133 associated with the switch that is in the block containing the declaration. Commentary If the declaration is not followed by any case or default labels, all references to the identifier it declares can only occur in the statements that follow it (which can only be reached via a jump to preceding case or default labels, unless a goto statement jumps to an ordinary label within the statement list occurs). 1762 EXAMPLE In the artificial program fragment EXAMPLE case fall through switch (expr) { int i = 4; f(i); case 0: i = 17; /* falls through into default code */ default: printf("%d\n", i); } the object whose identifier is i exists with automatic storage duration (within the block) but is never initialized, and thus if the controlling expression has a nonzero value, the call to the printf function will access an indeterminate value. Similarly, the call to the function f cannot be reached. Commentary 151 static storage Objects with static storage duration are initialized on program startup. duration initialized before startup 1 switch (i) 2 { 3 static char message[] = "abc"; /* Not dependent on control flow. */ 4 case 0: 5 f(message); 6 break; 7 case 1: 8 /* ... */ 9 } 1727 case Other issues associated with constructs contained in this example are discussed elsewhere. fall through 1749 switch past variably 6.8.5 Iteration statements modified type iteration state- ment 1763 syntax iteration-statement: while ( expression ) statement do statement while ( expression ) ; for ( expressionopt ; expressionopt ; expressionopt ) statement for ( declaration expressionopt ; expressionopt ) statement Commentary The terms loop header or head of the loop are sometimes used to refer to the source code location containing the controlling expression of a loop (in the case of a for statement it might be applied to all three components bracketed by parentheses). It is often claimed that programs spend 90% of their time executing 10% of their code. This characteristic is only possible if the time is spent in a subset of the programs iteration statements, or a small number of functions called within those statements. While there is a large body of published research on program June 24, 2009 v 1.2
  18. 1763 6.8.5 Iteration statements performance, there is little evidence to back up this claim (one study[1344] found that 88% of the time was spent in 20% of the code, while analysis[1455] of some small embedded applications found that 90% of the time was spent in loops). It may be that researchers are attracted to applications which spend their time in loops because there are often opportunities for optimization. Most of existing, published, execution time measurements are based on engineering and scientific applications, for database oriented applications[1160] and operating systems[1390] loops have not been found to be so important. The ; specified as the last token of a do statement is not needed to reduce the difficulty of parsing C source. It is simply part of an adopted convention. C90 Support for the form: for ( declaration expropt ; expropt ) statement is new in C99. C++ The C++ Standard allows local variable declarations to appear within all conditional expressions. These can occur in if, while, and switch statements. Other Languages Many languages require that the lower and upper bounds of a for statement be specified, rather than a termination condition. They usually use keywords to indicate the function of the various expressions (e.g., Modula-2, Pascal): 1 FOR I=start TO end BY step Some languages (e.g., BCPL, Modula-2) require step to be a translation time constant. Both Ada or Pascal require for statements to have a step size of one. Ada uses the syntax: 1 for counter in 1..10 2 loop 3 ... 4 for counter in reverse 1..10 5 loop 6 ... which also acts as the definition of counter. Cobol supports a PERFORM statement, which is effectively a while statement. 1 PERFORM UNTIL quantity > 1000 2 * some code 3 END-PERFORM The equivalent looping constructs In Fortran is known as a do statement. A relatively new looping construct, at least in the Fortran Standard, is FORALL. This is used to express a looping computation in a form that can more easily be translated for parallel execution. Some languages (e.g., Modula-2, Pascal) use the keywords repeat/until instead of do/while, while other languages (e.g., Ada) do not support an iteration statement with a test at the end of the loop. A few languages (e.g., Icon[236] which uses the term generators) have generalized the looping construct to provide what are commonly known as iterators. An iterator enumerates the members of a set (a mechanism for accessing each enumerated member is provided in the language), usually in some unspecified order, and has a loop termination condition. v 1.2 June 24, 2009
  19. 6.8.5 Iteration statements 1763 Common Implementations Many programs spend a significant percentage of their time executing iteration statements. The following are some of the ways in which processor and translator vendors have responded to this common usage 0 Measuring characteristic: implementa- tions • Translator vendors wanting to optimize the quality of generated machine code have a number of optimization techniques available to them. A traditional loop optimization is strength reduction[280] 0 translator optimizations (which replaces costly computations by less expensive ones), while more ambitious optimizers might perform hoisting of loop invariants and loop unrolling. Loop invariants are expressions whose value 1774 loop unrolling does not vary during the iteration of a loop; such expressions can be hoisted to a point just outside the start of the loop. Traditionally translators have only performed loop unrolling on for statements. (Translation time information on the number of loop iterations and step size is required; this information can often be obtained by from the expressions in the loop header, i.e., the loop body does not need to be analyzed.) 988 data depen- More sophisticated optimizations include making use of data dependencies to order the accesses to dency storage. As might be expected with such a performance critical construct, a large number of other optimization techniques are also available. • Processor vendors want to design processors that will execute programs as quickly as possible. Holding the executed instructions in a processor’s cache saves the overhead of fetching them from storage and 0 cache most processors cache both instructions and object values. Some processors (usually DSP) have what is known as a zero overhead loop buffer (effectively a software controlled instruction cache). The sequence of instructions in such a loop buffer can be repetitively executed with zero loop overhead (the total loop count may be encoded in the looping instruction or be contained in a register). Because of their small size (the Agere DSP16000[6] loop buffer has a limit of 31 instructions) and restrictions on instructions that may be executed (e.g., no instructions that change the flow of control) optimizers can have difficulty making good of such buffers.[1419] The characteristics of loop usage often means that successive array elements are accessed on successive loop interactions (i.e., storage accesses have spatial locality). McKinley and Temam[932] give empirical results on the effect of loops on cache behavior (based on Fortran source). Some CISC processors support a decrement/increment and branch on nonzero instruction;[323, 625] ideal for implementing loops whose termination condition is the value zero (something that can be arranged in handwritten assembler, but which rarely happens in loops written in higher-level languages— Table 1763.1). The simplifications introduced by the RISC design philosophy did away with this kind of instruction; programs written in high-level languages did not contain enough loops of the right kind to make it cost effective to support such an instruction. However, one application domain where a significant amount of code is still written in assembler (because of the comparatively poor performance 0 translator of translator generated machine code) is that addressed by DSP processors, which often contain such performance vs. assembler [989] decrement (and/or increment) branch instructions (the SC140 DSP core includes hardware loop 0 DSP processors counters that support up to four levels of loop nesting). The C compiler for the Unisys e-@ction Application Development Solutions[1424] uses the JGD processor instruction to optimize the loop iteration test. However, this usage limits the maximum number of loop iterations to 235 − 2, a value that is very unlikely to be reached in a commercial program (a trade-off made by the compiler implementors between simplicity and investing effort to handle very rare situations). Obtaining an estimate of the execution time of a sequence of statements may require estimating the number of times an iteration statement will iterate. Some implementations provide a mechanism for the developer to provide iteration count information to the translator. For instance, the translator for the TMS320C6000[1373] supports the following usage: June 24, 2009 v 1.2
  20. 1763 6.8.5 Iteration statements × • × × while • • for 10,000 do Function definitions • × • 1,000 × • • × • • 100 × • × × • × × • • • 10 × × • • × • • • • • × × • 1 × × × × × × • • × • 0 5 10 15 20 25 iteration-statements Figure 1763.1: Number of function definitions containing a given number of iteration-statements. Based on the translated form of this book’s benchmark programs. 1 #pragma MUST_ITERATE (30) /* Will loop at least 30 times. */ Another approach is for the translator to deduce the information from the source.[567] Program loops may not always be expressed using an iteration-statement (for instance, they may be created using a goto statement). Ramalingam[1158] gives an algorithm for identifying loops in almost linear time. Example 1 #include 2 3 int f(unsigned char i, unsigned char j) 4 { 5 do 6 while (i++ < j) 7 ; 8 while (i > j++) 9 ; 10 11 if (j != 0) 12 printf("Initial value of i was greater than initial value of j\n"); 13 } Usage A study by Bodík, Gupta, and Soffa[130] found that 11.3% of the expressions in SPEC95 were loop invariant. v 1.2 June 24, 2009
Đồng bộ tài khoản