# The New C Standard- P16

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:112

0
52
lượt xem
4

## The New C Standard- P16

Mô tả tài liệu

Tham khảo tài liệu 'the new c standard- p16', công nghệ thông tin, kỹ thuật lập trình phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: The New C Standard- P16

1. 6.10.1 Conditional inclusion 1883 • The speciﬁcation has changed between C90 and C99. The problem with any guideline recommendation is that the total cost is likely to be greater than the total beneﬁt (a cost is likely to be incurred in many cases and a beneﬁt obtained in very few cases). For this reason 835 integer no recommendation is made here. The discussion on sufﬁxed integer constants is also applicable in the constant type ﬁrst in list context of a conditional inclusion directive. Example In the following the developer may assume that unwanted higher bits in the value of C will be truncated when shifted left. 1 #define C 0x1100u 2 #define INT_BITS 32 3 4 #define TOP_BYTE (C
2. 1888 6.10.1 Conditional inclusion Commentary basic char- 478 acter set The guarantee on the value being nonnegative does not apply during preprocessing. For instance, a pre- positive if stored in char object processing using the EBCDIC character set and acting as if the type char was signed. In other contexts character 885 the value of a character constant containing a single-character that is not a member of the basic execution constant character set is implementation-deﬁned. more than one character Coding Guidelines character 885 constant The discussion on the possibility of character constants having other implementation-deﬁned values is more than one character applicable here. #ifdef Preprocessing directives of the forms 1884 #ifndef # ifdef identifier new-line groupopt # ifndef identifier new-line groupopt check whether the identiﬁer is or is not currently deﬁned as a macro name. Commentary There is no #elifdef form (although over half of the uses of the #elif directive are followed by a single instance of the defined operator— Table 1872.1). Their conditions are equivalent to #if defined identifier and #if !defined identifier respectively. 1885 Commentary The #ifdef and #ifndef forms are rather like the unary ++ and -- operators in that they provide a short hand notation for commonly used functionality. Coding Guidelines The #ifdef forms are the most common form of conditional inclusion directive. Measurements (see Table 1872.1) also show that nearly a third of the uses of the defined operator could be replaced by one of these forms. There are advantages (e.g., most common form suggests most practiced form for readers, and ease of visual scanning down the left edge of the source) and disadvantages (e.g., requires more effort to add additional conditions to the single test being made) to using the #ifdef forms, instead of the defined operator. However, there does not appear to be a worthwhile cost/beneﬁt to recommending one of the possibilities. 142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000 1886 is signed and positive within a #if expression even though it is unsigned in translation phase 7. Commentary The wording was changed by the response to DR #265. footnote 143) Thus, the constant expression in the following #if directive and if statement is not guaranteed to 1887 143 evaluate to the same value in these two contexts. #if ’z’ - ’a’ == 25 if (’z’ - ’a’ == 25) Commentary This situation could occur, for instance, if the Ascii representation were used during the preprocessing phases transla- 133 tion phase and EBCDIC were used during translation phase 5. 5 Each directive’s condition is checked in order. 1888 v 1.2 June 24, 2009
3. 6.10.1 Conditional inclusion 1890 Commentary The order is from the lowest line number to the highest line number. Coding Guidelines It may be possible to obtain some translation time performance advantage (at least for the original developer) 1739 selection by appropriately ordering the directives. Unlike developer behavior with if statements, developers do not statement syntax usually aim to optimize speed of translation when deciding how to order conditional inclusion directives (experience suggests that developers often simply append new directive to the end of any existing directives). Recognizing a known pattern in a sequence of directives has several beneﬁts for readers. They can make use of any previous deductions they have made on how to interpret the directives and what they represent, and the usage highlights common dependencies in the source. In the following code fragment more reader effort is required to spot similarities in the sequence that directives are checked than if both sequences of directives had occurred in the same order. 1 #ifdef MACHINE_A 2 /* ... */ 3 #else 4 #ifdef MACHINE_B 5 /* ... */ 6 #endif 7 #endif 8 9 #ifdef MACHINE_B 10 /* ... */ 11 #else 12 #ifdef MACHINE_A 13 /* ... */ 14 #endif 15 #endif Given the lack of attention from developers on the relative ordering of directives and the beneﬁts of using the same ordering, where possible, a guideline recommendation appears worthwhile. However, a guideline 0 guideline rec- recommendation needs to be automatically enforceable and determining when two sequences of directives ommendation enforceable have the same affect, during translation, may be infeasible because information that is not contained within the source may be required (e.g., dependencies between macro names that are likely to be deﬁned via translator command line options). Rev 1888.1 Where possible the visual order of evaluation of expressions within different sequences of nested conditional inclusion directives shall be the same. 1889 If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the name that determines the directive in order to keep track of the level of nested conditionals; Commentary 1744 if statement A parallel can be drawn with the behavior of if statements, in that if their controlling expression evaluates to operand compare against 0 zero, during program execution, any statements in the associated block are skipped. 1890 directives are processed only through the name that determines the directive in order to keep track of the level directive processing of nested conditionals; while skipping Commentary The preprocessor operates on a representation of the source written by the developer, not translated machine code. As such it needs to perform some processing on its input to be able to deduce when to stop skipping. June 24, 2009 v 1.2
4. 1891 6.10.1 Conditional inclusion × × #if part 1,000 × • #else part •× Translation units • ×× Top level ﬁles ×• •× × 100 × ×× •× • ×× ×× ×× ×× × • • × ×× • × × •× × •• ×× × ××× • • •• •• × × • × × ×× •× × × × ×× ×× × × × •• • × × × × • ×× × × × ××× × × × • × × × × ×× × × ××× × × ×× × × ×××××× × × × × × × 10 •• • • ×× • × ••• • • • ×× ××××××× × × ×• • × ××× × ×× × × × × ×× • × •× ×× × × × × ××× ×××× × ×× × × ×× × × •• • × × ×× × × • • • × × × ×××× × × × × × ×× × • × • •× × ×× × ×× ×× ×× × × • •• • •• •× × × × × × × ×× × × × ×× × • • × • ×• × × × × ××× • • × × × ×× × ×× 1 • •• •• • • • • ••• •• • ×× × ×× × × • ×× ××• × ×× × •• • ××× × × × × ×× × × × • ××× ×× •• ••• • • • • • × ×× × • × ×× × × × × ×× × × •× × 50 100 150 50 100 150 Physical lines skipped Physical lines skipped Figure 1889.1: Number of top-level source ﬁles (i.e., the contents of any included ﬁles are not counted) and (right) complete translation units (including the contents of any ﬁles #included more than once) having a given number of lines skipped during translation of this book’s benchmark programs. Directives need to be processed to keep track of the level of nesting of conditionals and translation phases transla- 116 tion phase 1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characters 1 within a comment must not be treated as directives. The intent of only requiring a minimum of directive processing, while skipping, is to enable partially written source code to be skipped and to allow preprocessors to optimize their performance in this special case, speeding up the rate at which the input is processed. Example 1 #if 1 2 extern int ei; 3 4 #elif " an unmatched quote character, undefined behavior 5 6 extern int foo_bar; 7 #endif 8 9 #if 0 10 printf("\ 11 #endif \n"); 12 13 #endif 14 15 #if 0 16 /* 17 #endif 18 */ 19 #endif the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the 1891 group. Commentary preprocessor 1854 directives syntax There is no requirement that any directive be properly formed, according to the preprocessor syntax. However, transla- 124 tion phase preprocessing tokens still need to be created, before they are ignored (as part of translation phase 3). 3 v 1.2 June 24, 2009
5. 6.10.2 Source ﬁle inclusion 1896 Example In the following the #define directive is not well formed. But because this group is being skipped the translator is required to ignore this fact. 1 #if 0 2 #define M(e 3 #endif 1892 Only the ﬁrst group whose control condition evaluates to true (nonzero) is processed. Commentary This group is processed exactly as-if it appeared in the source outside of any group. 1893 If none of the conditions evaluates to true, and there is a #else directive, the group controlled by the #else is processed; Commentary A semantic rule to associate #else with the lexically nearest preceding #if (or similar form) directive, like 1747 else the one given for if statements, is not needed because conditional inclusion is terminated by a #endif binds to near- est if directive. Like the matching #if (or similar form) directive case, all preprocessing tokens in the group are treated as if they appeared outside of any conditional inclusion directive. Processing continues until the ﬁrst #endif is encountered (which must match the opening directive). Coding Guidelines The arguments made for if statements always containing an else arm might be thought to also apply to 1745 else conditional inclusion. However, the presence of a matching #endif directive reduces the likelihood that readers will confuse which preprocessing directive any #else associates with (although other issues, such as lack of indentation or a large number of source lines between directives can make it difﬁcult to visually associate matching directives). 1894 lacking a #else directive, all the groups until the #endif are skipped.144) Commentary 1747 else The affect of this speciﬁcation mimics the behavior of if statements. binds to near- est if 1895 Forward references: macro replacement (6.10.3), source ﬁle inclusion (6.10.2), largest integer types (7.18.1.5). 6.10.2 Source ﬁle inclusion Constraints 1896 A #include directive shall identify a header or source ﬁle that can be processed by the implementation. source ﬁle inclusion Commentary There is no requirement that a header be represented using a source ﬁle. It could be represented using prebuilt 2018 footnote 153 information within the translator that is enabled only when the appropriate #include directive is encountered during preprocessing (but not in a group that is skipped). Also there is no requirement that the spelling of the header in the C source ﬁle be represented by a source ﬁle of the same spelling. The C Standard has no explicit knowledge of ﬁle systems and is silent on the issue of directory structures. Minimum required limits 1909 #include on the implementation processing of a header name are speciﬁed elsewhere. mapping to host ﬁle Failure to locate a header or source ﬁle that can be processed by the implementation (e.g., a ﬁle of the speciﬁed name does not exist, at least along the places searched) is a constraint violation. June 24, 2009 v 1.2
7. 6.10.2 Source ﬁle inclusion 1897 × 1,000 × × Translation units × × 100 × × × × × × × × × 10 × × × × × × 1 0 5 10 15 20 Unnecessary headers #include’d Figure 1896.2: Number of preprocessing translation units (excluding system headers) containing a given number of #includes whose contents are not referenced during translation (excludes the case where the same header is #included more than once, see Figure 1896.1). Based on the translated form of this book’s benchmark programs. 1,000 Source ﬁles 100 10 "header" 1 0 10 20 30 40 50 60 #includes Figure 1896.3: Number of .c source ﬁles containing a given number of #include directives (dashed lines represent number of unique headers). Based on the visible form of the .c ﬁles. Experience suggests that once a #include directive appears in a source ﬁle it is rarely removed (see Figure 1896.2) and that new #include directives are simply added after the last one. The issue of redundant code is discussed elsewhere. 190 redundant code There does not appear to be a worthwhile beneﬁt in ordering #include directives in any way (apart from any relative ordering dictated by dependencies between headers). Table 1896.1: Occurrence of two forms of header-names (as a percentage of all #include directives), the percentage of each kind that speciﬁes a path to the header ﬁle, and number of absolute paths speciﬁed. Based on the visible form of the .c ﬁles. Header Form % Occurrence % Uses Path Number Absolute Paths 75.0 86.4 0 "q-char-sequence" 25.0 17.2 0 Semantics 1897 A preprocessing directive of the form #include h-char-sequence # include new-line June 24, 2009 v 1.2
8. 1897 6.10.2 Source ﬁle inclusion × × × ×× × Occurrences of header name ×××××× × × 1,000 ××× ××× • "header" • ××× ×××× • ××× ××× ×××× • ×× ×××× ××× ×× • •• ××× ××× ×× ×× • • •••••••• ×× ×× ×× 100 •••••••• ×× ×× ×× ×× •••••••••• •••• • ×××× ×× ×× ×× ••••• ×××××× ••••• ••••• ×× ×× × •••••• ××××× •••••• ××××× ••••• •••••• ×××××× ••••••• ×××× × •••••• ××××× •••••• ××× 10 •••••×•××× •••••••××× × •••••××× • •••××× ×× •••••×× ×× ••••×× ×××× × ••••×× ••••× •••× ••××× ×× ×• •×× •×× •••• •••• ×••••• ×•×•• ××• •× ×• × 1 ×××•••• ×××•••• ×•••× ×××× ×•× •×× 1 10 100 1000 Rank Figure 1896.4: header-name rank (based on character sequences appearing in #include directives) plotted against the number of occurrences of each character sequence. Also see Figure 792.26. Fitting a power law using MLE for and "header-name" gives respective an exponent of -2.26, xmin = 8, and -1.8, xmin = 9. Based on the visible form of the .c ﬁles. searches a sequence of implementation-deﬁned places for a header identiﬁed uniquely by the speciﬁed sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. Commentary File systems invariably provide a unique method of identifying every ﬁle they contain (e.g., a full path name). The base document recognized the disadvantages of requiring that the full path name be speciﬁed in each #include directive and permitted a substring of it to be given. The implementation-deﬁned places are header name 918 usually additional character sequences (e.g., directory names) added to the h-char-sequence in an attempt syntax to create a full path name that refers to an existing ﬁle. Rationale The ﬁle search rules used for the ﬁlename in the #include directive were left as implementation-deﬁned. The Standard intends that the rules which are eventually provided by the implementor correspond as closely as possible to the original K&R rules. The primary reason that explicit rules were not included in the Standard is the infeasibility of describing a portable ﬁle system structure. It was considered unacceptable to include UNIX-like directory rules due to signiﬁcant differences between this structure and other popular commercial ﬁle system structures. Nested include ﬁles raise an issue of interpreting the ﬁle search rules. In UNIX C a #include directive found within an included ﬁle entails a search for the named ﬁle relative to the ﬁle system directory that holds the outer #include. Other implementations, including the earlier UNIX C described in K&R, always search relative to the same current directory. The C89 Committee decided in principle in favor of K&R approach, but was unable to provide explicit search rules as explained above. Other Languages Other languages (or an extension provided by their implementations) commonly use the double-quote delimited form. Common Implementations The character sequence between the < and > delimiters is invariably treated as the name of a ﬁle, possibly in- #include 1909 mapping to host ﬁle cluding a path. The ordering of the search sequence used for directives having the form is often different from that used for the form "q-char-sequence". For instance, in the case the contents of /usr/include might be searched ﬁrst, followed by the contents of the directory con- taining the .c ﬁle, while in "q-char-sequence" case the contents of the directory containing the .c ﬁle might be searched ﬁrst, followed by other places. v 1.2 June 24, 2009
9. 6.10.2 Source ﬁle inclusion 1897 The environment in which a translator executes may also affect the sequence of places that are searched. For instance, the affect of relative path names (e.g., ../proj/abc.h) on the identity of the current directory. gcc searches two directories, /usr/include and another directory that holds very machine speciﬁc ﬁles, such as stdarg.h (e.g., /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/include on your au- thors computer). gcc supports the #include_next directive. This directive causes the search algorithm to skip some of the initial implementation-deﬁned places that would normally be searched. The initial places that are skipped are those that were searched in locating the ﬁle containing the #include_next directive (including the place where the search succeeded). Tzerpos and Holt[1416] describe a well-formedness theory of header inclusion that enables unnecessary #include directives to be deduced. Coding Guidelines The standard does not specify the order in which the implementation-deﬁned places are searched. This is a potential coding guideline issue because it is possible that a h-char-sequence will match in more than one of the places (i.e., there is a ﬁle having the same name along several of the different possible search paths). The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different) on the order in which the places are searched. Experience suggests that the affect of a translator locating an #included ﬁle different from the one expected to be located by the developer has one of two consequences— (1) when the contents of the ﬁle accessed is similar to the one intended (e.g., a different version of the intended ﬁle) the source ﬁle may be successfully translated, and (2) when the contents of the ﬁle accessed has no connection with the intended ﬁle the source is rarely successfully translated. The problem might therefore be considered to be one of version management, rather than the choice of characters used in a h-char-sequence. There are a number of reasons why a solution to this issue is to not use h-char-sequences at all, including the following: • For the < > delimited form, implementations usually look in a predeﬁned location ﬁrst (as described in 1898 #include the Common implementation section above and in the following C sentence). places to search for Ensuring that the names chosen by developers for the headers they create are different from those of system headers is an almost impossible task. While it might be possible to enumerate the set of names of existing ﬁle names of system headers contained in commercially important environments, members are likely to be added to this set on a regular basis. Rather than trying to avoid using ﬁle names likely to match those of system headers, developers could ensure that places containing system headers are searched last. • The < > delimited form is often considered to denote externally supplied headers (e.g., provided by the implementation or translator environment vendor). What constitutes a system supplied header is open to interpretation. One distinction that can be made between system and developer headers is that developers do not control of the contents of system headers. Consequently, it can be argued that their contents are not subject to coding guidelines. Headers whose contents have been written by developers are subject to coding guidelines. The convention generally adopted to indicate this status is to use the double-quote character delimit form of #include. Rev 1897.1 Developer written headers in a #include directive shall not be delimited by the < and > characters. Developers sometimes specify full path names in headers (see Table 1896.1). This is a conﬁguration management issue and is not considered to be within the scope these coding guidelines. June 24, 2009 v 1.2
10. 1899 6.10.2 Source ﬁle inclusion Table 1897.1: Number of various kinds of identiﬁers declared in the headers contained in the /usr/include directory of some translation environments. Information was automatically extracted and represents an approximate lower bound. Versions of the translation environments from approximately the same year (mid 1990s) were used. The counts for ISO C assumes that the minimum set of required identiﬁers are declared and excludes the type generic macros. Information Linux 2.0 AIX on RS/6000 HP/UX 9 SunOS 4 Solaris 2 ISO C Number of headers 2,006 1,514 1,264 987 1,495 24 macro deﬁnitions 10,252 18,637 13,314 11,987 10,903 446 identiﬁers with external linkage 1,672 1,542 1,935 616 1,281 487 identiﬁers with internal linkage 80 34 2012 0 5 0 tag declaration 716 1,088 899 1,208 945 3 typedef name declared 1,024 828 15 493 1,027 55 #include How the places are speciﬁed or the header identiﬁed is implementation-deﬁned. 1898 places to search for Commentary The differences between the environments in which translation occurs has narrowed over the years. However, even although there may be much common practice, such are issues are considered to be outside the scope of program 10 transformation mechanism the C Standard. Common Implementations Implementations invariably search one or more predeﬁned locations ﬁrst (e.g., /usr/include), followed by a list of alternative places. A number of techniques are used to allow developers to specify a list of alternative places to be searched for ﬁles corresponding to the headers speciﬁed in a #include directive. For instance, the alternative places may be speciﬁed via a translator command line option (e.g., -I), in a translator conﬁguration ﬁle (e.g., gcc version 2.91.66 hosted on RedHat Linux reads many default locations from the ﬁle /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs, although the path /usr/include is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows based translators use INCLUDE). The directory separator used in Unix and MS-DOS slants in different directions. Many implementations, in both environments, recognize both characters as directory delimiters. One consequence of this is that escape sequences are not recognized as such (something that is unlikely to be a problem in header names). The RISCOS environment does not support ﬁlenames ending in .h. The implementation-deﬁned behavior for this host is to look in a directory called h, for a ﬁle of the given name with the .h removed. Coding Guidelines The implementation-deﬁned behavior associated with how the places are speciﬁed occurs outside of the source code and is the remit of conﬁguration management guidelines. For this reason nothing further is said here. #include A preprocessing directive of the form 1899 q-char-sequence # include "q-char-sequence" new-line causes the replacement of that directive by the entire contents of the source ﬁle identiﬁed by the speciﬁed sequence between the " delimiters. Commentary The commonly accepted intent of this form of the #include directive is that it is used to reference source ﬁles created by developers (i.e., headers that are not provided as part of the implementation or host environment). The only syntactic difference between q-char-sequence and h-char-sequence is that neither sequence header name 918 may contain their respective delimiters. syntax Most q-char-sequences end with one of two character sequences (i.e., .c or .h). The character sequences before these sufﬁxes is often called the header name. v 1.2 June 24, 2009
11. 6.10.2 Source ﬁle inclusion 1901 Other Languages The use of double-quote as the delimiter is the almost universal form used in other languages (although some use the ’ character because that is what is used to delimit string literals). Coding Guidelines The term commonly used to refer to these source ﬁles is header. The context of the conversation often being used to distinguish any other intended usage. The intent is that the contents of these source ﬁles is controlled by developers and as such they are subject to coding guidelines. 1900 The named source ﬁle is searched for in an implementation-deﬁned manner. Commentary While this “implementation-deﬁned manner” might be the same as that for the < > delimited form. The intent is for it to be sufﬁciently different that developers do not need to be concerned about the name of a header created by them matching one provided as part of the implementation (and therefore potentially found by the translator when searching for a matching header). For instance, your author does not know the names of most of the 304 ﬁles (e.g., compface.h) contained in /usr/include on his software development computer. 1897 #include The discussion on the < > delimited form is applicable here. h-char-sequence Common Implementations The search algorithm used invariably differs from that used for the < > delimited form (otherwise there would be little point in distinguishing the two cases). The search algorithm used by some implementations is to ﬁrst look in the directory containing the source ﬁle currently being translated (which may itself have been included). If that search fails, and the current source ﬁle has itself been included, the directory containing the source ﬁle that #include it is then searched. This process continuing back through any nested #include directives. For instance, in: file_1.c 1 #include "abc.h" file_2.c 1 #include "/foo/file_1.c" file_3.c 1 #include "/another/path/file_2.c" (assuming the translation environment supports the path names used), translating the source ﬁle file_3.c causes file_2.c to be included, which in turn includes file_3.c. The source ﬁle abc.h will be searched for in the directories /foo, /another/path and then the directory containing file_3.c. Some implementations use the double-quote delimited form within their system headers, to change the default ﬁrst location that is searched. For instance, a third-party API may contain the header abc.h, which in turn needs to include ayx.h. Using the form "ayx.h" means that the implementation will search in the directory containing abc.h ﬁrst, not /usr/include. This usage can help localize the ﬁles that belong to speciﬁc APIs. Other implementations use a search algorithm that starts with the directory containing the original source ﬁle being translated. If the source ﬁle is not found after these places have been searched, some implementations then search 1898 #include other places speciﬁed via any translator options. Other implementations simply follow the behavior described places to search for by the following C sentence (which has the consequence of eventually checking these other places). 1901 If this search is not supported, or if the search fails, the directive is reprocessed as if it read # include new-line with the identical contained sequence (including > characters, if any) from the original directive. June 24, 2009 v 1.2
12. 1908 6.10.2 Source ﬁle inclusion Commentary The previous search can fail in the sense that it does not ﬁnd a matching source ﬁle. Some existing code uses the double-quote delimited form of #include directive to include headers provided by the implementation (rather than the < > delimited form). This requirement ensures that such code continues to be conforming. footnote 144) As indicated by the syntax, a preprocessing token shall not follow a #else or #endif directive before the 1902 144 terminating new-line character. Commentary Saying in words what is speciﬁed in the syntax. Common Implementations Many early implementations (and some present days ones, for compatibility with existing source) treated any sequence of characters following one of these directives as a comment, e.g., #endif X == 1. However, comments may appear anywhere in a source ﬁle, including within a preprocessing directive. 1903 Commentary comment 126 A comment is replaced by a single space character prior to preprocessing. replaced by space preprocess- 1858 ing directive ended by A preprocessing directive of the form 1904 # include pp-tokens new-line (that does not match one of the two previous forms) is permitted. Commentary #include 1914 This form permits the < > or double-quote delimited forms to be generated via macro expansion. However, it example 2 is rarely used (11 instances in over 60,000 #include directives in the visible source of the .c ﬁles). Whether this is because developers are unaware of its existence, or because it has little utility is not known. #include The preprocessing tokens after include in the directive are processed just as in normal text. (Each identiﬁer 1905 macros expanded currently deﬁned as a macro name is replaced by its replacement list of preprocessing tokens.) Commentary To be exact, the preprocessing tokens after include in the directive up to the ﬁrst new-line character are processed just as in normal text. (Each identiﬁer currently deﬁned as a macro name is replaced by its replacement list of preprocessing tokens.) 1906 Commentary This C sentence provides explicitly clariﬁcation that macro replacement occurs in this case (the same #line 1991 clariﬁcation is also given elsewhere). macros expanded The directive resulting after all replacements shall match one of the two previous forms.145) 1907 Commentary It is not a violation of syntax if the directive does not match one of the two previous forms, because the syntax of this form has been matched. It is a violation of semantics and therefore the behavior is undeﬁned. The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a 1908 pair of " characters is combined into a single header name preprocessing token is implementation-deﬁned. v 1.2 June 24, 2009
13. 6.10.2 Source ﬁle inclusion 1909 Commentary This implementation-deﬁned behavior may take a number of forms, including: • The ## operator can be used to glue preprocessing tokens together. However, the behavior is undeﬁned 1958 ## operator ## if the resulting character sequence is not a valid preprocessing token. For instance, the ﬁve preprocess- 1963 valid not if result ing tokens {{} {string} {.} {h} {}} cannot be glued together to form a valid preprocessing token without going through intermediate stages whose behavior is undeﬁned. • Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a string preprocessing token) need not depend on any implementation-deﬁned behavior. The stringize 1950 # operator can be used to create a string preprocessing token. operator • Other implementation-deﬁned behaviors might include the handling of space characters. For instance, in the following: 1 #define bra < 2 #define ket > 3 #include bra stdio.h ket does the implementation strip off the space character at the ends of the delimited character sequence? Coding Guidelines Given the rarity of use of this form of #include no guideline recommendations are given here. Example 1 #define mk_sys_hdr(name) < ## name ## > 2 3 #if BUG_FIX 4 #define VERSION 2a /* works because pp-numbers include alphabetics */ 5 #else 6 #define VERSION 2 7 #endif 8 9 #define add_quotes(a) # a 10 #define mk_str(str, ver) add_quotes(str ## ver) 11 12 #include mk_str(Version, VERSION) 1909 The implementation shall provide unique mappings for sequences consisting of one or more letters or digits #include mapping (as deﬁned in 5.2.1) nondigits or digits (6.4.2.1) followed by a period (.) and a single letter nondigit. to host ﬁle Commentary This C sentence and the following ones in this C paragraph are a speciﬁcation of the minimum set of requirements that an implementation must meet. For sequences outside of this set the implementation mapping may be non-unique (like, for instance, the Microsoft Windows technique of mapping ﬁles ending in .html to .htm). The handling of character sequences that resemble UCNs may also differ, e.g., "\ubada\file.txt" (Ubada is a city in Tanzania and BADA is the Hangul symbol in ISO 10646). The standard does not permit any number of period characters because many operating systems do not permit them (at least one, RISCOS, does not permit any). The wording was changed by the response to DR #302 to extend the speciﬁcation to be more consistent with C++. C++ 16.2p5 June 24, 2009 v 1.2
14. 1911 6.10.2 Source ﬁle inclusion The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed by a period (.) and a single nondigit. Other Languages Other languages either speciﬁed to operate within the same operating systems and ﬁle systems limitations as C and as such have to deal with the same issues, or require an integrated development environment to be created before they can be used. Common Implementations Implementations invariably pass the sequence of characters that appear between the delimiters (when searching other places a directory path may be added) as an argument in a call to fopen or equivalent system function. The called library function will eventually call some host operating system function that interfaces to the host ﬁle system. The C translator’s behavior is thus controlled by the characteristics of the host ﬁle system and how it maps character sequences to ﬁle names. The handling of the period character varies between ﬁle systems, known behaviors include: • Unix based ﬁle systems permit more than one period in a ﬁle name. • MS-DOS based ﬁle systems only permit a single period in a ﬁle name. • RISCOS, an operating system for the Acorn ARM processor does not support ﬁlenames that contain a period. For this host ﬁle names, that contained a period, speciﬁed in a #include directive were mapped using a directory structure. All ﬁle names ending in the characters .h were searched for in a directory called h. Coding Guidelines Because an implementation is not required to provide a unique mapping for all sequences it is possible that an unintended header or source ﬁle will be accessed, or the translator will fail to identify a known header or #include 1897 source ﬁle. The possible consequences of an unintended access are discussed elsewhere, while failure to h-char-sequence source ﬁle 1896 identify known header or source ﬁle will cause a diagnostic to be issued. The cost/beneﬁt issues associated inclusion with using character sequences having a unique mapping in the different environments that the source may be translated in is outside the scope of these coding guidelines. The ﬁrst character shall be a letter not be a digit. 1910 Commentary This requirement only applies to the ﬁrst character of the sequence that implementations are required to provide a unique mapping for. The wording was changed by the response to DR #302. C90 The requirement that the ﬁrst character not be a digit is new in C99. Given that it is more restrictive than that required for existing C90 implementations (and thus existing code) it is unlikely that existing code will be affected by this requirement. C++ This requirement is new in C99 and is not speciﬁed in the C++ Standard (the argument given in the C90 subsection (above) also applies to C++). Common Implementations Most implementations support a ﬁrst character that is not a letter. header name The implementation may ignore the distinctions of alphabetical case and restrict the mapping to eight signiﬁcant 1911 signiﬁcant charac- ters characters before the period. v 1.2 June 24, 2009
15. 6.10.2 Source ﬁle inclusion 1914 Commentary These permissions reﬂect known characteristics of ﬁle systems in which translators are executed. C90 The limit speciﬁed by the C90 Standard was six signiﬁcant characters. However, implementations invariably used the number of signiﬁcant characters available in the host ﬁle system (i.e., they do not artiﬁcially limit the number of signiﬁcant characters). It is unlikely that a header of source ﬁle will fail to be identiﬁed because of a difference in what used to be a non-signiﬁcant character. C++ The C++ Standard does not give implementations any permissions to restrict the number of signiﬁcant characters before the period (16.1p5). However, the limits of the ﬁle system used during translation are likely to be the same for both C and C++ implementations and consequently no difference is listed here. Common Implementations All ﬁle systems place some limits on the number of characters in a source ﬁle name— for instance: • Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict the mapping to eight signiﬁcant characters before any period (and a maximum of three after it). • POSIX requires that at least 14 characters be signiﬁcant in a ﬁle name (it also requires implementations to support at least 255 characters in a pathname). Many Linux ﬁle systems support up to 255 characters in a ﬁlename and 4095 characters in a pathname. Coding Guidelines The potential problems associated with limits on sequences characters that are likely to be treated as unique is a conﬁguration management issue that is outside the scope of these coding guidelines. 1912 A #include preprocessing directive may appear in a source ﬁle that has been read because of a #include directive in another ﬁle, up to an implementation-deﬁned nesting limit (see 5.2.4.1). Commentary Thus #include directives can be nested within source ﬁles whose contents have themselves been #included. 295 limit This issue is discussed elsewhere. While this permission only applies to source ﬁles, an implementation #include nest- ing using some form of precompiled headers (which are not source ﬁles within the standard’s deﬁnition of the 121 header precompiled term) that did not support this functionality would not be popular with developers. 108 source ﬁles 1913 EXAMPLE 1 The most common uses of #include preprocessing directives are as in the following: #include #include "myprog.h" Other Languages Some languages only have a single form of #include directive for all headers. 1914 EXAMPLE 2 This illustrates macro-replaced #include directives: #include example 2 #if VERSION == 1 #define INCFILE "vers1.h" #elif VERSION == 2 #define INCFILE "vers2.h" // and so on #else #define INCFILE "versN.h" #endif #include INCFILE June 24, 2009 v 1.2
16. 1919 6.10.3 Macro replacement Commentary This example does not illustrate any beneﬁt compared to that obtained from placing separate #include directives in each arm of the conditional inclusion directive. Forward references: macro replacement (6.10.3). 1915 footnote 145) Note that adjacent string literals are not concatenated into a single string literal (see the translation 1916 145 phases in 5.1.1.2); Commentary transla- 135 tion phase String concatenation occurs in translation phase 6 and so it is not possible to join together two existing strings 6 to form another string within a #include directive. thus, an expansion that results in two string literals is an invalid directive. 1917 Commentary It is an invalid directive in that it violates a semantic requirement and thus the behavior is undeﬁned. It is not a syntax violation. 6.10.3 Macro replacement macro replace-Constraints ment replacement list Two replacement lists are identical if and only if the preprocessing tokens in both have the same number, 1918 identical if ordering, spelling, and white-space separation, where all white-space separations are considered identical. Commentary internal 282 This is actually a deﬁnition in a Constraints clause (it is used by two constraints in this C subsection). identiﬁer The check against same spelling only needs to take into account the signiﬁcant characters of an identiﬁer. signiﬁcant characters Considering all white-space separations to be identical removes the need for developers to be concerned about use of different source layout (e.g., indentation) and method of spacing (e.g., space character vs. horizontal tab). Rationale The speciﬁcation of macro deﬁnition and replacement in the Standard was based on these principles: • Interfere with existing code as little as possible. • Keep the preprocessing model simple and uniform. • Allow macros to be used wherever functions can be. • Deﬁne macro expansion such that it produces the same token sequence whether the macro calls appear in open text, in macro arguments, or in macro deﬁnitions. Preprocessing is speciﬁed in such a way that it can be implemented either as a separate text-to-text prepass or as a token-oriented portion of the compiler itself. Thus, the preprocessing grammar is speciﬁed in terms of tokens. object-like An identiﬁer currently deﬁned as an object-like macro shall not be redeﬁned by another #define preprocessing 1919 macro redeﬁni- directive unless the second deﬁnition is an object-like macro deﬁnition and the two replacement lists are tion identical. v 1.2 June 24, 2009
17. 6.10.3 Macro replacement 1921 Commentary There was an existing body of code, containing redeﬁnitions of the same macro, when the C Standard was ﬁrst written. The C committee did not want to specify that existing code containing such usage was non-conforming, but they did consider the case where the bodies of any subsequent deﬁnitions differed to be an erroneous usage. 1983 EXAMPLE macro redeﬁnition C90 The wording in the C90 Standard was modiﬁed by the response to DR #089. Common Implementations Some translators permit multiple deﬁnitions of a macro, independently of the contents of the contents of the #deﬁne/#undef stack bodies. The behavior is for a new deﬁnition to cause the previous body to be pushed, in a stack-like fashion. Any subsequent #undef of the macro name popping this stacked deﬁnition and to make it the current one. Coding Guidelines C permits more than one deﬁnition of the same macro name, with the same body, and more than one external deﬁnition of the same object, with the same type and the coding guideline issues are the same for both (in 420 linkage 422.1 identiﬁer both cases translators are not always required to issue a diagnostic if the deﬁnitions are considered to be declared in one ﬁle different). In both cases a technique for avoiding duplicate deﬁnitions, during translation but not in the visible source, is to bracket deﬁnitions with #ifndef MACRO_NAME/#endif (in the case of the ﬁle scope object a macro name needs to be created and associated with its declaration). Using this technique has the disadvantage that it prevents the translator checking that any subsequent redeclarations of an identiﬁer are the same (unless the bracketing occurs around the only textual declaration that occurs in any source ﬁle used to build a program). 1920 Likewise, an identiﬁer currently deﬁned as a function-like macro shall not be redeﬁned by another #define function-like macro redeﬁnition preprocessing directive unless the second deﬁnition is a function-like macro deﬁnition that has the same number and spelling of parameters, and the two replacement lists are identical. Commentary 1919 object-like The issues are the same as for object-like macros, with the addition of checks on the parameters. Requiring macro redeﬁnition that the parameters be spelled the same, rather than, for instance, that they have an identical effect, simpliﬁes the similarity checking of two macro bodies. For instance, in: 1 #define FM(foo) ((foo) + x) 2 #define FM(bar) ((bar) + x) a translator is not required to deduce that the two deﬁnitions of FM are structurally identical. 1921 There shall be white-space between the identiﬁer and the replacement list in the deﬁnition of an object-like macro. Commentary In the following (assuming $is a member of the extended character set and permitted in an identiﬁer 216 extended character set preprocessing token): 1 #define A$ x an object-like macro with the name A$and the body x is deﬁned, not macro with the name A and the body$ x. There is no requirement that there be white-space following the ) in a function-like macro deﬁnition. C90 The response to DR #027 added the following requirements to the C90 Standard. DR #027 June 24, 2009 v 1.2
18. 1922 6.10.3 Macro replacement Correction Add to subclause 6.8, page 86 (Constraints): In the deﬁnition of an object-like macro, if the ﬁrst character of a replacement list is not a character required by subclause 5.2.1, then there shall be white-space separation between the identiﬁer and the replacement list.* [Footnote *: This allows an implementation to choose to interpret the directive: #define THIS$AND$THAT(a, b) ((a) + (b)) as deﬁning a function-like macro THIS$AND$THAT, rather than an object-like macro THIS. Whichever choice it makes, it must also issue a diagnostic.] However, the complex interaction between this speciﬁcation and UCNs was debated during the C9X review process and it was decided to simplify the requirements to the current C99 form. 1 #define TEN.1 /* Define the macro TEN to have the body .1 in C90. */ 2 /* A constraint violation in C99. */ C++ The C++ Standard speciﬁes the same behavior as the C90 Standard. Common Implementations HP–was DEC– treats \$ as part of the spelling of the macro name. If the identiﬁer-list in the macro deﬁnition does not end with an ellipsis, the number of arguments (including 1922 those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal the number of parameters in the macro deﬁnition. Commentary function call 998 arguments agree with parameters This requirement is the macro invocation equivalent of the one for function calls. C90 If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undeﬁned. The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259 (no committee response was felt necessary). 1 #define foo() A 2 #define bar(B) B 3 4 foo() // no arguments 5 bar() // one empty argument? What was undeﬁned behavior in C90 (an empty argument) is now explicitly supported in C99. The two most likely C90 translator undeﬁned behaviors are either to support them (existing source developed using such a translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source developed using such a translator will not contain any empty arguments in a macro invocation). C++ The C++ Standard contains the same wording as the C90 Standard. C++ translators are not required to correctly process source containing macro invocations having any empty arguments. v 1.2 June 24, 2009
19. 6.10.3 Macro replacement 1925 Common Implementations Some C90 implementations (e.g., gcc) treated empty arguments as an argument containing no preprocessing tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a constraint violation). 1923 Otherwise, there shall be more arguments in the invocation than there are parameters in the macro deﬁnition ... arguments macro (excluding the ...). Commentary There must be at least one argument to match the ellipsis. This requirement avoids the problems that occur Rationale when the trailing arguments are included in a list of arguments to another macro or function. For example, if dprintf had been deﬁned as #define dprintf(format,...) \ dfprintf(stderr, format, __VA_ARGS__) and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded form. While some implementations have used various notations or conventions to work around this problem, the Committee felt it better to avoid the problem altogether. C90 Support for the form ... is new in C99. C++ Support for the form ... is new in C99 and is not speciﬁed in the C++ Standard. Common Implementations gcc allowed zero arguments to match a macro parameter deﬁned using the ... form. Coding Guidelines While some developers may be confused because the requirements on the number of arguments are different from functions deﬁned using the ellipsis notation, passing too few arguments is a constraint violation (i.e., translators are required to issue a diagnostic that a developer then needs to correct). 1924 There shall exist a ) preprocessing token that terminates the invocation. macro invocation ) terminates it Commentary While this requirement is speciﬁed in the syntax, it is interpreted as requiring the ) preprocessing token to occur before any macro replacement of the identiﬁers following the matching ( preprocessing token. For instance, in: 1 #define R_PAREN ) 2 3 #define FUNC(a) a 4 5 static int glob = (1 + FUNC(1 R_PAREN ); the invocation is terminated by the ) preprocessing token that occurs immediately before ;, not the expanded form of R_PAREN. 1925 The identiﬁer _ _VA_ARGS_ _ shall occur only in the replacement-list of a function-like macro that uses the ellipsis notation in the argumentsparameters. June 24, 2009 v 1.2
20. 1928 6.10.3 Macro replacement Commentary This requirement simpliﬁes a translators processing of occurrences of the identiﬁer _ _VA_ARGS_ _. This typographical correction was made by the response to DR #234. C90 Support for _ _VA_ARGS_ _ is new in C99. Source code declaring an identiﬁer with the spelling _ _VA_ARGS_ _ will cause a C99 translator to issue a diagnostic (the behavior was undeﬁned in C90). C++ Support for _ _VA_ARGS_ _ is new in C99 and is not speciﬁed in the C++ Standard. Common Implementations gcc required developers to give a name to the parameter that accepted a variable number of arguments. This parameter name appeared in the replacement list wherever the variable number of arguments were to be substituted. Example 1 /* 2 * The following are constraint violations. 3 */ 4 #define __VA_ARGS__ 5 #define jparks __VA_ARGS__ 6 #define jparks(__VA_ARGS__) 7 #define jparks(__VA_ARGS__, ...) __VA_ARGS__ 8 9 #define jparks(x) x 10 jparks(__VA_ARGS__) 11 12 #define jparks(x, ...) x 13 jparks(__VA_ARGS__,1) 14 /* 15 * The following break the spirit, if not the wording 16 * of this constraint. 17 */ 18 #define jparks(x, y) x##y 19 jparks(__VA, _ARGS__) 20 21 #define jparks(x, y, ...) x##y 22 jparks(__VA, _ARGS__, 1) macro parameter A parameter identiﬁer in a function-like macro shall be uniquely declared within its scope. 1926 unique in scope Commentary declaration 1350 only one if no linkage This constraint is the macro equivalent of the one given for objects with no linkage. Its scope is the list macro pa- 1934 of parameters in the macro deﬁnition and the body of that deﬁnition. This scope ends at the new-line that rameter scope extends terminates the directive. Macro parameters are also discussed elsewhere. identiﬁer 396 macro parameter Semantics macro name The identiﬁer immediately following the define is called the macro name. 1927 identiﬁer Commentary This deﬁnes the term macro name. This term is generically used in software engineering to refer to this kind of entity. v 1.2 June 24, 2009