The New C Standard- P16

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:112

0
48
lượt xem
4
download

The New C Standard- P16

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Tham khảo tài liệu 'the new c standard- p16', công nghệ thông tin, kỹ thuật lập trình phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:
Lưu

Nội dung Text: The New C Standard- P16

  1. 6.10.1 Conditional inclusion 1883 • The specification has changed between C90 and C99. The problem with any guideline recommendation is that the total cost is likely to be greater than the total benefit (a cost is likely to be incurred in many cases and a benefit obtained in very few cases). For this reason 835 integer no recommendation is made here. The discussion on suffixed integer constants is also applicable in the constant type first in list context of a conditional inclusion directive. Example In the following the developer may assume that unwanted higher bits in the value of C will be truncated when shifted left. 1 #define C 0x1100u 2 #define INT_BITS 32 3 4 #define TOP_BYTE (C
  2. 1888 6.10.1 Conditional inclusion Commentary basic char- 478 acter set The guarantee on the value being nonnegative does not apply during preprocessing. For instance, a pre- positive if stored in char object processing using the EBCDIC character set and acting as if the type char was signed. In other contexts character 885 the value of a character constant containing a single-character that is not a member of the basic execution constant character set is implementation-defined. more than one character Coding Guidelines character 885 constant The discussion on the possibility of character constants having other implementation-defined values is more than one character applicable here. #ifdef Preprocessing directives of the forms 1884 #ifndef # ifdef identifier new-line groupopt # ifndef identifier new-line groupopt check whether the identifier is or is not currently defined as a macro name. Commentary There is no #elifdef form (although over half of the uses of the #elif directive are followed by a single instance of the defined operator— Table 1872.1). Their conditions are equivalent to #if defined identifier and #if !defined identifier respectively. 1885 Commentary The #ifdef and #ifndef forms are rather like the unary ++ and -- operators in that they provide a short hand notation for commonly used functionality. Coding Guidelines The #ifdef forms are the most common form of conditional inclusion directive. Measurements (see Table 1872.1) also show that nearly a third of the uses of the defined operator could be replaced by one of these forms. There are advantages (e.g., most common form suggests most practiced form for readers, and ease of visual scanning down the left edge of the source) and disadvantages (e.g., requires more effort to add additional conditions to the single test being made) to using the #ifdef forms, instead of the defined operator. However, there does not appear to be a worthwhile cost/benefit to recommending one of the possibilities. 142) Thus on an implementation where INT_MAX is 0x7FFF and UINT_MAX is 0xFFFF, the constant 0x8000 1886 is signed and positive within a #if expression even though it is unsigned in translation phase 7. Commentary The wording was changed by the response to DR #265. footnote 143) Thus, the constant expression in the following #if directive and if statement is not guaranteed to 1887 143 evaluate to the same value in these two contexts. #if ’z’ - ’a’ == 25 if (’z’ - ’a’ == 25) Commentary This situation could occur, for instance, if the Ascii representation were used during the preprocessing phases transla- 133 tion phase and EBCDIC were used during translation phase 5. 5 Each directive’s condition is checked in order. 1888 v 1.2 June 24, 2009
  3. 6.10.1 Conditional inclusion 1890 Commentary The order is from the lowest line number to the highest line number. Coding Guidelines It may be possible to obtain some translation time performance advantage (at least for the original developer) 1739 selection by appropriately ordering the directives. Unlike developer behavior with if statements, developers do not statement syntax usually aim to optimize speed of translation when deciding how to order conditional inclusion directives (experience suggests that developers often simply append new directive to the end of any existing directives). Recognizing a known pattern in a sequence of directives has several benefits for readers. They can make use of any previous deductions they have made on how to interpret the directives and what they represent, and the usage highlights common dependencies in the source. In the following code fragment more reader effort is required to spot similarities in the sequence that directives are checked than if both sequences of directives had occurred in the same order. 1 #ifdef MACHINE_A 2 /* ... */ 3 #else 4 #ifdef MACHINE_B 5 /* ... */ 6 #endif 7 #endif 8 9 #ifdef MACHINE_B 10 /* ... */ 11 #else 12 #ifdef MACHINE_A 13 /* ... */ 14 #endif 15 #endif Given the lack of attention from developers on the relative ordering of directives and the benefits of using the same ordering, where possible, a guideline recommendation appears worthwhile. However, a guideline 0 guideline rec- recommendation needs to be automatically enforceable and determining when two sequences of directives ommendation enforceable have the same affect, during translation, may be infeasible because information that is not contained within the source may be required (e.g., dependencies between macro names that are likely to be defined via translator command line options). Rev 1888.1 Where possible the visual order of evaluation of expressions within different sequences of nested conditional inclusion directives shall be the same. 1889 If it evaluates to false (zero), the group that it controls is skipped: directives are processed only through the name that determines the directive in order to keep track of the level of nested conditionals; Commentary 1744 if statement A parallel can be drawn with the behavior of if statements, in that if their controlling expression evaluates to operand compare against 0 zero, during program execution, any statements in the associated block are skipped. 1890 directives are processed only through the name that determines the directive in order to keep track of the level directive processing of nested conditionals; while skipping Commentary The preprocessor operates on a representation of the source written by the developer, not translated machine code. As such it needs to perform some processing on its input to be able to deduce when to stop skipping. June 24, 2009 v 1.2
  4. 1891 6.10.1 Conditional inclusion × × #if part 1,000 × • #else part •× Translation units • ×× Top level files ו •× × 100 × ×× •× • ×× ×× ×× ×× × • • × ×× • × × •× × •• ×× × ××× • • •• •• × × • × × ×× •× × × × ×× ×× × × × •• • × × × × • ×× × × × ××× × × × • × × × × ×× × × ××× × × ×× × × ×××××× × × × × × × 10 •• • • ×× • × ••• • • • ×× ××××××× × × ×• • × ××× × ×× × × × × ×× • × •× ×× × × × × ××× ×××× × ×× × × ×× × × •• • × × ×× × × • • • × × × ×××× × × × × × ×× × • × • •× × ×× × ×× ×× ×× × × • •• • •• •× × × × × × × ×× × × × ×× × • • × • ו × × × × ××× • • × × × ×× × ×× 1 • •• •• • • • • ••• •• • ×× × ×× × × • ×× ×ו × ×× × •• • ××× × × × × ×× × × × • ××× ×× •• ••• • • • • • × ×× × • × ×× × × × × ×× × × •× × 50 100 150 50 100 150 Physical lines skipped Physical lines skipped Figure 1889.1: Number of top-level source files (i.e., the contents of any included files are not counted) and (right) complete translation units (including the contents of any files #included more than once) having a given number of lines skipped during translation of this book’s benchmark programs. Directives need to be processed to keep track of the level of nesting of conditionals and translation phases transla- 116 tion phase 1–3 still need to be performed (line splicing could affect what is or is not the start of a line) and characters 1 within a comment must not be treated as directives. The intent of only requiring a minimum of directive processing, while skipping, is to enable partially written source code to be skipped and to allow preprocessors to optimize their performance in this special case, speeding up the rate at which the input is processed. Example 1 #if 1 2 extern int ei; 3 4 #elif " an unmatched quote character, undefined behavior 5 6 extern int foo_bar; 7 #endif 8 9 #if 0 10 printf("\ 11 #endif \n"); 12 13 #endif 14 15 #if 0 16 /* 17 #endif 18 */ 19 #endif the rest of the directives’ preprocessing tokens are ignored, as are the other preprocessing tokens in the 1891 group. Commentary preprocessor 1854 directives syntax There is no requirement that any directive be properly formed, according to the preprocessor syntax. However, transla- 124 tion phase preprocessing tokens still need to be created, before they are ignored (as part of translation phase 3). 3 v 1.2 June 24, 2009
  5. 6.10.2 Source file inclusion 1896 Example In the following the #define directive is not well formed. But because this group is being skipped the translator is required to ignore this fact. 1 #if 0 2 #define M(e 3 #endif 1892 Only the first group whose control condition evaluates to true (nonzero) is processed. Commentary This group is processed exactly as-if it appeared in the source outside of any group. 1893 If none of the conditions evaluates to true, and there is a #else directive, the group controlled by the #else is processed; Commentary A semantic rule to associate #else with the lexically nearest preceding #if (or similar form) directive, like 1747 else the one given for if statements, is not needed because conditional inclusion is terminated by a #endif binds to near- est if directive. Like the matching #if (or similar form) directive case, all preprocessing tokens in the group are treated as if they appeared outside of any conditional inclusion directive. Processing continues until the first #endif is encountered (which must match the opening directive). Coding Guidelines The arguments made for if statements always containing an else arm might be thought to also apply to 1745 else conditional inclusion. However, the presence of a matching #endif directive reduces the likelihood that readers will confuse which preprocessing directive any #else associates with (although other issues, such as lack of indentation or a large number of source lines between directives can make it difficult to visually associate matching directives). 1894 lacking a #else directive, all the groups until the #endif are skipped.144) Commentary 1747 else The affect of this specification mimics the behavior of if statements. binds to near- est if 1895 Forward references: macro replacement (6.10.3), source file inclusion (6.10.2), largest integer types (7.18.1.5). 6.10.2 Source file inclusion Constraints 1896 A #include directive shall identify a header or source file that can be processed by the implementation. source file inclusion Commentary There is no requirement that a header be represented using a source file. It could be represented using prebuilt 2018 footnote 153 information within the translator that is enabled only when the appropriate #include directive is encountered during preprocessing (but not in a group that is skipped). Also there is no requirement that the spelling of the header in the C source file be represented by a source file of the same spelling. The C Standard has no explicit knowledge of file systems and is silent on the issue of directory structures. Minimum required limits 1909 #include on the implementation processing of a header name are specified elsewhere. mapping to host file Failure to locate a header or source file that can be processed by the implementation (e.g., a file of the specified name does not exist, at least along the places searched) is a constraint violation. June 24, 2009 v 1.2
  6. 1896 6.10.2 Source file inclusion Other Languages Most languages do not specify a #include mechanism, although many of their implementations provide one. The approach commonly used by C implementations is popular, but not universal. Some languages explicitly state that a #include directive denotes a file of the given name in the translators host environment. Common Implementations For most implementations the header name maps to a file name of the same spelling. It is quite common for the translation environment to ignore the case of alphabetic letters (e.g., MS-DOS and early versions of Microsoft Windows), or to limit the number of significant characters in the file name denoted by a header name (the remaining characters being ignored). Use of the / character in specifying a full path to a file is sufficiently common usage that even host environments where this character is not normally associated with a directory separator support such usage in header names (many Microsoft windows translators support this character, as well as the \ character, as a directory separator). source file 121 In the majority of implementations #include directives specify files containing source in text form. representation header 121 However, some implementations support what are known as precompiled headers. precompiled It is not uncommon (over 10% of #includes in Figure 1896.1) for the same header to be #included more than once when translating a source file (it is a requirement that implementations support this usage for standard headers). The following are some of the techniques implementations use to reduce the overhead of subsequent #includes. • A common convention is to bracket the contents of a header, starting with the preprocessing token sequence #ifndef _ _H_file_name_ _/#define _ _H_file_name_ _ and ending with #endif. The processing of subsequent #includes of the same header is then reduced to the minimal processing needed to skip to the matching #endif. Some implementations (e.g., gcc) go one step further and detect headers that contain such bracketing the first time they are processed, and completely skips opening and processing the header if it is subsequently encountered again in a #include directive. • Support the preprocessing directive #import.[359] This directive is equivalent to the #include directive except that if the specified header has already been included it is not included again. Coding Guidelines Some coding guideline documents recommend that implementation supplied headers appear before developer written headers, in a source file. Such recommendations overlook the possibility that a developer written header might itself #include an implementation header. 100,000 × × × All #includes ∆ ∆ ∆ User #includes 10,000 Number of #includes • × • • Nested user #includes ∆ 1,000 • × ∆ × • ∆ × ∆ 100 • • ∆ × × • ∆ • × ∆ • 10 × ∆ × 1 ∆ × 1 5 10 Times #included Figure 1896.1: Number of times the same header was #included during the translation of a single translation unit. The crosses denote all headers (i.e., all systems headers are counted), triangles denote all headers delimited by quotes (i.e., likely to be user defined headers) and bullets denote all quote delimited headers #include nested at least three levels deep. Based on the translated form of this book’s benchmark programs. v 1.2 June 24, 2009
  7. 6.10.2 Source file inclusion 1897 × 1,000 × × Translation units × × 100 × × × × × × × × × 10 × × × × × × 1 0 5 10 15 20 Unnecessary headers #include’d Figure 1896.2: Number of preprocessing translation units (excluding system headers) containing a given number of #includes whose contents are not referenced during translation (excludes the case where the same header is #included more than once, see Figure 1896.1). Based on the translated form of this book’s benchmark programs. 1,000 Source files 100 10 "header" 1 0 10 20 30 40 50 60 #includes Figure 1896.3: Number of .c source files containing a given number of #include directives (dashed lines represent number of unique headers). Based on the visible form of the .c files. Experience suggests that once a #include directive appears in a source file it is rarely removed (see Figure 1896.2) and that new #include directives are simply added after the last one. The issue of redundant code is discussed elsewhere. 190 redundant code There does not appear to be a worthwhile benefit in ordering #include directives in any way (apart from any relative ordering dictated by dependencies between headers). Table 1896.1: Occurrence of two forms of header-names (as a percentage of all #include directives), the percentage of each kind that specifies a path to the header file, and number of absolute paths specified. Based on the visible form of the .c files. Header Form % Occurrence % Uses Path Number Absolute Paths 75.0 86.4 0 "q-char-sequence" 25.0 17.2 0 Semantics 1897 A preprocessing directive of the form #include h-char-sequence # include new-line June 24, 2009 v 1.2
  8. 1897 6.10.2 Source file inclusion × × × ×× × Occurrences of header name ×××××× × × 1,000 ××× ××× • "header" • ××× ×××× • ××× ××× ×××× • ×× ×××× ××× ×× • •• ××× ××× ×× ×× • • •••••••• ×× ×× ×× 100 •••••••• ×× ×× ×× ×× •••••••••• •••• • ×××× ×× ×× ×× ••••• ×××××× ••••• ••••• ×× ×× × •••••• ××××× •••••• ××××× ••••• •••••• ×××××× ••••••• ×××× × •••••• ××××× •••••• ××× 10 •••••×•××× •••••••××× × •••••××× • •••××× ×× •••••×× ×× ••••×× ×××× × ••••×× ••••× •••× ••××× ×× ×• •×× •×× •••• •••• ו•••• וו• ×ו •× ו × 1 ××ו••• ××ו••• ו••× ×××× ×•× •×× 1 10 100 1000 Rank Figure 1896.4: header-name rank (based on character sequences appearing in #include directives) plotted against the number of occurrences of each character sequence. Also see Figure 792.26. Fitting a power law using MLE for and "header-name" gives respective an exponent of -2.26, xmin = 8, and -1.8, xmin = 9. Based on the visible form of the .c files. searches a sequence of implementation-defined places for a header identified uniquely by the specified sequence between the < and > delimiters, and causes the replacement of that directive by the entire contents of the header. Commentary File systems invariably provide a unique method of identifying every file they contain (e.g., a full path name). The base document recognized the disadvantages of requiring that the full path name be specified in each #include directive and permitted a substring of it to be given. The implementation-defined places are header name 918 usually additional character sequences (e.g., directory names) added to the h-char-sequence in an attempt syntax to create a full path name that refers to an existing file. Rationale The file search rules used for the filename in the #include directive were left as implementation-defined. The Standard intends that the rules which are eventually provided by the implementor correspond as closely as possible to the original K&R rules. The primary reason that explicit rules were not included in the Standard is the infeasibility of describing a portable file system structure. It was considered unacceptable to include UNIX-like directory rules due to significant differences between this structure and other popular commercial file system structures. Nested include files raise an issue of interpreting the file search rules. In UNIX C a #include directive found within an included file entails a search for the named file relative to the file system directory that holds the outer #include. Other implementations, including the earlier UNIX C described in K&R, always search relative to the same current directory. The C89 Committee decided in principle in favor of K&R approach, but was unable to provide explicit search rules as explained above. Other Languages Other languages (or an extension provided by their implementations) commonly use the double-quote delimited form. Common Implementations The character sequence between the < and > delimiters is invariably treated as the name of a file, possibly in- #include 1909 mapping to host file cluding a path. The ordering of the search sequence used for directives having the form is often different from that used for the form "q-char-sequence". For instance, in the case the contents of /usr/include might be searched first, followed by the contents of the directory con- taining the .c file, while in "q-char-sequence" case the contents of the directory containing the .c file might be searched first, followed by other places. v 1.2 June 24, 2009
  9. 6.10.2 Source file inclusion 1897 The environment in which a translator executes may also affect the sequence of places that are searched. For instance, the affect of relative path names (e.g., ../proj/abc.h) on the identity of the current directory. gcc searches two directories, /usr/include and another directory that holds very machine specific files, such as stdarg.h (e.g., /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/include on your au- thors computer). gcc supports the #include_next directive. This directive causes the search algorithm to skip some of the initial implementation-defined places that would normally be searched. The initial places that are skipped are those that were searched in locating the file containing the #include_next directive (including the place where the search succeeded). Tzerpos and Holt[1416] describe a well-formedness theory of header inclusion that enables unnecessary #include directives to be deduced. Coding Guidelines The standard does not specify the order in which the implementation-defined places are searched. This is a potential coding guideline issue because it is possible that a h-char-sequence will match in more than one of the places (i.e., there is a file having the same name along several of the different possible search paths). The behavior is thus dependent (i.e., it is assumed that the contents of the different headers will be different) on the order in which the places are searched. Experience suggests that the affect of a translator locating an #included file different from the one expected to be located by the developer has one of two consequences— (1) when the contents of the file accessed is similar to the one intended (e.g., a different version of the intended file) the source file may be successfully translated, and (2) when the contents of the file accessed has no connection with the intended file the source is rarely successfully translated. The problem might therefore be considered to be one of version management, rather than the choice of characters used in a h-char-sequence. There are a number of reasons why a solution to this issue is to not use h-char-sequences at all, including the following: • For the < > delimited form, implementations usually look in a predefined location first (as described in 1898 #include the Common implementation section above and in the following C sentence). places to search for Ensuring that the names chosen by developers for the headers they create are different from those of system headers is an almost impossible task. While it might be possible to enumerate the set of names of existing file names of system headers contained in commercially important environments, members are likely to be added to this set on a regular basis. Rather than trying to avoid using file names likely to match those of system headers, developers could ensure that places containing system headers are searched last. • The < > delimited form is often considered to denote externally supplied headers (e.g., provided by the implementation or translator environment vendor). What constitutes a system supplied header is open to interpretation. One distinction that can be made between system and developer headers is that developers do not control of the contents of system headers. Consequently, it can be argued that their contents are not subject to coding guidelines. Headers whose contents have been written by developers are subject to coding guidelines. The convention generally adopted to indicate this status is to use the double-quote character delimit form of #include. Rev 1897.1 Developer written headers in a #include directive shall not be delimited by the < and > characters. Developers sometimes specify full path names in headers (see Table 1896.1). This is a configuration management issue and is not considered to be within the scope these coding guidelines. June 24, 2009 v 1.2
  10. 1899 6.10.2 Source file inclusion Table 1897.1: Number of various kinds of identifiers declared in the headers contained in the /usr/include directory of some translation environments. Information was automatically extracted and represents an approximate lower bound. Versions of the translation environments from approximately the same year (mid 1990s) were used. The counts for ISO C assumes that the minimum set of required identifiers are declared and excludes the type generic macros. Information Linux 2.0 AIX on RS/6000 HP/UX 9 SunOS 4 Solaris 2 ISO C Number of headers 2,006 1,514 1,264 987 1,495 24 macro definitions 10,252 18,637 13,314 11,987 10,903 446 identifiers with external linkage 1,672 1,542 1,935 616 1,281 487 identifiers with internal linkage 80 34 2012 0 5 0 tag declaration 716 1,088 899 1,208 945 3 typedef name declared 1,024 828 15 493 1,027 55 #include How the places are specified or the header identified is implementation-defined. 1898 places to search for Commentary The differences between the environments in which translation occurs has narrowed over the years. However, even although there may be much common practice, such are issues are considered to be outside the scope of program 10 transformation mechanism the C Standard. Common Implementations Implementations invariably search one or more predefined locations first (e.g., /usr/include), followed by a list of alternative places. A number of techniques are used to allow developers to specify a list of alternative places to be searched for files corresponding to the headers specified in a #include directive. For instance, the alternative places may be specified via a translator command line option (e.g., -I), in a translator configuration file (e.g., gcc version 2.91.66 hosted on RedHat Linux reads many default locations from the file /usr/lib/gcc-lib/i386-redhat-linux/egcs-2.91.66/specs, although the path /usr/include is still hard coded in the translator sources), or an environment variable (e.g., several Microsoft windows based translators use INCLUDE). The directory separator used in Unix and MS-DOS slants in different directions. Many implementations, in both environments, recognize both characters as directory delimiters. One consequence of this is that escape sequences are not recognized as such (something that is unlikely to be a problem in header names). The RISCOS environment does not support filenames ending in .h. The implementation-defined behavior for this host is to look in a directory called h, for a file of the given name with the .h removed. Coding Guidelines The implementation-defined behavior associated with how the places are specified occurs outside of the source code and is the remit of configuration management guidelines. For this reason nothing further is said here. #include A preprocessing directive of the form 1899 q-char-sequence # include "q-char-sequence" new-line causes the replacement of that directive by the entire contents of the source file identified by the specified sequence between the " delimiters. Commentary The commonly accepted intent of this form of the #include directive is that it is used to reference source files created by developers (i.e., headers that are not provided as part of the implementation or host environment). The only syntactic difference between q-char-sequence and h-char-sequence is that neither sequence header name 918 may contain their respective delimiters. syntax Most q-char-sequences end with one of two character sequences (i.e., .c or .h). The character sequences before these suffixes is often called the header name. v 1.2 June 24, 2009
  11. 6.10.2 Source file inclusion 1901 Other Languages The use of double-quote as the delimiter is the almost universal form used in other languages (although some use the ’ character because that is what is used to delimit string literals). Coding Guidelines The term commonly used to refer to these source files is header. The context of the conversation often being used to distinguish any other intended usage. The intent is that the contents of these source files is controlled by developers and as such they are subject to coding guidelines. 1900 The named source file is searched for in an implementation-defined manner. Commentary While this “implementation-defined manner” might be the same as that for the < > delimited form. The intent is for it to be sufficiently different that developers do not need to be concerned about the name of a header created by them matching one provided as part of the implementation (and therefore potentially found by the translator when searching for a matching header). For instance, your author does not know the names of most of the 304 files (e.g., compface.h) contained in /usr/include on his software development computer. 1897 #include The discussion on the < > delimited form is applicable here. h-char-sequence Common Implementations The search algorithm used invariably differs from that used for the < > delimited form (otherwise there would be little point in distinguishing the two cases). The search algorithm used by some implementations is to first look in the directory containing the source file currently being translated (which may itself have been included). If that search fails, and the current source file has itself been included, the directory containing the source file that #include it is then searched. This process continuing back through any nested #include directives. For instance, in: file_1.c 1 #include "abc.h" file_2.c 1 #include "/foo/file_1.c" file_3.c 1 #include "/another/path/file_2.c" (assuming the translation environment supports the path names used), translating the source file file_3.c causes file_2.c to be included, which in turn includes file_3.c. The source file abc.h will be searched for in the directories /foo, /another/path and then the directory containing file_3.c. Some implementations use the double-quote delimited form within their system headers, to change the default first location that is searched. For instance, a third-party API may contain the header abc.h, which in turn needs to include ayx.h. Using the form "ayx.h" means that the implementation will search in the directory containing abc.h first, not /usr/include. This usage can help localize the files that belong to specific APIs. Other implementations use a search algorithm that starts with the directory containing the original source file being translated. If the source file is not found after these places have been searched, some implementations then search 1898 #include other places specified via any translator options. Other implementations simply follow the behavior described places to search for by the following C sentence (which has the consequence of eventually checking these other places). 1901 If this search is not supported, or if the search fails, the directive is reprocessed as if it read # include new-line with the identical contained sequence (including > characters, if any) from the original directive. June 24, 2009 v 1.2
  12. 1908 6.10.2 Source file inclusion Commentary The previous search can fail in the sense that it does not find a matching source file. Some existing code uses the double-quote delimited form of #include directive to include headers provided by the implementation (rather than the < > delimited form). This requirement ensures that such code continues to be conforming. footnote 144) As indicated by the syntax, a preprocessing token shall not follow a #else or #endif directive before the 1902 144 terminating new-line character. Commentary Saying in words what is specified in the syntax. Common Implementations Many early implementations (and some present days ones, for compatibility with existing source) treated any sequence of characters following one of these directives as a comment, e.g., #endif X == 1. However, comments may appear anywhere in a source file, including within a preprocessing directive. 1903 Commentary comment 126 A comment is replaced by a single space character prior to preprocessing. replaced by space preprocess- 1858 ing directive ended by A preprocessing directive of the form 1904 # include pp-tokens new-line (that does not match one of the two previous forms) is permitted. Commentary #include 1914 This form permits the < > or double-quote delimited forms to be generated via macro expansion. However, it example 2 is rarely used (11 instances in over 60,000 #include directives in the visible source of the .c files). Whether this is because developers are unaware of its existence, or because it has little utility is not known. #include The preprocessing tokens after include in the directive are processed just as in normal text. (Each identifier 1905 macros expanded currently defined as a macro name is replaced by its replacement list of preprocessing tokens.) Commentary To be exact, the preprocessing tokens after include in the directive up to the first new-line character are processed just as in normal text. (Each identifier currently defined as a macro name is replaced by its replacement list of preprocessing tokens.) 1906 Commentary This C sentence provides explicitly clarification that macro replacement occurs in this case (the same #line 1991 clarification is also given elsewhere). macros expanded The directive resulting after all replacements shall match one of the two previous forms.145) 1907 Commentary It is not a violation of syntax if the directive does not match one of the two previous forms, because the syntax of this form has been matched. It is a violation of semantics and therefore the behavior is undefined. The method by which a sequence of preprocessing tokens between a < and a > preprocessing token pair or a 1908 pair of " characters is combined into a single header name preprocessing token is implementation-defined. v 1.2 June 24, 2009
  13. 6.10.2 Source file inclusion 1909 Commentary This implementation-defined behavior may take a number of forms, including: • The ## operator can be used to glue preprocessing tokens together. However, the behavior is undefined 1958 ## operator ## if the resulting character sequence is not a valid preprocessing token. For instance, the five preprocess- 1963 valid not if result ing tokens {{} {string} {.} {h} {}} cannot be glued together to form a valid preprocessing token without going through intermediate stages whose behavior is undefined. • Creating a preprocessing token, via macro expansion, having the double-quote delimited form (i.e., a string preprocessing token) need not depend on any implementation-defined behavior. The stringize 1950 # operator can be used to create a string preprocessing token. operator • Other implementation-defined behaviors might include the handling of space characters. For instance, in the following: 1 #define bra < 2 #define ket > 3 #include bra stdio.h ket does the implementation strip off the space character at the ends of the delimited character sequence? Coding Guidelines Given the rarity of use of this form of #include no guideline recommendations are given here. Example 1 #define mk_sys_hdr(name) < ## name ## > 2 3 #if BUG_FIX 4 #define VERSION 2a /* works because pp-numbers include alphabetics */ 5 #else 6 #define VERSION 2 7 #endif 8 9 #define add_quotes(a) # a 10 #define mk_str(str, ver) add_quotes(str ## ver) 11 12 #include mk_str(Version, VERSION) 1909 The implementation shall provide unique mappings for sequences consisting of one or more letters or digits #include mapping (as defined in 5.2.1) nondigits or digits (6.4.2.1) followed by a period (.) and a single letter nondigit. to host file Commentary This C sentence and the following ones in this C paragraph are a specification of the minimum set of requirements that an implementation must meet. For sequences outside of this set the implementation mapping may be non-unique (like, for instance, the Microsoft Windows technique of mapping files ending in .html to .htm). The handling of character sequences that resemble UCNs may also differ, e.g., "\ubada\file.txt" (Ubada is a city in Tanzania and BADA is the Hangul symbol in ISO 10646). The standard does not permit any number of period characters because many operating systems do not permit them (at least one, RISCOS, does not permit any). The wording was changed by the response to DR #302 to extend the specification to be more consistent with C++. C++ 16.2p5 June 24, 2009 v 1.2
  14. 1911 6.10.2 Source file inclusion The implementation provides unique mappings for sequences consisting of one or more nondigits (2.10) followed by a period (.) and a single nondigit. Other Languages Other languages either specified to operate within the same operating systems and file systems limitations as C and as such have to deal with the same issues, or require an integrated development environment to be created before they can be used. Common Implementations Implementations invariably pass the sequence of characters that appear between the delimiters (when searching other places a directory path may be added) as an argument in a call to fopen or equivalent system function. The called library function will eventually call some host operating system function that interfaces to the host file system. The C translator’s behavior is thus controlled by the characteristics of the host file system and how it maps character sequences to file names. The handling of the period character varies between file systems, known behaviors include: • Unix based file systems permit more than one period in a file name. • MS-DOS based file systems only permit a single period in a file name. • RISCOS, an operating system for the Acorn ARM processor does not support filenames that contain a period. For this host file names, that contained a period, specified in a #include directive were mapped using a directory structure. All file names ending in the characters .h were searched for in a directory called h. Coding Guidelines Because an implementation is not required to provide a unique mapping for all sequences it is possible that an unintended header or source file will be accessed, or the translator will fail to identify a known header or #include 1897 source file. The possible consequences of an unintended access are discussed elsewhere, while failure to h-char-sequence source file 1896 identify known header or source file will cause a diagnostic to be issued. The cost/benefit issues associated inclusion with using character sequences having a unique mapping in the different environments that the source may be translated in is outside the scope of these coding guidelines. The first character shall be a letter not be a digit. 1910 Commentary This requirement only applies to the first character of the sequence that implementations are required to provide a unique mapping for. The wording was changed by the response to DR #302. C90 The requirement that the first character not be a digit is new in C99. Given that it is more restrictive than that required for existing C90 implementations (and thus existing code) it is unlikely that existing code will be affected by this requirement. C++ This requirement is new in C99 and is not specified in the C++ Standard (the argument given in the C90 subsection (above) also applies to C++). Common Implementations Most implementations support a first character that is not a letter. header name The implementation may ignore the distinctions of alphabetical case and restrict the mapping to eight significant 1911 significant charac- ters characters before the period. v 1.2 June 24, 2009
  15. 6.10.2 Source file inclusion 1914 Commentary These permissions reflect known characteristics of file systems in which translators are executed. C90 The limit specified by the C90 Standard was six significant characters. However, implementations invariably used the number of significant characters available in the host file system (i.e., they do not artificially limit the number of significant characters). It is unlikely that a header of source file will fail to be identified because of a difference in what used to be a non-significant character. C++ The C++ Standard does not give implementations any permissions to restrict the number of significant characters before the period (16.1p5). However, the limits of the file system used during translation are likely to be the same for both C and C++ implementations and consequently no difference is listed here. Common Implementations All file systems place some limits on the number of characters in a source file name— for instance: • Most versions of the Microsoft DOS environment ignore the distinction of alphabetic case and restrict the mapping to eight significant characters before any period (and a maximum of three after it). • POSIX requires that at least 14 characters be significant in a file name (it also requires implementations to support at least 255 characters in a pathname). Many Linux file systems support up to 255 characters in a filename and 4095 characters in a pathname. Coding Guidelines The potential problems associated with limits on sequences characters that are likely to be treated as unique is a configuration management issue that is outside the scope of these coding guidelines. 1912 A #include preprocessing directive may appear in a source file that has been read because of a #include directive in another file, up to an implementation-defined nesting limit (see 5.2.4.1). Commentary Thus #include directives can be nested within source files whose contents have themselves been #included. 295 limit This issue is discussed elsewhere. While this permission only applies to source files, an implementation #include nest- ing using some form of precompiled headers (which are not source files within the standard’s definition of the 121 header precompiled term) that did not support this functionality would not be popular with developers. 108 source files 1913 EXAMPLE 1 The most common uses of #include preprocessing directives are as in the following: #include #include "myprog.h" Other Languages Some languages only have a single form of #include directive for all headers. 1914 EXAMPLE 2 This illustrates macro-replaced #include directives: #include example 2 #if VERSION == 1 #define INCFILE "vers1.h" #elif VERSION == 2 #define INCFILE "vers2.h" // and so on #else #define INCFILE "versN.h" #endif #include INCFILE June 24, 2009 v 1.2
  16. 1919 6.10.3 Macro replacement Commentary This example does not illustrate any benefit compared to that obtained from placing separate #include directives in each arm of the conditional inclusion directive. Forward references: macro replacement (6.10.3). 1915 footnote 145) Note that adjacent string literals are not concatenated into a single string literal (see the translation 1916 145 phases in 5.1.1.2); Commentary transla- 135 tion phase String concatenation occurs in translation phase 6 and so it is not possible to join together two existing strings 6 to form another string within a #include directive. thus, an expansion that results in two string literals is an invalid directive. 1917 Commentary It is an invalid directive in that it violates a semantic requirement and thus the behavior is undefined. It is not a syntax violation. 6.10.3 Macro replacement macro replace-Constraints ment replacement list Two replacement lists are identical if and only if the preprocessing tokens in both have the same number, 1918 identical if ordering, spelling, and white-space separation, where all white-space separations are considered identical. Commentary internal 282 This is actually a definition in a Constraints clause (it is used by two constraints in this C subsection). identifier The check against same spelling only needs to take into account the significant characters of an identifier. significant characters Considering all white-space separations to be identical removes the need for developers to be concerned about use of different source layout (e.g., indentation) and method of spacing (e.g., space character vs. horizontal tab). Rationale The specification of macro definition and replacement in the Standard was based on these principles: • Interfere with existing code as little as possible. • Keep the preprocessing model simple and uniform. • Allow macros to be used wherever functions can be. • Define macro expansion such that it produces the same token sequence whether the macro calls appear in open text, in macro arguments, or in macro definitions. Preprocessing is specified in such a way that it can be implemented either as a separate text-to-text prepass or as a token-oriented portion of the compiler itself. Thus, the preprocessing grammar is specified in terms of tokens. object-like An identifier currently defined as an object-like macro shall not be redefined by another #define preprocessing 1919 macro redefini- directive unless the second definition is an object-like macro definition and the two replacement lists are tion identical. v 1.2 June 24, 2009
  17. 6.10.3 Macro replacement 1921 Commentary There was an existing body of code, containing redefinitions of the same macro, when the C Standard was first written. The C committee did not want to specify that existing code containing such usage was non-conforming, but they did consider the case where the bodies of any subsequent definitions differed to be an erroneous usage. 1983 EXAMPLE macro redefinition C90 The wording in the C90 Standard was modified by the response to DR #089. Common Implementations Some translators permit multiple definitions of a macro, independently of the contents of the contents of the #define/#undef stack bodies. The behavior is for a new definition to cause the previous body to be pushed, in a stack-like fashion. Any subsequent #undef of the macro name popping this stacked definition and to make it the current one. Coding Guidelines C permits more than one definition of the same macro name, with the same body, and more than one external definition of the same object, with the same type and the coding guideline issues are the same for both (in 420 linkage 422.1 identifier both cases translators are not always required to issue a diagnostic if the definitions are considered to be declared in one file different). In both cases a technique for avoiding duplicate definitions, during translation but not in the visible source, is to bracket definitions with #ifndef MACRO_NAME/#endif (in the case of the file scope object a macro name needs to be created and associated with its declaration). Using this technique has the disadvantage that it prevents the translator checking that any subsequent redeclarations of an identifier are the same (unless the bracketing occurs around the only textual declaration that occurs in any source file used to build a program). 1920 Likewise, an identifier currently defined as a function-like macro shall not be redefined by another #define function-like macro redefinition preprocessing directive unless the second definition is a function-like macro definition that has the same number and spelling of parameters, and the two replacement lists are identical. Commentary 1919 object-like The issues are the same as for object-like macros, with the addition of checks on the parameters. Requiring macro redefinition that the parameters be spelled the same, rather than, for instance, that they have an identical effect, simplifies the similarity checking of two macro bodies. For instance, in: 1 #define FM(foo) ((foo) + x) 2 #define FM(bar) ((bar) + x) a translator is not required to deduce that the two definitions of FM are structurally identical. 1921 There shall be white-space between the identifier and the replacement list in the definition of an object-like macro. Commentary In the following (assuming $ is a member of the extended character set and permitted in an identifier 216 extended character set preprocessing token): 1 #define A$ x an object-like macro with the name A$ and the body x is defined, not macro with the name A and the body $ x. There is no requirement that there be white-space following the ) in a function-like macro definition. C90 The response to DR #027 added the following requirements to the C90 Standard. DR #027 June 24, 2009 v 1.2
  18. 1922 6.10.3 Macro replacement Correction Add to subclause 6.8, page 86 (Constraints): In the definition of an object-like macro, if the first character of a replacement list is not a character required by subclause 5.2.1, then there shall be white-space separation between the identifier and the replacement list.* [Footnote *: This allows an implementation to choose to interpret the directive: #define THIS$AND$THAT(a, b) ((a) + (b)) as defining a function-like macro THIS$AND$THAT, rather than an object-like macro THIS. Whichever choice it makes, it must also issue a diagnostic.] However, the complex interaction between this specification and UCNs was debated during the C9X review process and it was decided to simplify the requirements to the current C99 form. 1 #define TEN.1 /* Define the macro TEN to have the body .1 in C90. */ 2 /* A constraint violation in C99. */ C++ The C++ Standard specifies the same behavior as the C90 Standard. Common Implementations HP–was DEC– treats $ as part of the spelling of the macro name. If the identifier-list in the macro definition does not end with an ellipsis, the number of arguments (including 1922 those arguments consisting of no preprocessing tokens) in an invocation of a function-like macro shall equal the number of parameters in the macro definition. Commentary function call 998 arguments agree with parameters This requirement is the macro invocation equivalent of the one for function calls. C90 If (before argument substitution) any argument consists of no preprocessing tokens, the behavior is undefined. The behavior of the following was discussed in DR #003q3, DR #153, and raised against C99 in DR #259 (no committee response was felt necessary). 1 #define foo() A 2 #define bar(B) B 3 4 foo() // no arguments 5 bar() // one empty argument? What was undefined behavior in C90 (an empty argument) is now explicitly supported in C99. The two most likely C90 translator undefined behaviors are either to support them (existing source developed using such a translator will may contain empty arguments in a macro invocation), or to issue a diagnostic (existing source developed using such a translator will not contain any empty arguments in a macro invocation). C++ The C++ Standard contains the same wording as the C90 Standard. C++ translators are not required to correctly process source containing macro invocations having any empty arguments. v 1.2 June 24, 2009
  19. 6.10.3 Macro replacement 1925 Common Implementations Some C90 implementations (e.g., gcc) treated empty arguments as an argument containing no preprocessing tokens, while others (e.g., Microsoft C) treated an empty argument as being a missing argument (i.e., a constraint violation). 1923 Otherwise, there shall be more arguments in the invocation than there are parameters in the macro definition ... arguments macro (excluding the ...). Commentary There must be at least one argument to match the ellipsis. This requirement avoids the problems that occur Rationale when the trailing arguments are included in a list of arguments to another macro or function. For example, if dprintf had been defined as #define dprintf(format,...) \ dfprintf(stderr, format, __VA_ARGS__) and it were allowed for there to be only one argument, then there would be a trailing comma in the expanded form. While some implementations have used various notations or conventions to work around this problem, the Committee felt it better to avoid the problem altogether. C90 Support for the form ... is new in C99. C++ Support for the form ... is new in C99 and is not specified in the C++ Standard. Common Implementations gcc allowed zero arguments to match a macro parameter defined using the ... form. Coding Guidelines While some developers may be confused because the requirements on the number of arguments are different from functions defined using the ellipsis notation, passing too few arguments is a constraint violation (i.e., translators are required to issue a diagnostic that a developer then needs to correct). 1924 There shall exist a ) preprocessing token that terminates the invocation. macro invocation ) terminates it Commentary While this requirement is specified in the syntax, it is interpreted as requiring the ) preprocessing token to occur before any macro replacement of the identifiers following the matching ( preprocessing token. For instance, in: 1 #define R_PAREN ) 2 3 #define FUNC(a) a 4 5 static int glob = (1 + FUNC(1 R_PAREN ); the invocation is terminated by the ) preprocessing token that occurs immediately before ;, not the expanded form of R_PAREN. 1925 The identifier _ _VA_ARGS_ _ shall occur only in the replacement-list of a function-like macro that uses the ellipsis notation in the argumentsparameters. June 24, 2009 v 1.2
  20. 1928 6.10.3 Macro replacement Commentary This requirement simplifies a translators processing of occurrences of the identifier _ _VA_ARGS_ _. This typographical correction was made by the response to DR #234. C90 Support for _ _VA_ARGS_ _ is new in C99. Source code declaring an identifier with the spelling _ _VA_ARGS_ _ will cause a C99 translator to issue a diagnostic (the behavior was undefined in C90). C++ Support for _ _VA_ARGS_ _ is new in C99 and is not specified in the C++ Standard. Common Implementations gcc required developers to give a name to the parameter that accepted a variable number of arguments. This parameter name appeared in the replacement list wherever the variable number of arguments were to be substituted. Example 1 /* 2 * The following are constraint violations. 3 */ 4 #define __VA_ARGS__ 5 #define jparks __VA_ARGS__ 6 #define jparks(__VA_ARGS__) 7 #define jparks(__VA_ARGS__, ...) __VA_ARGS__ 8 9 #define jparks(x) x 10 jparks(__VA_ARGS__) 11 12 #define jparks(x, ...) x 13 jparks(__VA_ARGS__,1) 14 /* 15 * The following break the spirit, if not the wording 16 * of this constraint. 17 */ 18 #define jparks(x, y) x##y 19 jparks(__VA, _ARGS__) 20 21 #define jparks(x, y, ...) x##y 22 jparks(__VA, _ARGS__, 1) macro parameter A parameter identifier in a function-like macro shall be uniquely declared within its scope. 1926 unique in scope Commentary declaration 1350 only one if no linkage This constraint is the macro equivalent of the one given for objects with no linkage. Its scope is the list macro pa- 1934 of parameters in the macro definition and the body of that definition. This scope ends at the new-line that rameter scope extends terminates the directive. Macro parameters are also discussed elsewhere. identifier 396 macro parameter Semantics macro name The identifier immediately following the define is called the macro name. 1927 identifier Commentary This defines the term macro name. This term is generically used in software engineering to refer to this kind of entity. v 1.2 June 24, 2009
Đồng bộ tài khoản