Windows Internals covering windows server 2008 and windows vista- P17

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

Thêm vào BST

Báo xấu

103
lượt xem 8
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Windows Internals covering windows server 2008 and windows vista- P17: In this chapter, we’ll introduce the key Microsoft Windows operating system concepts and terms we’ll be using throughout this book, such as the Windows API, processes, threads, virtual memory, kernel mode and user mode, objects, handles, security, and the registry.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Windows Internals covering windows server 2008 and windows vista- P17

4. Section Ref 1 Pfn Ref 46 Mapped Views 4 5. User Ref 0 WaitForDel 0 Flush Count 0 6. File Object 86960228 ModWriteCount 0 System Views 0 7. Flags (8008080) File WasPurged Accessed 8. File: \Program Files\Debugging Tools for Windows (x86)\debugger.chw Next look at the file object referenced by the control area with this command: 1. lkd> dt nt!_FILE_OBJECT 0x86960228 2. +0x000 Type : 5 3. +0x002 Size : 128 4. +0x004 DeviceObject : 0x84a69a18 _DEVICE_OBJECT 5. +0x008 Vpb : 0x84a63278 _VPB 6. +0x00c FsContext : 0x9ae3e768 7. +0x010 FsContext2 : 0xad4a0c78 8. +0x014 SectionObjectPointer : 0x86724504 _SECTION_OBJECT_POINTERS 9. +0x018 PrivateCacheMap : 0x86b48460 10. +0x01c FinalStatus : 0 11. +0x020 RelatedFileObject : (null) 12. +0x024 LockOperation : 0 '' 13. ... The private cache map is at offset 0x18: 1. lkd> dt nt!_PRIVATE_CACHE_MAP 0x86b48460 2. +0x000 NodeTypeCode : 766 3. +0x000 Flags : _PRIVATE_CACHE_MAP_FLAGS 4. +0x000 UlongFlags : 0x1402fe 5. +0x004 ReadAheadMask : 0xffff 6. +0x008 FileObject : 0x86960228 _FILE_OBJECT 7. +0x010 FileOffset1 : _LARGE_INTEGER 0x146 8. +0x018 BeyondLastByte1 : _LARGE_INTEGER 0x14a 9. +0x020 FileOffset2 : _LARGE_INTEGER 0x14a 10. +0x028 BeyondLastByte2 : _LARGE_INTEGER 0x156 11. +0x030 ReadAheadOffset : [2] _LARGE_INTEGER 0x0 12. +0x040 ReadAheadLength : [2] 0 13. +0x048 ReadAheadSpinLock : 0 14. +0x04c PrivateLinks : _LIST_ENTRY [ 0x86b48420 - 0x86b48420 ] 15. +0x054 ReadAheadWorkItem : (null) Finally, you can locate the shared cache map in the SectionObjectPointer field of the file object and then view its contents: 1. lkd> dt nt!_SECTION_OBJECT_POINTERS 0x86724504 2. +0x000 DataSectionObject : 0x867548f0 3. +0x004 SharedCacheMap : 0x86b48388 4. +0x008 ImageSectionObject : (null) 790 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
5. lkd> dt nt!_SHARED_CACHE_MAP 0x86b48388 6. +0x000 NodeTypeCode : 767 7. +0x002 NodeByteSize : 320 8. +0x004 OpenCount : 1 9. +0x008 FileSize : _LARGE_INTEGER 0x125726 10. +0x010 BcbList : _LIST_ENTRY [ 0x86b48398 - 0x86b48398 ] 11. +0x018 SectionSize : _LARGE_INTEGER 0x140000 12. +0x020 ValidDataLength : _LARGE_INTEGER 0x125726 13. +0x028 ValidDataGoal : _LARGE_INTEGER 0x125726 14. +0x030 InitialVacbs : [4] (null) 15. +0x040 Vacbs : 0x867de330 -> 0x84738b30 _VACB 16. +0x044 FileObjectFastRef : _EX_FAST_REF 17. +0x048 ActiveVacb : 0x84738b30 _VACB 18. ... Alternatively, you can use the !fileobj command to look up and display much of this information automatically. For example, using this command on the same file object referenced earlier results in the following output: 1. lkd> !fileobj 0x86960228 2. \Program Files\Debugging Tools for Windows (x86)\debugger.chw 3. Device Object: 0x84a69a18 \Driver\volmgr 4. Vpb: 0x84a63278 5. Event signalled 6. Access: Read SharedRead SharedWrite 7. Flags: 0xc0042 8. Synchronous IO 9. Cache Supported 10. Handle Created 11. Fast IO Read 12. FsContext: 0x9ae3e768 FsContext2: 0xad4a0c78 13. Private Cache Map: 0x86b48460 14. CurrentByteOffset: 156 15. Cache Data: 16. Section Object Pointers: 86724504 17. Shared Cache Map: 86b48388 File Offset: 156 in VACB number 0 18. Vacb: 84738b30 19. Your data is at: b1e00156 10.5 File System interfaces The first time a file’s data is accessed for a read or write operation, the file system driver is responsible for determining whether some part of the file is mapped in the system cache. If it’s not, the file system driver must call the CcInitializeCacheMap function to set up the perfile data structures described in the preceding section. 791 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Once a file is set up for cached access, the file system driver calls one of several functions to access the data in the file. There are three primary methods for accessing cached data, each intended for a specific situation: ■ The copy method copies user data between cache buffers in system space and a process buffer in user space. ■ The mapping and pinning method uses virtual addresses to read and write data directly from and to cache buffers. ■ The physical memory access method uses physical addresses to read and write data directly from and to cache buffers. File system drivers must provide two versions of the file read operation—cached and noncached—to prevent an infinite loop when the memory manager processes a page fault. When the memory manager resolves a page fault by calling the file system to retrieve data from the file (via the device driver, of course), it must specify this noncached read operation by setting the “no cache” flag in the IRP. Figure 10-10 illustrates the typical interactions between the cache manager, the memory manager, and file system drivers in response to user read or write file I/O. The cache manager is invoked by a file system through the copy interfaces (the CcCopyRead and CcCopyWrite paths). To process a CcFastCopyRead or CcCopyRead read, for example, the cache manager creates a view in the cache to map a portion of the file being read and reads the file data into the user buffer by copying from the view. The copy operation generates page faults as it accesses each previously invalid page in the view, and in response the memory manager initiates noncached I/O into the file system driver to retrieve the data corresponding to the part of the file mapped to the page that faulted. The next three sections explain these cache access mechanisms, their purpose, and how they’re used. 10.5.1 Copying to and from the Cache 792 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Because the system cache is in system space, it is mapped into the address space of every process. As with all system space pages, however, cache pages aren’t accessible from user mode because that would be a potential security hole. (For example, a process might not have the rights to read a file whose data is currently contained in some part of the system cache.) Thus, user application file reads and writes to cached files must be serviced by kernelmode routines that copy data between the cache’s buffers in system space and the application’s buffers residing in the process address space. The functions that file system drivers can use to perform this operation are listed in Table 10-2. You can examine read activity from the cache via the performance counters or system perprocessor variables stored in the processor’s control block (KPRCB) listed in Table 10-3. 10.5.2 Caching with the Mapping and Pinning Interfaces Just as user applications read and write data in files on a disk, file system drivers need to read and write the data that describes the files themselves (the metadata, or volume structure data). Because the file system drivers run in kernel mode, however, they could, if the cache manager were properly informed, modify data directly in the system cache. To permit this optimization, the cache manager provides the functions shown in Table 10-4. These functions permit the file system drivers to find where in virtual memory the file system metadata resides, thus allowing direct modification without the use of intermediary buffers. 793 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
If a file system driver needs to read file system metadata in the cache, it calls the cache manager’s mapping interface to obtain the virtual address of the desired data. The cache manager touches all the requested pages to bring them into memory and then returns control to the file system driver. The file system driver can then access the data directly. If the file system driver needs to modify cache pages, it calls the cache manager’s pinning services, which keep the pages active in virtual memory so that they cannot be reclaimed. The pages aren’t actually locked into memory (such as when a device driver locks pages for direct memory access transfers). Most of the time, a file system driver will mark its metadata stream “no write”, which instructs the memory manager’s mapped page writer (explained in Chapter 9) to not write the pages to disk until explicitly told to do so. When the file system driver unpins (releases) them, the cache manager releases its resources so that it can lazily flush any changes to disk and release the cache view that the metadata occupied. The mapping and pinning interfaces solve one thorny problem of implementing a file system: buffer management. Without directly manipulating cached metadata, a file system must predict the maximum number of buffers it will need when updating a volume’s structure. By allowing the file system to access and update its metadata directly in the cache, the cache manager eliminates the need for buffers, simply updating the volume structure in the virtual memory the memory manager provides. The only limitation the file system encounters is the amount of available memory. You can examine pinning and mapping activity in the cache via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-5. 794 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
10.5.3 Caching with the Direct Memory Access Interfaces In addition to the mapping and pinning interfaces used to access metadata directly in the cache, the cache manager provides a third interface to cached data: direct memory access (DMA). The DMA functions are used to read from or write to cache pages without intervening buffers, such as when a network file system is doing a transfer over the network. The DMA interface returns to the file system the physical addresses of cached user data (rather than the virtual addresses, which the mapping and pinning interfaces return), which can then be used to transfer data directly from physical memory to a network device. Although small amounts of data (1 KB to 2 KB) can use the usual buffer-based copying interfaces, for larger transfers the DMA interface can result in significant performance improvements for a network server processing file requests from remote systems. To describe these references to physical memory, a memory descriptor list (MDL) is used. (MDLs were introduced in Chapter 9.) The four separate functions described in Table 10-6 create the cache manager’s DMA interface. 795 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
You can examine MDL activity from the cache via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-7. 10.6 Fast I/O Whenever possible, reads and writes to cached files are handled by a high-speed mechanism named fast I/O. Fast I/O is a means of reading or writing a cached file without going through the work of generating an IRP, as described in Chapter 7. With fast I/O, the I/O manager calls the file system driver’s fast I/O routine to see whether I/O can be satisfied directly from the cache manager without generating an IRP. Because the cache manager is architected on top of the virtual memory subsystem, file system drivers can use the cache manager to access file data simply by copying to or from pages mapped to the actual file being referenced without going through the overhead of generating an IRP. Fast I/O doesn’t always occur. For example, the first read or write to a file requires setting up the file for caching (mapping the file into the cache and setting up the cache data structures, as explained earlier in the section “Cache Data Structures”). Also, if the caller specified an asynchronous read or write, fast I/O isn’t used because the caller might be stalled during paging I/O operations required to satisfy the buffer copy to or from the system cache and thus not really providing the requested asynchronous I/O operation. But even on a synchronous I/O, the file system driver might decide that it can’t process the I/O operation by using the fast I/O mechanism, say, for example, if the file in question has a locked range of bytes (as a result of calls to the Windows LockFile and UnlockFile functions). Because the cache manager doesn’t know what parts of which files are locked, the file system driver must check the validity of the read or write, which requires generating an IRP. The decision tree for fast I/O is shown in Figure 10-11. These steps are involved in servicing a read or a write with fast I/O: 1. A thread performs a read or write operation. 2. If the file is cached and the I/O is synchronous, the request passes to the fast I/O entry point of the file system driver stack. If the file isn’t cached, the file system driver sets up the file for caching so that the next time, fast I/O can be used to satisfy a read or write request. 796 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
3. If the file system driver’s fast I/O routine determines that fast I/O is possible, it calls the cache manager’s read or write routine to access the file data directly in the cache. (If fast I/O isn’t possible, the file system driver returns to the I/O system, which then generates an IRP for the I/O and eventually calls the file system’s regular read routine.) 4. The cache manager translates the supplied file offset into a virtual address in the cache. 5. For reads, the cache manager copies the data from the cache into the buffer of the process requesting it; for writes, it copies the data from the buffer to the cache. 6. One of the following actions occurs: ❏ For reads where FILE_FLAG_RANDOM_ACCESS wasn’t specified when the file was opened, the read-ahead information in the caller’s private cache map is updated. Read-ahead may also be queued for files for which the FO_RANDOM_ACCESS flag is not specified. ❏ For writes, the dirty bit of any modified page in the cache is set so that the lazy writer will know to flush it to disk. ❏ For write-through files, any modifications are flushed to disk. The performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-8 can be used to determine the fast I/O activity on the system. 797 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
10.7 read ahead and Write behind In this section, you’ll see how the cache manager implements reading and writing file data on behalf of file system drivers. Keep in mind that the cache manager is involved in file I/O only when a file is opened without the FILE_FLAG_NO_BUFFERING flag and then read from or written to using the Windows I/O functions (for example, using the Windows ReadFile and WriteFile functions). Mapped files don’t go through the cache manager, nor do files opened with the FILE_FLAG_NO_BUFFERING flag set. Note When an application uses the FILE_FLAG_NO_BUFFERING flag to open a file, its file I/O must start at device-aligned offsets and be of sizes that are a multiple of the alignment size; its input and output buffers must also be device-aligned virtual addresses. For file systems, this usually corresponds to the sector size (512 bytes on NTFS, typically, and 2,048 bytes on CDFS). One of the benefits of the cache manager, apart from the actual caching performance, is the fact that it performs intermediate buffering to allow arbitrarily aligned and sized I/O. 10.7.1 Intelligent Read-Ahead The cache manager uses the principle of spatial locality to perform intelligent read-ahead by predicting what data the calling process is likely to read next based on the data that it is reading currently. Because the system cache is based on virtual addresses, which are contiguous for a particular file, it doesn’t matter whether they’re juxtaposed in physical memory. File read-ahead for logical block caching is more complex and requires tight cooperation between file system drivers and the block cache because that cache system is based on the relative positions of the accessed data on the disk, and, of course, files aren’t necessarily stored contiguously on disk. You can examine read-ahead activity by using the Cache: Read Aheads/sec performance counter or the CcReadAheadIos system variable. Reading the next block of a file that is being accessed sequentially provides an obvious performance improvement, with the disadvantage that it will cause head seeks. To extend readahead benefits to cases of strided data accesses (both forward and backward through a file), the cache manager maintains a history of the last two read requests in the private cache map for 798 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
the file handle being accessed, a method known as asynchronous read-ahead with history. If a pattern can be determined from the caller’s apparently random reads, the cache manager extrapolates it. For example, if the caller reads page 4000 and then page 3000, the cache manager assumes that the next page the caller will require is page 2000 and prereads it. Note Although a caller must issue a minimum of three read operations to establish a predictable sequence, only two are stored in the private cache map. To make read-ahead even more efficient, the Win32 CreateFile function provides a flag indicating forward sequential file access: FILE_FLAG_SEQUENTIAL_SCAN. If this flag is set, the cache manager doesn’t keep a read history for the caller for prediction but instead performs sequential read-ahead. However, as the file is read into the cache’s working set, the cache manager unmaps views of the file that are no longer active and, if they are unmodified, directs the memory manager to place the pages belonging to the unmapped views at the front of the standby list so that they will be quickly reused. It also reads ahead two times as much data (2 MB instead of 1 MB, for example). As the caller continues reading, the cache manager prereads additional blocks of data, always staying about one read (of the size of the current read) ahead of the caller. The cache manager’s read-ahead is asynchronous because it is performed in a thread separate from the caller’s thread and proceeds concurrently with the caller’s execution. When called to retrieve cached data, the cache manager first accesses the requested virtual page to satisfy the request and then queues an additional I/O request to retrieve additional data to a system worker thread. The worker thread then executes in the background, reading additional data in anticipation of the caller’s next read request. The preread pages are faulted into memory while the program continues executing so that when the caller requests the data it’s already in memory. For applications that have no predictable read pattern, the FILE_FLAG_RANDOM_ ACCESS flag can be specified when the CreateFile function is called. This flag instructs the cache manager not to attempt to predict where the application is reading next and thus disables read-ahead. The flag also stops the cache manager from aggressively unmapping views of the file as the file is accessed so as to minimize the mapping/unmapping activity for the file when the application revisits portions of the file. 10.7.2 Write-Back Caching and Lazy Writing The cache manager implements a write-back cache with lazy write. This means that data written to files is first stored in memory in cache pages and then written to disk later. Thus, write operations are allowed to accumulate for a short time and are then flushed to disk all at once, reducing the overall number of disk I/O operations. The cache manager must explicitly call the memory manager to flush cache pages because otherwise the memory manager writes memory contents to disk only when demand for physical memory exceeds supply, as is appropriate for volatile data. Cached file data, however, represents nonvolatile disk data. If a process modifies cached data, the user expects the contents to be reflected on disk in a timely manner. 799 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The decision about how often to flush the cache is an important one. If the cache is flushed too frequently, system performance will be slowed by unnecessary I/O. If the cache is flushed too rarely, you risk losing modified file data in the cases of a system failure (a loss especially irritating to users who know that they asked the application to save the changes) and running out of physical memory (because it’s being used by an excess of modified pages). To balance these concerns, once per second the cache manager’s lazy writer function executes on a system worker thread and queues one-eighth of the dirty pages in the system cache to be written to disk. If the rate at which dirty pages are being produced is greater than the amount the lazy writer had determined it should write, the lazy writer writes an additional number of dirty pages that it calculates are necessary to match that rate. System worker threads from the systemwide critical worker thread pool actually perform the I/O operations. Note The cache manager provides a means for file system drivers to track when and how much data has been written to a file. After the lazy writer flushes dirty pages to the disk, the cache manager notifies the file system, instructing it to update its view of the valid data length for the file. (The cache manager and file systems separately track the valid data length for a file in memory.) You can examine the activity of the lazy writer by examining the cache performance counters or per-processor variables stored in the processor’s control block (KPRCB) listed in Table 10-9. eXPeriMeNT: Watching the Cache Manager in action In this experiment, we’ll use Process Monitor to view the underlying file system activity, including cache manager read-ahead and write-behind, when Windows Explorer copies a large file (in this example, a CD-ROM image) from one local directory to another. First, configure Process Monitor’s filter to include the source and destination file paths, the Explorer.exe and System processes, and the ReadFile and WriteFile operations. In this example, the c:\source.iso file was copied to c:\programming\source.iso, so the filter is configured as follows: You should see a Process Monitor trace like the one shown here after you copy the file: 800 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The first few entries show the initial I/O processing performed by the copy engine and the first cache manager operations. Here are some of the things that you can see: ■ The initial 1-MB cached read from Explorer at the first entry. The size of this read depends on an internal matrix calculation based on the file size and can vary from 128 KB to 1 MB. Because this file was large, the copy engine chose 1 MB. ■ The 1-MB read is followed by 16 64-KB noncached reads. Noncached reads typically indicate activity due to page faults or cache manager access. A closer look at the stack trace for these events, which you can see by double-clicking an entry and choosing the Stack tab, reveals that indeed the CcCopyRead cache manager routine, which is called by the NTFS driver’s read routine, causes the memory manager to fault the source data into physical memory: ■ After these 64-KB page fault I/Os, the cache manager’s read-ahead mechanism starts reading the file, which includes the System process’s subsequent noncached 2-MB read at the 1-MB offset. Because of the file size and Explorer’s read I/O sizes, the cache manager chose 2 MB as the optimal read-ahead size. 801 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The stack trace for one of the read-ahead operations, shown next, confirms that one of the cache manager’s worker threads is performing the read-ahead. After this point, Explorer’s 1-MB reads aren’t followed by 64-KB page faults, because the read-ahead thread stays ahead of Explorer, prefetching the file data with its 2-MB noncached reads. Eventually, after reading about 4 MB of the file, Explorer starts performing writes to the destination file. These are sequential, cached 64-KB writes. After about 32 MB of reads, the first WriteFile operation from the System process occurs, shown here: The write operation’s stack trace, shown here, indicates that the memory manager’s mapped page writer thread was actually responsible for the write: 802 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
This occurs because for the first couple of megabytes of data, the cache manager hadn’t started performing write-behind, so the memory manager’s mapped page writer began flushing the modified destination file data (see Chapter 9 for more information on the mapped page writer). To get a clearer view of the cache manager operations, remove Explorer from the Process Monitor’s filter so that only the System process operations are visible, as shown next. With this view, it’s much easier to see the cache manager’s 16-MB write-behind operations (the maximum write sizes are 1 MB on client versions of Windows and 32 MB on server versions; this experiment was performed on a server system). The Time Of Day column shows that these operations occur almost exactly 1 second apart. The stack trace for one of the write-behind operations, shown here, verifies that a cache manager worker thread is performing write-behind: 803 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
As an added experiment, try repeating this process with a remote copy instead (from one Windows system to another) and by copying files of varying sizes. You’ll notice some different behaviors by the copy engine and the cache manager, both on the receiving and sending sides. Disabling Lazy Writing for a File If you create a temporary file by specifying the flag FILE_ATTRIBUTE_TEMPORARY in a call to the Windows CreateFile function, the lazy writer won’t write dirty pages to the disk unless there is a severe shortage of physical memory or the file is explicitly flushed. This characteristic of the lazy writer improves system performance—the lazy writer doesn’t immediately write data to a disk that might ultimately be discarded. Applications usually delete temporary files soon after closing them. Forcing the Cache to Write Through to Disk Because some applications can’t tolerate even momentary delays between writing a file and seeing the updates on disk, the cache manager also supports write-through caching on a per–file object basis; changes are written to disk as soon as they’re made. To turn on write-through caching, set the FILE_FLAG_WRITE_THROUGH flag in the call to the CreateFile function. Alternatively, a thread can explicitly flush an open file, by using the Windows FlushFileBuffers function, when it reaches a point at which the data needs to be written to disk. You can observe cache flush operations that are the result of write-through I/O requests or explicit calls to FlushFileBuffers via the performance counters or per-processor variables stored in the processor’s control block (KPRCB) shown in Table 10-10. 804 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Flushing Mapped Files If the lazy writer must write data to disk from a view that’s also mapped into another process’s address space, the situation becomes a little more complicated, because the cache manager will only know about the pages it has modified. (Pages modified by another process are known only to that process because the modified bit in the page table entries for modified pages is kept in the process private page tables.) To address this situation, the memory manager informs the cache manager when a user maps a file. When such a file is flushed in the cache (for example, as a result of a call to the Windows FlushFileBuffers function), the cache manager writes the dirty pages in the cache and then checks to see whether the file is also mapped by another process. When the cache manager sees that the file is, the cache manager then flushes the entire view of the section to write out pages that the second process might have modified. If a user maps a view of a file that is also open in the cache, when the view is unmapped, the modified pages are marked as dirty so that when the lazy writer thread later flushes the view, those dirty pages will be written to disk. This procedure works as long as the sequence occurs in the following order: 1. A user unmaps the view. 2. A process flushes file buffers. If this sequence isn’t followed, you can’t predict which pages will be written to disk. eXPeriMeNT: Watching Cache Flushes You can see the cache manager map views into the system cache and flush pages to disk by running the Reliability and Performance Monitor and adding the Data Maps/sec and Lazy Write Flushes/sec counters and then copying a large file from one location to another. The generally higher line in the following screen shot shows Data Maps/sec and the other shows Lazy Write Flushes/sec. 10.7.3 Write Throttling 805 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The file system and cache manager must determine whether a cached write request will affect system performance and then schedule any delayed writes. First the file system asks the cache manager whether a certain number of bytes can be written right now without hurting performance by using the CcCanIWrite function and blocking that write if necessary. For asynchronous I/O, the file system sets up a callback with the cache manager for automatically writing the bytes when writes are again permitted by calling CcDeferWrite. Otherwise, it just blocks and waits on CcCanIWrite to continue. Once it’s notified of an impending write operation, the cache manager determines how many dirty pages are in the cache and how much physical memory is available. If few physical pages are free, the cache manager momentarily blocks the file system thread that’s requesting to write data to the cache. The cache manager’s lazy writer flushes some of the dirty pages to disk and then allows the blocked file system thread to continue. This write throttling prevents system performance from degrading because of a lack of memory when a file system or network server issues a large write operation. Note The effects of write throttling are global to the system because the resource it is based on, available physical memory, is global to the system. This means that if heavy write activity to a slow device triggers write throttling, writes to other devices will also be throttled. The dirty page threshold is the number of pages that the system cache will allow to be dirty before throttling cached writers. This value is computed at system initialization time and depends on the product type (client or server). Two other values are also computed—the top dirty page threshold and the bottom dirty page threshold. Depending on memory consumption and the rate at which dirty pages are being processed, the lazy writer calls the internal function CcAdjustThrottle, which, on server systems, performs dynamic adjustment of the current threshold based on the calculated top and bottom values. This adjustment is made to preserve the read cache in cases of a heavy write load that will inevitably overrun the cache and become throttled. Table 10-11 lists the algorithms used to calculate the dirty page thresholds. Write throttling is also useful for network redirectors transmitting data over slow communication lines. For example, suppose a local process writes a large amount of data to a remote file system over a 9600-baud line. The data isn’t written to the remote disk until the cache manager’s lazy writer flushes the cache. If the redirector has accumulated lots of dirty pages that are flushed to disk at once, the recipient could receive a network timeout before the data transfer completes. By using the CcSetDirtyPageThreshold function, the cache manager allows network redirectors to set a limit on the number of dirty cache pages they can tolerate (for each stream), thus preventing this scenario. By limiting the number of dirty pages, the redirector ensures that a cache flush operation won’t cause a network timeout. eXPeriMeNT: Viewing the Write-Throttle Parameters 806 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
The !defwrites kernel debugger command dumps the values of the kernel variables the cache manager uses, including the number of dirty pages in the file cache (CcTotalDirtyPages), when determining whether it should throttle write operations: 1. lkd> !defwrites 2. *** Cache Write Throttle Analysis *** 3. CcTotalDirtyPages: 7 ( 28 Kb) 4. CcDirtyPageThreshold: 425694 ( 1702776 Kb) 5. MmAvailablePages: 572387 ( 2289548 Kb) 6. MmThrottleTop: 450 ( 1800 Kb) 7. MmThrottleBottom: 80 ( 320 Kb) 8. MmModifiedPageListHead.Total: 10477 ( 41908 Kb) 9. Write throttles not engaged 10. lkd> !defwrites 11. *** Cache Write Throttle Analysis *** 12. CcTotalDirtyPages: 7 ( 28 Kb) 13. CcDirtyPageThreshold: 425694 ( 1702776 Kb) 14. MmAvailablePages: 572387 ( 2289548 Kb) 15. MmThrottleTop: 450 ( 1800 Kb) 16. MmThrottleBottom: 80 ( 320 Kb) 17. MmModifiedPageListHead.Total: 10477 ( 41908 Kb) 18. Write throttles not engaged This output shows that the number of dirty pages is far from the number that triggers write throttling (CcDirtyPageThreshold), so the system has not engaged in any write throttling. 10.7.4 System Threads As mentioned earlier, the cache manager performs lazy write and read-ahead I/O operations by submitting requests to the common critical system worker thread pool. However, it does limit the use of these threads to one less than the total number of critical system worker threads for small and medium memory systems (two less than the total for large memory systems). Internally, the cache manager organizes its work requests into four lists (though these are serviced by the same set of executive worker threads): ■ The express queue is used for read-ahead operations. ■ The regular queue is used for lazy write scans (for dirty data to flush), write-behinds, and lazy closes. ■ The fast teardown queue is used when the memory manager is waiting for the data section owned by the cache manager to be freed so that the file can be opened with an image section instead, which causes CcWriteBehind to flush the entire file and tear down the shared cache map. ■ The post tick queue is used for the cache manager to internally register for a notification after each “tick” of the lazy writer thread—in other words, at the end of each pass. 807 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
To keep track of the work items the worker threads need to perform, the cache manager creates its own internal per-processor look-aside list, a fixed-length list—one for each processor—of worker queue item structures. (Look-aside lists are discussed in Chapter 9.) The number of worker queue items depends on system size: 32 for small-memory systems, 64 for medium-memory systems, 128 for large-memory client systems, and 256 for large-memory server systems. For cross-processor performance, the cache manager also allocates a global look-aside list at the same sizes as just described. 10.8 Conclusion The cache manager provides a high-speed, intelligent mechanism for reducing disk I/O and increasing overall system throughput. By caching on the basis of virtual blocks, the cache manager can perform intelligent read-ahead. By relying on the global memory manager’s mapped file primitive to access file data, the cache manager can provide the special fast I/O mechanism to reduce the CPU time required for read and write operations and also leave all matters related to physical memory management to the single Windows global memory manager, thus reducing code duplication and increasing efficiency. 808 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
11. File Systems In this chapter, we present an overview of the file system formats supported by Windows. We then describe the types of file system drivers and their basic operation, including how they interact with other system components, such as the memory manager and the cache manager. Following that is a description of how to use Process Monitor from Windows Sysinternals (at www.microsoft.com/technet/sysinternals) to troubleshoot a wide variety of file system access problems. In the balance of the chapter, we first describe the Common Log File System (CLFS), a transactional logging virtual file system implemented on the native Windows file system format, NTFS. Then we focus on the on-disk layout of NTFS and its advanced features, such as compression, recoverability, quotas, symbolic links, transactions (which use the services provided by CLFS), and encryption. To fully understand this chapter, you should be familiar with the terminology introduced in Chapter 8, including the terms volume and partition. You’ll also need to be acquainted with these additional terms: ■ Sectors are hardware-addressable blocks on a storage medium. Hard disks for x86 systems almost always define a 512-byte sector size; however, Windows also supports large sector disks, which are a new technology that allows access to even larger disks. Thus, if the sector size is the standard 512 bytes and the operating system wants to modify the 632nd byte on a disk, it must write a 512-byte block of data to the second sector on the disk. ■ File system formats define the way that file data is stored on storage media, and they affect a file system’s features. For example, a format that doesn’t allow user permissions to be associated with files and directories can’t support security. A file system format can also impose limits on the sizes of files and storage devices that the file system supports. Finally, some file system formats efficiently implement support for either large or small files or for large or small disks. NTFS and exFAT are examples of file system formats that offer a different set of features and usage scenarios. ■ Clusters are the addressable blocks that many file system formats use. Cluster size is always a multiple of the sector size, as shown in Figure 11-1. File system formats use clusters to manage disk space more efficiently; a cluster size that is larger than the sector size divides a disk into more manageable blocks. The potential trade-off of a larger cluster size is wasted disk space, or internal fragmentation, that results when file sizes aren’t perfect multiples of cluster sizes. 809 Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.