Developing a Troubleshooting Checklist phần 2

Chia sẻ: Tuan Bui Nghia | Ngày: | Loại File: PDF | Số trang:5

0
52
lượt xem
5
download

Developing a Troubleshooting Checklist phần 2

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

By just telnetting to TCP port 80 and typing GET / HTTP/1.0 and then pressing Enter a few times, I can retrieve the default web page for the server

Chủ đề:
Lưu

Nội dung Text: Developing a Troubleshooting Checklist phần 2

  1. By just telnetting to TCP port 80 and typing GET / HTTP/1.0 and then pressing Enter a few times, I can retrieve the default web page for the server, which at least verifies that the target host is properly connected to and communicating with the network and at best will tell me exactly what web server software is being run as shown on the 5th line from end, "Server: Microsoft-IIS/6.0. Step3: Physically Check the Firewall One of the most common failures on a network is a physical failure. From bumped power cords to incorrectly seated network cables, you can often quickly identify and remedy the problem by paying a visit to the physical machine. In addition, many network devices, including firewalls, provide visual indicators regarding the status of the system. For example, if you do not see a link light on an interface, that is usually a good indicator that the network cable is not plugged in. In some cases (for example, remote firewalls), it is just not feasible to check the firewall yourself. If you have a trusted person at that remote site, however, you can ask that person to check on the firewall on your behalf. As firewall administrators, many of us have gotten used to being able to do most of our work remotely from our desk. As much of an annoyance as it may be to have to walk over to where the firewall is to check on it, that pales in comparison to spending time trying to troubleshoot a connectivity problem only to find that someone bumped the power cord on the firewall. Step4: Check for Recent Changes Recent changes are not always responsible for problems that occur, but they should always be examined as a potential cause of the problems. The reason for this is simple: Today's networks are so complex that it is difficult to ensure that a change does not cause a problem for a dependent system. Consequentially, it is critical that you have a means of tracking and monitoring the changes that are made in your environment so that you have something that you can refer back to. Good change control is more than just "busy work." It provides a methodical means of answering the questions of who, what, and when: • Who made recent changes At the most simplistic, this gives you the name of who to check with regarding the changes to determine whether they can provide insight into the problem. • What were the changes that were made This is the most important information that your change-control process contains. This information enables you to look at what was changed to make a decision as to whether it looks like the changes could be responsible for the problems. For example, if someone updated the SNMP settings but the problem appears to be with traffic being blocked, a good chance
  2. exists that the changes that were made are irrelevant for the problem that is occurring. • When were the changes made Changes that were made days or weeks ago probably are not responsible for the problems of today. Similarly, however, if the changes were made an hour ago, and the problem showed up an hour ago, it is probably worth investigating the changes in more detail. It is important to view recent changes as a culprit for problems with a skeptical eye, however. Before spending time undoing the changes, examine the change in the context of the problem and make sure that it makes sense for the changes that were made to be a cause of the problem. For example, one time I watched a company roll back a series of virus Digital Audio Tape (DAT) files because they were the last change made on the network before authentication errors started occurring. Now, anyone who knows anything about DAT updates knows that they have pretty much nothing to do with authentication, and this case was no different. When it was all said and done, the DAT updates were rolled back and the problem still existed, but the company lost hours of time that could have been spent fixing the problem. It was subsequently discovered that a domain controller in error was causing the problems. The point is, make sure that the changes appear to be relevant before devoting full attention to them. Just because there were recent changes does not mean that they are responsible for the problem. This is particularly true with firewalls, where it seems like if a change has been made to a firewall within six months of a problem occurring, someone will immediately question whether the firewall is the problemeven if the problem traffic in question never goes through the firewall. Step 5: Check the Firewall Logs for Errors As you saw in Chapter 12, "What Is My Firewall Telling Me?," a wealth of information is available in most firewalls logs and logging systems. Therefore, always review your firewall logs as a routine troubleshooting step. To assist in using the logs as a troubleshooting tool, you can increase the level of logging detail, perhaps changing to informational or even debugging level or selecting to log specific error messages to help isolate the issue. When examining the logs, pay particular attention to the following types of events: • Look for state errors State errors can be indicators of problems with the firewall translation tables (for example, if the Cisco Secure PIX Firewall has an incorrectly configured static translation value). • Look for denied traffic Denied traffic is the classic indicator of an incorrectly configured ruleset. Although virtually all firewalls include an implicit deny statement at the end of the firewall ruleset, to assist in troubleshooting it can be helpful to include an explicit deny and log statement to ensure that the denied traffic is logged accordingly.
  3. • Look for configuration errors Often configuration errors will be reported in the firewall logs as error events, allowing you to rapidly identify a configuration error without needing to review the configuration line by line. A good example of this might be speed and duplex mismatch errors, which can cause the firewall to not be able to make a reliable network connection. • Look for hardware errors Event logs are one of the best sources for discovering hardware-related errors because most firewall vendors log hardware error events in the firewall logs. Step 6: Verify the Firewall Configuration There are two elements to verifying the firewall configuration. The first is to compare the current configuration to a known good configuration. The second is to verify that the firewall configuration is accurate with no typos or other errors. Every time that the firewall configuration is changed (in addition to the first time the firewall is configured), a copy of the new configuration should be saved for archival purposes. This archive represents the last known working configuration. In the event that the firewall is changed, having this archive allows you to compare the current configuration to the archive in an attempt to identify whether any changes have been made to the configuration. If there have been, you can further investigate the changes to determine whether the changes are responsible for the problems that are occurring. Perhaps the most common source of problems with firewalls, however, comes from simple misconfigurations of the firewall. It is too easy to mistype a line, click the wrong element in a graphical user interface (GUI), or just apply the wrong command to the firewall, thus causing a problem on the network that must be troubleshot. This is particularly true when it comes to troubleshooting the firewall ruleset. It is easy to enter the wrong transport protocol (TCP when you meant UDP), IP address, or port number and thus cause the problem. A great example of this occurred when Cisco released the security advisory "Cisco IOS Interface Blocked by IPv4 Packets." As a workaround, it was recommended that, among other things, protocol 53 be blocked. Unfortunately, so many network administrators see "53" and automatically assume DNS (TCP and UDP ports 53), which resulted in folks implementing rulesets to block TCP and UDP port 53 (thus causing DNS traffic to stop being passed) instead of protocol 53, which is related to Cisco IPv4 Packet Processing Denial of Service (SWIPE). Step 7: Verify the Firewall Ruleset As mentioned in the previous section, the firewall ruleset deserves the most scrutiny of anything regarding a firewall during the troubleshooting process. After all, in most cases the firewall exists solely to filter traffic in accordance with the ruleset, which means that
  4. if there is a mistake in the ruleset it will almost certainly manifest itself as a problem on the network. The most common ruleset error is a simple typo. For this reason, I like having someone validate the ruleset other than the person making the changes. The reason for this is simple: The person making the changes generally knows what the changes should be and is more apt to read what he or she thinks the ruleset is supposed to contain, not what the ruleset actually contains. Putting a fresh set of eyes on the ruleset increases the odds that someone will notice that someone inadvertently configured the rule for TCP rather than UDP and so on. Another common error with rulesets is the processing order of the ruleset. You need to understand in what order your firewall processes the ruleset and then verify that you do not have a rule out of order which is causing the problem. For example, if the rules are processed top down until a match is made and you have a rule that denies traffic before a rule that permits traffic, the firewall is going to process the deny and then exit the ruleset because it found a match, never making it to the line that permits the traffic in question. Step 8: Verify That Any Dependent, Non-Firewall-Specific Systems Are Not the Culprit Something else to consider in troubleshooting are the dependent services and systems that are not firewall specific or for which the firewall administrator might not be responsible. This includes the systems that are being protected by the firewall. Common services to examine are name resolution processes such as DNS and WINS. Many times, someone will attempt to access a resource by name through the firewall and when the request fails assume that the firewall is the problem. However, if name resolution is not working properly, the user may not be able to resolve the name of the resource requested to an IP address, which is the cause of the connection failure. Another common source of dependent problems are the systems that provide services to users through the firewall, such as web servers. These servers are frequently managed by a completely separate team that may or may not communicate the status of the servers with the firewall administrators. Therefore, the server administrators may take systems down for maintenance and so on without informing the firewall team. When a user attempts to access the resource, the request naturally failsnot because of the firewall but because the server behind the firewall providing the actual service is not online. External authentication servers such as RADIUS, TACACS+, and Microsoft Windows Domain Controllers can also be a source of problems. For example, if the access to a protected resource behind the firewall requires external authentication and the firewall cannot communicate with the authentication server, it may appear that the firewall is
  5. blocking traffic (and in a manner of speaking, it is), but the real problem is not the firewall but a failure of the authentication server. Step 9: Monitor the Network Traffic When all else has failed and you are left scratching your head regarding what the problem may be, it is a good time to monitor the actual network traffic and examine precisely how the systems are attempting to communicate to and through the firewall. Doing so can help to identify communications problems that may or may not have shown up in the firewall event logs or may have shown up in the firewall event logs but not have provided enough information to determine a course of action to correct the problem. As mentioned previously in this book, monitoring the network traffic with something like Ethereal, allowing you to view the raw packets and communications between hosts, is much like having a Rosetta stone to help decipher the network languages and communications processes that hosts are using to talk to each other. For example, a common ruleset error that people implement is to open TCP port 20 to their FTP servers because it has been commonly reported that FTP servers use both TCP port 20 and 21 for communications. Although this is true, most FTP clients and servers can communicate solely using TCP port 21, which can be validated by monitoring the traffic between the client and server. Having access to this kind of information will assist you in identifying and troubleshooting problems that do not exhibit symptoms anywhere else, be it in the firewall logs, configuration, or firewall ruleset.  
Đồng bộ tài khoản