During the course of this chapter, we'll be comparing the task of writing a firewall ruleset to computer programming in general. If you don't have any experience with computer programming, this approach might sound complicated and unsettling. The configuration of a firewall shouldn't require a degree in computing sciene or experience in programming, right?
The answer is no, it shouldn't and mostly doesn't. The language used in rulesets to configure pf is made to ressemble the human language. For instance:
block all pass out all keep state pass in proto tcp to any port www keep stateIndeed, it doesn't take a computer programmer to understand what this ruleset does or to intuitively write a ruleset to implement a similarly simple policy. Chances are good that a ruleset created like this will do precisely what the author wanted.
Unfortunately, computers do what you tell them to do instead of what you want them to do. Worse, they can't tell the difference between the two, if there is any. If the computer doesn't do precisely what you want, even though you assumed you made your instructions clear, it's up to you to identify the difference and reformulate the instructions. Since this is a common problem in programming, we can look at how programmers deal with it. It turns out that the skills and methods used to test and debug programs and rulesets are very similar. You won't need to know any programming languages to understand the implications for firewall testing and debugging.
Hence, the first step in configuring a firewall is specifying informally what it is supposed to do. What connections should it block or pass? An example would be:
Vague policies like "allow only what is strictly necessary" or "block attacks" need to be made more precise, or you won't be able to implement them or test them. Like in software development, lacking specifications rarely lead to meaningful and correct implementations ("why don't you start writing code, while I go find out what the customer wants").
You might receive a complete policy and your task is to implement it, or defining the policy might be part of your task. In either case, you'll need to have the policy in hand before you can complete implementing and testing it.
When the source code violates the formal language, the parser reports a syntax error and refuses to translate the text file. This is called a compile-time error and such errors are reliably detected and usually resolved quickly. When pfctl can't parse your ruleset file, it reports the number of the line in the file where the error occured together with a more or less accurate description of what it didn't understand. Unless the entire ruleset file could be parsed without any syntax errors, pfctl does not change the previous ruleset in the kernel. As long as the ruleset file contains one or more syntactic errors, there is no program pf can execute.
The second type of errors is called run-time error, as it occurs only when a syntactically correct program that has been successfully translated is running. With a generic programming language, this might occur when the program is dividing a number by zero, tries to access an invalid memory location or runs out of memory. Since pf rulesets are very limited compared to the functionality of a generic programming language, most of these errors cannot occur when a ruleset is executed, i.e. a ruleset can't 'crash' like a generic program. But, of course, a ruleset can produce the wrong output at run-time, by blocking or passing packets that it shouldn't according to the policy. This is sometimes called a 'logic error', an error that doesn't cause the execution of a program to get aborted, but 'merely' produces incorrect output.
So, before we can start to test whether the firewall correctly implements our policy, we first need to have a ruleset loaded successfully.
# pfctl -f /etc/pf.conf /etc/pf.conf:3: syntax errorThe message means that there is a syntactic error on line 3 of the file /etc/pf.conf and pfctl couldn't load the ruleset. The in-kernel ruleset has not been changed, it remains the same as it was before the failed attempt to load the new ruleset.
There are many different errors that pfctl can produce. The first step is to take a close look at the error message and read it carefully. Not all parts might make sense immediately, but the best chance to understanding what is going wrong is to read all parts. If the message has the format "filename:number: text", it refers to the line with that number inside the file with that name.
The next step is to look at the specific line, either using a text editor (in vi, you can jump to line 3 by entering 3G in beep mode), or like this:
# cat -n /etc/pf.conf 1 int_if = "fxp 0" 2 block all 3 pass out on $int_if inet all kep state # head -n 3 /etc/pf.conf | tail -n 1 pass out inet on $int_if all kep stateThe problem might be a simple typo, like in this case ("kep" instead "keep"). After fixing that, we try to reload the file:
# pfctl -f /etc/pf.conf /etc/pf.conf:3: syntax error # head -n 3 /etc/pf.conf | tail -n 1 pass out inet on $int_if all keep stateNow the keywords are all valid, but on closer inspection, we notice that the placement of the "inet" keyword before "on $int_if" is invalid. It also illustrates that the same line can obviously contain more than a single mistake. pfctl only reports the first problem it finds, and then aborts. If, on retry, it reports the same line again, there might be more mistakes, or the first problem wasn't corrected properly.
Misplacement of keywords is another common mistake. It can be identified by comparing the rule with the BNF syntax at the bottom of the pf.conf(5) man page, which contains:
pf-rule = action [ ( "in" | "out" ) ] [ "log" | "log-all" ] [ "quick" ] [ "on" ifspec ] [ route ] [ af ] [ protospec ] hosts [ filteropt-list ] ifspec = ( [ "!" ] interface-name ) | "{" interface-list "}" af = "inet" | "inet6"This implies that "inet" should come after "on $int_if".
We correct that and retry again:
# pfctl -f /etc/pf.conf /etc/pf.conf:3: syntax error # head -n 3 /etc/pf.conf | tail -n 1 pass out on $int_if inet all keep stateThere is nothing obviously wrong left now. But we're not seeing all the relevant details. The line depends on the definition of the macro $int_if. Could that be wrongly defined? Let's see:
# pfctl -vf /etc/pf.conf int_if = "fxp 0" block drop all ... /etc/pf.conf:3: syntax errorAfter fixing the mistyped "fxp 0" into "fxp0", we retry again:
# pfctl -f /etc/pf.confNo output means the file was loaded successfully.
In some cases, pfctl can provide a more specific error message instead of the generic "syntax error", like:
# pfctl -f /etc/pf.conf /etc/pf.conf:3: port only applies to tcp/udp /etc/pf.conf:3: skipping rule due to errors /etc/pf.conf:3: rule expands to no valid combination # head -n 3 /etc/pf.conf | tail -n 1 pass out on $int_if to port ssh keep stateThe error reported first is usually the most helpful one, and subsequent errors might be misleading. In this case, the problem is that the rule specifies a port criterion without specifying either proto udp or proto tcp.
In rare cases, pfctl is confused by unprintable characters or whitespace in the file, and the mistake is hard to spot without making those characters visible:
# pfctl -f /etc/pf.conf /etc/pf.conf:2: whitespace after \ /etc/pf.conf:2: syntax error # cat -ent /etc/pf.conf 1 block all$ 2 pass out on gem0 from any to any \ $ 3 ^Ikeep state$The problem is the blank after the backslash on the second line, before the end of the line, indicated by the dollar sign by cat -e.
Once the ruleset loads successfully, it can be a good idea to look at the result:
$ cat /etc/pf.conf block all # pass from any to any \ pass from 10.1.2.3 to any $ pfctl -f /etc/pf.conf $ pfctl -sr block drop allThe backslash at the end of the comment line actually continues the comment line.
Expansion of {} lists can have surprising results, which also shows up in the parsed ruleset:
$ cat /etc/pf.conf pass from { !10.1.2.3, !10.2.3.4 } to any $ pfctl -nvf /etc/pf.conf pass inet from ! 10.1.2.3 to any pass inet from ! 10.2.3.4 to anyHere, the problem is that "{ !10.1.2.3, !10.2.3.4 }" doesn't mean "any address except 10.1.2.3 and 10.2.3.4", the expansion will literally match any possible address.
You should reload your ruleset after making permanent changes, to make sure pfctl can load it during the next reboot. On OpenBSD, the rc(8) startup script /etc/rc first loads a small default ruleset, which blocks everything by default except for traffic required during startup (like dhcp or ntp). When it subsequently fails to load the real ruleset /etc/pf.conf, due to syntax errors introduced before the reboot without testing, the small default ruleset will remain active. Luckily, the small default ruleset allows incoming ssh, so the problem can still be fixed remotely.
There are two ways for the ruleset to fail: it might block connections which should be passed or it might pass connections which should be blocked.
Testing generally refers to empirically trying various cases of connections in a systematic way. There's an almost infinite number of different connections that a firewall might be facing, and it would be unfeasible to try every combination of source and destination addresses and ports on all interfaces. Proving the correctness of a ruleset might be possible for very simple rulesets. In practice, the better approach is to create a list of test-cases based on the policy, such that every aspect of the policy is covered. For instance, for our example policy the list of cases to try would be
This might sound odd, but the goal of each test should be to find an error in the firewall implementation, and not to NOT find an error. The overall goal of the process is to build a firewall ruleset without any errors, but if you assume that there are errors, you want to find them rather than miss them. When you assume the role of the tester, you have to adopt a destructive mindset and try to break the ruleset. Only then a failure to break it becomes a constructive argument that the ruleset is free of errors.
TCP and UDP connections can generally be tested with nc(1). nc can be used both as client and as server (using the -l option). For ICMP queries and replies, ping(8) is a simple testing client.
To test connections that should be blocked, you can use any kind of tool that attempts to communicate with the target.
Using tools from ports, like nmap(1), you can scan multiple ports, even across multiple target hosts. Make sure to read the man page when results look odd. For instance, a TCP port is reported as 'unfiltered' when nmap receives a RST from pf. Also, if the host that runs nmap is itself running pf, that might interfere with nmap's ability to do its job properly.
There are some online penetration testing services which allow you to scan yourself from the Internet. Some draw invalid conclusions depending on how you block ports with pf (block drop vs. return-rst or return-icmp), be sure you understand what they do and how they make conclusions before you get alarmed.
More advanced intrusion tools might include things like IP fragmentation or sending of invalid IP packets.
To test connections that should pass according to the policy, it's best to simply use the protocols using the common applications that legitimate users will be using. For instance, if you should pass HTTP connections to a web server, using different web browsers to fetch various content from differnt client hosts is a better test than just confirming that the TCP handshake to the server works with nc(1). Some errors are affected by factors like the hosts' operating systems, for instance you might see problems with TCP window scaling or TCP SACK only between hosts running specific operating systems.
When the expected result is 'pass', there are several ways in which an observed result can differ. The TCP handshake might fail because the peer (or the firewall) is returning an RST. The handshake might simply time out. The handshake might complete and the connection might work for a while but then stall or reset. The connection might work permanently, but throughput or latency might be different from expectations, either lower than expected or higher (in case you expect AltQ to rate-limit the connection).
Expected results can include other aspects than the block/pass decision, for instance whether packets are logged, how they are translated, how they are routed or whether they are increasing counters as expected. If you care about these aspects, they're worth including in the list of things to test together with their expected results.
Your policy might include requirements regarding performance, reaction to overloading or redundancy. These could require dedicated tests. If you set up fail-over using CARP, you probably want to test what happens in case of various kinds of failures.
When you do observe a result that differs from the expectation, systematically note what you did during the test, what result you expected, why you expected that result, what result you observed and how the observation differs from the expectation. Repeat the test to see if the observed result is consistantly reproducable or if results vary. Try varying but similar parameters (like different source or destination addresses or ports).
Once you have a reproducable problem, the next step is to debug it to understand why things don't work as expected and how to fix them. When, during the course of this, you modify the ruleset, you'll have to repeat the entire list of tests again, including the tests that didn't show a problem previously. The change you made might have inadvertedly broken something that worked before.
The same applies to other changes made to the ruleset. A formal procedure for testing can make the process less error-prone. You're probably not going to repeat the entire test procedure for every little change you add to the ruleset. Some changes are trivial and shouldn't be able to break anything. But sometimes they do, or the sum of several changes introduces an error. You can use a revision control system like cvs(1) on your ruleset file, which helps investigating past changes to the file when errors are discovered. If you know that the error was not present a week ago, but now is, looking at all changes made to the file over the last week can help spot the problem, or at least allows to revert the changes until there's time to investigate the problem.
Non-trivial rulesets are like programs, they are rarely perfect in their first version, and it takes a while until they can be trusted to be free of bugs. Unlike real programs, which are never considered completely bug-free by most programmers, rulesets are simple enough to usually mature to that point.
Before you start searching for the cause of a problem you should define what exactly is considered the problem. If you've spotted an error during testing yourself, that can be very simple. If another person is reporting the problem, it can be difficult to extract the essence from a vague problem report. The best starting point is a problem that you can reliably reproduce yourself.
Some network problems are not actually caused by pf. Before you focus on debugging your pf configuration, it is worth establishing that it is indeed pf responsible for the problem. This is simple to do and can save a lot of time searching in the wrong place. Just disable pf with pfctl -d and verify that the problem goes away. If it does, re-enable pf with pfctl -e and verify that the problem occurs again. This does not apply to certain kinds of problems, like when pf is not NAT'ing connections as desired, because when pf is disabled, obviously the problem can't go away. But when possible, try to first prove that pf must be responsible.
Similarly, if the problem is that pf is not doing something you expect it to do, the first step should be to ensure that pf is actually running and the intended ruleset is successfully loaded, with:
# pfctl -si | grep Status Status: Enabled for 4 days 13:47:32 Debug: Urgent # pfctl -sr pass quick on lo0 all pass quick on enc0 all ...
There are several ways to find out what connections are used by an application or protocol. tcpdump(8) can show packets arriving at or leaving from interfaces, both real interfaces like network interface cards and virtual interfaces like pflog(4) and pfsync(4). You can supply an expression that filters what packets are being shown, thereby excluding existing noise on the network. Attempt to communicate using the application or protocol in question, and see what packets are being sent. For example:
# tcpdump -nvvvpi fxp0 tcp and not port ssh and not port smtp 23:55:59.072513 10.1.2.3.65123 > 10.2.3.4.6667: S 4093655771:4093655771(0) win 5840 <mss 1380,sackOK,timestamp 1039287798 0,nop,wscale 0> (DF)This is a TCP SYN packet, the first packet part of the TCP handshake. The sender is 10.1.2.3 port 65123 (which looks like a random high port) and the receipient is 10.2.3.4 port 6667. A detailed description of the output format can be found in the tcpdump(8) man page. tcpdump is the most important tool used in debugging pf related problems, it's well worth getting familiar with.
Another approach is to use pf's log feature. Assuming you use the 'log' option in all 'block' rules, almost any packet blocked by pf will be logged. You can remove the 'log' option from rules that deal with known protocols, so only packets blocked on unknown ports are logged. Try to use the blocked application and check pflog, like:
# ifconfig pflog0 up # tcpdump -nettti pflog0 Nov 26 00:02:26.723219 rule 41/0(match): block in on kue0: 195.234.187.87.34482 > 62.65.145.30.6667: S 3537828346:3537828346(0) win 16384 <mss 1380,nop,nop,sackOK,[|tcp]> (DF)If you're using pflogd(8), the daemon will constantly listen on pflog0 and store the log in /var/log/pflog, which you can view with:
# tcpdump -netttr /var/log/pflogWhen dumping pf logged packets, you can use extended filtering expressions to tcpdump, for instance it can show only logged packets that were blocked incoming on interface wi0 with:
# tcpdump -netttr /var/log/pflog inbound and action block and on wi0Some protocols, like FTP, are not that easy to match, because they don't use fixed port numbers or use multiple related connections. It might not be possible to pass them through the firewall without opening up a wide range of ports. For specific protocols there are solutions, like ftp-proxy(8).
For example, your ruleset might contain the rule
block in return-rst on $ext_if proto tcp from any to $ext_if port sshBut when you try to connect to TCP port 22, the connection is accepted! It appears like the firewall is ignoring your rule. As puzzling as these cases may be when experienced the first couple of times, there's always a logical and often trivial explanation.
First, you should verify everything we just assumed so far. For instance, we assumed that pf is running and the ruleset contains the rule above. It might be unlikely that these assumptions are wrong, but they're quickly verified:
# pfctl -si | grep Status Status: Enabled for 4 days 14:03:13 Debug: Urgent # pfctl -gsr | grep 'port = ssh' @14 block return-rst in on kue0 inet proto tcp from any to 62.65.145.30 port = sshNext, we assume that a TCP connection to port 22 is passing in on kue0. You might think that's obviously true, but it's worth verifying. Start tcpdump:
# tcpdump -nvvvi kue0 tcp and port 22 and dst 62.65.145.30Then repeat the SSH connection. You should see the packets of your connection in tcpdump's output. If you don't, that might be because the connection isn't actually passing through kue0, but through another interface, which would explain why the rule isn't matching. Or you might be connecting to a different address. In short, if you don't see the SSH packets arrive, pf won't see them either, and can't possibly block them using the rule in question.
But if you do see the packets with tcpdump, pf should see and filter them as well. The next assumption is that the block rule is not just present somewhere in the ruleset (which we verified already), but is the last matching rule for these connections. If it isn't the last matching rule, obviously it doesn't make the block decision.
How can the rule not be the last matching rule? Three reasons are possible:
Manually evaluating the ruleset like this can be tedious, even though it can be done pretty quickly and reliably with more experience. If the ruleset is large, you can temporarily reduce it. Save a copy of the real ruleset and remove all rules that you think can't affect this case. Load that ruleset and repeat the test. If the connection is now blocked, the conclusion is that one of the seemingly unrelated rules you removed is responsible for either a) or c). Re-add the rules one by one and repeat the test, until you reach the responsible rule. If the connection is still passed after removal of all unrelated rules, repeat the mental evaluation of the now reduced ruleset.
Another approach is to use pf's logging to identify the cases a) and c). Add 'log' to all 'pass quick' rules before rule 14. Add 'log' to all 'pass' rules after rule 14. Start tcpdump on pflog0 and establish a new SSH connection. You'll see what rule other than rule 14 is matching the packet last. If nothing is logged, the explanation must be b).
First you should find out which of the four cases is the problem. When you try to establish a new connection, you should see the TCP SYN on the first interface using tcpdump. You should see the same TCP SYN leaving out on the second interface. If you don't, the conclusion is that pf is blocking the packet in on the first interface or out on the second.
If the SYN is not blocked, you should see a SYN+ACK arrive in on the second interface and out on the first. If not, pf is blocking the SYN+ACK on either interface.
Add 'log' to the rules which should pass the SYN and SYN+ACK on both interfaces, as well as to all block rules. Repeat the connection attempt and check pflog. It should tell you precisely which case was blocked and by what rule.
There are very few cases where pf silently drops packets not based on rules, where adding 'log' to all rules does not cause the dropped packets to get logged through pflog. The most common case is when a packet almost, but not entirely, matches a state entry.
Remember that for each packet seen, pf first does a state lookup. If a matching state entry is found, the packet is passed immediately, without evaluation of the ruleset.
A state entry contains information related to the state of one connection. Each state entry contains a unique key. This key consists of several values that are constant throughout the lifetime of a connection, these are:
When a new state entry is created by a 'keep state' rule, the entry is stored in the state tree using the state's key. An important limitation of the state tree is that all keys must be unique. That is, no two state entries can have the same key.
It might not be immediately obvious that the same two peers could not establish multiple concurrent connections involving the same addresses, protocol and ports, but this is actually a fundamental property of both TCP and UDP. In fact, the peers' TCP/IP stacks are only able to associate individual packets with their appropriate sockets by doing a similar lookup based on addresses and ports.
Even when a connection is closed, the same pair of addresses and ports cannot be reused immediately. The network might deliver a retransmitted packet of the old connection late, and if the receipient's TCP/IP stack would then falsely associate this packet with a new connection, this would disturb or even reset the new connection. For this reason, both peers are required to wait a specific period of time, called 2MSL for 'twice the maximum segment lifetime', before reusing an old pair of addresses and ports for a new connection.
You can observe this by manually establishing multiple connections between the same peer. For instance, you have a web server running on 10.1.1.1 port 80, and connect to it from client 10.2.2.2 using nc(8) twice, like this:
$ nc -v 10.1.1.1 80 & nc -v 10.1.1.1 80 Connection to 10.1.1.1 80 port [tcp/www] succeeded! Connection to 10.1.1.1 80 port [tcp/www] succeeded!While the connections are still open, you can use netstat(8) on the client or server to list the connections:
$ netstat -n | grep 10.1.1.1.80 tcp 0 0 10.2.2.6.28054 10.1.1.1.80 ESTABLISHED tcp 0 0 10.2.2.6.43204 10.1.1.1.80 ESTABLISHEDAs you can see, the client has chosen two different (random) source ports, so it doesn't violate the requirement of key uniqueness.
You can tell nc(8) to use a specific source port using -p, like:
$ nc -v -p 31234 10.1.1.1 80 & nc -v -p 31234 10.1.1.1 80 Connection to 10.1.1.1 80 port [tcp/www] succeeded! nc: bind failed: Address already in useThe TCP/IP stack of the client prevents the violation of the key uniqueness requirement. Some rare and faulty TCP/IP stacks do not respect this rule, and pf will block their connections when they violate the key uniqueness, as we'll see soon.
Let's get back to how pf does a state lookup when a packet is being filtered. The lookup consists of two steps. First, the state table is searched for a state entry with a key matching the protocol, addresses and port of the packet. This search accounts for packets flowing in either direction. For instance, assume the following packet has created a state entry:
incoming TCP from 10.2.2.2:28054 to 10.1.1.1:80A lookup for the following packets would find this state entry:
incoming TCP from 10.2.2.2:28054 to 10.1.1.1:80 outgoing TCP from 10.1.1.1:80 to 10.2.2.2:28054The state includes information about the direction (incoming or outgoing) of the initial packet that created the state. For instance, the following packets would NOT match the state entry:
outgoing TCP from 10.2.2.2:28054 to 10.1.1.1:80 incoming TCP from 10.1.1.1:80 to 10.2.2.2:28054The reason for this restriction is not obvious, but quite simple. Imagine you only have a single interface with address 10.1.1.1 where a web server is listening on port 80. When client 10.2.2.2 connects to you (using random source port 28054), the initial packet of the connection comes in on your interface and all your outgoing replies should be from 10.1.1.1:80 to 10.2.2.2:28054. You do not want to pass out packets from 10.2.2.2:28054 to 10.1.1.1:80, such packets would make no sense.
If you have a firewall with two interfaces and look at connections passing through the firewall, you'll see that every packet passing in on one interface passes out through the second. If you create state when the initial packet of the connection arrives in on the first interface, that state entry will not allow the same packet to pass out on the second interface, because the direction is wrong in the same way.
Instead, the packet is found to not match the state you already have, and the ruleset is evaluated. You'll have to explicitely allow the packet to pass out on the second interface with a rule. Usually, you'll want to use 'keep state' on that rule as well, so a second state entry is created that covers the entire connection on the second interface.
If you're wondering how it's possible to create a second state for the same connection when we've just explained how states must have unique keys, the explanation is that the state key also contains the direction of the connection, and the entire combination must be unique.
Now we can also explain the difference between floating and interface-bound states. By default, pf creates states that are not bound to any interface. That is, once you allow a connection in on one interface, packets related to the connection that match the state (including the direction restriction!) are passed on any interface. In simple setups with static routing this is only a theoretical issue. There is no reason why you should see packets of the same connection arrive in through several interfaces or why your replies should leave out through several interfaces. With dynamic routing, however, this can happen. You can choose to restrict states to specific interfaces. By using the global setting 'set state-policy if-bound' or the per-rule option 'keep state (if-bound)' you ensure that packets can match state only on the interface that created the state.
When virtual tunneling interfaces are involved, there are cases where the same connection passes through the firewall multiple times. For instance, the initial packet of a connection might first pass in through interface A, then pass in through interface B, then out through interface C and finally pass out through interface D. Usually the packet will be encapsulated on interfaces A and D and decapsulated on interfaces B and C, so pf sees packets of different protocols, and you can create four different states. Without encapsulation, when the packet is the same on all four interfaces, you may not be able to use some features like translations or sequence number modulation, because that would lead to state entries with conflicting keys. Unless you have a complex setup involving tunneling interfaces without encapsulation and see error messages like 'pf: src_tree insert failed', this should be of no concern to you.
Let's return to the state lookup done for each packet before ruleset evaluation. The search for a state entry with matching key will either find a single state entry or not find any state entry at all. If no state entry is found, the ruleset is evaluated.
When a state entry is found, a second step is performed for TCP packets before they are considered to be part of the known connection and passed without ruleset evaluation: sequence number checking.
There are many forms of TCP attacks, where an attacker is trying to manipulate a connection between two hosts. In most cases, the attacker is not located on the routing path between the hosts. That is, he can't listen in on the legitimate packets being sent between the hosts. He can, however, send packets to either host imitating packets of its peer, by spoofing (faking) his source address. The goal of the attacker might be to prevent establishment of connections or to tear down already established connections (to cause a denial of service) or to inject malicious payload into ongoing connections.
To succeed, the attacker has to correctly guess several parameters of the connection, like source and destination addresses and ports. Especially for well-known protocols, this isn't as impossible as it may appear. If the attacker knows both hosts' addresses and one port (because he's attacking a connection to a known service), he only has to guess one port. Even if the client is using a truly random source port (which isn't typical anyway), the attacker could try all 65536 possibilities in a short period of time.
The only thing that's truly hard to guess for an attacker is the right sequence number (and acknowledgement). If both peers chose their initial sequence numbers randomly (or you're modulating sequence numbers for hosts that have weak ISN generators), an attacker will not be able to guess an appropriate value at any given point during the connection.
Throughout the the lifetime of a valid TCP connection, the sequence numbers (and acknowledgements) of individual packets advance according to certain rules. For instance, once a host has sent a particular segment of data and the peer has acknowledged receiption, there is no legitimate reason for the sender to resend data for the same segment. In fact, an attempt to overwrite parts of already received data is not just invalid according to the TCP protocol, but a common attack.
pf uses these rules to deduce small windows for valid sequence numbers. Typically, pf can be sure that only about 30,000 out of 4,294,967,296 possible sequence numbers are valid at any point during a connection. Only when both a packet's sequence and acknowledgement number match these windows, pf will assume that the packet is legitimate and pass it.
When, during the state lookup, a state is found that matches the packet, the second step is to compare the packet's sequence numbers against the windows of allowed values stored in the state entry. When this second step fails, pf will produce a 'BAD state' message and drop the packet without evaluating the ruleset. There are two reasons for not evaluating the ruleset in this case: it would almost certainly be a mistake to pass the packet, and if the ruleset evaluation would result in a last-matching pass keep state rule, pf couldn't honour the decision and create a new state, as that would create a state key conflict.
In order to actually see and log 'BAD state' messages, you'll need to enable debug logging, using:
$ pfctl -xmDebug messages are sent to the console, and syslogd by default archives them in /var/log/messages. Look for messages starting with 'pf:', like:
pf: BAD state: TCP 192.168.1.10:20 192.168.1.10:20 192.168.1.200:64828 [lo=1185380879 high=1185380879 win=33304 modulator=0 wscale=1] [lo=1046638749 high=1046705357 win=33304 modulator=0 wscale=1] 4:4 A seq=1185380879 ack=1046638749 len=1448 ackskew=0 pkts=940:631 dir=out,fwd pf: State failure on: 1 |These messages always come in pairs. The first message shows the state entry at the time the packet was blocked and the sequence numbers of the packet that failed the tests. The second message lists the conditions that were violated.
At the end of the first message, you'll see whether the state was created on an incoming (dir=in) or outgoing (dir=out) packet, and whether the blocked packet was flowing in the same (dir=,fwd) or reverse (dir=,rev) direction relative to the initial state-creating packet.
A state contains three address:port pairs, two of which are always equal unless the connection is being translated by nat, rdr or binat. For outgoing connections, the source is printed on the left and the destination on the right. If the outgoing connection involves source translation, the pair in the middle shows the source after translation. For incoming connections, the connection's source is found on the right and the destination in the middle. If the incoming connection involves destination translation, the left-most pair shows the destination after translation. This format corresponds to the output of pfctl -ss, the only difference is that pfctl indicates the direction of the state using arrows instead.
Next, you see the two peers' current sequence number windows in square brackets. The '4:4' means the state is fully established (smaller values are possible during handshake, larger ones during connection closing). The 'A' indicates that the blocked packet had the ACK flag set, similar to the formatting of TCP flags in tcpdump(8) output, followed by the sequence (seq=) and acknowledgement (ack=) numbers in the blocked packet and the length (len=) of the packet's data payload. ackskew is an internal value of the state entry, only relevant when not equal zero.
The 'pkts=940:631' part means that the state entry has matches 940 packets in the direction of the initial packet and 631 packets in the opposite direction since it was created. These counters can be especially helpful in identifying the cause of problems occuring during the handshake, when either one is zero, contradicting your expectation that the state has matched packets in both directions.
The second message contains a list of one or more digits. Each digit printed represents one check that failed:
However, there is one class of problems in pf configuration that can be diagnosed based on the 'BAD state' messages produced steadily in those cases.
Use 'flags S/SA' on all 'pass proto tcp keep state' rules!All initial SYN packets (and only those packets) have flag SYN set but flag ACK not set. When all your 'keep state' rules that can apply to TCP packets are restricted these packet, only initial SYN packets can create states. Therefore, any TCP state created is created based on an initial SYN packet.
The reason for creating state only on initial SYN packets is a TCP extention called 'window scaling' defined in RFC 1323. The field of the TCP header used to advertise accepted windows became too small for today's fast links. Modern TCP/IP stacks would like to use larger window values than can be stored in the existing header field. Window scaling means that all window values advertised by one peer are to be multiplied by a certain factor by the receipient, instead of be taken literally. In order for this scheme to work, both peers must understand the extention and advertise their ability to support it during the handshake using TCP options. The TCP options are only present in the initial SYN and SYN+ACK packets of the handshake. If and only if both of those packets contain the TCP option, the negotiation is successful, and all further packets' window values are meant to be multiplied.
If pf didn't know about window scaling being used, it would take all advertised window values seen literally, and calculate its windows of acceptable sequence number ranges incorrectly. Typically, peers start to advertise smaller windows and gradually advertise larger windows during the course of a connection. Unaware of the window scaling factors, pf would at some point start to block packets because it would think one peer is overflowing the other's advertised window. The effects would be more or less subtle. Sometimes, the peers will react to the loss of the packets by going into a loss recovery mode and advertise smaller windows. When pf then passes subsequent retransmissions again, advertised windows grow again, up to the point where pf blocks packets. The effect is that connections temporarily stall and throughput is poor. It's also possible that connections stall completely and time out.
pf does know about window scaling and supports it. However, the prerequisite is that you create state on the initial SYN, so pf can associate the first two packets of the handshake with the state entry. Since the entire negotiation of the window scaling factors takes place only in these two packets, there is no reliable way to deduce the factors after the handshake.
Window scaling wasn't widely used in the past, but this is changing rapidly. Just recently, Linux started using window scaling by default. If you experience stalling connections, especially when problems are limited to certain combinations of hosts, and you see 'BAD state' messages related to these connections logged, verify that you're really creating states on the initial packet of a connection.
You can tell whether pf has detected window scaling for a connection from the output of pfctl like:
$ pfctl -vss kue0 tcp 10.1.2.3:10604 -> 10.2.3.4:80 ESTABLISHED:ESTABLISHED [3046252937 + 58296] wscale 0 [1605347005 + 16384] wscale 1If you see 'wscale x' printed in the second line (even if x is zero), pf is aware that the connection uses window scaling.
Another simple method to identify problems related to window scaling is to temporarily disable window scaling support on either peer and reproduce the problem. On OpenBSD, the use of window scaling can be controlled with sysctl(8), like:
$ sysctl net.inet.tcp.rfc1323 net.inet.tcp.rfc1323=1 $ sysctl -w sysctl net.inet.tcp.rfc1323=0 net.inet.tcp.rfc1323: 1 -> 0Similar problems occur when you create a state entry based on packets other than the initial SYN and use 'modulate state' or use translations. In both cases, the translation should occur at the beginning of the connection. If the first packet is not already translated, translation of subsequent packets will usually confuse the receipient and cause it to send replies that pf blocks with 'BAD state' messages.
Copyright (c) 2004-2006 Daniel Hartmeier <[email protected]>. Permission to use, copy, modify, and distribute this documentation for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.