Linux Kernel Netfilter Subsystem Analysis
Table of Contents
- Core Hook Infrastructure
- iptables/xtables
- nf_tables
- Conntrack Integration
- Specific Match/Target Modules
- Key Data Structures
1. Core Hook Infrastructure
Location: /Users/sphinx/github/linux/net/netfilter/core.c
nf_hook_state Structure (line 78 in /Users/sphinx/github/linux/include/linux/netfilter.h)
struct nf_hook_state {
u8 hook; // Hook number (e.g., NF_INET_PRE_ROUTING)
u8 pf; // Protocol family (e.g., NFPROTO_IPV4)
struct net_device *in; // Input device
struct net_device *out; // Output device
struct sock *sk; // Socket (if any)
struct net *net; // Network namespace
int (*okfn)(struct net *, struct sock *, struct sk_buff *); // Continue function
};nf_hook_ops Structure (line 98 in /Users/sphinx/github/linux/include/linux/netfilter.h)
struct nf_hook_ops {
struct list_head list; // List entry for registration
struct rcu_head rcu; // RCU for safe deletion
/* User fills in from here down. */
nf_hookfn *hook; // Hook function callback
struct net_device *dev; // Device for NETDEV hooks
void *priv; // Private data passed to hook
u8 pf; // Protocol family
enum nf_hook_ops_type hook_ops_type:8; // Hook type (NF_HOOK_OP_*)
unsigned int hooknum; // Hook number (NF_INET_*)
int priority; // Hook priority (lower = earlier)
};nf_hook_entries Structure (line 123 in /Users/sphinx/github/linux/include/linux/netfilter.h)
struct nf_hook_entries {
u16 num_hook_entries; // Number of registered hooks
struct nf_hook_entry hooks[]; // Array of hook entries
/* trailer: pointers to original orig_ops of each hook,
* followed by rcu_head and scratch space used for freeing */
};nf_hook_entry Structure (line 113 in /Users/sphinx/github/linux/include/linux/netfilter.h)
struct nf_hook_entry {
nf_hookfn *hook; // Hook function pointer
void *priv; // Private data for the hook
};Hook Registration Functions
nf_register_net_hook() - Line 554 in /Users/sphinx/github/linux/net/netfilter/core.c
int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)- Purpose: Register a single netfilter hook
- Parameters:
net: Network namespacereg: Hook operations structure
- Behavior:
- For
NFPROTO_INETfamily, hooks are registered for both IPv4 and IPv6 (unlessNF_INET_INGRESS) - Calls
__nf_register_net_hook()to perform actual registration - On failure, rolls back IPv6 registration if IPv4 succeeded
- For
- Returns: 0 on success, negative errno on failure
nf_unregister_net_hook() - Line 526 in /Users/sphinx/github/linux/net/netfilter/core.c
void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)- Purpose: Unregister a netfilter hook
- Behavior:
- Handles
NFPROTO_INETby unregistering from both protocol families - Calls
__nf_unregister_net_hook()for actual removal
- Handles
__nf_register_net_hook() - Line 393 in /Users/sphinx/github/linux/net/netfilter/core.c
static int __nf_register_net_hook(struct net *net, int pf,
const struct nf_hook_ops *reg)- Purpose: Internal function to register a hook for a specific protocol family
- Behavior:
- Validates hook parameters based on protocol family
- Gets pointer to hook head via
nf_hook_entry_head() - Acquires
nf_hook_mutex - Grows hook entries array via
nf_hook_entries_grow() - Validates and assigns new hook entries
- Updates static keys for jump label optimization
- Frees old entries after RCU grace period
__nf_unregister_net_hook() - Line 485 in /Users/sphinx/github/linux/net/netfilter/core.c
static void __nf_unregister_net_hook(struct net *net, int pf,
const struct nf_hook_ops *reg)- Purpose: Internal function to unregister a hook
- Behavior:
- Locates hook entry via
nf_hook_entry_head() - Removes hook or replaces with dummy (accept_all) to maintain ordering
- Shrinks hook array if possible via
__nf_hook_entries_try_shrink() - Triggers
nf_queue_nf_hook_drop()for queue cleanup
- Locates hook entry via
Hook Entry Growth/Sizing
nf_hook_entries_grow() - Line 100 in /Users/sphinx/github/linux/net/netfilter/core.c
static struct nf_hook_entries *
nf_hook_entries_grow(const struct nf_hook_entries *old,
const struct nf_hook_ops *reg)- Purpose: Add a new hook to the hook entries array
- Behavior:
- Counts existing non-dummy hooks
- Checks BPF hook type priority constraints (line 128-130)
- Allocates new array with size = old + 1
- Inserts new hook maintaining priority order (ascending)
- BPF hooks at same priority as existing BPF hook are rejected
- Returns: New hook entries or ERR_PTR on failure
__nf_hook_entries_try_shrink() - Line 231 in /Users/sphinx/github/linux/net/netfilter/core.c
static void *__nf_hook_entries_try_shrink(struct nf_hook_entries *old,
struct nf_hook_entries __rcu **pp)- Purpose: Shrink hook array after unregistration
- Behavior:
- Counts dummy hooks (removed but placeholder)
- If all removed, returns NULL to clear entirely
- Otherwise allocates smaller array and copies live hooks
- Returns old entries to be freed after RCU grace period
Hook Traversal
nf_hook_slow() - Line 616 in /Users/sphinx/github/linux/net/netfilter/core.c
int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,
const struct nf_hook_entries *e, unsigned int s)- Purpose: Slow path hook invocation - iterates through all hooks
- Parameters:
skb: Packet bufferstate: Hook execution statee: Hook entries arrays: Starting index
- Behavior:
- Iterates from index
stoe->num_hook_entries - Calls each hook via
nf_hook_entry_hookfn() - Processes verdict:
NF_ACCEPT: Continue to next hookNF_DROP: Free skb, return errorNF_QUEUE: Queue to userspace vianf_queue()NF_STOLEN: Packet consumed by hook
- Returns 1 if packet should continue to
okfn()
- Iterates from index
nf_hook_entry_hookfn() - Line 155 in /Users/sphinx/github/linux/include/linux/netfilter.h
static inline int
nf_hook_entry_hookfn(const struct nf_hook_entry *entry, struct sk_buff *skb,
struct nf_hook_state *state)
{
return entry->hook(entry->priv, skb, state);
}Hook Priority Ordering
Hooks are ordered by ascending priority (lower number = higher priority):
- Priority is set in
nf_hook_ops.priority - New hooks are inserted before hooks with lower priority (higher number)
- Common priorities (defined in headers):
NF_IP_PRI_FIRST: -100NF_IP_PRI_CONNTRACK: -200 (Conntrack runs first)NF_IP_PRI_MANGLE: -150NF_IP_PRI_NAT_DST: -100NF_IP_PRI_FILTER: 0 (Standard filter)NF_IP_PRI_SECURITY: 50NF_IP_PRI_NAT_SRC: 100
Hook Head Lookup
nf_hook_entry_head() - Line 275 in /Users/sphinx/github/linux/net/netfilter/core.c
static struct nf_hook_entries __rcu **
nf_hook_entry_head(struct net *net, int pf, unsigned int hooknum,
struct net_device *dev)- Purpose: Find the hook entries array for a given protocol family/hook
- Behavior: Switches on protocol family:
NFPROTO_NETDEV: Direct device hooksNFPROTO_ARP: ARP hooksNFPROTO_BRIDGE: Bridge hooksNFPROTO_INET: Ingress hooks or dual IPv4/IPv6NFPROTO_IPV4: IPv4 hooksNFPROTO_IPV6: IPv6 hooks
2. iptables/xtables
Location: /Users/sphinx/github/linux/net/netfilter/x_tables.c
xt_table Structure (line 213 in /Users/sphinx/github/linux/include/linux/netfilter/x_tables.h)
struct xt_table {
struct list_head list; // List of registered tables
unsigned int valid_hooks; // Bitmask of valid hook points
struct xt_table_info *private; // Table entries (kernel internal)
struct nf_hook_ops *ops; // Hook ops for registration
struct module *me; // Owner module
u_int8_t af; // Address family (NFPROTO_*)
int priority; // Priority for hook ordering
const char name[XT_TABLE_MAXNAMELEN]; // Table name
};xt_table_info Structure (line 238 in /Users/sphinx/github/linux/include/linux/netfilter/x_tables.h)
struct xt_table_info {
unsigned int size; // Total size of entries
unsigned int number; // Number of entries
unsigned int initial_entries; // Initial entry count
unsigned int hook_entry[NF_INET_NUMHOOKS]; // Entry points per hook
unsigned int underflow[NF_INET_NUMHOOKS]; // Underflow points
unsigned int stacksize; // Jump stack size
void ***jumpstack; // Jump stack for subchain calls
unsigned char entries[]; // Rule entries blob
};xt_match Structure (line 132 in /Users/sphinx/github/linux/include/linux/netfilter/x_tables.h)
struct xt_match {
struct list_head list; // Registration list
const char name[XT_EXTENSION_MAXNAMELEN]; // Match name
u_int8_t revision; // Revision number
bool (*match)(const struct sk_buff *skb,
struct xt_action_param *); // Match function
int (*checkentry)(const struct xt_mtchk_param *); // Validation
void (*destroy)(const struct xt_mtdtor_param *); // Cleanup
struct module *me; // Owner module
const char *table; // Required table (or NULL)
unsigned int matchsize; // Size of match data
unsigned int usersize; // Userspace size
unsigned int hooks; // Hook bitmask
unsigned short proto; // Protocol (or 0 for all)
unsigned short family; // Address family
};xt_target Structure (line 172 in /Users/sphinx/github/linux/include/linux/netfilter/x_tables.h)
struct xt_target {
struct list_head list;
const char name[XT_EXTENSION_MAXNAMELEN];
u_int8_t revision;
// Target function: returns verdict (NF_ACCEPT, NF_DROP, etc.)
unsigned int (*target)(struct sk_buff *skb,
const struct xt_action_param *);
int (*checkentry)(const struct xt_tgchk_param *); // Validation
void (*destroy)(const struct xt_tgdtor_param *); // Cleanup
struct module *me;
const char *table;
unsigned int targetsize; // Size of target data
unsigned int usersize; // Userspace size
unsigned int hooks; // Hook bitmask
unsigned short proto;
unsigned short family;
};Entry Structures (Userspace/Kernel ABI)
xt_entry_match (line 11 in /Users/sphinx/github/linux/include/uapi/linux/netfilter/x_tables.h)
struct xt_entry_match {
union {
struct {
__u16 match_size; // Total size of entry
char name[XT_EXTENSION_MAXNAMELEN]; // Match name
__u8 revision; // Revision
} user;
struct {
__u16 match_size;
struct xt_match *match; // Kernel internal
} kernel;
__u16 match_size; // Total length
} u;
unsigned char data[]; // Match-specific data
};xt_entry_target (line 34 in /Users/sphinx/github/linux/include/uapi/linux/netfilter/x_tables.h)
struct xt_entry_target {
union {
struct {
__u16 target_size;
char name[XT_EXTENSION_MAXNAMELEN];
__u8 revision;
} user;
struct {
__u16 target_size;
struct xt_target *target; // Kernel internal
} kernel;
__u16 target_size;
} u;
unsigned char data[0]; // Target-specific data
};Match/Target Registration
xt_register_match() - Line 139 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
int xt_register_match(struct xt_match *match)
{
u_int8_t af = match->family;
mutex_lock(&xt[af].mutex);
list_add(&match->list, &xt[af].match);
mutex_unlock(&xt[af].mutex);
return 0;
}xt_register_target() - Line 89 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
int xt_register_target(struct xt_target *target)
{
u_int8_t af = target->family;
mutex_lock(&xt[af].mutex);
list_add(&target->list, &xt[af].target);
mutex_unlock(&xt[af].mutex);
return 0;
}xt_find_match() - Line 197 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
struct xt_match *xt_find_match(u8 af, const char *name, u8 revision)- Finds a match by name and revision
- Returns with refcount held on success
- Falls back to
NFPROTO_UNSPECfor family-independent matches
xt_find_target() - Line 246 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
static struct xt_target *xt_find_target(u8 af, const char *name, u8 revision)- Similar to
xt_find_match()but for targets
Validation Functions
xt_check_match() - Line 480 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
int xt_check_match(struct xt_mtchk_param *par,
unsigned int size, u16 proto, bool inv_proto)- Validates match size alignment
- Checks table constraints
- Verifies protocol match
- Validates hook mask
xt_check_target() - Line ~520 in /Users/sphinx/github/linux/net/netfilter/x_tables.c
int xt_check_target(struct xt_tgchk_param *par,
unsigned int size, u16 proto, bool inv_proto)- Similar validation for targets
Translation from iptables Syntax to Kernel Structures
- Userspace iptables sends netlink messages with rule specifications
- Kernel receives via
ipt_do_table()(in ip_tables.c) - For each entry:
- Matches are validated and iterated
- Target is checked and executed
- Verdict determines packet fate
3. nf_tables
Location: /Users/sphinx/github/linux/net/netfilter/nf_tables_api.c
nft_table Structure (line 1311 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_table {
struct list_head list; // Table list
struct rhltable chains_ht; // Chain hash table
struct list_head chains; // Chain list
struct list_head sets; // Sets in table
struct list_head objects; // Stateful objects
struct list_head flowtables; // Flow tables
u64 hgenerator; // Handle generator state
u64 handle; // Table handle
u32 use; // Reference count
u16 family:6; // Address family
u16 flags:8; // Table flags
u16 genmask:2; // Generation mask
u32 nlpid; // Netlink port ID
char *name; // Table name
u16 udlen; // User data length
u8 *udata; // User data
u8 validate_state; // Validation state
};nft_chain Structure (line 1142 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_chain {
struct nft_rule_blob __rcu *blob_gen_0; // Current generation rules
struct nft_rule_blob __rcu *blob_gen_1; // Next generation rules
struct list_head rules; // Rule list
struct list_head list; // Table chain list
struct rhlist_head rhlhead; // Hash list head
struct nft_table *table; // Parent table
u64 handle; // Chain handle
u32 use; // Reference count
u8 flags:5; // Chain flags
u8 bound:1; // Bound to rule
u8 genmask:2; // Generation mask
char *name; // Chain name
u16 udlen; // User data length
u8 *udata; // User data
struct nft_rule_blob *blob_next; // Commit phase pointer
struct nft_chain_validate_state vstate; // Validation state
};nft_base_chain Structure (line 1244 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_base_chain {
struct nf_hook_ops ops; // Netfilter hook ops
struct list_head hook_list; // Hook list for netdev
const struct nft_chain_type *type; // Chain type
u8 policy; // Default policy
u8 flags; // Flags
struct nft_stats __percpu *stats; // Per-CPU stats
struct nft_chain chain; // Base chain
struct flow_block flow_block; // Flow block for offload
};nft_rule Structure (line 1002 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_rule {
struct list_head list;
u64 handle:42; // Rule handle
u64 genmask:2; // Generation mask
u64 dlen:12; // Data length
u64 udata:1; // Has user data
unsigned char data[]; // Expression data
};nft_rule_blob Structure (line 1093 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_rule_blob {
unsigned long size;
unsigned char data[]; // Rule data (nft_rule_dp array)
};nft_rule_dp Structure (line 1073 in /Users/sphinx/github/linux/include/net/netfilter/nf_tables.h)
struct nft_rule_dp {
u64 is_last:1; // Last rule marker
u64 dlen:12; // Data length
u64 handle:42; // Handle for tracing
unsigned char data[]; // Expression data
};nft_do_chain() - Line 249 in /Users/sphinx/github/linux/net/netfilter/nf_tables_core.c
unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv)
{
const struct nft_chain *chain = priv, *basechain = chain;
const struct net *net = nft_net(pkt);
const struct nft_expr *expr, *last;
const struct nft_rule_dp *rule;
struct nft_regs regs;
unsigned int stackptr = 0;
struct nft_jumpstack jumpstack[NFT_JUMP_STACK_SIZE];
bool genbit = READ_ONCE(net->nft.gencursor);
struct nft_rule_blob *blob;
struct nft_traceinfo info;- Purpose: Main packet processing function for nf_tables
- Behavior:
- Selects generation (blob_gen_0 or blob_gen_1) based on
gencursor - Iterates through rules in the chain
- For each rule, evaluates expressions:
- Fast path:
nft_cmp_fast_eval,nft_cmp16_fast_eval,nft_bitwise_fast_eval,nft_payload_fast_eval - Slow path:
expr_call_ops_eval()for complex expressions
- Fast path:
- Handles verdicts:
NFT_BREAK: Continue to next ruleNFT_CONTINUE: Continue rule evaluationNFT_JUMP: Push current position, jump to subchainNFT_GOTO: Jump to subchain without saving positionNFT_RETURN: Return from subchain
- Uses jumpstack for NFT_JUMP (max depth:
NFT_JUMP_STACK_SIZE= 16) - Returns final verdict to caller
- Selects generation (blob_gen_0 or blob_gen_1) based on
Transaction Model
The transaction model in nf_tables allows atomic rule updates:
Preparation Phase (
NFT_TRANS_PREPARE):- New rules/chains are created
- Modifications stored in
nft_transstructures - Added to
commit_list
Commit Phase (
NFT_TRANS_COMMIT):- Generation cursor switches (blob_gen_0 <-> blob_gen_1)
- New structures become active
- Old structures marked for cleanup
Release Phase (
NFT_TRANS_RELEASE):- Old generation structures freed after RCU grace period
nft_trans_alloc() - Line 178 in /Users/sphinx/github/linux/net/netfilter/nf_tables_api.c
static struct nft_trans *nft_trans_alloc(const struct nft_ctx *ctx,
int msg_type, u32 size)
{
struct nft_trans *trans;
trans = kzalloc(size, GFP_KERNEL);
// Initialize transaction
INIT_LIST_HEAD(&trans->list);
trans->msg_type = msg_type;
trans->net = ctx->net;
trans->table = ctx->table;
// ...
return trans;
}Flow Table Registration
nft_register_flowtable_ops() - Line 8920 in /Users/sphinx/github/linux/net/netfilter/nf_tables_api.c
static int nft_register_flowtable_ops(struct net *net,
struct nft_flowtable *flowtable,
struct nft_hook *hook)- Registers flow table hooks with netfilter core
- Sets up hardware offload path if available
4. Conntrack Integration
Location: /Users/sphinx/github/linux/net/netfilter/nf_conntrack_netlink.c
ctnetlink_create_conntrack() - Line 2234 in /Users/sphinx/github/linux/net/netfilter/nf_conntrack_netlink.c
static struct nf_conn *
ctnetlink_create_conntrack(struct net *net,
const struct nf_conntrack_zone *zone,
const struct nlattr * const cda[],
struct nf_conntrack_tuple *otuple,
struct nf_conntrack_tuple *rtuple,
u8 u3)
{
struct nf_conn *ct;
int err = -EINVAL;
struct nf_conntrack_helper *helper;
struct nf_conn_tstamp *tstamp;
u64 timeout;
ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
if (IS_ERR(ct))
return ERR_PTR(-ENOMEM);- Purpose: Create a new conntrack entry via ctnetlink
- Behavior:
- Allocates conntrack via
nf_conntrack_alloc() - Parses helper info if present
- Sets up NAT if configured
- Adds extensions (acct, timestamp, labels, seqadj, synproxy)
- Sets timeout from netlink attribute
- Inserts into hash table via
nf_conntrack_hash_check_insert()
- Allocates conntrack via
ctnetlink_get_conntrack() - Line 1667 in /Users/sphinx/github/linux/net/netfilter/nf_conntrack_netlink.c
static int ctnetlink_get_conntrack(struct sk_buff *skb,
const struct nfnl_info *info,
const struct nlattr * const cda[])- Purpose: Get conntrack entries via ctnetlink
- Behavior:
- Parses tuple attributes
- Looks up conntrack by tuple
- Dumps conntrack state to netlink message
nf_ct_put() - Line 34 in /Users/sphinx/github/linux/include/linux/netfilter/nf_conntrack_common.h
static inline void nf_ct_put(struct nf_conntrack *nfct)
{
WARN_ON(!refcount_read(&nfct->use));
if (nfct)
nf_conntrack_put(nfct);
}- Purpose: Release reference to conntrack
- Behavior: Wrapper around
nf_conntrack_put()with sanity check
5. Specific Match/Target Modules
xt_DSCP.c - DSCP Marking
Location: /Users/sphinx/github/linux/net/netfilter/xt_DSCP.c
xt_dscp_info Structure (from /Users/sphinx/github/linux/include/uapi/linux/netfilter/xt_dscp.h)
struct xt_dscp_info {
__u8 dscp; // DSCP value to match/mark
__u8 invert; // Invert match
};
struct xt_tos_match_info {
__u8 tos_mask;
__u8 tos_value;
__u8 invert;
};dscp_mt() - Line 24 in /Users/sphinx/github/linux/net/netfilter/xt_DSCP.c
static bool
dscp_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_dscp_info *info = par->matchinfo;
u_int8_t dscp = ipv4_get_dsfield(ip_hdr(skb)) >> XT_DSCP_SHIFT;
return (dscp == info->dscp) ^ !!info->invert;
}- Extracts DSCP from IPv4 header
- Compares with match info
- XORs with invert flag for result
dscp_mt6() - Line 33 in /Users/sphinx/github/linux/net/netfilter/xt_DSCP.c
static bool
dscp_mt6(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_dscp_info *info = par->matchinfo;
u_int8_t dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> XT_DSCP_SHIFT;
return (dscp == info->dscp) ^ !!info->invert;
}- Same as above but for IPv6
dscp_mt_check() - Line 42 in /Users/sphinx/github/linux/net/netfilter/xt_DSCP.c
static int dscp_mt_check(const struct xt_mtchk_param *par)
{
const struct xt_dscp_info *info = par->matchinfo;
if (info->dscp > XT_DSCP_MAX)
return -EDOM; // Domain error
return 0;
}- Validates DSCP value is within valid range (0-63)
xt_HL.c - Hop Limit Modification
Location: /Users/sphinx/github/linux/net/netfilter/xt_HL.c
Hop Limit Match for IPv4 (ttl_mt) - Line 25
static bool ttl_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct ipt_ttl_info *info = par->matchinfo;
const u8 ttl = ip_hdr(skb)->ttl;
switch (info->mode) {
case IPT_TTL_EQ: return ttl == info->ttl;
case IPT_TTL_NE: return ttl != info->ttl;
case IPT_TTL_LT: return ttl < info->ttl;
case IPT_TTL_GT: return ttl > info->ttl;
}
return false;
}Hop Limit Match for IPv6 (hl_mt6) - Line 44
static bool hl_mt6(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct ip6t_hl_info *info = par->matchinfo;
const struct ipv6hdr *ip6h = ipv6_hdr(skb);
switch (info->mode) {
case IP6T_HL_EQ: return ip6h->hop_limit == info->hop_limit;
case IP6T_HL_NE: return ip6h->hop_limit != info->hop_limit;
case IP6T_HL_LT: return ip6h->hop_limit < info->hop_limit;
case IP6T_HL_GT: return ip6h->hop_limit > info->hop_limit;
}
return false;
}Match Registration - Line 63
static struct xt_match hl_mt_reg[] __read_mostly = {
{
.name = "ttl",
.revision = 0,
.family = NFPROTO_IPV4,
.match = ttl_mt,
.matchsize = sizeof(struct ipt_ttl_info),
.me = THIS_MODULE,
},
{
.name = "hl",
.revision = 0,
.family = NFPROTO_IPV6,
.match = hl_mt6,
.matchsize = sizeof(struct ip6t_hl_info),
.me = THIS_MODULE,
},
};xt_RATEEST.c - Rate Estimation
Location: /Users/sphinx/github/linux/net/netfilter/xt_RATEEST.c
Rate Estimation Match - Line 14
static bool
xt_rateest_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_rateest_match_info *info = par->matchinfo;
struct gnet_stats_rate_est64 sample = {0};
u_int32_t bps1, bps2, pps1, pps2;
bool ret = true;
gen_estimator_read(&info->est1->rate_est, &sample);
if (info->flags & XT_RATEEST_MATCH_DELTA) {
bps1 = info->bps1 >= sample.bps ? info->bps1 - sample.bps : 0;
pps1 = info->pps1 >= sample.pps ? info->pps1 - sample.pps : 0;
} else {
bps1 = sample.bps;
pps1 = sample.pps;
}- Reads rate estimation statistics from first estimator
- Optionally computes delta (difference from stored value)
xt_rateest_mt_checkentry() - Line 72
static int xt_rateest_mt_checkentry(const struct xt_mtchk_param *par)
{
struct xt_rateest_match_info *info = par->matchinfo;
struct xt_rateest *est1, *est2;
// Validates flags, mode, and name lengths
// Looks up rate estimators by name
est1 = xt_rateest_lookup(par->net, info->name1);
est2 = xt_rateest_lookup(par->net, info->name2);
info->est1 = est1;
info->est2 = est2;
return 0;
}xt_TCPMSS.c - TCP MSS Clamping
Location: /Users/sphinx/github/linux/net/netfilter/xt_TCPMSS.c
TCP MSS Match - Line 24
static bool
tcpmss_mt(const struct sk_buff *skb, struct xt_action_param *par)
{
const struct xt_tcpmss_match_info *info = par->matchinfo;
const struct tcphdr *th;
struct tcphdr _tcph;
const u_int8_t *op;
u8 _opt[15 * 4 - sizeof(_tcph)];
unsigned int i, optlen;
th = skb_header_pointer(skb, par->thoff, sizeof(_tcph), &_tcph);
if (th == NULL)
goto dropit;
if (th->doff*4 < sizeof(*th))
goto dropit;
optlen = th->doff*4 - sizeof(*th);
if (!optlen)
goto out;
op = skb_header_pointer(skb, par->thoff + sizeof(*th), optlen, _opt);
if (op == NULL)
goto dropit;
for (i = 0; i < optlen; ) {
if (op[i] == TCPOPT_MSS
&& (optlen - i) >= TCPOLEN_MSS
&& op[i+1] == TCPOLEN_MSS) {
u_int16_t mssval;
mssval = (op[i+2] << 8) | op[i+3];
return (mssval >= info->mss_min &&
mssval <= info->mss_max) ^ info->invert;
}
if (op[i] < 2 || i == optlen - 1)
i++;
else
i += op[i+1] ? : 1;
}
out:
return info->invert;
dropit:
par->hotdrop = true;
return false;
}- Parses TCP options looking for MSS option (kind=2, len=4)
- Extracts MSS value from options
- Compares against min/max range
xt_MARK.c - MARK Target
Location: /Users/sphinx/github/linux/include/uapi/linux/netfilter/xt_MARK.h
struct xt_mark_tginfo2 {
__u32 mark, mask; // Mark value and mask
};
struct xt_mark_mtinfo1 {
__u32 mark, mask;
__u8 invert;
};The actual implementation is typically in net/netfilter/xt_mark.c which:
- For match: Compares
skb->markwithinfo->markmasked byinfo->mask - For target: Sets
skb->markusing formula:(skb->mark & ~mask) | (mark & mask)
6. Key Data Structures
struct sk_buff - Packet Buffer
Location: /Users/sphinx/github/linux/include/linux/skbuff.h
Key netfilter-relevant fields:
struct sk_buff {
/* Netfilter related fields (when CONFIG_NF_CONNTRACK): */
unsigned long _nfct; // Connection tracking info
/* Network header offset */
unsigned int network_header; // Offset to network header
unsigned int transport_header; // Offset to transport header
unsigned int mac_header; // Offset to MAC header
/* Packet marks */
__u32 mark; // Skb mark (used by fwmark, etc.)
/* Transfer type */
__u16 tc_index; // Traffic control index
/* VLAN */
__u16 vlan_proto; // VLAN protocol
__u16 vlan_tci; // VLAN TCI
/* Checksum */
__sum16 csum; // Checksum
__u8 ip_summed; // Checksum status (CHECKSUM_*)
/* Network namespace */
struct net *skb_net; // Network namespace
/* Other key fields */
struct sock *sk; // Socket
struct net_device *dev; // Device
struct sk_buff *next; // Queue next
struct sk_buff *prev; // Queue prev
// ... many more fields
};Key accessor functions:
skb_network_header(skb)- Get network layer headerskb_transport_header(skb)- Get transport layer headerskb_mac_header(skb)- Get MAC headerskb_mac_header_was_set(skb)- Check if MAC header is setskb_vlan_tag_get(skb)- Get VLAN tag
struct net_device - Network Device
Location: /Users/sphinx/github/linux/include/linux/netdevice.h
Key netfilter-relevant fields:
struct net_device {
char name[IFNAMSIZ]; // Device name
unsigned long flags; // IFF_* flags
unsigned int features; // NETIF_F_* features
/* Netfilter hooks on this device */
#ifdef CONFIG_NETFILTER_INGRESS
struct nf_hook_entries __rcu *nf_hooks_ingress;
#endif
#ifdef CONFIG_NETFILTER_EGRESS
struct nf_hook_entries __rcu *nf_hooks_egress;
#endif
/* Device chain */
struct list_head napi_list; // NAPI poll list
// ... many more fields
};struct net - Network Namespace
Location: /Users/sphinx/github/linux/include/net/net_namespace.h
Key netfilter-relevant fields:
struct net {
/* Per-namespace netfilter hooks */
struct netns_nf nf;
/* nftables state */
struct netns_nft nft;
// ... many more fields
};The netns_nf structure contains:
struct netns_nf {
struct nf_hook_entries __rcu *hooks_ipv4[NF_INET_NUMHOOKS];
struct nf_hook_entries __rcu *hooks_ipv6[NF_INET_NUMHOOKS];
#ifdef CONFIG_NETFILTER_FAMILY_ARP
struct nf_hook_entries __rcu *hooks_arp[NF_INET_NUMHOOKS];
#endif
#ifdef CONFIG_NETFILTER_FAMILY_BRIDGE
struct nf_hook_entries __rcu *hooks_bridge[NF_INET_NUMHOOKS];
#endif
struct proc_dir_entry *proc_netfilter;
};struct flowi - Flow Identity
Location: /Users/sphinx/github/linux/include/net/flow.h
struct flowi {
__u8 flowic_oif; // Output interface
__u8 flowic_iif; // Input interface
__u8 flowic_tos; // Type of service
__u8 flowic_scope; // Scope
__u8 flowic_proto; // Protocol
__u8 flowic_mark; // Firewall mark
__u16 flowic_padding; // Padding
union nf_inet_addr flowic_u; // Addresses (IPv4 or IPv6)
struct flowic_tunnel flowic_tunnel; // Tunnel info
// ... additional fields
};Call Chains - Packet Flow Through Hooks
IPv4 Packet Ingress Flow
1. NIC Driver receives packet
-> netif_rx() or netif_receive_skb()
2. IP Stack prerouting
-> ip_rcv()
-> ip_rcv_finish()
-> ip_local_deliver()
-> ip_local_deliver_finish()
3. Netfilter PREROUTING Hook (NF_INET_PRE_ROUTING)
-> nf_hook(INET, PRE_ROUTING, net, sk, skb, dev, NULL, ip_local_deliver)
-> For IPv4: net->nf.hooks_ipv4[NF_INET_PRE_ROUTING]
-> nf_hook_slow() iterates hooks
-> If NF_HOOK_COND returns 1, continue to ip_local_deliver()
4. Conntrack (if enabled, priority -200)
-> nf_conntrack_in() (PREROUTING)
-> Creates/updates connection tracking entry
-> Sets skb->_nfct
5. NAT (if enabled, priority -100)
-> nf_nat_pre_routing() (PREROUTING)
-> Performs NAT translation
6. Mangle table PREROUTING (priority -150)
-> iptable_mangle PRE_ROUTING
7. Filter/Destination NAT
-> Remaining hooks in PREROUTING
8. Routing decision
-> ip_route_input() or ip_route_input_slow()
-> Determines local delivery or forward
9a. LOCAL IN (for local delivery):
-> ip_local_deliver()
-> NF_INET_LOCAL_IN hook
-> Filter INPUT chain
-> Transport layer (TCP/UDP)
9b. FORWARD (for forwarding):
-> ip_forward()
-> ip_forward_finish()
-> NF_INET_FORWARD hook
-> Filter FORWARD chain
-> ip_output()
-> NF_INET_POST_ROUTING hooknf_tables Packet Flow
1. Hook registered via nft_register_flowtable_ops()
or nf_register_net_hook()
2. Packet enters nft_do_chain()
3. For each rule in chain:
a. For each expression:
- Fast eval: nft_cmp_fast_eval, nft_bitwise_fast_eval
- Slow eval: expr->ops->eval()
b. If expr returns verdict != NFT_CONTINUE:
- NFT_BREAK: Skip to next rule
- NFT_JUMP: Push position, jump to subchain
- NFT_GOTO: Jump to subchain
- NFT_RETURN: Pop from stack or return
- Verdict (NF_ACCEPT, NF_DROP, etc.): Return
4. If no rule matches:
- Return chain policy (NF_ACCEPT or NF_DROP)
5. Update stats if enabledExtension Registration Call Chain
1. Module init
-> xt_register_matches() or xt_register_targets()
2. xt_register_match():
mutex_lock(&xt[af].mutex)
list_add(&match->list, &xt[af].match)
mutex_unlock(&xt[af].mutex)
3. At rule insertion time:
-> xt_find_match() or xt_find_target()
-> try_module_get(match->me)
4. Match/target checked and called:
-> match->match() or target->target()Hook Priority Summary (IPv4)
| Priority | Component |
|---|---|
| -300 | NF_IP_PRI_FIRST |
| -200 | NF_IP_PRI_CONNTRACK (Conntrack) |
| -150 | NF_IP_PRI_MANGLE (mangle table PREROUTING) |
| -100 | NF_IP_PRI_NAT_DST (DNAT) |
| 0 | NF_IP_PRI_FILTER (filter) |
| 50 | NF_IP_PRI_SECURITY (SELinux) |
| 100 | NF_IP_PRI_NAT_SRC (SNAT) |
| 200 | NF_IP_PRI_LAST |
Key Files Summary
| File | Purpose |
|---|---|
/Users/sphinx/github/linux/net/netfilter/core.c | Core hook infrastructure |
/Users/sphinx/github/linux/net/netfilter/x_tables.c | xtables backend |
/Users/sphinx/github/linux/net/netfilter/nf_tables_api.c | nf_tables API |
/Users/sphinx/github/linux/net/netfilter/nf_tables_core.c | nf_tables core (nft_do_chain) |
/Users/sphinx/github/linux/net/netfilter/nf_tables_trace.c | nf_tables tracing |
/Users/sphinx/github/linux/net/netfilter/nf_conntrack_netlink.c | Conntrack netlink interface |
/Users/sphinx/github/linux/net/netfilter/xt_DSCP.c | DSCP match |
/Users/sphinx/github/linux/net/netfilter/xt_HL.c | Hop Limit match |
/Users/sphinx/github/linux/net/netfilter/xt_RATEEST.c | Rate estimation match |
/Users/sphinx/github/linux/net/netfilter/xt_TCPMSS.c | TCP MSS match |
/Users/sphinx/github/linux/include/linux/netfilter.h | Core netfilter definitions |
/Users/sphinx/github/linux/include/linux/netfilter/x_tables.h | xtables definitions |
/Users/sphinx/github/linux/include/net/netfilter/nf_tables.h | nf_tables kernel API |
/Users/sphinx/github/linux/include/uapi/linux/netfilter/nf_tables.h | nf_tables userspace ABI |
/Users/sphinx/github/linux/include/linux/skbuff.h | sk_buff definition |