I made a forwarding program that forwards packets via IPIP. Both the forwarding and destination servers are Linux VMs running Ubuntu 18.04 (VMs are managed on a Proxmox host). On the machine where packets are being forwarded to (10.50.0.4
), the endpoint IPIP tunnel (10.2.0.5
) and application are both sitting inside of a network namespace together. The application is binding to the IPIP tunnel's IP address and the default gateway is the IPIP tunnel. What I am trying to accomplish is having the application send packets directly back to the clients while having the source IP spoofed as the forwarding server IP rather than packets going back through the IPIP tunnel. I want to do this so there is less load on the forwarding server along with less latency overall (e.g. packets from the application won't have to go back through the forwarding server).
Initially, I tried creating a veth pair and put the peer inside the network namespace. I then created a bridge on the default namespace and assigned it an IP (10.2.0.1/16
). From here, I bridged the veth on the default namespace and created an SNAT rule in the IPTable's POSTROUTING chain under the NAT table for 10.2.0.0/16
which sources out as the forwarding server's IP (10.50.0.3
). I set the namespace's route to the veth peer along with the next hop (the bridge IP, 10.2.0.1
). While the application was able to send outbound packets through the veth pair which got sourced out as the forwarding server's IP address, the application still didn't work properly. I think this is because the application doesn't support binding to multiple interfaces (IPIP tunnel for receiving and default route interface, veth peer, for sending). Unfortunately, this application is closed-source.
With the above said, I decided to try making a C program using AF_PACKET sockets. The default route in the namespace is set to the IPIP tunnel. However, I still have a veth pair connected to the namespace. The receiving socket captures all packets on the IPIP tunnel (including outgoing packets) and the sending socket binds to the veth peer inside the namespace. When the receiving socket captures a packet, it checks the source IP and if that is the IP address of the IPIP tunnel, this means the IPIP tunnel is sending this packet back out. Therefore, I change the source IP to the forwarding server and try sending it out the veth peer. I also block the original packets going back to the forwarding server via IPTables (iptables -A OUTPUT -d <forwarding server IP> -j DROP
). Here is the program's code:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/ip.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <net/if.h>
#include <linux/if.h>
#include <linux/if_packet.h>
#include <linux/tcp.h>
#include <linux/udp.h>
#include <linux/icmp.h>
#include <net/ethernet.h>
#include <string.h>
#include <error.h>
#include <errno.h>
#include <inttypes.h>
#include <pthread.h>
#include <sys/sysinfo.h>
#include <sys/ioctl.h>
#include <signal.h>
#include <ctype.h>
#define REDIRECT_HEADER
#include "csum.h"
#define MAX_PCKT_LENGTH 65535
#define PACKET_MASK_ANY 0xffffffff
#define PACKET_OUTGOING 4
#define PACKET_RECV_TYPE 18
static int cont = 1;
static unsigned char sMAC[ETH_ALEN];
static unsigned char dMAC[ETH_ALEN];
void signHdl(int tmp)
{
cont = 0;
}
void GetGatewayMAC()
{
char cmd[] = "ip neigh | grep \"$(ip -4 route list 0/0 | cut -d' ' -f3) \" | cut -d' ' -f5 | tr '[a-f]' '[A-F]'";
FILE *fp = popen(cmd, "r");
if (fp != NULL)
{
char line[18];
if (fgets(line, sizeof(line), fp) != NULL)
{
sscanf(line, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx", &dMAC[0], &dMAC[1], &dMAC[2], &dMAC[3], &dMAC[4], &dMAC[5]);
}
pclose(fp);
}
}
void shiftChar(char *arr, int size, int dataLen)
{
for (int16_t i = (dataLen - 1); i >= 0; i--)
{
memmove(arr + i + size, arr + i, 1);
}
for (int16_t i = 0; i < size; i++)
{
memcpy(arr + i, "0", 1);
}
}
void removeChar(char *arr, int size, int dataLen)
{
for (int16_t i = 0; i < dataLen; i++)
{
memmove(arr + i, arr + size + i, 1);
}
for (int16_t i = 0; i < size; i++)
{
memcpy(arr + size + dataLen - i, "0", 1);
}
}
int main(int argc, char *argv[])
{
if (argc < 3)
{
perror("main");
exit(1);
}
int sockfd, sendsockfd;
uint8_t type; // 1 = normal interface (includes Ethernet headers). 2 = IPIP tunnel (doesn't include Ethernet headers).
struct sockaddr_ll a, b, din;
socklen_t dinLen = sizeof(din);
if (argc > 3)
{
type = atoi(argv[3]);
}
a.sll_family = PF_PACKET;
a.sll_ifindex = if_nametoindex(argv[1]);
a.sll_protocol = htons(ETH_P_ALL);
a.sll_halen = ETH_ALEN;
b.sll_family = PF_PACKET;
b.sll_ifindex = if_nametoindex("veth2");
b.sll_protocol = htons(ETH_P_IP);
b.sll_halen = ETH_ALEN;
sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
sendsockfd = socket(AF_PACKET, SOCK_RAW, IPPROTO_RAW);
if (sockfd < 0 || sendsockfd < 0)
{
perror("socket");
exit(1);
}
int v=0;
v = PACKET_MASK_ANY & ~(1<<PACKET_OUTGOING) & ~(1 << PACKET_LOOPBACK);
setsockopt(sockfd, SOL_PACKET, PACKET_RECV_TYPE, &v, sizeof(v));
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strcpy(ifr.ifr_name, "veth2");
if (ioctl(sendsockfd, SIOCGIFHWADDR, &ifr) != 0)
{
perror("ioctl");
exit(1);
}
memcpy(a.sll_addr, ifr.ifr_addr.sa_data, ETH_ALEN);
memcpy(sMAC, a.sll_addr, ETH_ALEN);
GetGatewayMAC();
if (bind(sockfd, (struct sockaddr *)&a, sizeof(a)) < 0)
{
perror("bind");
exit(1);
}
if (bind(sendsockfd, (struct sockaddr *)&b, sizeof(b)) < 0)
{
perror("bind");
exit(1);
}
signal(SIGINT, signHdl);
printf("Source MAC => ");
for(uint8_t i = 0; i < ETH_ALEN; i++)
{
printf("%02x", sMAC[i]);
if (i != 5)
{
printf(":");
}
}
printf("\n");
printf("Destination MAC => ");
for(uint8_t i = 0; i < ETH_ALEN; i++)
{
printf("%02x", dMAC[i]);
if (i != 5)
{
printf(":");
}
}
printf("\n\n");
while (cont)
{
unsigned char buffer[MAX_PCKT_LENGTH];
uint16_t recv;
if ((recv = recvfrom(sockfd, &buffer, MAX_PCKT_LENGTH, 0, (struct sockaddr *)&din, &dinLen)) < 1)
{
perror("recvfrom");
continue;
}
struct ethhdr *ethhdr;
struct iphdr *iphdr;
struct udphdr *udphdr;
if (type == 1)
{
ethhdr = (struct ethhdr *) (buffer);
iphdr = (struct iphdr *) (buffer + sizeof(struct ethhdr));
udphdr = (struct udphdr *) (buffer + sizeof(struct ethhdr) + (iphdr->ihl * 4));
}
else
{
iphdr = (struct iphdr *) (buffer);
udphdr = (struct udphdr *) (buffer + (iphdr->ihl * 4));
}
if (type != 1)
{
shiftChar(buffer, sizeof(struct ethhdr), ntohs(iphdr->tot_len));
ethhdr = (struct ethhdr *) (buffer);
iphdr = (struct iphdr *) (buffer + sizeof(struct ethhdr));
udphdr = (struct udphdr *) (buffer + sizeof(struct ethhdr) + (iphdr->ihl * 4));
//memcpy(ethhdr->h_source, sMAC, ETH_ALEN);
//memcpy(ethhdr->h_dest, dMAC, ETH_ALEN);
ethhdr->h_source[0] = 0x82;
ethhdr->h_source[1] = 0xB3;
ethhdr->h_source[2] = 0x6F;
ethhdr->h_source[3] = 0x24;
ethhdr->h_source[4] = 0x0E;
ethhdr->h_source[5] = 0x74;
ethhdr->h_dest[0] = 0x96;
ethhdr->h_dest[1] = 0xF0;
ethhdr->h_dest[2] = 0xB6;
ethhdr->h_dest[3] = 0xDC;
ethhdr->h_dest[4] = 0xE5;
ethhdr->h_dest[5] = 0x1A;
ethhdr->h_proto = htons(ETH_P_IP);
if (iphdr->saddr == inet_addr(argv[2]) && iphdr->protocol == IPPROTO_UDP)
{
printf("Sending out %d bytes from %s => %s. %d is version. %d is port. XDDD\n", recv, inIP, outIP, iphdr->version, ntohs(udphdr->dest));
// Change source IP
uint32_t oldAddr = iphdr->saddr;
iphdr->saddr = inet_addr("10.50.0.3");
struct in_addr in;
in.s_addr = iphdr->saddr;
char inIP[16];
strcpy(inIP, inet_ntoa(in));
struct in_addr out;
out.s_addr = iphdr->daddr;
char outIP[16];
strcpy(outIP, inet_ntoa(out));
// Recalculate checksumz.
iphdr->check = csum_diff4(oldAddr, iphdr->saddr, iphdr->check);
udphdr->check = 0;
udphdr->check = csum_tcpudp_magic(iphdr->saddr, iphdr->daddr, ntohs(udphdr->len), IPPROTO_UDP, csum_partial(udphdr, ntohs(udphdr->len), 0));
//udphdr->check = csum_diff4(oldAddr, iphdr->saddr, udphdr->check);
uint16_t sent;
if ((sent = write(sendsockfd, buffer, ntohs(iphdr->tot_len) + sizeof(struct ethhdr))) < 1)
{
perror("write");
continue;
}
printf("Sent %d (%lu) back %s => %s.\n\n", sent, ntohs(iphdr->tot_len) + sizeof(struct ethhdr), inIP, outIP);
}
}
}
close(sockfd);
exit(0);
}
Please keep in mind there is a lot of useless code in the above program since I've been trying to test multiple things. With that said, when capturing packets on the IPIP tunnel using AF_PACKET sockets, it does not include an Ethernet header. I believe this is by design since capturing on any other interface on the system does include Ethernet headers. Here is how I'm executing the program inside the namespace:
root@test03:/home/roy# ip netns exec server01 ./af_packet_ipip ipip01 10.2.0.5 2
Here are the interface's information on the host machine (default namespace) along with the namespace the IPIP tunnel and application is running inside:
root@test03:/home/roy# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether ae:21:14:4b:3a:6d brd ff:ff:ff:ff:ff:ff
inet 10.50.0.4/24 brd 10.50.0.255 scope global dynamic ens18
valid_lft 68316sec preferred_lft 68316sec
inet6 fe80::ac21:14ff:fe4b:3a6d/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:49:df:c2:99 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
7: veth1@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master idk state UP group default qlen 1000
link/ether 96:f0:b6:dc:e5:1a brd ff:ff:ff:ff:ff:ff link-netnsid 0
11: idk: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 96:f0:b6:dc:e5:1a brd ff:ff:ff:ff:ff:ff
inet 10.2.0.1/16 scope global idk
valid_lft forever preferred_lft forever
root@test03:/home/roy# ip netns exec server01 ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
5: ipip01@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 peer 10.50.0.3 link-netnsid 0
inet 10.2.0.5/16 scope global ipip01
valid_lft forever preferred_lft forever
6: veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 82:b3:6f:24:0e:74 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.2.0.4/16 scope global veth2
valid_lft forever preferred_lft forever
root@test03:/home/roy# ip route
default via 10.50.0.1 dev ens18 proto dhcp src 10.50.0.4 metric 100
10.2.0.0/16 dev idk proto kernel scope link src 10.2.0.1
10.50.0.0/24 dev ens18 proto kernel scope link src 10.50.0.4
10.50.0.1 dev ens18 proto dhcp scope link src 10.50.0.4 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
root@test03:/home/roy# ip netns exec server01 ip route
default dev ipip01 scope link
10.2.0.0/16 dev veth2 proto kernel scope link src 10.2.0.4
10.2.0.0/16 dev ipip01 proto kernel scope link src 10.2.0.5
The packets are sent back out to the bridge (idk
) and here is a TCPDump showing this:
root@test03:/home/roy# tcpdump -i idk dst host 10.xxx.xxx.xxx -nne
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on idk, link-type EN10MB (Ethernet), capture size 262144 bytes
17:14:48.529043 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:50.224207 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:52.024256 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 147: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 105
17:14:52.234143 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:52.519119 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:54.529119 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:55.024089 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 147: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 105
17:14:55.234171 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:56.524109 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:14:57.229134 82:b3:6f:24:0e:74 > 96:f0:b6:dc:e5:1a, ethertype IPv4 (0x0800), length 51: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
These packets are being sent back to my computer (10.xxx.xxx.xxx
). However, my computer does not receive these packets and I'm not able to connect to the application. Here is a packet capture using the any
interface value:
root@test03:/home/roy# tcpdump -i any udp and src host 10.50.0.3 and dst host 10.xxx.xxx.xxx -nne
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
17:18:15.214170 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:15.214205 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:16.519127 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:16.519153 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:17.014107 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 149: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 105
17:18:17.014132 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 149: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 105
17:18:17.224127 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:17.224151 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:18.514150 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:18.514208 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:19.219091 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
17:18:19.219115 In 82:b3:6f:24:0e:74 ethertype IPv4 (0x0800), length 53: 10.50.0.3.27015 > 10.xxx.xxx.xxx.7130: UDP, length 9
I've been trying to use different source and destination MAC addresses for the Ethernet header. Currently, the source MAC address is set to veth peer's MAC address inside the namespace and the destination MAC address is set to the veth/bridge on the default namespace. When I did a packet capture while having the default route inside the namespace set to the veth peer along with the next hop (bridge IP), those were the source and destination MAC addresses used. I also tried setting the source and destination MAC addresses to 0's to see if that did anything. With that said, I tried setting the destination MAC address to the main host gateway's MAC address. None of these worked, however.
I also tried setting POSTROUTING rules to masquerade and SNAT. Here are some I've tried:
Chain POSTROUTING (policy ACCEPT 9 packets, 640 bytes)
pkts bytes target prot opt in out source destination
79 5056 SNAT all -- * * 10.2.0.0/16 0.0.0.0/0 to:10.50.0.3
0 0 SNAT all -- * idk 0.0.0.0/0 0.0.0.0/0 to:10.50.0.3
0 0 SNAT all -- * veth1 0.0.0.0/0 0.0.0.0/0 to:10.50.0.3
0 0 MASQUERADE all -- * * 10.50.0.3 0.0.0.0/0
None of these worked, though. I'm unsure if I'll need some sort of POSTROUTING rule in order for my program to send packets out through the veth pair/bridge (but also spoofed as the forwarding server IP).
I've confirmed the IP/UDP header's checksums are also correct on these packets.
Additional notes/questions:
Once I figure out the main issue, I'm unsure what the best way to obtain the correct source and destination MAC addresses for the Ethernet header automatically is. For some reason my function to get the source MAC address from the veth peer doesn't work and I'm used to setting the destination MAC address to the gateway's MAC address (which is all 0's inside a network namespace). Any suggestions are welcomed!
The progam above is being made for testing purposes and I just want to see if my theory will even work. If I can get this working, I want to find a faster solution than AF_PACKET sockets. The AF_PACKET sockets receives a copy of the packet from the kernel to my understanding. Therefore, this will result in more load. I want to find a way to capture all outgoing packets on the IPIP tunnel and modify the original packet itself before sending it through the veth peer inside the network namespace. If you have any suggestions for this, feel free to let me know! I wanted to start looking into DPDK for this, but I don't think DPDK will be able to attach to an IPIP tunnel along with an interface that has an application bound to it already. To my understanding, it requires a dedicated NIC.
I was wondering if anybody knows what I am missing or doing wrong here. I'd assume I'm either missing an IPTables rule or my source/destination MAC addresses are incorrect on the Ethernet header I'm sending.
If you need any additional information, please let me know!
Any help is highly appreciated and thank you for your time.
User contributions licensed under CC BY-SA 3.0