pfSense: arpresolve: can’t allocate llinfo for X.X.X.X on emX

After some network maintenance, my virtual pfSense firewall started to cause some major issues and parts of the network started to stop functioning. In the end, it came down to the following error message (arpresolve: can’t allocate llinfo for X.X.X.X on emX). Because this wasn’t the first time… it was time to do a proper blog post about this issue.

In this post, I’m going to describe the issues and solution to get your network back up and running.

Environment

Here is an overview of my current environment where the problem occurred. I created a basic image of the setup:

Basic Network Overview
Basic Network Overview

Not very spectacular, just two routers/firewalls connected with a interconnect network and they use the BGP Routing Protocol. On both sides, you have a set of VLANs connected to the router with a trunk. A so-called router on a stick topology. The reason I use the BGP routing protocol is related to my daily job. The BGP Routing Protocol is kind of the preferred one for VMware NSX.



Problem

The problem started when… I upgraded my physical Cisco Firewall with new firmware and IOS. The interconnect network between pfSense and the Cisco stopped working completely. At first, I wasted about 1.5 hours on the physical Cisco firewall instead of the pfSense appliance. Because it looked to me that the IOS update was causing issues.

I was completely wrong… the Cisco Firewall was running without any issues… but pfSense had developed a new feature…

When I looked in the pfSense interface and went to the “System Logs > System > General” there was a serious error message. The pfSense kernel reported the following issue (arpresolve: can’t allocate llinfo for 192.168.80.253 on em8). Here is a screen capture of the message:

pfSense kernel message: (arpresolve: can’t allocate llinfo for X.X.X.X on emX)

It was unable to learn any new ARP entries in the interconnect network. This was causing the Cisco ASA & pfSense appliance to not form a BGP relationship.

In the case of sending a simple ping, it was not possible. Here is a basic diagram of the issue. All networks on both sides are functioning but the two routers are not able to talk and exchange BGP routes.

Basic Network Failure
Basic Network Failure


Solution

In the end, there are two solutions available for solving the problem:

  • Option 01: Restart the entire pfSense appliance > Problem solved!
  • Option 02: Deactivate interface and activate interface:
    • Connect with Putty to the pfSense appliance.
    • Activate the shell.
    • Deactivate interface (ifconfig em8 down)
    • Activate interface (ifconfig em8 up)
    • Problem solved!

The root cause is not completely clear to me… In total, I encountered this issue for about ten times. This is what I figured out so far:

  • The problem occurred for the first time after installing the package OpenBGPd on pfSense.
  • The ARP issue is only triggered when BGP neighbour states change, not always but sometimes.
  • The issue only occurs on the interconnect network… all other networks just work.

Article Update Juli 2019

After some frustration, I finally found a temporary workaround. After restarting the interface I have a couple of seconds to enter a static ARP entry before it stops working or does not allow me to execute the command:

  • Create two SSH sessions (one for restarting the interface and one for creating the static ARP entry).
    • Session 01:
      • Deactivate interface (ifconfig em8 down)
      • Activate interface (ifconfig em8 up)
    • Session 02:
      • Create static arp entry (arp -s hs-fw01.home-server.local 28:6f:7f:02:45:15)
  • This should fix the problem, in my case, it is working now for about 1 month.

Anybody experiencing the same problems? Anybody who has a definitive solution? Please comment below :)!

29 comments

  1. I see this message when the dynamic IP changes for my residential Verizon FIOS service. It takes quite a while to clear this message and actually get a new IP & update my dynamic DNS entries. Sorry, I can’t speak more to your experience here, as this is quite a different setup.

  2. Andrew says:

    I’m currently experiencing this issue, unrelated to BGP.

    Network is essentially, (ISP/WAN) edge router < 192.168.20.1 > 192.168.20.x < 192.168.20.250 > pfsense firewall < 192.168.1.1 > 192.168.1.x & (via VPN) 192.168.100.x

    Periodically I get the same system -> log arpresolve can’t allocate llinfo for 192.168.20.1 on mvnetaX.

    My setup was working without any incident, then I changed the edge router & things started to freak out.

    • Andrew says:

      Small update:

      I was able to resolve this issue by change my 192.168.20.250 interface from DHCP to static.

      The exact nature of the problem still escapes me though.

      The 192.168.20.250 interface could have it’s IP set via DHCP (which infers a link/route exists). But then once it’s IP was/is set, it would say that it had to access to the 192.168.20.x network. The reasons for the link being down post DHCP is bizarre. Also the DHCP server could see that the device was there, but the PFsense interface couldn’t see the DHCP server/router.

      There was some chatter that I came across but can’t find it again of the issue potentially boiling down to invalid MTU sizes for certain hardware manufacturers. Given I don’t need DHCP & I could take the static approach, I stopped digging into what was causing this issue. But I thought I’d come back here & report my findings.

        • Hello Andrew,

          Thanks for replying on the blog post, I just added a new section to the blog (Article Update Juli 2019).
          This update explains how I finally got it working without issues…. but it is still a workaround :(.

          Keep in mind it is still a workaround and not a fix :(.
          In some cases, a static ARP entry is not ideal.

          Best regards,

          Mischa Buijs

  3. ianh says:

    THANKYOU!

    We’ve been having a similar issue for some time.

    Setup is a virtual pfSense with a /29 toward a pair of routers (Huawei NE in this case). eBGP sessions from each router to pfSense would intermittently just not come up, even though ping worked. Happens on multiple of the same setups. Your post inspired me to investigate ARP further and we found the arpresolve log messages. tcpdump showed traffic for an up BGP session would use the correct source and destination MACs. Traffic for a session that was failing would show outbound from pfSense as using the MAC address of the working session. No ARP for the failed session. So the working router would accept the frame, and just send it on to the failed router, which would return traffic directly.

    This clicked – its eBGP’s TTL check. Simply adding ‘multihop 2’ to openbgpd sessions brought everyone up consistently and cleanly. Its not a great long term fix, but it will resolve our issues until the underlying ARP thing is identified/fixed.

    As to why openbgpd/pfSense sets the next hop in the routing table for the entire /29 to ‘an up session’, I’m not sure, more research required.

    Rgds,

    I.

  4. kgodric says:

    We just had the issue happen to us. Our ASA rebooted, and the issue started. The ASA is configured correctly, yet we cannot ping it.

    Logs for pfSense show the following:
    kernel arpresolve: can’t allocate llinfo for x.x.x.x on bce2

    When I try to add the static arp I get the following:
    arp: cannot intuit interface index and type for y.y.y.y

    I have rebooted, stopped and restarted the interface, and nothing. No change. I am about to delete the interface and re-add it. My next step if that does not work is to power it off, remove power for a few min, and then plug back in and see if it works.

    In the mean time, does anyone have any idea why the kernel is borking like this? Is there a rollback that I can do to get a previous kernel? Maybe an upgrade to a new kernel? It was working for 3 months without issue before this happened. It is hard to believe that rebooting a router/firewall would cause it to stop working.

  5. Jiri says:

    Sound familiar to me. On Freebsd 12.1 P3 – 10GBit network connected from Huawei S5720 to ix0 or mce0 (intel X520 or Mellanox Connect X4), pf acting as NAT and firewall, sometimes broken connectivity. Traffic about hundreds Mbits. On Intel connection stops fully. On Mellanox 80% packet loss apear. In all cases interface down / up and traffic flow without problems. I have opened PR FreeBSD Bugzilla – Bug 243463 for Intel. For Intel the message was ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1455) — resetting, ix0: link state changed to DOWN, ix0: link state changed to UP, but in some cases nothing appear in syslog and traffic stop. For Mellanox error message looks like “arpresolve: can’t allocate llinfo for xx.xxx.xxx.xxx on mce0.

  6. Becky says:

    I am having the same issue, but the IP address in question is something external. I am not sure if this is some very bizarre bug or an attack I need to try to find.

  7. Khurram says:

    This issue occurred on my home network device today. My setup is very simple. Pfsense connected to ISP provided modem with DHCP on the WAN side. Two Vlans on the LAN, one for home one for guests. I changed my ISP’s modem exactly one month ago but last night I installed arpwatch package and today the issue occurred. My pfsense device was not getting DHCP lease from modem and the log had exact same entries.
    I have uninstalled arpwatch package and hope the issue won’t occur again.

  8. Tak says:

    Same log entries showed up in my home setup after I switched to a different ISP from 100Mbit to 1Gbit.
    My setup is rather simple, fiber media converter > 4 core thinclient with PFsense.
    With my new ISP the PFsense router was only able to get 500Mbit up & down. With the hardware provided by the ISP the WAN connection speed was about 1Gbit.

    After I’ve found your article and work around by setting the interface down and up again, I went to look for a way to automatically reinitialize the WAN interface when this error occurs.
    Apparently there is a service called “dpinger” which provided Gateway Monitor. When you go to PFsense > System > Routing, you can edit your Gateway.
    There is an option for Gateway Action, but I’m not seeing a way to have a custom script run as said Action.
    Also, the logs of this dpinger service show errors like “WAN_DHCP : sendto error: 65” When I have time I’ll investigate this further.
    Because I’ve only 1 WAN gateway and no fail-over to another gateway or HA with CARP, I’ve decided to disable the Gateway Monitoring, disabling the dpinger service.

    Since the service is disabled I have not had an issue with my connection going down. Monitoring is done by ping with another service.
    In addition, strangely, my WAN connection speed went up to 700Mbit down and 800Mbit up! Still no 1Gbit, but it’s enough for now.

    Thanks for writing this article! It lead to a different workaround that seems to working and be holding for me.

  9. Adrien Carlyle says:

    I started have issues after upgrading to gigabit Cable internet, and having to swap my modem.

    I was using an SG-1100, but them moved to a qotom i5 box, and also a modified Firebox T70 running pfsense.

    Started seeing really weird issues as soon as the modem was swapped, would be able to ping/trace, but couldn’t get some web pages to load but others would seem to be fine, eventually these arp massages started hitting the logs. It would be fine after a full reboot of my equipment, then after some time would become unstable and then entirely cut out.

    I ended up noticing at random, that the web UI would show the interface as “no carrier” as I’d be checking back and forth between the console and the web. In my SSH session, I ran ifconfig multiple times back to back and my interface was switching between active and no carrier. But it was holding, and the IP was rapidly being added and dropped. Ended up forcing the interface to 1000M Full and it seems to have stabilized the system.

  10. Stefan says:

    Had this problem today, after 2 years for the first time. Wasted hours to find out whats the problem.
    Thanks to this block, I turned of the arp watch deamon and for now, it works again. Thanks!

  11. Ketil says:

    Do not know if my problem relates to yours, but i had the same log entry several times a second. This all happend after a ISP modem reboot. This modem is in half bridge mode.
    The funny thing is that my ISP gateway lies outside of my ISP IP (IP 89.10.X.X/8 – GW 217.13.X.X). Do NOT know how this is set up, but anyway…
    On pfsense i have dhcp set up on wan and i receive all correct info on interface (per status page), but the log enty start ticking in and no internet service. To get this going i have to set up routing info in “shellcmd settings” like this:

    Command shellcmd type Description
    route add -net 217.13.x.x/32 -iface ue0 shellcmd route add net
    route add default 217.13.x.x shellcmd route add DF-GW
    route add -net 217.13.x.x/32 -iface ue0; route add default 217.13.x.x afterfilterchangeshellcmd Add GW and route

    Hope this can help somebody

    Regards
    Ketil M

  12. BtheB says:

    I run version: 2.4.5-RELEASE-p1 and I see this problem with dynamic IP from Spectrum.

    1- Do you see this issue in latest version: 2.5.2 and so upgrade won’t help?

    2- s there any way to add static arp through GUI rather than shell? Because if doing in shell it may not stick.

    3- is there any way to put a timeout for router to revert back the last change in case things go wrong? (I am remote to router).

    Thanks,

    • Mischa Buijs says:

      Hello,

      I just stopped using pfSense because of this bug, the issue is still ongoing and the FreeBSD kernel is blamed for the issue.
      To resolve my issue I moved away from pfSense to VyOs.

      To answer your questions:
      1: No idea, I have not tested that exact version (I stopped using pfSense at version 2.5.0).
      2: No I have not found a way to do that.
      3: I don’t think… the only way is a scheduled reboot maybe.

      Regards,

      Mischa

  13. BtheB says:

    And what are the components of this arp entry?

    arp -s hs-fw01.home-server.local 28:6f:7f:02:45:15

    Error I see is something like this:
    Aug 18 17:30:16 kernel arpresolve: can’t allocate llinfo for 70.115.11.1 on igb0

  14. BtheB says:

    Thanks.

    I have upgraded to LATEST version of pfSense and now I see port change to DOWN instead of those ARP errors. I think it is the same issue but showing with different words in logs.

    Did you ever test by putting a dumb switch between pfSense and ISP modem? I am about to do that and test. Spectrum (ISP) tells me they see no problem on their modem.

    • Mischa Buijs says:

      Hello,

      No never tried that trick! Could be a way to resolve this issue but I am just not sure.
      One thing is for sure it is a very annoying bug that is still there after more than 2 years.

      Regards,

      Mischa

  15. BtheB says:

    So with switch in between ISP “wifi router” and our pfSense the “arp” error went away. I think I know why this is happening specifically in case of Spectrum but may also be same for other providers.

    Spectrum provides a DOCSIS modem that pfSense can connect to it directly OR they can also provide a second CPE they call a “Wifi Router”. So, if pfSense is connected to that “Wifi Router” of theirs and they confirmed it is in bridge mode, pfsense then gets the “arp” errors. But if pfSense is connected directly to modem a different type of error I saw which was “Changed to DOWN” for WAN interface. Now I think these two are different issues.

    We have static IP so when I set that on pfSense I get the arp error. And when I don’t set the static IP on pfSense the “Wifi Router” gives me a 192.168.1.x IP. So this means that even though they say their “Wifi Router” is in bridge it truly is not. It is some semi bridge mode and it keeps pushing it’s 192.168.1.x IP down pfSense’s throat even pfSense is set to statically pickup an IP. This much I have found. I am waiting now to see if with static IP and arp errors I will get disconneccted or not. Probably this time won’t get disconnected since the 192.168.1.x might be ignored by pfSense.

  16. Mse Sq^2 says:

    I have the same problem with the same error which can be temporarily resolved with the same fix using OPNsense which I think definitely points to an underlying BSD bug or miscommunication between some common element to both the *senses and BSD.

    I’m running OPNsense In a routed no-NAT firewall only mode behind another router with an intervening Cisco layer-3 switch and a frontier-fios 1G lime. The disconnect seems to be occurring between the firewall and the switch irrespective of the internet connection which remains fine at all times.

    I just tried the fix last night after two ugly days fighting with it and eventually relying on a direct connection to the edge router bypassing the interior firewall and switch completely.

    This morning I had to use the drop/add fix again as it lost the connection overnight. I am going to try the static–ARP workaround later today.

    I have an otherwise standard setup and I’m not using BGP. I’m only losing two of four vlans. The others seem unaffected.

    If the solution turns out to be leaving the BSD – wrapper OS environment completely I think I’ll just unplug, move to the woods and become a hermit because getting up to speed on both pfsense and OPNsense took a really long time.

  17. Paulsson says:

    Interestingly, I had a the same issue on my opnSense instance. I had tried the workardounds with ifconfig down etc. before, but sometimes not even this worked. After a little research I discovered this post based on the error messages I found in my error logs. By incident and for a reason unknown to me, this issue was resolved for me by deactivating IPv6 DHCP client on the WAN interface – which it don’t recall configuring in the first place. It might have been activated automatically when upgrading, but I’m only speculating here. Hope this helps somebody.

Leave a Reply

Your email address will not be published. Required fields are marked *