Soussan DAS Computer Consultants

Email bounce message? What, why, and how to fix!

A few years ago I sent an email message that immediately returned a bounce message. This article is going to chronicle exactly what happened, how I narrowed down the problem, and how I eventually completely fixed the issue.

A few years ago when I first started writing this article it was an ongoing issue and hadn't yet been tracked down - every couple of weeks it would pop back up again. It has been a while now and I'm 100% confident the problem is dead.

I wrote back then: "But I promise this problem will die! If I can't kill it then I suck as a computer consultant and all my client should just go find someone else and I should go flip burgers at a local fast food restaurant."

As I love quoting "Klondike Kat always gets his mouse."

Most of this was written before I knew what was happening, and you'll hear that in paragraphs I wrote pre-killing the problem. I've added other info afterwards, but you'll figure that out. Re-writing is on my list of things to do.

Quote
Click Here for Press Release

I do want to apologize up front - this article will start simple but get be pretty geeky. As in "If you own a propeller hat, please go get it and put it on your head now." If you are experiencing this problem and feel what I've described is beyond your abilities, I can assist remotely. Shoot me an email and I'll setup a time to talk.

Problem statement:

The exact bounced email message I received reads like this (sanitized to remove any personal information except for my own):

------
CY1NAM02FT030.mail.protection.outlook.com rejected your message to the following e-mail addresses: 'Bryan... (bryan...@contoso.com)' (bryan...@contoso.com)

CY1NAM02FT030.mail.protection.outlook.com gave this error: Service unavailable, Client host [173.14.62.105] blocked using Spamhaus. To request removal from this list see http://www.spamhaus.org/lookup.lasso (AS16012611) [CY1NAM02FT030.eop-nam02.prod.protection.outlook.com]

Your message wasn't delivered due to a permission or security issue. It may have been rejected by a moderator, the address may only accept e-mail from certain senders, or another restriction may be preventing delivery.
------

The questions are: Why is this happening, what can be done about it, and how can I prevent this from happening in the future?

Why is this happening?

The key to understanding is to read every word of the bounce message. The good news is this message tells me that my mail server which when I wrote this was at the IP address 173.14.62.105 was blocked from sending mail due to its inclusion on a blocklist run by the good folks over at spamhaus.org - a block list I know well as I setup client systems to use it all the time. There are many services out there that try to eliminate spam from your inbox. Blocklists are setup to do this by making lists of mail servers that for one reason or another and by some rules setup by the blocklist provider they decide who are the naughty servers and who are the nice servers. There are MANY block list providers out there. Some like Spamcop rely on users to submit spam to their servers and if enough perople report the spam they will be listed. Some have known lists of spammers that run their own mail servers to send out spam. Some are proprietary and have some undisclosed method of deciding who are naughty and who are nice.

If I setup your mail server, you are likley using at least 4 blocklists to reduce the amount of spam and malware that gets into your mail server. Apparently Office365 or Microsoft365 or whatever they have renamed their services to after I wrote this article follows the same rules I've been following for years before AnyOfThose365s even existed!

Every blocklist is different, you have to ask the blocklist provider why were you listed, and for this particular provider they give you a way to look up your block via the web.

By going to that lookup URL and punching in my IP address, I was able to see the reason I was blocked:

-----
IP Address 173.14.62.105 is listed in the CBL. It shows signs of being infected with a spam sending trojan, malicious link or some other form of botnet.

This IP is infected (or NATting for a computer that is infected) with the Conficker botnet.

[--stuff deleted--]

Your IP was observed making connections to TCP/IP IP address 38.229.128.116 (a conficker sinkhole) with a destination port 80, source port (for this detection) of 14792 at exactly 2017-05-05 01:24:46 (UTC). All of our detection systems use NTP for time synchronization, so the timestamp should be accurate within one second.
-----

So the answer to what happened: The folks that run the spamhaus blocklist have setup a bunch of systems to detect if you have any systems that are running the conficker worm. They did this by setting up some systems that if anyone connects to them in a specific way - the way Conficker would connect to them - will list them on their block list as infected with Conficker and thus shouldn't be trusted.

And for some as yet unknown reason a computer that is somewhere in my network touched that detector at the date / time listed above, and thus my IP address was listed in their blocklist as infected and not to be trusted.

The problem for me and likely most people connecting to the internet - my firewall uses Network Address Translation (NAT) to allow most of my normal traffic from many different computers to appear at the same public IP address. A discussion of NAT goes way beyond this article - you can read more at the link provided.

From what Spamhaus gave me, I have when it happened and how it happened but to identify who - which specific computer - is to blame I have to turn to my own internal resources. Unfortunately, my firewall logs information but it didn't go back far enough for me to determine which system did it. And in reality it could have been a whole host of different systems - I have 3 Linux based MythTV computers doing my DVR functionality, network based TV tuners that feed information into the MythTV DVR, we recently introduced some IoT (Internet Of Things) type devices - light switches, wall sockets, dimmers, hub, etc. for home automation and control, wireless access points, cell phones, our guests have cell phones and we let them use our WiFi, I often have a client's computer here undergoing surgery, a bunch of servers I keep for both research and production, my own mail server, a network vulnerability scanning virtual machine, an Owncloud virtual machine, a couple of Amazon Echo and Echo Dots, two network printers, a NAS4FREE box with gobs of storage inside, an Amazon Fire TV, couple of iPads, 3 or 4 Kindles, a couple of Rokus, and of course my own personal systems - one desktop + 3 laptops for day-to-day operations. Oh and let us not forget some of the laptops actually boot into multiple operating systems and those VMs have various flavors of Linux, Solaris, Windows, CentOS, even one with Dos 6.22 & Windows for Workgroups 3.11!

In other words, a whole lot of possible things that could have touched that sinkhole and caused my appearance on the blocklist. Okay, probably not the Dos 6.22 or the Solaris boxes ... :)

Lately there have been a lot of bits of malware that have used the various advertising networks to infect systems ... so it is even possible that a stray banner ad somehow reached out to that sinkhole's IP address at port 80 and caused a false positive. In fact, I do wonder if I can trigger a false positive with a one-line telnet command.

Maybe I have a system infected, maybe I don't. So I went through the request for de-listing procedure they so nicely outline at their web site and see if the problem ever comes back.

Two weeks later, another email bounced:

-----
IP Address 173.14.62.105 is listed in the CBL. It shows signs of being infected with a spam sending trojan, malicious link or some other form of botnet.

This IP is infected (or NATting for a computer that is infected) with the Conficker botnet.

[--stuff deleted--]

Your IP was observed making connections to TCP/IP IP address 38.229.129.61 (a conficker sinkhole) with a destination port 80, source port (for this detection) of 20643 at exactly 2017-05-18 13:33:46 (UTC). All of our detection systems use NTP for time synchronization, so the timestamp should be accurate within one second.
-----

I have a saying ... if something happens once, it is a one-off. Twice? It is a coincidence. Three times? You have a pattern.

This was the 2nd time in two weeks. I got out my Jump to Conclusions mat and didn't wait for a third time before declaring this as a pattern.

What can be done about it?

Umm... find and clean the infected system(s)? Duh! I made the assumption that these are not false positives - that I really did have some system out there that was infected, though nothing gave me an indication that anything was wrong. Every system that has any kind of anti-malware indicated it was clean.

If you read up on Conficker, there are some detection methods. I ran through a bunch of them, but none showed me anything positive, though in all honesty I couldn't run every method on every system. I really needed a way to track which system was causing the listing before I could move forward doing anything other than throwing darts at the wall and hoping to hit the dartboard.

I had an idea - if I grabbed and saved every single packet as it comes in or out of the LAN side of the firewall, I could easily find the date / time of the conficker trigger and track back to which system sent the packet that got my network listed as infected. Except that is a whole lot of data - multi-gigabytes for watching a movie, every packet sent off-site for backup, every email in and out, every image I post up in my Flickr stream, all the Zoom meetings and remote control sessions, ... ugh. That would be a lot of data to sift through ... but it would catch my problem.

More research lead me to the doorstep of the netflow based flow analyzers. You can see / read about one here. I even setup a virtual machine to capture these flows in order to track the problem down ... except my firewall only supported sending out netflow packets if I spent another $800 or so on a license to enable it for a limited period of time.

Which lead me down a path ... my firewall CAN report all kinds of data, but can I put together a cheap (or free) solution that will gather data in the background and if / when this happens next give me some breadcrumbs that will lead me back to the offending system?

I found a solution that costs $0.00 in hardware and software!

Getting your geek on

This is going to get a bit technical. Actually more than a bit - if you've got a propeller hat put it on now and please keep your hands and feet inside the car at all times.

There has been a facility in the Unix world forever known as Syslog. Syslog servers receive messages from ... well ... anywhere. They can be internal processes that want to log some message about something or external devices that want to say something but don't have the ability to display it as it doesn't have a display or who/whatever you might need to send a message between two points. And my Sonicwall firewall supports syslog:

Pictures are thumbnails - click for a larger and more readable version!

This is not the firewall I have anymore - when I got gigabit fiber I upgraded to a TZ570 and his setting for the syslog server is here:

While I could run a syslog server on one of my Linux boxes, I opted to run it instead using a free syslog server I'd used in the past called Kiwi on one of my Windows servers with gobs of storage - you can find it here. Once installed, it was ready to receive syslog messages from anything on my local network!

(maybe talk a bit more about netflows? Could be a shorther path for most... then again most people's routers don't support netflows)

The sonicwall lets you pick what kinds of messages are sent to different places - log files, email, etc. I tried a few rules (unsuccessfully) before discovering the magic logging rule I had to enable. Curiously, when I tried to send this to a log file that was emailed to me it never worked, which is why I shifted gears to try doing this with syslog instead, which is probalby a way better idea long term as it turns out I'm logging a ton of data and that isn't what the email system was designed for.

That path to the magic rule to enable is Network-->Network Access-->Network Allowed

There is a rule I learned in one of my EE college classes - understand and test the limits of your test equipment before relying on them. The lesson applies in software and networks just as much as it does in hardware. Before relying on this to capture the event that I have to wait a few weeks before it recurs, I needed to test and validate that it would catch the event if it happened again. If I don't, I risk having to wait for another cycle to see if my test equipment is working or not. To test, I sent a packet I knew I could find easily and looked for it in my log file. I did a telnet to 8.8.8.8 port 53 from one of my MythTV boxes:

Testing firewall logging

and then looked for that session in my syslog file:

Syslog catches traffic?

The first line I ran before I did the test, the second was after I tried to telnet to 8.8.8.8. Apparently I fired off rule 98 when the connection opened and 1235 when the packet was allowed through.

"Wow, David, that is great... but how many home users do you think have a Sonicwall firewall at home?"

I'm glad you asked!

Most routers - home or office - do have some kind of syslog facility.

A few months ago I got AT&T 1 GB fiber installed here at my home / office. It uses an AT&T BGW320-500 router. I'm only using it as a network router in front of the Sonicwall but I looked for a few minutes and found this:

Before this we had Comcast (AKA: Xfinity) and I vaguely remember seeing a syslog in one of their screens. (Note to self: Next time I'm in front of a Comcast router get into the admin interface and see if I can find what screen it is at).

Worst case, download the manual for your router - should be a PDF file - and search for the word 'syslog'

Now you don't need some fancy firewall to detect these kinds of problems.

Anyway, back to trying to catch the problem child on the network ...

With this all in place, it is a matter of waiting to be listed again or not. A quick track of how much space is being taken up by this log per day to be sure I don't fill up the syslog server revealed a non-trivial amount of data gathered every day:

Con permiso, Capitan. The hall is rented, the orchestra engaged. It's now time to see if you can dance!

(time passes ... music plays ... insects are born and die ... hurry up and wait!)

And it happened again! I mapped some of the relevant data in the same colors between the message from the blocklist website and my logs so you can easily see the correlation along with the source IP address & MAC address in black bold italic.

It was June 1st 2017 when I received the semi-familiar information when looking up my IP address in the CBL list:

-----
IP Address 173.14.62.105 is listed in the CBL. It shows signs of being infected with a spam sending trojan, malicious link or some other form of botnet.

This IP is infected (or NATting for a computer that is infected) with the Conficker botnet.

[--stuff deleted--]

Your IP was observed making connections to TCP/IP IP address 38.229.146.78 (a conficker sinkhole) with a destination port 80, source port (for this detection) of 13388 at exactly 2017-06-01 12:48:15 (UTC). All of our detection systems use NTP for time synchronization, so the timestamp should be accurate within one second.
-----

Searching my syslog files, just showing the interesting things:

-----
S:\temp>find "38.229.146.78" syslogcatch*

---------- SYSLOGCATCHALL-2017-06-01.TXT
2017-06-01 08:47:42 Local0.Debug 192.168.20.1 id=firewall sn=0017C574D7BC time="2017-06-01 12:47:42 UTC" fw=173.14.62.105 pri=7 c=262144 m=98 msg="Connection Opened" n=5914974 src=192.168.2.145:59086:X0 dst=38.229.146.78:80:X1 proto=tcp/http sent=52

2017-06-01 08:47:42 Local0.Info 192.168.20.1 id=firewall sn=0017C574D7BC time="2017-06-01 12:47:42 UTC" fw=173.14.62.105 pri=6 c=0 m=1235 msg="Packetallowed by Access Rules" app=9 n=81921876 src=192.168.2.145:59086:X0 dst=38.229.146.78:80:X1 srcMac=60:67:20:d8:5a:e8 dstMac=00:17:c5:74:d7:bc proto=tcp/http rule="7 (LAN->WAN)"

2017-06-01 08:47:44 Local0.Debug 192.168.20.1 id=firewall sn=0017C574D7BC time="2017-06-01 12:47:44 UTC" fw=173.14.62.105 pri=7 c=1024 m=537 msg="Connection Closed" app=9 n=109629914 src=192.168.2.145:59086:X0 dst=38.229.146.78:80:X1 srcMac=60:67:20:d8:5a:e8 dstMac=00:22:2d:67:5f:36 proto=tcp/http sent=388 rcvd=489 spkt=5 rpkt=6 cdur=2550
-----

The black highlighted text shows the IP address and the MAC address of the offending device!

Now I have isolated the problem to one specific computer and can drill into it in more detail.

Things I thought I should look into at the time:

(curious note to self - does Sonicwall randomize the outbound IP port on the WAN side vs. the outbound port on the LAN/DMZ side? It sure looks like it does ... need to test this with the sniffer on both sides to confirm... This makes the port number way less useful. Would this show both port numbers if it was collected by a netflow? Things to look at maybe before publishing)

(more notes to self - I could add calendar and distance between infections - spidee sense says exactly every 2 weeks? Try confirming the pattern, might be a scheuled task / cron job on a system... or could be one that wasn't powered on for that time period. Need to check once it is identified....)

Anyway, the good news is now I have something I can look at on the local side!

The device wouldn't echo pings ... which made me suspect one of the IoT devices, but to shorten an exciting (!) story about MAC address lookup (this is an Intel device ... so ... that really narrows it down, doesn't it?), tracing the MAC address though switches, finding what port it is on, which ended up being a wireless access point, knowing exactly which access point to localize it, then manually checking a few systems ... and yes, it was exciting, but to confess I get excited when hunting down strange problems.

It was a client laptop computer that had some not urgent data I was asked to extract.

It was running Microsoft Security Essentials which didn't come up with anything on its own ... when asked to full scan the system, these two popped out:

Was this the malware? I wasn't sure at the time - but I'm not happy that MS Security Essentials (AKA: Windows Defender in its previous life) wouldn't catch it when it was first downloaded. We have to wait as the recurrence seems to happen about every 2 weeks or so. Box was scanned with Trend & another product I don't remember, nothing of interest was found.

(time passes)

Two weeks went by without an event. That system was eventually virtualized into a new Windows 10 system's virtual machine and one day maybe 6 months later when that VM was running my public IP address was again listed.

Considering this VM was setup just to copy old data in the rare instance that data is somehow not already on our network as this system was shut down and the cost in time of digging into the dark corners of all the millions of places evil people hide malware I didn't think it was very cost effective to try to hunt this down and kill it. We kept it isolated from network communications and the problem hasn't recurred since those few times in 2017. I have written before about hunting malware and the proof it is no longer cost effective to try and clean given you can never be 100% sure you've found and killed all the malware bits and his buddies he downloaded at the same time. Pull your data, wipe, and re-install - or never log into your bank on that system again.

Other ways this could have been found

This is by far not the only method for locating such problem points. Another option would be capturing ALL data going in and out of your network with wireshark. I've probably called out Wireshark in 1/3 of the articles I write as I'm often called into companies where strange things are happening and there is a tendancy to 'Blame The Network' and Wireshark is often the tool I use to discover what box is causing all the problems.

If you are curious, here and here are samples of simple and complex problems solved with Wireshark.

I've often used Wireshark with a round robin data file set captured to a VERY BIG hard drive to capture all data in and out of a network device to locate the source of some intermittent problems that only happen when nobody is looking at them.

How can I prevent this from happening in the future?

At the risk of stating something blindly obvious, don't get infected with malware. Easier said than done - especially if this ends up being a web page's ad from an ad network giving you a false positive.

You can certainly put in place even more technology to try to detect / block these kinds of items but I've said it before "The bad guys are getting better at being badder faster than the good guys are getting better at being gooder". When it comes to tracing down the source, your only real option is to implement some kind of logging setup at the ingress / egress point of your network. For most of us should be your firewall and / or router.

As you can see from the example I traced through above, the firewall's long term stored syslog data allowed me to quickly zero in on which IP & MAC address sent out that evil conficker registration. The MAC address is extremely important in networks where DHCP is utilized as a system at one address one day can be at a completely different IP address another day. Without knowing the MAC address, you might spend / waste a whole lot of time and resources looking at a pristine system.

Email bounce messages are frustrating and getting on block lists can be a killer to a business that relies on email.

The key is to set this up BEFORE you need it. This kind of forensic evidence is essential. If you need to do this because you have an infection and this setup is beyond your capabilities, I can probably do all this for you remotely. Check the Contact Us link up top.

If you found this helpful or not, please send me a brief email -- one line will more than do. Or more! I love hearing tidbits from users I've helped. Maybe share a line of what you searched for or how you found this article.

I can be reached at:

das (at-sign) dascomputerconsultants (dot) com

Enjoy!
David Soussan