Event logs
I'll always check the event logs first. Remember, I
was looking for something about Front Page and IIS and publishing a web
site to the server, and this is what
I found:
Event Type: Error
Event Source: Server ActiveSync
Event Category: None
Event ID: 3015
Date: 10/26/2012
Time: 5:08:32 AM
User: N/A
Computer: SBSDUAL833A
Description:
IP-based AUTD failed to initialize because the processing of
notifications could not be setup. Error code [0x80004005]. Verify that
no other applications are currently bound to UDP port [2883], or try
specifying a different port number.
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Event Type: Error
Event Source: Server ActiveSync
Event Category: None
Event ID: 3024
Date: 10/26/2012
Time: 5:08:35 AM
User: N/A
Computer: SBSDUAL833A
Description:
IP-based AUTD failed to initialize. Error code: [0x80004005].
For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.
Both eventIDs 3015 and 3024 repeating every 5 or 10
minutes over and over again... all day, all night long:
Server ActiveSync is how Exchange communicates with
mobile devices to tell them email has arrived.
The first event copied above makes WHAT is happening
fairly obvious - some other service is sitting on UDP port 2883, which
happens to be the magic UDP port that Server ActiveSync wants to grab
onto, so it can't. Which is why my phone wasn't telling me when an
email came in. But what could possibly grab it?
I'd completely forgotten that I can't really use TCPView
on the servers because of how many lines it wants to display and how
long the screen takes to update... and I was reminded of this problem
when I tried it. The screen plotted so many lines on the screen it
couldn't keep up with the updates. I know this sounds unrelated, but
you'll see in just a minute how it is all connected.
So back to the old stand-by, netstat!
So now I knew the process ID #1692 was sitting on port
2883. Task manager showed that process ID 1692 was DNS.EXE!
(The keenly observant among you will notice that dns.exe
has a process ID (PID) of 8052 and not 1692. That is because I grabbed
this print screen after I'd already stopped and started the DNS server
service. Little did I know I'd write up an article on this, so I
neglected to get the print screen prior to the restart. I could have
fudged the screen, but ... )
Port 2883? Since when does DNS use port 2883? It is
supposed to sit around on TCP 53 and UDP 53... And maybe chat with other
DNS servers to keep things in sync. At least that's what I'd read in the
past or seen when I've sniffed data.
Curious, I asked netstat what other ports was process
#1692, my DNS.EXE server service, listening on and was shocked:
this continued on for pages and pages till we got to the
bottom:
Yes, it was all over the place.
Stopped it, told Server ActiveSync it can start up again (for the
curious, on SBS 2003 and Server 2003 is serviced by w3wp.exe on UDP port
2883), and all
of a sudden my phone dinged again when an email arrived. Quick look in
the event viewer and:
Yes, the process did initialize itself!
Drilling into the
root cause
Sometimes just fixing a problem now is good enough. When
there is a fire burning, you put it out NOW! But if you
fix the symptom and don't find the root cause, chances are the problem
will happen again. If you don't fix the reason the fire started, chances
are it will just start again sometime in the future. I've met a lot of "system administrators" whose
fix-it tool of choice is to reboot and hope it doesn't happen again.
I'm not one of them! In fact, I have a saying - if
something very unlikely and strange happens once, it is a "one-off" and
you can usually ignore it. Twice, it could be a coincidence. Three
times, it is a pattern. Pattern failures that take a long time to recur
are incredibly frustrating as you can't really say if something you did
fixed the problem or not until a lot of time has passed without the
problem recurring. They are puzzles, and I like puzzles - and this
looked like it was going to be a good one. So searching for the root
cause was the next task! I've seen this enough and now I have a nice
clue - the question is why and what to do about it.
Some google searches for "server 2003 dns
listening" gave pointers to various articles:
http://technet.microsoft.com/en-us/security/bulletin/ms08-037
http://support.microsoft.com/kb/953230
http://support.microsoft.com/kb/956188/en-us
The 3rd link had gold, in the Detailed Cause section:
"The implementation of the DNS server security update
reserves a set of ports when randomizing queries. This design decision
was made to address performance concerns for DNS servers that handle and
originate a significantly larger number of queries compared to
Windows-based clients. The set of reserved ports by the DNS Server is
referred to from here onward as a "socket pool." The default size of the
socket pool on Windows-based servers is 2,500 sockets."
Ok, so DNS.EXE is going to latch onto 2500 random port
numbers between 1025 and 65535... hmmm.. And what prevents it from
stepping on some other service's port?
Some quick math (2500 / (65535-1024)) * 100 = 3.875%
chance that on any given boot DNS will step on a port some other service
wants, assuming it only wants one port and that DNS started before that
other service did.
As "Mr. Magical" Marshall Brodien used to say, "Its
always easy, once you know the secret!"
Curious, I looked and another 2003 server (not SBS) was
doing the same thing, only his port started way higher:
(thumbnail image, click for one larger)
The other server's random port usage started higher - around
49,000 or so.
Later, I also found a link an article discussing what
ports needed to be reserved on an SBS 2003 server:
http://support.microsoft.com/kb/956189
Problem Summary:
By design DNS grabs a bunch of randomly numbered ports.
This can step on ports that other services are pre-programmed to use,
which if DNS starts before any of those other services then there is a
3.875% chance that other service will not run correctly because DNS
hijacked the other service's port.
What to do about
this...
So my real questions:
1) How can I prevent the random port grabbing from stepping on my
server's toes?
2) Can I make TCPView usable again?
Reading more of the articles, I can tweak this:
The default size of the socket pool on Windows-based
servers is 2,500 sockets. This size is configurable by modifying the
SocketPoolSize registry entry in the following subkey in the registry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\DNS\Parameters\SocketPoolSize
That would make TCPView more usable. The lower I make
it, the more I'm sacrificing the added randomness Microsoft implemented
to prevent the TCP Cache Poisoning vulnerability.
The other thing I can do is tweak the ephemeral port
allocation range... and still more reading:
"After you install security update 953230 on Windows
Server 2003 and down-level platforms, the following conditions are true:
* If the value of the MaxUserPort registry entry is set, the ports are
allocated randomly from the [1024, MaxUserPort] range.
* If the value of the MaxUserPort registry entry is not set, the ports
are allocated randomly from the [49152, 65535] range.
The MaxUserPort entry is at:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\MaxUserPort
Server 2008 uses the range of 49152-65535 no matter if
the update is installed or not. And you can set or change it via:
netsh int ipv4 show dynamicport [tcp|udp]
So I'm guessing I do have MaxUserPort defined on the SBS
2003 server and no MaxUserPort defined on the stand alone 2003 server.
I did not have SocketPoolSize defined on either server -
so I created the DWORD, and set it to 50. Now TCPView works again!
As for the MaxUserPort parameter ....
My SBS 2003 server did have the parameter defined:
which explains why he would sometimes (3.8% chance) step
on ActiveSync.
My stand alone Server 2003 box did not have it defined:
So he was OK.
Why would SBS 2003 define that parameter?
Setting up the system to have more than 16,000 ephemeral
ports ? Not sure - found references that say I
shouldn't delete the key,
but no real reason why. I really respect Susan Bradley, the SBS Diva and
her knowledge - but this one I'm going to disagree with for now. I might
change my mind later, but right now it is "Delete the key." None of my clients run their SBS Server with
more than 20 people, and if that isn't enough ports to use then I'm
thinking they've been hacked. So for now, I'm going to delete the key
from the SBS server and see what happens.
Stay tuned! I might have to change my mind if this
proves fatal!
More to come...
UPDATE 8/2014:
It has now been 22 months since I fixed this. Not once
has the problem recurred. I've seen no bad effects from the changes.
This exact fix has been applied to numerous client computers, none of
which have reported any ill effects. I'd
have to say this problem is now solved!
If you found this helpful, please send me a brief email -- one line
will more than do. If I see people need, want, and / or use this kind of
information that will encourage me to keep creating this kind of content.
Whereas if I never hear from anyone, then why bother?
I can be reached at:
das (at-sign) dascomputerconsultants (dot) com
Enjoy!
David Soussan
(C) 2012, 2014 DAS Computer Consultants,
LTD. All Rights Reserved.
|