My client
updates this particular project every year since I first created
it back in 2005. Every year when there is new data available,
custom written VBA code cleans out the existing tables and re-populates
them with data gathered in the current year. It does this by connecting
up to an SQL database and reading one table a record at a time and
splitting up that record into various tables inside of Access. The
details I think would make a fascinating application discussion, but
that isn't why you are reading this. Suffice to say, from 2005 till 2008
the "Clean and Import" function has worked quickly and without issues
since I first coded it.
Until this year (2009).
And now for
something completely differentTwo weeks ago (4/2009) I got a
call that said the clean and import function was hung up and not
working. In the middle of coding up the 100-some-odd new Crystal Reports
for this year, rather than stop and diagnose I instead ran it from my
desk which took all of about 20 seconds total and sent them a snapshot
of this year's data, leaving the diagnosing of what is happening in
their world to another day. While doing development, I typically
develop, test, and run the whole application on my desktop. Occasionally
I'll push it to the server and run it over my LAN, but most of the time
I get the best performance running locally so that's how I run it. Last
week, I remotely accessed a problem system and watched it completely
hang up. Or at least that is what it looked like.
Thus starts the diagnosing of "what is wrong with my database?" -- or
at least that was the client's perspective -- that I had a problem with
my database hanging up and not importing records now.
With the function running slowly on the client's workstation, I hit Control Break and got into VBA on
the line it was hung up on, and found that it wasn't hung up at all --
just moving as if its feet were walking in deep mud.
I instrumented the
code block with display counters and a timer and copied the database
project to the local workstation. When run locally, the "Clean" portion of
the "clean and import" function ran in 7 seconds on the client's
desktop. Copied up to their server and run from the same desktop, Clean
completed in 260 seconds. So I replicated the test -- same code, same
database, only I copied it up to my server and when run from my
workstation Clean ran in 14 seconds.
To put things into perspective, this is a 2x hit in performance in my
environment -- which is a 100 Mb/s network, with the database sitting on
the other side of a firewall on my DMZ. Oh, and that server is an SBS
2003 server that is also an Exchange 2003 server. The server is a many
year old Dell Poweredge 2400 dual 833 MHz Pentium III system. That
server is also running SQL 2000, is a web server for a whole host of
sites including the page you are reading right now, and has a few other
functions in life.
In other words, it is some rather long in the tooth hardware with
lots of other stuff going on and has all data filtered through a
firewall, so given that I only see a 2x hit in
performance I think it is doing quite well for itself.
Which isn't justifying the speed hit -- but pointing out that if
anything my client should see better performance than I see as their
hardware is newer and they are directly on the same network segment with
no firewall between their client and server systems. In fact,
they used to see more than acceptable speed in years past. Which should
lead everyone to the exact same question:
When something that used to work fine stops working today, what changed
between then and now?
A lot, actually.
They had two IT folks last year (2008), lost one and got two more. The new
network person has since re-built their main data servers, their
Exchange 2003 server, implemented numerous group policies, built new
images for deployment to many workstations some of which are now on
Vista, moved their ACT! database off to its own server, pulled their web
site to a stand-alone server on their DMZ, and started a pilot group of
Office 2007 with selected staff members.
I think they still have the same network switches, firewall, router,
network cabling, and power cables for the PCs. Though those are a year
older.
In other words, almost everything has changed. With so much
different, the problem could literally be anywhere.
I love diagnosing these kinds of problems! It is a treasure hunt, and
I learn a lot while doing it.
The slow running code
Here is one of the loops that runs slowly. In VBA I created a
recordset for the records to be deleted and then:
Do While Not rsClients.EOF
rsClients.Delete
DelCount = DelCount + 1
If MeasuringZapTime Then StatusUpdate ("Client Delcount = " &
DelCount & " ")
Loop
Yes, I know, there are more efficient ways to do that. I could have
done this instead ... DoCmd.SetWarnings False
DoCmd.RunSQL "DELETE * FROM tblClients
WHERE <fill in my conditions>"
DoCmd.SetWarnings True But the
point of showing this isn't a lesson in VBA, but to show the slow loop
isn't exactly doing anything incredibly complicated. It is just deleting
a bunch of records. One at a time.
Searching for the solution
I did what most good IT, networking, developer, and support folks do
when they hit a new problem... I jumped onto the web and searched for
quite awhile and came away with a whole collection of possibilities for
this slow behavior. Grouped into
piles, there were:
Network:
- Cabling problems
- Noise
- CRC
errors
- Retransmissions
- Overloaded network
- Chattering auto-negotiations
between switch and NIC
- Collisions
Desktop issues:
- Viruses, malware, spyware
- Application repair / reinstall
- Operating System level problems
- TCP Stack Tuning & optimization
- Dirty registry needs cleaning
- OS service packs / hotfixes
- Application service
packs / hotfixes
- ...
Server issues:
- Viruses, malware, spyware
- CPU loading by other tasks
- Disk contention / thrashing
- Buffer allocation
- Disk queues
- Shadow copy versioning
- Old hardware
- Conflicting new features like DFS
- Service pack level
- AV software
- "You need more RAM"
- "You need to buy new servers"
- ...
Of course, my
code can also be at fault:
- Inefficient database access methods
- Memory leaks
- "You should do this
with SQL server and not Microsoft Access"
- "You should not use an .MDB file over a network"
- ...
There are many reasons your database might be slow over a network,
and a whole pile of them are covered by all the other web sites and
pages you've no doubt found already. It can also be a combination of
items listed above.
My client had none of those problems. What I found was something that
impacts many different databases - this problem doesn't just
hit Access 97, Access 2000, Access 2003, and Access 2007.
I have the answer! At least for this particular case, and have
diagnosed & resolved many other serious slow database cases. If your company is experiencing
a slow database problem,
I'd love to help out. The real work isn't in applying any one fix -
it is in correctly diagnosing the root cause of the slow database issue
and thus correctly making the fix without breaking anything else or
making a bunch of unnecessary changes while searching for the problem.
As the list of possible root causes spans the gamut of desktop, server,
hardware, software, drivers, and all the networking components between,
if you know one or two and not all you can look for days and never find
the real issue. I was able to make my client's database
operations speed up by non-trivial amounts. One particular operation I
timed at 142 seconds in their environment now take 14 seconds. And remember -- this was on
the same code that had worked fine and fast for years. That is 10
times faster! Your mileage may
vary. I can do this for you, too! And if the problem isn't the one I
found, chances are I can isolate the root cause and make things work
like they should or point you in the right direction so performance is
what it should be. A totally different problem
with similar symptoms
A different client that had a massive data import function written by another
company that took 5 hours to run. They called me. That import function was written years ago by a person that knew enough
about databases to hack together a solution but not enough to make
something that works as it should. What was really interesting about this was a
gradual slowdown - as
the import progressed, each successive block of 400 records took longer
to import than the previous 400 records! It got even worse when you
crossed ~22,000 records. I instrumented the code, ran an import, and
gathered data on each block of 400 records. This graph shows the measured
slowdown on actual imports:
The first 400 records took < 1 second to import, as did the next
400 records. Records 10,000 through 10,400 took 5 seconds. Records
30,000 through 30,400 took 20 seconds. Records 45000
through 45400 took 45 seconds! Add it all up, this particular "small"
50K record dataset I
was using to measure performance took 1.5
hours to import. The whole dataset had 5 of these to fully import the data before
they could process any of it. And if something was wrong with the data,
they'd have to do it all over again.
In this case, after evaluating the existing import code I recommended
completely scrapping it and writing it from scratch. Which they were
skeptical about - this had been running fine for years, it was just slow
- can't I just make it faster? Trust me...
A week or so later the first time the client saw the new code I wrote for them run, they
hit the button to start the import and after a few seconds asked "Is
there any way you can put up a status window so we'll know how much
longer it will be until the import is done?" She had just finished that
sentence when the window popped up saying the import was done. "That
couldn't have worked. It was too fast." but when she checked it,
everything was perfect. "Wow - if it is that fast I guess we don't need
a status window!" Now they import 50,000 records in under 30 seconds! To accomplish this, I ended up rewriting the import code
completely and using an entirely different method than the original
engineer did. As an aside, this particular problem wasn't a network,
server, desktop, configuration, or user error. It was some poorly
written VB code and not understanding how to properly interface Excel
and Access. This
client's ROI on the changes made to his system was about a week of
importing, and after that it is all profit. In fact, his exact words
were "Wow - I was going to hire another person, but now I don't have
to."
Click here to find out how you can contact me, engage my services, and
make your world a whole lot better than it is today. Most of my
articles are freebies, some of them quite popular. Search "Install
Server 2003 Intel desktop board" or "Volume shadow copy broken" in
Google - they help out > 400 people every month.
Slow database has many potential issues, each with its own solution.
The first problem described has a fix that is
potentially a lot more dangerous to put out there as a do-it-yourself
guide, so I'm not publishing the details. If I did, I know lots of
folks would just try it and see if it worked, and maybe not change
things back if that wasn't the problem. Then you'd find there were other
issues because of the side effects of this fix, and before you know it
people would tell me that my fix was really their problem. The first
rule is to "Do no harm", and spray-and-pray is not how I operate. The
second example required re-writing the VB code, also not something for
the DIY crowd unless you write software for a living. Correctly diagnosing the
root cause of slowdowns in not a simple DIY operation, and since there
are so many possible causes for a slowdown I can't recommend anyone
randomly throw various solutions at the problem hoping to find the right
one. You can certainly try and throw darts at the dartboard, and you
might hit the target, or you might make something else a whole lot worse
and not realize that until it is too late. Given the problem
could be in so many places, each one requires its own tests to confirm
or deny each possible problem source. I've been doing this kind of
things long enough that with one network sniff I can see a whole lot and
in some cases know where to look next. Trying any solution randomly without
diagnosing is like taking your car in to be fixed with the problem
"Won't start" and getting a bill for $8,000. "Well, first we replaced
the engine and it still wouldn't start. Then we replaced the battery and
it wouldn't start. When we replaced the battery cable, it started just
fine. Sorry your bill was about $7,950 more than it should have been.
You can pay the cashier and she'll give you your keys. Thank you, come
again!" If you are really set on trying to figure this out yourself, I
suggest you watch the webcast I did for Microsoft
on debugging with a network sniffer! It is free! The
webcast can be found here. Inside is a real world database slow
access case that had a bunch of groups pointing fingers at each other,
and a couple of the solutions would have involved upgrading equipment
and yes, the problem would look like it had gone away - but that
solution would only delay how long before the problem would appear
again. That case, while very interesting, is not any of the problems
described above. Here is a VERY quick setup and test you can do - put
Wireshark on the server side, just your client and server - nothing
else. Start your client up and get it right to the edge of where the
slow problem will appear. Then start the capture on the server, do
whatever on the client system demonstrates the slowness, then stop the
Wireshark capture. Look at the data! Are you seeing a lot of data
transferred? As in 'Too much for the operation I asked the system to do'
then look at your queries, index files, etc. for something not right.
Are you seeing the client send a query then a long pause before the
server answers with results? You've got a problem on the server side.
Are you seeing the client send a query, server answer right away, and a
long pause before the client asks for another blob of data? You are
looking at a client side problem. There are so many moving parts in
this scenario you have to start by
binary searching
for the problem - cut the world into two halves and find out which half
of the world contains your problem. Then cut that half of the world in
half and repeat until the large problem is narrowed down to one
partcular area. If everything I've written sounds like gibberish, then
call me! If you've tried other fixes and your database
is still slow, contact me with some of the details. There is no downside -
if I can't help, there is no charge. I can do all the work remotely, but
if you are in southeast Michigan I can do the work at your site as well.
Hope to hear from you soon!
I'm also experimenting with the whole social media thing. If you liked
this and have a facebook account, please show a little love and like the
article:
David Soussan
(C) 2009, 2010, 2014 DAS Computer Consultants, LTD. All Rights Reserved. |