My iPhone can normally go a whole night and still
have plenty of juice the next morning. Thursday morning, 11/29/2012
at 10 AM EST I was in the car driving from Birmingham MI to the
northwest suburbs of Chicago. A client with 300-some-odd systems, a
rack of servers, terminal servers, etc. contacted me on Monday as
they appeared completely infected with some botnet type malware they
couldn't get rid of. I worked with them remotely on Tuesday and was
to be on site ASAP, which happened to be Thursday afternoon.
There is likely another story to be written about that
incident.
While on site, I noticed my iPhone was very low on
power. It was fine when I left and 6 hours later it was at 100% being
kept charged while in the car. But a few hours on site and I'm down to
50% battery and wondering what is going on. Plugged back into the wall,
when we finally quit around 2:00 AM Friday (yes, it was a VERY long
day...) the phone started at 100% charge and while going to my client's
house for the night within an hour was down to 80%. And this was sitting
in my pocket not doing a whole heck of a lot.
I did the usual - close all apps, reboot the phone, turn
off bluetooth and wi-fi and the GPS map and other things known to suck
batteries, and in the morning I was down to practically nothing.
Given the dragons we were battling, the phone issues
would have to wait until I returned home. In the meantime, the AC outlet
or a nearby USB charging port was my phone's best friend.
Botnet dragons slain, I got home late on Saturday night and
Sunday morning started poking into what was up with the iPhone.
And that is when things started to really get
interesting.
The Art Of Debugging
100:
Big systems are ... well... big! They have lots of parts, any one of
which can be the source of any number of problems. When you have a big
system problem, the key is to divide it into smaller subsystems and try
to eliminate some of them with tests. For example:
If you come home, open the front door, and flip the wall switch to
the floor lamp on and it doesn't light up, what could be the problem?
Bad bulb, loose bulb, missing bulb, broken lamp switch, unplugged lamp,
bad wiring, switch turned off at the lamp, no power to the circuit, bad
circuit breaker, no power to the house, no power to the neighborhood,
...
You probably do some mental diagnosis without even thinking about it
- if when you were driving down the street you saw nobody's street
lights were on and the whole neighborhood was unusually dark, you
probably already figured out there is a black-out in the neighborhood.
No need to check the bulb, lamp's plug, switch, breaker panel, etc. as
you know the house doesn't have power.
If the neighborhood had lights but when you walked into the house
your light wouldn't turn on, you'd go for another light in the same or
nearby room. If that lights, you know the house has power - if it
doesn't, you'd suspect something that is common to both lights.
You take a big problem, cut it into smaller problems, group them
together into clusters of symptoms with common causes, then come up with
a test that can rule in or out entire groups of problems. It is the same
concept for the medical field, automotive diagnostics, computer systems,
the Mars rover, and hundreds of other disciplines.
It is all about solving puzzles. And I like solving puzzles!
The Art Of Debugging
100 - plan B:
When something that was working perfectly all of a sudden stops
working perfectly, the first question you should ask is "What changed on
or about the time things stopped working?"
Most of the time, I get the answer "Nothing changed..." but often
times something did that seemed unrelated and so it was ignored, but in
the post mortem analysis the clues were there and the problem really
wanted to be found, but nobody knew there was any connection. I mention
that here just because it is often times a much quicker path, undoing
what changed recently and see if the problem goes away.
In my case, nothing changed with my iPhone short of loading up some
new episodes of the Security Now podcast for the drive.
Diagnosing the iPhone
battery life problem
The first thing I did was take a baseline - I know intuitively the
iPhone battery is draining fast, but if I don't measure it to see how
fast than how will I know if I've fixed it or not?
We call this taking a baseline measurement. Actually, a baseline is
typically "What is it when it is good?" and if I'd had another iPhone 4S
of similar size and age, I could have measured that .. but I knew mine
was draining quickly, so my question was how quickly is quickly? That
way when I try something, I can take another measurement and know with
relative certainty if I've fixed the problem, if it is a little better,
or if it is still laughing at me as I haven't found it yet.
Here was my baseline: From 100%, minimal use, sitting on the counter
in the kitchen and not using it for 2 hours, what is my battery at?
If you look at that screen and realize I started with 100% power 2
hours prior, then I lost 33% in 2 hours ... which means my phone would
be dead in 6-7 hours, which corresponds with what my intuitive sense of
"My phone's juice is leaking all over the floor!" was telling me.
With my previous iPhone 3GS, at one point it had stopped showing its
photos on some of the PCs I use when plugged in. During that time, I
learned besides the warm and cold boots often times something gets
horked up inside and you have to do a full restore to bring it back to
normal operation. So I plugged it in and did a full backup, then a full
restore.
Given the rough 16% drain per hours, I relatively quickly figured out
after the restore was done that my phone was still leaking its juice at
an incredible rate.
Divide and conquer
From my engineering background and general knowledge having worked
with electronics since age 7, I knew that the screen's backlight was
probably the #1 battery drain. #2 would be the radios (cellular,
bluetooth, wi-fi), and #3 would be interacting with the user causing the
CPU to have to think.
There is also a principal that says when doing tests, you should make
one change at a time and take your measurements. If you make 5 changes
at once and the problem goes away, which one or combination solved the
problem? That is a good general rule to follow when you are starting
out.
The screen was dark during most of the first tests, so that shouldn't
be it unless there was a hardware problem inside the unit. Actually,
"Unless there is a hardware problem inside the unit" is pretty much
possibly the cause of anything, so I'm not going to say that anymore -
you can just infer that is always a possibility, and indeed I had a
strange feeling that would be where I'd end up.
So the next test involved verifying I'd shut all apps down, and I
also turned off the wi-fi and bluetooth.
Same problem. Recharge, and the next test turned off all the radios -
I put the phone in airplane mode.
And after an hour sitting, I still had 99% battery!
Radio Transmissions
are very expensive!
Believe it or not, it takes very little energy to receive radio
transmissions. In the old days, we made a
crystal radio
out of a diode, earphone, and a length of wire tossed over a tree.
Transmitting a radio signal is a completely different matter. That
takes juice!
Did my phone have a bad transmitter? Was it stuck transmitting
continuously and thus sucking my battery dry? Or was a receiving antenna
down in my area and my poor phone was upping the transmit power to hit a
tower much further away? Probably not - the problem was with me at two
locations in Illinois as well as back home in Michigan, so for three
towers to be down near my three locations was a low enough probability
event that I could rule it out.
It was able to make a call just fine, so it wasn't likely a stuck
transmitter. But clearly the radio was closer to the root of my problem.
eMail and push
The iPhone can either poll for mail periodically or you can tell your
mail server (if it supports this feature) to maintain a connection to
your phone and push mail to it as soon as it arrives. I've run with push
turned on for years, and yes it does drain the battery quicker than
polling - but not as quickly as I was seeing. So my next stop was to ask
my server what it was doing with my phone, perhaps get a clue where to
look next.
Apple licensed a technology called ActiveSync from Microsoft, and
that single software license is probably why the Blackberry is all but
dead in the marketplace. RIM (Research In Motion) licensed their
software to let your blackberry talk to your Exchange server and get
email long before the iPhone was around, and since blackberry was the
only game in town you paid for it - and paid a lot! If you had one
blackberry, your cost was around $600 just for the base installation.
Every user added more to those costs.
Microsoft's ActiveSync is a web based interface that is pretty easy
to figure out, integrates with Exchange, and runs right from IIS
(Internet Information Server) on the same server as Exchange, much like
Outlook Web Access. Except it throws out just the data and lets the
phone ask for other things. This is a greatly simplified view, but good
enough for the purposes of this discussion.
I'd poked around in the log files before - IIS leaves lots of traces
of what happened, and when investigating systems that were web hacked,
or seeing what juicy bits the log files have, or seeing how ActiveSync
works in order to test and debug it manually, I had more than a passing
familiarly of analyzing the logs.
So my first stop down this path was to see just what was going on
with ActiveSync - was I having a problem there? Here is the directory
along with the log files:
The files are named "exYYMMDD.log" so the log for 2012-11-25 would be
named "ex121125.log"
Do you see any pattern? The size of the files on a daily basis is
5-10 MB. On 11/29, the log is 60 MB! 11/30, 219 MB!
Showing everything inside the files is less interesting than some
simple analysis of the logs - I filtered out just one of the entries
that happens regularly to see how often that request is made from the
phone to the server. That query on a good day looks something like this:
Date
Time
Server IP
Cmd URL (LOTS of stuff deleted off to the right)
From this, you can see the phone is making queries back to the server
5-8 times per hour. More if I'm hitting the check now button, but
consider that normal. Now look at the same log only from 12/2:
In case you don't want to count, that is 19 queries in one MINUTE!
Some more analysis... I counted up the queries on various days both
good and bad:
11/25: 150
11/26: 182
11/27: 265
11/28: 234
11/29: 5671
11/30: 22,289
12/01: 18,802
12/02: 14,916
Curious - it started on the day I was driving to
Illinois! I didn't think I had the problem when I was in Michigan... I'd
like to think that if my battery was draining that quickly I would have
noticed it. I was working from home Monday and Tuesday, and Wednesday I
was at a Microsoft seminar all day downtown - and I thought I was
draining a little quicker as I remember plugging in, but that could be
because I was inside a big building and transmitting out to the towers
takes more power to get out of the building than in open air.
So when on 11/29 did the problem actually start? Going
back to the logs and filtering:
These time stamps are 5 hours ahead of the time here
(GMT vs. EST), so I have to subtract 5 hours to correlate the time to
when something really happened.
So 0:14 (12:14 AM) - 5 hours = 7:14 AM on 11/29/2012.
Hmmm... I happen to have left the house for a meeting
right around 7 AM. What was happening in the logs around when I left
that could have caused this? Now I have to look inside the logs without
filtering on just that one GET command... which I did, but didn't find
any smoking gun. Then again, I wasn't looking for great details.
You are almost a computer crime forensic
investigator at this point. Except instead of looking for what bad
person did the evidence of what / when they did it, you are looking
for what a bad computer did and what / when / why it did it.
Show Me The Solution
Already!
I don't know why the iPhone was doing what it was doing. I don't have
inside access to the configuration files and stuff in that hand held
computer - the iPhone is a powerful computer that
happens to also make phone calls. So unless a tech support person from
Apple calls and gets the backup file from me and does some analysis on
it to find out why it was asking for data constantly, we will probably never know why
the iPhone was so chatty.
Good thing I'm on an unlimited data plan! And caught this early!
I'd be royally pissed off if I also had to pay hard cash for the
data that my phone decided it wanted to suck off the net
continuously.
But the answer was to delete the Exchange account and re-add the same
account, which you can see me doing in the log:
Before the blank lines, you can see the queries running away, then I
delete the account and delete the notifications, and then re-add the
account. This happened around 4:40 PM, and all has been great since.
In fact, here is another screen shot showing my usage, how long on
stand by, and my battery percentage:
This shows the phone has been on standby for longer than before
(2 hours 5 minutes earlier, 2 hours 37 minutes now) and has 95% of
its battery left. Curiously, and something I didn't notice before,
the usage previously shows 2:03 ... even though the phone was
sitting idle. Perhaps that is an indicator that you've got a
run-away Active Sync happening and need to delete and re-add your
email account?
Hopefully I never need to find out again, but if you've come across
this and it is the answer to your problem, please send me an email with
some details!
Epilogue:
If you read all the stuff above this section, you hopefully figured
out that I know my way around systems and can drill into them when they
are misbehaving.
What scares me are all the companies and non-technical people
that now rely completely on technology and expect it to just work.
And when it doesn't just work, where can they go for help?
The web is certainly a wonderful resource, but how many
non-technical people know how to search out solutions to these kinds
of problems? I didn't find anything pointing me where I ended up
(and yes, I did more than a few searches in multiple search engines
and came up dry except for 853 PAGES with 15 posts per page about
the iPhone 4S's battery life and people complaining, throwing
solutions around, etc. You can read it at
https://discussions.apple.com/thread/3391947?start=0&tstart=0)
So what is a normal person to do? Not much. Sadly, right now you
are at the mercy of the rest of the technical community.
But worse than that - some of us in the tech world talk about
"cloud" this and that, saying you should all be moving to the cloud.
Can you imagine trying to discover this problem if all your data was
in some other company's hands? You'd call into their technical
support number, who mostly work off scripts and databases that all
work users through all the user's problems (and most of them are
user problems), but in this case they'd have to running around
changing settings and turning off Push and checking all the
application settings and doing a whole lot of nothing even remotely
related to the root cause of the problem.
I do think that we will be using cloud services in the future,
much like today we don't generate our own electricity.
But today, to put all those eggs into the cloud basket? Not if
you are my client! I want your systems to work better and more
reliably than that.
But the cloud is the subject of another article.
Final Words...
If you found this helpful or not, please send me a brief email -- one
line will more than do. If I saved you a bunch of time (and thus $$),
and you wanted to show appreciation, sending a little love via PayPal or
an Amazon gift card is also very much appreciated!
I can be reached at:
das (at-sign) dascomputerconsultants (dot) com
Enjoy!
David Soussan
(C) 2012 DAS Computer Consultants, LTD. All Rights Reserved.
|