Sampling of Soussan’s “Toolbox”
Throughout Soussan’s engineering career, he’s “touched so
many tools that now it’s at such a critical mass, I can move
pretty seamlessly from one program to the next.” Nonetheless,
here is small sampling of what is in his toolbox:
-
Borland
C++ Builder & many other C compilers
-
Borland
Delphi
-
Microsoft
Visual Basic (6, 4, & Office VBA, Visual Studio 2005,
Visual Studio 2010, Visual Studio 2013)
-
Windows
NT (3.5 through XP & Vista, NT server 4.0, Server 2000,
Server 2003, Server 2008, SBS Server 2003, Server 2008,
Server 2012)
-
Windows
CE.NET
-
Solaris
7 & 8, administration & C, C++, Informix ESQL/C,
Shell programming
-
Active
Server Pages (ASP), ASP.net
-
OS-9,
OS-9000, 68xxx and ARM processors
-
68HC11 & other
68xxx Assembly
-
Motorola
DSP56002
-
Intel
assembly from 8080 through Pentium architecture
-
Local & Wide area networking (physical connections,
routing, DNS, mail pointers, etc.)
-
Wireless (Point to point, point to multipoint, microwave)
-
Microsoft Small Business Server 2003
-
Microsoft Exchange 2003, Exchange 2007, Exchange 2010
-
Microsoft SQL Server 2000, 2005, 2008
-
Crystal Reports - written > 1000 different reports
-
Network protocol analysis
-
Security scanning & network cleaning
-
Malware analysis & remediation - but you shouldn't clean
anymore! See article in Cool Stuff link.
There are some very detailed 'how-to' articles I threw
together in the 'Cool Stuff'
link. They guide you through some very complex problems and
cut to the solution for things I've seen others struggle
with. Doesn't look as nice as these pages, but they are all
meat with no potatoes. Plus they generate a lot of traffic,
so I'll probably focus my spare time writing more of those
and fewer case studies.
What follows are all high level problems and solutions that
might be of interest.
Case Study: VI. One of the many problems with the World's
Largest Intelligent Transportations System [ITS] (1996-2004)
Background
An Intelligent Transportation System, or ITS as it is referred
to in the industry, is a system to bring travel and traffic data
to a central location, analyze it, and disseminate that
information back out to the motoring public. Usually this is
done with a combination of technologies, such as road sensors,
video cameras, electronic message signs, radio broadcasts (HAR:
Highway Advisory Radio), a
telephone automated response system (HAT: Highway Advisory
Telephone), etc. The Company
contracted to do this for the state was having lots of
difficulties getting the project working. What started off as a
6 month contract turned into a 8 year career spanning re-design
through maintenance.
There are many cases revolving
around this project.
Challenge (2006)
During an expansion attempt by the new maintenance company they
were trying to get video and camera control signals to work over
radio based Ethernet equipment. Through many weeks and many
false starts, they got video receiving but couldn't get the
camera control to work right.
It started off working, then over time a slowly creeping delay
would make camera control worse and worse. Eventually, you would
hit a command like panning the camera right and a good 30
seconds later the camera would start panning right. You would
let go of the joystick and the camera would continue panning
right until 30 seconds had elapsed, then the camera would stop
moving. I was no longer working regularly on the project, but
having just a little bit of experience with the system I got the
call for help.
Problem details
The camera control data stream is a 4800 baud RS422 continuous
stream of characters. When nothing is happening, sync sequences
are continuously sent and the camera visually displays if it
sees those synchronizing sequences or not.
Unfortunately, Ethernet (and other packet based data
transmission methods) aren't designed to send a continuous and
uninterrupted stream of data. They build up characters (bytes)
into larger quantities called packets and send those a big chunk
at a time. This design minimizes the cost of the overhead
associated with sending packets, which is good. It also breaks
up the timing of continuous data into bursts, which is bad if
you need continuous streams of data.
There are many ways around this, some of which can be done
within the firmware of various data conversion devices. When I
arrived on site, they were experimenting with protocol
converters and serial-to-Ethernet converters which added ~$850
to the cost of each fielded camera not to mention the added
points of failure and maintenance required.
Soussan’s Solution
I was able to confirm the creeping data delay pretty quickly,
then managed to make things run a little better by changing some
of the firmware and configuration settings inside the
serial-to-Ethernet converters. But I still wasn't happy with the
results as control delays, while significantly better, were
still slowly creeping higher. Plus, I hate throwing hardware at
a problem that really isn't a hardware problem.
Having reverse engineered the camera control protocol years ago,
I already had a PC with serial ports that interpreted those
camera commands. Plus, it was already both on the network and
had a spare serial output which the video CODECs could use for
camera control. Those were originally tried before the external
Ethernet-to-serial boxes were tried and those were even worse,
so the client company had abandoned that method.
I resurrected it.
With about 25 lines of Visual Basic code, I was able to send
both the synchronization sequences and camera control commands
out to the CODECs via the serial port for just the cameras that
were using the radio based Ethernet. By not always sending the
sync sequences, that eliminated all the continuous data that was
buffering up and causing control delays.
And just to put some icing on the cake, each camera link now
cost $850 less in hardware and 4 fewer power supplies for those
boxes, fewer cables & connections, etc. Multiply that by the 20
or so new cameras, and add in the labor to configure it all,
test, and maintain for the next 10-20 years... I'm no
accountant, but that is some pretty good ROI for a few days of
work.
Case Study: V. One of the many problems with the World's
Largest Intelligent Transportations System [ITS] (1996-2004)
Challenge (1998)
The design called for video, data, and camera control to be
available from another location which would be implemented by an
"intertie" between the two sites. The original engineers
designed the interite as a 19.2 Kb/s serial data link over
copper, fiber and lastly microwave. Creating the software that
would send data over the link was estimated at 6 months and
would involve a lot of custom protocol design work to translate
the database access into a serial stream to send over the link.
Video was on its own microwave subcarrier with the serial data
stream for camera control riding alongside in yet another 19.2
Kb/s link.
Problem details
What was wrong with this design centered around all the custom
work to get the traffic data flowing through a serial link. When
calculated out, the bandwidth provided by the serial link would
allow only 5 minute updates. Worse than that would be designing
and coding how to transfer the data over a serial link.
Soussan’s Solution
Looking over specifications for the existing equipment, another
engineer and I allocated some existing bandwidth from the video
encoder / decoder to provide T1 bandwidth on one of the links.
The microwave had more than enough bandwidth to support that
same signal, so it was modified as well. Lastly, two Ascend
Pipeline routers were acquired to accomplish the Ethernet to T1
and back routing. All the small serial data streams were
coalesced into one big T1.
The entire setup was prototyped during a week of holiday
shutdown, with all the communications equipment between the two
sites simulated by a T1 crossover cable.
Now instead of 19.2 Kb/s bandwidth between the two sites, there
was a 1.5 Mb/s between both sites. Plus, there was very standard
TCP/IP connectivity as the two sites were different networks
connected by a private link. This allowed standard Visual Basic
and ODBC connectivity between two databases instead of custom
serial port communications. This cut the development effort down
to 1-2 months once the link was proven. Camera control was set
over a TCP socket to a different VB application.
"Design for Manufacturing" applies often in the world of
physical products. It is sadly absent from many software and
systems designs. Intimate understanding of multiple technologies
allowed the redesign to simultaneously cut months out of the
schedule and increasing bandwidth on the link by many orders of
magnitude.
The installation was relatively seamless. The blue cable
representing all the T1 stuff between the two sites was replaced
with the equipment at each site, one connection problem was
resolved, and the entire link came up functional the same day.
Case Study: IV. One of the many problems with the World's
Largest Intelligent Transportations System [ITS] (1996-2004)
Challenge (1999)
The communications for expressway segments are broken down into
regions. When first started, the system works fine. Then about
24 hours later communications with the messages signs in one
region become intermittent, then stop working completely. About
8 hours after that, most of the sites generating traffic data
are also reporting errors.
Problem details
Detailed analysis of the round trip message travel time revealed
the initial round trip communications travel time to the field
devices and back at 0.25 seconds. 24 hours later, that time
increased to 0.4 seconds. it would max out at just over 0.6
seconds, which exceeded the system's design limits for
communications round trip times. Increasing the time the system
waits for responses is not an option as there were too many
devices to communicate with every minute if the timeouts were
set too high.
To diagnose, the system was allowed to degrade to its longest
delay possible, then round-trip travel times were measured at
various points in the system. The root cause ended up being the
Cylink (now PCom) 900 MHz Spread Spectrum radio, which was the
device used to get data from the tower to the individual field
sites.
Contact with Cylink was established, who knew of the bug in the
product. It was a 'creeping buffer delay' which degraded over
time, and the product which was still being manufactured but not
enhanced anymore wasn't going to be fixed. Ever.
"Buy our new product,
it doesn't have that problem" was their solution.
There were 180 or so radios in the field. At $2000 each, that
would cost $360,000 to fix just for the parts, not counting the
labor. Or discovering what other strange anomalies the new
product had and how to fix or work around those.
Soussan’s Solution
With more analysis, the problem was isolated to the "master"
radio, which is the radio at the tower that communicates with
all the "slave" radios at the field sites. With the system fully
operational but in the high delay failing state, various tests
were performed to see if the delay could be eliminated from the
communications path.
We found completely resetting the master radio by removing and
reapplying power would eliminate the delay. It would still creep
up over the next 24-36 hours, but the initial delay was reset
back to its 'best case' operating conditions.
A trip to Radio Shack yielded a digital light timer which was
set to "turn off" the lights at 4:00 AM, then turn them back on
at 4:01 AM every day. That timer was attached to the power
circuit of the master Cylink radio. The timer is made by
Intermatic and OEMmed to Radio Shack. The same solution has
solved other problems where a quick reset on a regular schedule
is required.
Now, every 24 hours the delays are reset back to zero, start
creeping back up, and are brought back to zero before the delays
are high enough to cause problems.
Total cost of this solution was $19.95. Plus sales tax.
Which is a whole lot cheaper
than replacing all the radios for $360K plus labor!
Case Study: III. Building the World's Largest Intelligent
Transportations System [ITS] (1996-2004)
Background
Same as Case Study IV as this is the same project.
Challenge
(1997)
Company1 can't figure out why one data path across ~9 miles of a
fiber ring and 16 cabinets along the expressway wasn't working
reliably. Other engineers assigned to the problem came up with
no explanation. A non-technical CEO of company2 interested in buying
that division of company1
needs to understand the level of problems present in the project
implementation schedule. This was more a people & communications
challenge than a technical challenge.
Soussan’s Solution
With a little diagnosis and research, the equipment chosen was
never designed to be connected up into a ring configuration.
Instead, it was designed for a central point and two spokes
outward, with the tips of the spokes never touching each other.
The reason the ring configuration was working at all was there
were some intermittent breaks in the ring, thus making it look
like two spokes of a wheel. When the ring was good, the equipment didn't
work right; when the ring was broken, the equipment worked fine.
By changing out the cabinet equipment for devices designed to be
connected in a ring manner and fixing the intermittent fiber problems
(cleaning, connectors, and some fiber breaks),
that particular communications system came up functional. The
original designers didn't specify the correct equipment for the
design they had created. Nobody else working on the project
could figure this out.
Company1 was selling the division charged with creating the ITS
system. A potential buyer interviewed everyone involved on the
project; among their concerns were meeting the scheduled
delivery dates, costs associated with the project, and future
warranty liabilities once the system is accepted. With the CEO
conducting the interviews personally, the challenge became
explaining in terms upper management understands the depth and
scope of very highly technical issues and how it will impact the
future liabilities about
to be purchased.
It was the mid-spring and I'd been hired as a consultant--actually
working on the project, not just advising--for 6
months. The potential new CEO asked "So, how is the project coming along?
Are you going to make the scheduled deliverable in December?"
"I've got a paycheck of mine against a paycheck of yours that
says this team is going to miss that date and going to miss it
badly. In fact, if you doubled the staff and
everything else goes perfectly, you'll be lucky to come in at
just a year over schedule."
The CEO looked like I'd just hit him over the head with a
two-by-four. Over the next half hour, we explored my statement
in detail. I'd gone over the problem just discovered and fixed,
why it was a design level problem, why design problems cost 100
times as much to fix once they are fielded, all
de-technobabbled. Then projected out the other data paths that
weren't yet working and that "why they weren't working" was still
unknown. What kinds of skills it takes to debug those kinds of
problems.
Worst of all, the entire project was currently prototyped in the
field, whereas it should have been prototyped in the lab.
Seven years later over dinner with the CEO that did buy the
company but protected themselves from buying the significant
liabilities of this project I'd asked "When you interviewed
everyone, I know I shocked you with my view of the schedule.
What did the rest of the engineers have to say about that same
question?"
"They hemmed and hawed, said things were mostly OK, and they'd
probably not make December but would be a couple of months late.
Nobody said what you said or as strongly, but you were right."
Case Study: II. The Best Design Never Implemented (1990)
Challenge
Company
needed to update the software in thousands of emission
analyzers in the field. The first updates were handled by field
technicians visiting every unit, opening it up, plugging a case
with a secured hard drive inside, and waiting 20 minutes for the
software update to complete. This was cost prohibitive as a long
term solution.
Soussan’s Solution
This testers all had internal 9600 Bits per Second modems
installed for sending data to the various state agencies
interested in the smog test results. During an interview with
Motorola, David was asked "What was the best design you ever had
that wasn't implemented?" and this was his answer:
Back in the 1980s there was a commercial for a hair care product
that went something like "... you'll love it so much you'll tell
two friends. And they'll tell two friends. And they'll tell two
friends. And so on, and so on, and so on." each time showing all
those people in smaller square boxes on the TV.
The idea was to divide up all the phone numbers into local
calling zone groups, make lists, then "We update two testers ...
and they update two testers... and so on, and so on, ..."
Given the dial and transfer time to update one tester was about
two hours and there was an 8 hour window when the testers would
answer an incoming phone call, I calculated in 4 nights best
case 32,768 testers could be updated. After each update, they
would "Phone home" to an 800 number to check themselves off the
list. After a week, any failures could be re-seeded, retried,
and visited only if necessary thus saving thousands of field
tech visits.
The company could have sent $5 to each service station to cover
any phone charges and still come out way ahead.
In the end, the company wanted to maintain all the dialing and
control locally, so the second solution discussed in the case
below was implemented. But still, it is one of the best ideas
that never got implemented. Keep in mind, this was all
pre-public internet.
Case Study: I. Automated Auto Emissions Testers (1989)
Challenge
Company
was committed to providing state governments aggregated data
from all fielded auto emission testers in four different
states so states could prove compliance with federal air
quality regulations and receive federal highway funds. The
company had stopped providing the data due to existing PDP-11
based system for reading data tapes did not work properly.
States were about to take the company to court for major
damages per contractual commitments. Internal company IT
department said they could do software & hardware for
one state for $800K in hardware, four software engineers
and six months. Each additional state would cost an additional,
unspecified amount.
Soussan’s Solution
Company VP of engineering called Soussan and asked, “What
can you do?” After a couple of days of analysis and design,
Soussan offered, “I can do that project in three months with
one other software engineer and $50 K in hardware.” Soussan
got the project. End results were on time, under budget and
the output complied 100% with each state’s data requirements,
averting the lawsuits.
In the old system, the cycle time for loading tapes was three
weeks per state per month. The new system that Soussan set
up was 80386 PC based, written in C using a custom, time slice,
multi-processor that used multiple serial port hardware to
parallel operate eight tape drives simultaneously and independently.
After the system was designed and tested, the usage was a bit
awkward as the user was constantly going to a tape drive then
to keyboard when loading/unloading a tape. Custom hardware
modifications to the tape drive allowed the PC to sense the
tape’s motor, and software intelligence made the tape loading
process into a race between the operator and eight tape drivers.
The new system allowed one person in six hours to completely
read all tapes for one state.
On a different project, fielded testers required site visits
by technicians in order to update software. With 10,000+ testers
fielded, this was a considerable expense.
To reduce these expenses, David led the team in the creation
of a distributed processing client server automatic dial and
update system. Each PC in the engineering department would
take a phone number from a central server, attempt to connect
to the remote tester, upload new software, verify that the
upload worked, and report the status back to the server. The
testers automatically woke up every night and accepted incoming
calls between the pre-set time periods; the engineering department’s
machines woke up and automatically updated them. After a set
number of failed attempts, field service was dispatched out
to non-updated sites for manually updating and resolving why
they could not connect. Ninety-five percent of software updates
no longer required a site visit. Of the ~4000 testers sold
to check stations in one state, 3800 were updated in one week’s
time for the cost of some long-distance charges.
This was pre-public internet, when the best modem speed was
9600 baud!
|