Why everyone hates Qwest

The date: 31 January 2005
The time: 11:00 Hours
The event: Our T1 starts to act flaky and then dies.
The resolution: 19:00 Hours

It’s not like telco circuits never die, it’s just that you don’t expect them to.  That’s why the telephone industry if famous for having five nines of reliability (99.999% uptime, or five minutes of downtime per year).  But that doesn’t really tell the whole story.  Why did a simple circuit fault at a CO (central office) take eight hours to resolve?

OK, Qwest isn’t the only party involved in this debacle, but they are the ones that dragged it out.  My contract is actually with Sprint for a data circuit that is used to provide TCP/IP services to my office.  When the circuit died, we checked our router to see what’s up.  The router was showing an alarm on the DSU/CSU card, so my network administrator and I were pretty confident that it wasn’t a problem with the router.  To confirm that fact, we pinged it from the internal network and it responded in it’s normal sub millisecond response time.  However when we tried to access anything outside of our network on the public TCP/IP network (Internet) there were lots of dropped packets so that pretty much confirmed the fact that the line had died.  Time to call Sprint.

“Hello, thank you, where can we reach you.  We will check the circuit.  May we do invasive testing of your router?”

“You bet.  I’ve got 17 users with their hair on fire, and no email and no web access.  You bet.”

“OK, we’ll call you back as soon as a service technician picks up the ticket.”

Twelve hundred: About an hour later I get a call back that someone has the ticket and is working on my issue.  “We will call you back in an hour with a status update.”

Thirteen hundred: “We have finished testing and have turned the issue over to the physical network group.” To the non-geeks, that means that there isn’t a network programming issue, and that there is a broken wire or fiber optic cable, or piece of hardware that makes this whole deal work.  “We will call back in one hour with an update.”

Fourteen hundred: “We have tested our circuits and are referring the issue to local Telco (Qwest) for resolution.” This is sometimes referred to as the last-mile, those wires that are owned by the state regulated monopoly.

This means that Sprint has cleared everything that they are physically or programmatically responsible for and there is something wrong with the circuit in the part of the hardware or programming that they lease from Qwest and Qwest is responsible for maintaining.

Sixteen thirty: Having heard nothing, I call Sprint back with the service ticket number.  Their response: “Telco will have a tech on site at 18:00.” I inform the building management office to alert security that Qwest will need access to the demark (Point where the utilities wires enter the building and become the customer’s problem).

“Dear: I’m going to be late.  Don’t hold dinner.”

Seventeen forty-five: I go to the demark and meet the service technician.  Seems like a nice guy.  Is poking around in the service cabinet where the MUX (Multiplexer that splits the fiber into lots of individual circuits) is locate.  “I’ve checked these and they look good from here.  We’re going to try a loopback (Short the circuit and see if bounces back to the origin, in this case the CO).” He plugs in the loopback plug and is talking to someone at the CO.  “What do you mean you can’t see it? OK, I’ll wait.”

Tick-tock.  Tick-tock.  Tick-tock.

Eighteen hundred: I go back up to my office to get some things done.  Turn off my machines and decide I’ll hang out a little bit longer.  I go back down stairs.

I ask, “Do these cabinets always show major alarms like this?”

“Yea.  The minor alarm is probably that the door to the cabinet is open, and the major alarm is probably that only one rectifier (of two in the cabinet) is working to keep the batteries charged.  Or, it could be all these cards that are plugged in and configured but not connected.  I’m not sure.  I’d have to do further testing.  But your circuit doesn’t show an alarm, so it’s probably OK.”

“You mean you don’t maintain the equipment so that you could just open the door, look for a red light, and if nothing is lit close the door and be on your way?”

“Nope.”

Tick-tock.  Tick-tock.  Tick-tock.

Eighteen thirty: I go back upstairs to my office to get my stuff and go home.  When I pass the demark fifteen minutes later, the door is closed and Mr. Qwest is gone.  Well, I guess the problem wasn’t really here.

Twenty-two hundred: After dinner and a walk around Lake Como with a friend I call Sprint to inquire about the status.

“Qwest closed the ticket around Nineteen hundred.  There was something wrong with a piece of equipment at the CO.  Your circuit should be working.”

Hell.  It’s too late to take another hour and a half out of my day to drive back to the office and restart the router.  I’ll have my network admin bounce it in the morning.

In the morning, everything was golden and we had service.

So what’s the moral?

Well, in my opinion; if the equipment is in as crappy a condition at the CO as it was at the demark I can understand why it took them so bloody long to diagnose the problem.  In my computer room, I can walk in, look around and see what equipment is having issues, because the manufacturers put these little colored lights on the front of it that turn amber or red when that particular piece of gear is having a problem.  The telephone equipment manufacturers are no exception to this practice, and in fact they go so much further.  They put little lights that say minor, major and critical on the front of their gear.

Obviously the Telco’s have mismanaged so badly, that everything is in major alarm all the time.  There is so much gear in the racks that is misconfigured or disconnected that the little lights have no purpose.  When there really is an issue, they have no clue except to physically retest each part of a particular circuit by hand.  What a shame.  What a waste.  And we get to pay the price.

For their part, Sprint really did a pretty good job they communicated, and for the most part they were timely.  But Qwest didn’t communicate at all, and took way too long to find what they should have been able to walk up to and diagnose.  That’s why we hate Qwest.