On The Insider: Britney's Bikini-Clad Top 10
BNET Business Network:
BNET
TechRepublic
ZDNet

September 15th, 2009

'How I tweeted my way out of spinal surgery'

Posted by Michael Krigsman @ 9:00 am

Categories: CIO issues, CRM, Collective intelligence, End-user impact, Enterprise 2.0, Financial impact, IT issues, Politics, Project failures

Tags: Patient, Hospital, Twitter Inc., Health Care, Surgery, Sarah Cortes, Packer Hospital, Transparency, Healthcare, E-mail

The post describes a failure that is significant in light of the ongoing national debate surrounding health care reform and economics. Beyond health care, the role of social networking makes this failure a valuable case study for the enterprise.

Technology consultant and blogger, Sarah Cortes, went by ambulance to Robert Packer Hospital, a facility located in rural Pennsylvania, after she suffered a serious spinal fracture. The story takes an unusual turn because Cortes says Twitter helped her escape from the clutches of hospital staff whom, she claims, tried to intimidate and coerce her into accepting unnecessary spinal surgery.

On her blog, Cortes writes that Packer, “tried numerous maneuvers over 48 hours to hold me there against my will.” She continues [bullet formatting added]:

[The] tactics included:

  • Threats that my insurance would not pay any expenses if I did not accept their treatment. My bill was already in the many thousands of dollars, they informed me.
  • Intimidation that if I did not stop resisting their treatment I could be paralyzed
  • Impeding my communication with Boston doctors by needlessly limiting my phone access. Thank God for Twitter and iphones.

Cortes believes Packer wanted to perform the surgery to help boost its accreditation statistics. From Cortes’ blog:

Read the rest of this entry »

April 14th, 2009

7 (nasty) truths about IT spending

Posted by Michael Krigsman @ 6:55 am

Categories: CIO issues, Financial impact, IT issues, Project strategy, Uncategorized

Tags: Information Technology, IT-spending, Susan Cramm, Projects, Project Failure Rate, Help Desk, Strategy, It Operations, Management, Michael Krigsman

An article in the March issue of Harvard Business Review, written by former CIO Susan Cramm, discusses harsh realities associated with out-of-control IT costs. Although cost containment is integral to reducing failed IT projects, the article suggests a certain Draconian inflexibility that just doesn’t make sense.

The article includes a sidebar called “The Seven Truths,” reflecting Susan’s position that, “companies overspend on IT because they are unwilling to say no to frontline managers.” Here are the seven truths (reformatted from original):

  1. Enhancements often don’t deliver results commensurate with their costs. Establish a fixed budget for IT enhancements for each function or division, in line with the goals they are expected to achieve. Do not extend funds. When they run out, they run out.
  2. Projects are often too big and take too long, partly because unnecessary functionality is built into applications. Require leaders to commit to delivering measurable value for application functions before granting them project approval and before allowing them to maintain funding at each stage. Tie executive compensation to realization of value.
  3. Read the rest of this entry »

August 8th, 2008

IT failures roundup: Emergency services around the world

Posted by Michael Krigsman @ 5:42 am

Categories: Availability and reliability, End-user impact, Government projects, News roundup, Project failures

Tags: emergency service, information technology, computer, productivity, michael krigsman

Today’s roundup focuses on public emergency services. Although these systems should never fail, here are several that did.

The impact of most IT failures is limited to inconvenience, delays, and higher costs. However, computer problems affecting police, fire, and medical response can directly cause loss of life and property.

Maryland 911. A boy almost drowned when the fire department didn’t respond following a faulty Verizon system upgrade. From the Washington Post:

Peter Lucht, a Verizon spokesman, said the disturbances in Prince William “stemmed from an unusual combination of factors” and were “not something that we usually see.” The company has worked closely with the county “to stabilize the system and ensure that [problems] won’t be repeated,” he said.

The county purchased its 911 system from Verizon in 2002, and it was installed in 2003, and fire officials said there were no problems until the May 28 upgrade. There have been no problems since July 12, they said.

Maine 911. Seven “system shutdowns” prevented callers from reaching emergency services. From the Kennebec Journal:

Between April and June, emergency communications systems at the Cumberland County regional communications center, the state dispatch center in Gray and the Penobscot County regional center had problems receiving 911 calls. The most severe problems were at Cumberland County’s system in Windham where a series of seven system shutdowns in April and May left callers getting no answer, in one case for as much as an hour.

In response, FairPoint installed a switch allowing dispatch centers to immediately transfer calls to a backup center when there is a problem, and a designated a separate telephone line to alert staff when there is a problem. The equipment was installed in the six dispatching centers in the state that had the type of equipment affected by the malfunctions.

London ambulance services. Failed computers caused London ambulance workers to fall back on pencil and paper call tracking. From the Evening Standard:

Ambulance staff were forced to record emergency calls with pen and paper and find addresses using A to Z after their computerised system crashed early yesterday morning and a computer failure at the Royal Free Hampstead NHS Trust means patients face delays and records have been lost.

The Royal Free introduced the system in June to reduce paperwork but since then it has crashed, leaving those waiting for operations and blood tests badly affected. When the same system was introduced at Bart’s and The London NHS Trust, cancer patients missed critical appointments because their records were lost.

A spokesman for the Royal Free said: “With change on this scale, it is inevitable that it will take time for staff to familiarise themselves fully with all the functions that they need to use but overall we are pleased with the progress being made.

“It is recognised that while staff are getting used to the system a small number of our patients may have to wait longer than expected in clinic.”

Australian ambulance services. System failure forced Queensland, Australia emergency services to record and track calls using whiteboards. From IT Failures blog:

Thorough investigations have been conducted as to the cause of each, with the following outcomes:

  • Two of the outages have been attributed directly to human error. The first was an SQL update which had unexpected results, the second was a server which was inadvertently shut down.
  • The third outage has been traced back to a fault with the replication software ‘Replistor’, which resulted in the primary database server rebooting.

[No humorous picture today, because there's absolutely nothing funny about these failures.]

July 3rd, 2008

Computer 'mystery outages' plague Australian Emergency Services

Posted by Michael Krigsman @ 4:50 am

Categories: CIO issues, Government projects, IT issues, Project failures, Project management

Tags: TriTech Software Systems, Outage, Computer, Manufacturing, Quality, Business Operations, Michael Krigsman

Fire truck

A series of three “mystery outages” have forced the Department of Emergency Services (DES) in Queensland, Australia to suspend rolling out a new, $6 million computer-aided dispatch system.

Responding to the outages, Queensland Emergency Services Minister Neil Roberts commented:

The one just yesterday is unexplained at this stage, however we’re confident we’re going to be able to find out the reason for that.

We’re seeking advice from the United States suppliers and until we get a clear indication of what the actual issue was, I’ve instructed the Department to put on hold the further roll-out of this system….

After an outage last June, a Department press release cut the system some slack:

Late yesterday the system experienced an outage for approximately 90 minutes where it was necessary for operators to resort back to a manual system. The cause of this outage was related to a maintenance issue, not the system.

Emergency response personnel are dissatisfied with the problems, as demonstrated by this quote in AustralianIT:

Fire officers have been highly critical of the new system, which they say has only been “half implemented” by the department making it ineffective.

“The system is meant to locate the closest vehicle to an incident and dispatch that vehicle. But they’re yet to install the automatic vehicle locaters in our trucks so it can do this,” an officer said.

The Department of Emergency Services has investigated all three outages, according to a spokesman:

Thorough investigations have been conducted as to the cause of each, with the following outcomes:

  • Two of the outages have been attributed directly to human error. The first was an SQL update which had unexpected results, the second was a server which was inadvertently shut down.
  • The third outage has been traced back to a fault with the replication software ‘Replistor’, which resulted in the primary database server rebooting.

The United Supplier referred to in the DES media release was in fact Stratus, as the primary database server had rebooted during the 3rd outage, and we had asked Stratus to investigate the logs and provide a report.

Based on these facts, it appears the DES should evaluate whatever procedures are required to ensure that similar human errors do not cause additional failures. At the same time, the Department is to be applauded for its openness in presenting facts to the public.

April 8th, 2008

"10 secrets of bad CIOs"

Posted by Michael Krigsman @ 6:48 pm

Categories: CIO issues, IT issues

Tags: CIO, John Halamka, Blogging, Telecommunications, Internet, Michael Krigsman

 “10 secrets of bad CIOs”

John Halamka is CIO of the CareGroup Healthcare System, a practicing emergency room physician, and proprietor of the Geekdoctor blog; the dude knows his stuff. I therefore read The 10 secrets of bad CIOs, his ComputerWorld opinion piece, with real interest.

Here are John’s ten secrets of bad CIOs:

  1. Start each meeting with a chip on your shoulder. If a CIO presupposes that every request will be unreasonable and every interaction unpleasant, then every meeting will be unproductive.
  2. Set priorities yourself. Good intentions won’t prevent mismatches between customer expectations and IT resource allocation.
  3. Protect your staff at the expense of the organization. I work hard to prevent my lean and mean staff from becoming bony and angry. But I can’t just say no to customers, so I work with them to balance resources, scope and timing.
  4. Put yourself first. Being a CIO is a lifestyle, not a job.
  5. Indulge in tantrums. Walking into the CEO’s office and saying that you will quit unless your budget is increased does not win the war.
  6. Hide your mistakes. Transparency may be challenging in the short term, but it always improves the situation in the long term.
  7. Burn bridges. It’s a small world, and it’s best to be cordial and professional in every encounter.
  8. Don’t give your stakeholders a voice. A CIO can earn a lot of respect just by listening.
  9. Cling to obsolete technologies. The CIO should never be the roadblock to adopting new technologies and ideas.
  10. Think inside the box. Although exploring new ideas will not always result in a breakthrough, it’s the only way to innovate.

The list is non-technical — it’s all about human relations, management, and communication. That’s the most important lesson of all.

Yo’ John, I live in Brookline, about 10 minutes from your office. Let’s get together - I want to hear more.

February 15th, 2008

Technorati fails, fixes, and fully discloses

Posted by Michael Krigsman @ 5:55 am

Categories: Availability and reliability, End-user impact, Enterprise 2.0, Failure 2.0, IT issues, Project failures, SaaS, PaaS, and SOA, Vendor relationships

Tags: Technorati, Blog, Blogging, Internet, Michael Krigsman

Technorati, a major blog indexer, recently shut down their spiders to fix severe data problems. From the blog post explaining the incident (emphasis added):

[A] small percentage of recently created blogs were having their data scrambled. An example of this appears in this blog post. The spidering outages allowed us time to investigate, diagnose and make corrections that prevented further data corruption.

Technorati handles a large volume of data everyday; isolating and devising remedies for these kinds of issues that effect a small percentage of the data flow is tricky. However, we think we’re recovering now and the backlog of data processing is getting worked through.

Just to peek into the works a little bit, many distributed data systems rely on centrally dispensing identifiers for data elements and Technorati has such a beast. What was found were cases of blogs new to our system (from within the last 3 weeks) losing their identifiers and those identifiers getting re-associated to other new blogs. No blogs that existed in our system before Dec. 18th (the vast majority) were impacted at all. The outward manifestations visible were posts for blogs with a shared ID mingled (a mashup the authors naturally were unhappy with) and mis-associated blog claims (“And you may tell yourself, this is not my beautiful blog”).

This was a unprecedented case for us; while it had been occurring in about 8% of those blogs (created on or after December 18) for about 2 days (beginning on Tuesday, January 8th) we had until that time never encountered this phenomenon. An intensive investigation was launched, reconstructing operational timelines and correlating facts. What we found was that this stemmed from a failure incident with the primary system for identifier dispensing, another failure in the secondary system that took its place and then a corrupted data set mistakenly taking over that one, ouch! The first two blows appeared to be handled routinely but the third time was cursed; propagation of corrupted data was not detected for about 48 hours between Tuesday when it started and Thursday when we pulled the emergency brakes on the spiders.

So we’re recovering now, most of the data is being restored to its previous state and we have had a number of internal postmortem discussions about earlier fault detection and recovery.

THE PROJECT FAILURES ANALYSIS

Technical failures often have two components: the failure itself and management’s subsequent handling of the incident. Although uncontrolled technical failures can occur under the best of circumstances, end-user satisfaction is usually a function of management rather than technology.

In this case, Technorati handled the incident well. The company:

  • Acknowledged the full scope of the problem
  • Took immediate corrective action once they realized the problem existed
  • Provided context regarding why the problem was hard to solve
  • Protected the company’s credibility (I call this “intelligent CYA”)
  • Described symptoms the customer might experience, in jargon-free terms
  • Presented their problem resolution strategy
  • Demonstrated responsible and professional analysis

Technorati’s short blog post explained an arcane problem, helped calm jumpy users, covered the company’s collective butt, and showed the place is run by pros. I’m impressed.

February 11th, 2008

Customer blames bankruptcy on IBM IT failure

Posted by Michael Krigsman @ 5:19 am

Categories: CIO issues, Financial impact, IT issues, Project failures, Training, Vendor relationships

Tags: Information Technology, Problem, ERP System, ERP, Bankruptcy, IBM Corp., ALF, Freightliner, Enterprise Resource Planning (ERP), Enterprise Software

Customer blames bankruptcy on IBM IT failure

American LaFrance (ALF), the “leading brand of custom-made fire fighting, fire rescue vehicles, ambulances, and heavy-duty work refuse vehicles,” has declared bankruptcy, blaming IBM and a failed ERP implementation.

According to filings in the District of Delaware bankruptcy court (PACER case no. 08-10178), problems occurred when ALF was spun out as an independent company from Freightliner, the previous owner. During the transition, ALF outsourced “accounting, inventory, payroll, and manufacturing process services” to Freightliner. As part of the transition, ALF developed a “standalone” ERP system designed to support the firm after the Freightliner separation was completed.

The bankruptcy filings describe the painful cutover from Freightliner:

Almost immediately upon the changeover to the ERP System, ALF recognized serious deficiencies with the system that had a crippling impact on ALF’s operations. Some of the problems that ALF encountered in implementing the ERP System included, among others: (i) inability to reconcile data between the Freightliner system and the ERP System; (ii) incorrect or incomplete inventory, purchasing and customer data due to either problems with the Freightliner system or the conversion of the data to the ERP System; (iii) inaccurate or incomplete vehicle configurations loaded in the ERP System; (iv) insufficient training on the ERP System; and (v) missing financial information including accounts payable detail, incomplete or inaccurate accounts receivable data, and inaccurate beginning general ledger balances.

For the next several months following the changeover, ALF attempted to solve the plethora of problems with the ERP system. Despite such efforts, as a direct result of the problems with the ERP System, ALF became unable to complete the manufacture of many pre-ordered vehicles.

The manufacture of highly-customized Emergency Vehicles requires the availability of a large number of inventory SKUs at key points in the production process. The conversion from the Freightliner system to the ERP System resulted in the inability to account for inventory on a reliable basis. This, in turn, severely limited ALF’s ability to deliver completed products to its customers. Consequently, ALF’s inability to deliver vehicles had an immediate impact on ALF’s cash flow and created a liquidity crisis.

ALF claims that IBM is responsible for the IT problems that precipitated the bankruptcy:

ALF is currently analyzing potential causes of action against IBM based upon services provided by IBM in connection with the problem-riddled transition to the ERP System.

The documents describe IBM Corp. (for the “customer agreement”) and IBM Global Services (for “systems applications project assistance”) as having open contracts with ALF. IBM is listed as a $5.5 million creditor, although ALF disputes the invoices:

IBM

THE PROJECT FAILURES ANALYSIS

In my reading of the documents, which only present ALF’s side of the story, it could be said that both ALF and IBM dropped the ball during the transition from Freightliner. Here are my conclusions:

  • IBM did not manage the project properly. Given ALF’s dependence on Freightliner, “serious deficiencies” in production software should have been identified prior to the cutover, for example by testing and running the systems in parallel. IBM managed development, which typically includes extensive testing before deployment.
  • ALF did not manage the project properly. IBM’s role does not minimize ALF’s ultimate responsibility for managing this mission-critical IT project. ALF’s management was probably distracted by the deteriorating Freightliner relationship, by a major facilities relocation that didn’t go well, and by generally poor market conditions.
  • The ERP problems were managerial, not technical, in nature. The list of ERP and data problems cited in the filings suggest poor project management, rather than technical issues, were at the root of the difficulties. Since the division of labor between ALF and IBM is not made clear in the filings, it’s impossible to discern where responsibility lies.
  • General market conditions made things worse. While all this was happening, the market for ALF’s products tanked:

[T]he Emergency Vehicle industry is currently depressed. Many competitive manufacturers are experiencing financial difficulties and several have ceased operations.

  • All these issues created customer service problems, multiplying the negative effects of the market downturn. For example, the Bellingham Herald reported:

The city is trying to get a refund of the more than $362,000 it spent on an American LaFrance pumper that has had electrical problems 10 times [since 2005].

    In addition, FireRescue1, an industry news source, states:

Several departments that have ordered apparatus have suffered lengthy delays in delivery.

“I think one of their problems may have been that they underestimated the problems with moving a plant and production and actively pursuing business for new apparatus,” [Bill Peters, who runs New Jersey-based Fire Apparatus Consulting Services] said.

“Perhaps they bit off more than they could chew, especially with the building of a new factory. It might have been wise not to take as many orders and not to have backed themselves up so much.”

This risky, high stakes project was primarily business in nature, despite the heavily technical components. Project failures often arise when non-technical senior management don’t fully understand the business ramifications of technical decisions made by IT. Poor communication and lack of understanding between IT and business management remains a serious problem contributing to many IT failures. My ongoing interview series, NakedIT: Conversations with Innovators, explores this issue in depth.

The combination of so many negative conditions ultimately created a situation where the company could not recover, leading to the bankruptcy. ALF was founded in 1832, so it’s a shame to see this happen. Unfortunately, many of ALF’s vendors will probably suffer as the company goes through bankruptcy.

(To research this post, I studied the bankruptcy documents, left messages for ALF’s proposed Chief Restructuring Officer and its IT manager, spoke with two attorneys connected with the case, and got unpleasantly barked at by a third. All facts, conclusions, and interpretations in this post are based on information obtained from publicly-available filings.)

Update 2/22/08: IBM ignored a request for comment on the story. Larry Dignan wrote a great follow-on post.

October 25th, 2007

IT success: CA fire-fighting infrastructure

Posted by Michael Krigsman @ 6:54 pm

Categories: Government projects, Project management, Project success

Tags: Bandwidth, NASA, Agency, Information Technology, Computer Associates International Inc., Satellite, Cal Fire, Resource Ordering System, Incinet Application, CAD

IT success: CA fire-fighting infrastructure

As fires continue to rage in Southern California, emergency responders and local residents alike are supported by rapidly-deployed IT systems.

Government Computer News reports on the important role IT infrastructure plays in helping fire-fighting and communications:

1. Website bandwith

The state’s computerized reaction to public demand for information about the disaster swamped the Fire Incident Web site of the California Department of Forestry and Fire Protection (Cal Fire).

Cal Fire’s chief information officer, Ron Ralph, said in a telephone interview that the increase in traffic to the agency’s Web site had overwhelmed the bandwidth allocated for it. The department posted a notice that “due to extremely high traffic volume, the Cal Fire Incident Web site is not functioning. This temporary page contains the latest information available.”

We were bursting up to 12 to 14 megabits of traffic,” Ralph added.

Ralph contacted AT&T to seek additional bandwidth for Cal Fire’s site. “AT&T doubled our bandwidth within 12 hours,” he said. “I thought that was pretty impressive.”

2. Satellite communications

San Diego-based Tachyon helped the fire control teams by lending two portable satellite ground stations for temporary use at fire command centers, Ralph said. Cal Fire’s CIO office also is working with AT&T to arrange for loaned satellite uplink and downlink gear.

3. Software infrastructure

Ralph said his operation relies on an Oracle database management system on the back end and Citrix for application delivery. The system connects with some 4,500 users via a client server network that uses a virtual private network.

Cal Fire’s resources include a suite of applications referred to as the computer-aided dispatch (CAD) system, which processes 9-1-1 calls and shunts them to the appropriate agency staff members. “The CAD system is fully redundant and fielded at 22 locations around the state,” Ralph said.

When fire incidents grow into large problems, Cal Fire switches to its Resource Ordering System (ROS) link.

ROS is a nationwide federal system with hardware based in Kansas City, Mo. Cal Fire uses redundant communications paths to connect with ROS, which in turn is fully interoperable with the CAD suite, Ralph said.

During large fires, Cal Fire relies on ROS to keep track of the various assets, such as bulldozers, helicopters and fixed-wing aircraft, deployed to fight the conflagrations.

IT personnel at the on-site centers rely on an application called Inicinet to keep track of various functions, activities, resource needs and other relevant data, Ralph said. The Incinet application communicates with Cal Fire via satellite uplinks or other means, Ralph said.

4. Satellite imagery

FEMA’s lengthy description of the many federal activities it is coordinating in the fire response omitted mention of ongoing work by NASA to provide imagery to fire fighting agencies.

A NASA official said in a telephone interview that her agency also is providing near-real time video footage to firefighting agencies. She added that NASA’s Dryden Flight Research Center at Edwards Air Force Base is evaluating the possibility of fielding an unmanned aerial vehicle to gather aerial photos of the region.

To see additional NASA fire images, click here.

InformationWeek reports on how San Diego residents are using Enterprise 2.0 tools, such as Google and Twitter, to maintain communications during the fire emergency.

For much more information about Twitter’s role in the crisis, see this post on the ZDNet government and IT blog.

IT success: CA fire-fighting infrastructure

August 28th, 2007

101 project management tips

Posted by Michael Krigsman @ 7:25 pm

Categories: Project management

Tags: Project Management, Milestone, Ethics, Michael Krigsman

Inside CRM has a list of 101 tips for being a better manager. Here are my picks for the ten best:

Only promise what you can realistically deliver. Don’t create deadlines that you know you can’t meet. By only promising what you know you can do, you’ll be able to finish on time.

Create milestones. Creating milestones for you and your team will help you keep track of your progress and also give you a sense of accomplishment as you reach each milestone.

Make sure expectations are clear. Be sure that each member of your team knows what their specific responsibilities are. This will save time and prevent tasks from being overlooked.

Give credit when it’s due. Don’t take credit for your employees’ ideas or hog their limelight. This action not only fosters resentment but also makes you seem untrustworthy.

Set up a realistic budget. While it’s good to be optimistic, don’t plan for more spending than you know you can afford. Make sure you plan for emergencies and contingencies as well.

Save costs where they matter the most. Don’t just pinch pennies for the present. Make sure your savings will pay off in the long run. Compromising on quality might cost you later on in repairs and replacements.

Adopt a predictive managerial style. Don’t wait for things to happen to make a move. Anticipate problems and provide contingency plans.

Test your contingency plans. Waiting for disaster to strike is a dangerous way to find out if your emergency plans will hold. Test them out from time to time to fine-tune them and make sure they’re still relevant.

Stand up for employees. If other departments or managers are bearing down hard on your employees, stand up for them.

Remember that ethics matter above all. Be honest and reliable in all of your business and personal relationships.

Some of this stuff is a bit corny, but if you follow these guidelines, I guarantee your projects will be more successful.

[via Reforming Project Management]

August 25th, 2007

Liberated Syndication: down, down, down

Posted by Michael Krigsman @ 9:18 pm

Categories: Availability and reliability, Enterprise 2.0, Project failures

Tags: Registrar, DNS, Server, E-mail, Computer, Michael Krigsman

According to their website, “Liberated Syndication is a premiere media distribution service built from the ground up with DIY content creators in mind.” Unfortunately, a DNS configuration error caused the site to be unreachable by customers for an extended period of time.

Here’s the email they sent out to those unhappy customers:

---------- Forwarded message ----------
From: support@libsyn.com
Date: Sat, 25 Aug 2007 14:49:51 -0400
Subject: Cannot find libsyn.com
To: XXXXXXXX@XXXXX.com
Dear Libsyn users-

I would first like to apologize for this email coming as late as it is. On thursday, 8/23 at approx 2:30pm EST there as a misconfiguration of DNS records sent out to the registrar which holds libsyn.com. This mistake quickly propagated around the internet in a matter of minutes, and despite our rapid response to correct the problem, there was a window where name servers around the globe stored and is currently saving the incorrect information. Initially we felt we had caught the error in time and that the effects would be minimal.

We posted to the normal channels- support.libsyn.com about the issue.

Several hours later, as we started to see reports of the outage (as users couldn't email us, cause the domain name was not resolving) we jumped back into the DNS issue and got on the phone w/ our registrar to see what they could do. Our registrar could not offer any solution to the problem other then waiting for servers around the world to correct themselves. They estimated it would take 24 hours or less. We are now approaching 48 hours and a portion of our users, and our users audience are still unable to resolve any .libsyn.com domains.

We continue to go back and forth with our registrar to see if there is ANYTHING we can do to speed up the global re-caching of our name servers to the proper settings. They keep telling us there is nothing anyone can do to force local, or regional DNS caches to expire.

There's a few layers of caching that occurs to make the Internet work, it seems.

1) source name servers- this is the ones we control. We can make changes and they are done instantly cause we control them

2) global/regional name servers- this is where our registrar pushes out the information we tell them. These we don't control and update slower then our source

3) local/ISP name servers- this is where your home computer gets its info from. They are run by the internet service providers (like Comcast or Verizon for example). These we don't control either and update at their own rate.

4) home computer- finally, your computer keeps its own cache so it doesn't have to hit your ISP's name servers everytime you hit a website you visit often. We definitely can't control these. We suggest a reboot of your computer to possibly kick-start your computer's dns cache.

We apologizes for the repercussions of this error. We are doing everything in our power to bring full service back to all users.

Since the standard network communication channels http://support.libsyn.com and email (support@libsyn.com) were not able to be seen either, we have opened up some emergency channels which we will use in the future if there are issues regarding libsyn.com.

Wizzard.tv blog: http://www.wizzard.tv/blog Yahoo libsyn users group: http://tech.groups.yahoo.com/group/libsynusers

A few days ago, I described the need for careful testing to avoid making users want to curse you. In that spirit, here is free, unsolicited advice to anyone making DNS configuration changes: double-check your work before clicking that all-important CONFIRM button.

By the way, when your site is unavailable, especially to paying customers, the CEO should send out the dreaded email personally. An anonymous letter from support, as in this case, is pretty lame.

[Thank you to an anonymous reader for forwarding the email to me.]

Michael KrigsmanMichael Krigsman is CEO of Asuret, Inc., a software and consulting company dedicated to reducing software implementation failures. Click here to discuss this post with him on Twitter. See his full profile and disclosure of his industry affiliations.

Email Michael Krigsman

Subscribe to IT Project Failures via Email alerts or RSS.

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Most Popular Posts

advertisement
Click Here

Archives

ZDNet Blogs

White Papers, Webcasts, and Downloads

  • Smart Tech Expert advice on innovations in healthcare and the green technologies that make it happen. Find out more
  • Smart Business Discussion and advice on management issues that revolve around making your world smarter and more useful. More Smart Advice
  • Smart People The best and worst moves in the management and strategy trenches. Learn More