Due to my shameful 2-11 record in fantasy football this season, I welcomed the league to my house on Sunday for the game (it’s been a tradition for the last place finisher to host the Super Bowl every year). Our league started 15 years ago with a bunch of recently-transplanted Boeing engineers straight out of college. Most of us have left the company or been laid off, but a few of the original crew are still at the company. In talking to one of them this weekend, it’s clear that the battery problems with the 787 are a serious headache for Boeing, but many of the problems with the 787 were things that a lot of folks (even going back to when I was there and it was still the 7E7) were fully anticipating. And it sounds like some are starting to voice that a little more.
I left the company in 2000, but even prior to that, the vision of a global outsourcing model to build the next big Boeing plane had already been articulated. Some of the older folks in my group (I was a flight control test engineer) were nervous. Others were beyond nervous and predicting doom. One of them fired off a long, angry email to the all-company email list. He was reprimanded, but not fired. I’d love to read that email today, as I’d expect it would be like reading the scrolls of Nostradamus.
Outsourcing has occurred in a lot of different work environments. And in some of them, it’s arguably achieved its objectives – lower production costs with equal or near-equal production. But within the tech world, nearly all the attempts at outsourcing I’ve seen have been disasters. I’m not talking about just tech support or research or some specialized one-off skill, I’m talking about efforts to design, build, and test a large-scale development project with different project groups located around the globe. The logistical difficulties and communication issues involved quickly overwhelm your ability to move at the pace you need to move at.
Let me give a simple example more related to the world I currently inhabit, the world of software services and big data. This’ll be familiar to more people than the innards of a jumbo jet, but I promise I’ll get back to Boeing afterwards.
Let’s say you’re a software tester on a small team. It’s Monday morning and you’re supposed to get a web application build from the developer to install on your test web server box. You get in at 8am and a link to the build is in your inbox from late Friday night. You begin to install it but it throws some kind of configuration error. You suspect the developer didn’t package it up right, so you grab some coffee for a bit and wait for the developer to get in. When he gets in at 9am, you stroll over to his desk and tell him. He makes a quick fix, you get it installed, and you start up your testing.
By 10am, you’ve done a good amount of poking around in the application, which is supposed to read from a database, do some calculations, and show some fancy graphs related to the data. Because you’re a diligent and prepared tester, you have a whole long list of test cases that you came up with in anticipation of the handoff. The first thing you notice is that when some fields in the database are zero, the graph isn’t displaying the data in a way that makes sense. At the 10am stand-up meeting, you bring this up to the program manager. The program manager looks at it and you convince him it’s wrong, but the developer isn’t convinced. A whiteboard is summoned and the program manager convinces the developer he needs to make a change. You file a bug, and the developer starts to fix the code.
In the meantime, you continue testing. You discover that in order to get to certain views in the web application with particularly sensitive data, you need to pass in some credentials. The program manager isn’t sure what they should be, and the developer was testing against a mock local version of the DB, so he never had to worry about this. The database admin is still hungover from the concert he went to last night, but he looks up the credentials and gets them to you. They work, but it’s 12:30pm and you need to grab some lunch.
After lunch, you start checking out how the application is showing the sensitive data (financial records from various countries). You notice that in a few of the countries, the numbers are blank, but you’re pretty sure the database has data. You ask the database admin for help, and he gives you direct read only access to the database so that you can look everything up yourself and he can go back to playing foosball.
You discover that the data in the database looks totally fine, but the database is set up in a weird way in order to accommodate data coming from different places. You end up having to track down a different developer who wrote the ETLs (a fancy acronym for programs that extract data from one source, transform it, and load it into a database of some kind) and she realizes after some debugging that she’s not doing the encoding right, which is messing up how the other developer reads it. Her code was being tested separately, but that tester had little understanding of how encoding works and never knew it was wrong. You file a bug and the developer starts coding the fix (it’s a simple fix) before they release that into the production system on Wednesday. It’s 4:30pm now and you’ve had a pretty good Monday.
Now let’s imagine that same scenario in an environment where the development work has been outsourced to India and the data warehousing work is being done in Russia. You and the program manager are still in Seattle.
You get to your desk Monday morning and you have the link to the web application build in your inbox, sent two hours ago from India, where it’s Monday evening now. Normally, the program manager and the offshore dev lead will talk at the end of the Indian workday and the beginning of yours, but by the time you discover that the build is broken, it’s almost 9pm in India. So you spend your Monday catching up on some test documentation, surfing Reddit, and honing your foosball skills.
Tuesday morning in India, the developer corrects his mistake and fires off an updated build which you’ll get in about 10 hours when it’s 8am in Seattle. You get in, install it, and you’re off and running. You were prepared for the test effort, and maybe even moreso now because of your extra day to prepare, so you quickly notice the error with the way the graph is displaying zero values and bring it up to the program manager. The program manager is convinced it’s a bug and puts some notes in the bug report you filed against the developer.
It’s now Tuesday at 1pm and you discover that you’re unable to see the sensitive data pages without additional credentials. The database is now being administered in Russia, where it’s the middle of the night. The program manager looks through his email to see if he can dig up the password you need. He can’t, so you file a work ticket in their system and take a second look at some test cases for the rest of the afternoon.
Wednesday morning in India, the developer looks at the bug report you filed, but doesn’t quite understand how the program manager wants it changed. He talks to his dev lead down the hall, who isn’t even convinced it’s a real bug. He tells the developer that he’ll talk to the program manager at the end of the day when it’s morning in Seattle.
In Russia, they receive the change ticket to provide your database credentials, but it’s a busy day there. All the new ETL’s are going into production so even though your change ticket is marked as a “Blocker”, it’s only blocking testing, so it doesn’t get acted on until late in the day, and then because it involves providing credentials to a sensitive database, it has to go through additional approval that won’t happen until next day.
Wednesday morning at 6am in Seattle, the program manager gets on the phone with the Indian dev lead and tries to explain how the graphs should be displayed. Without a whiteboard and with the language barrier, this is difficult, but eventually the program manager conveys it across, but the developer has already left for the day, so it won’t get implemented until Thursday.
You get in at 8am and discover that you’re still blocked from being able to do anything. You sigh, sit back in your chair, and wonder about all the other things you could’ve done today if you just had the balls to pretend to be sick. You spend enough time at the foosball table on Wednesday that you start feeling confident enough to take on Steve from marketing. He rolls you, 5-1.
On Thursday, the Indian developer fixes the graph bug leaving you with an hour or two of regression testing since his change affects nearly every graph in the system. And you also receive your test credentials from Russia. So you’re feeling pretty good about being able to make some progress as you settle in at your desk Thursday morning. You finish up the regression testing by about 11am and you log into the secure section to look at the parts of the application you’ve been blocked from. You notice the problem where there’s blank data in places where you’re certain the data exists. You try to use the same credentials to look directly into the database, but it won’t let you. You file another ticket against the admins in Russia. You spend the rest of the day watching YouTube videos and playing Angry Birds on your phone.
By Friday morning, you have your updated read-only access directly to the database and you confirm what you suspected, that the data is correct in the database, but being displayed as empty values in the application. You suspect there might be a problem with the ETL’s, and to your delight, the Indian programmer who wrote the ETLs works a late shift and is still online. It’s Friday night in India, so you feel kind of bad making him work a little late, but you really want to get this figured out. He quickly discovers the problem with the encoding. But since they went to production with this broken code on Wednesday, he’s screwed. He can fix your test database, but not until he fixes production first.
So you continue to test whatever you can, but you’re looking at the same things 3 or 4 times and getting bored. After lunch, you’ve given up. You notice that the cute new office admin is hanging out at the foosball table, so you head in there and play a few games. You finally get a rematch with Steve and you take him to 4-4 before he shoots a laser past your hapless goalie from his back line. But you’re happy with your improvement and the admin reminds you that everyone’s going to happy hour, as it’s the outsourcing consultant’s last day before moving on to his next job. So you cut out a little early, do some shots with the admin and her boyfriend, and then head home for the weekend having made less progress in a week than a tester in a co-located work environment makes in a day.
Ok, so this is a bit of a jokey example, but anyone who’s worked in an environment like this can tell you how close to home it hits. And one thing that I wanted to make clear is that the problems with tech outsourcing often have nothing to do with the quality of the offshore employees. In the times I’ve had to work with offshore workers, they’ve all been very good. The problem is one of logistics and communication. Any project that involves a large number of integration points is going to require a lot of coordination and communication. And just as with my example, even small failures can expand your timeline exponentially. This is where there’s a lot of similarity between large-scale IT endeavors and building an airplane.
Boeing had teams all over the globe, each doing their own design for their own parts. There was a belief that if you work hard to define and understand the integration points, you could make this chaos work. But ever since the 777, modern airplanes are essentially large-scale flying computer systems. And this plan worked as poorly as it does when the development of any other large scale computer system is spread out across the globe. A tech project requires people to be nimble, to be able to aggressively take charge and sometimes step out into different roles. To use a soccer metaphor, sometimes you need a midfielder to play striker for a bit, or for more people to come back into the box to defend a corner kick. It requires everyone to be flexible and work together. But offshored development teams turn the playing field into a foosball table, with everyone stuck in their rows kicking the ball around with limited control over where they go and what they do.
The rest of the story at Boeing is becoming well known. Delays started to mount as the complexity of testing all these integration points started to manifest. The normal set of rigorous oversight was mostly bypassed as the FAA threw up their hands trying to do their normal fault analyses on foreign suppliers. And years after the first Dreamliner was rolled out for the world to see, they’re all currently sitting dormant around the world as engineers scramble to figure out why the lithium-ion batteries are catching fire.
A few years back, I was a test manager at a company that was in the process of outsourcing a lot of its development and testing work. I scheduled a lunch with a consultant who said she had expertise in making offshore workgroups more agile. During the lunch, I asked her if she knew of any companies that’ve been successful with their offshoring projects. She stopped for a second, thought about it, and said “they must be successful, everyone’s doing it”. Hopefully not for much longer.