Software Projects - why are they hell?

Talk to people in the IT industry (those that are still left) and ask them about the project they are working on; ninety-nine times out of a hundred they will describe it as "the project from hell" or "insane" or "Project California (you can check out, but you can never leave)". Why is this perception of projects so common? Major software projects have been a fact of life since the 1960's; why are we still getting it so wrong?

Methods of delivering successful software projects have come and gone. I contend that successful projects have little to do with methodology, and all to do with people. This is becoming recognised in the literature (e.g. Death March, Peopleware, Rapid Development), and in the development of Extreme Programming and the less-extreme Agile methodology proposals. It doesn't matter what methodology is used, what tools, what development environment, what the environmental conditions are like: with the right team of the right size, you can deliver successfully.

During the early years of my employment, I had avoided any engagement in projects. This was not due to any aversion (that came later) or plan, but I simply never had to deal with one. This idyllic state of affairs came to an end when I became involved in the fallout from a badly sold project (in which I had been part of the sales team). From here on, I spent most of my life with projects of one shape or form - and it was mostly unpleasant.

My intention is to describe a number of projects I've been involved, as objectively as possible, draw some common threads, and then present an undisciplined rant on why IT projects are such a mess.

Selling the wrong product

Let's start at the beginning; the sales process. The first disaster project I became involved in was caused partly by selling the wrong solution (and it was known to be wrong at the time of the sale), and partly by the use of an SI company that had no previous knowledge of the product that was sold. A lethal combination.

The customer wanted, effectively, a newsroom system. An ability to enter copy into a word-processing system (usually transcribed and translated from the news sources), send the material for editing and revision, then published using hardcopy. This requirement came out at about the same time as IBM PCs were just arriving on the IT scene. I was working as a technical pre-sales consultant on a mainframe operating system called TP-6 (one of three, count 'em, three, operating systems that ran on the same mainframe hardware) sold by Sweetspring.

Sophisticated word-processing programs were thin on the ground at the time. With hindsight, the only Sweetspring product that could possibly have fulfilled the requirements was emacs, but even emacs would struggle to satisfy the need for tracking revisions. A further problem with emacs was that it was not available on TP-6, and we weren't interested in selling the OS on which emacs was available.

However, the major reason why we continued with TP-6 was that a word-processing program did actually exist on it. It was written by an independent consultant who, at the time of the bid, was working in Sweden. I was dispatched to Stockholm to review the capabilities of the product. If I knew then what I know now, I wouldn't even have bothered to go: an untried product, written by one guy, where we didn't really understand what the customer wanted - clearly a disaster in the making. However, I didn't even think about it. I had a nice time in Sweden, saw the product, reported back what everyone wanted to hear (good to go) and away we went.

An alternative, which was discussed internally, was to use PCs as the terminals, with some (unspecified) word-processing software. While this might solve the problem of the word-processing functionality, and move the massive CPU load to the PCs, the challenges of automatically shipping the files between the PC and the host seemed difficult to solve (this was in the days before Ethernet and network file systems were widely available). The other difficulty was that we were too far down the line in promoting TP-6 to switch horses now.

The scene was set for the three necessary components of the disaster: a company focused on selling a technology, rather than finding the solution to the customer's problem; a system integration company with no knowledge of the operating system to build the rather large amount of custom code that was required; and a untried word-processing program which ran on TP-6, authored by a very small company of one.

As a pre-sales team, all we cared about was selling the solution. And, astonishingly enough, we did. I still remember the celebrations we had (although only vaguely). Soon after this, I moved to another part of Sweetspring, and lost touch with the implementation. I did get called in occasionally to provide the TP-6 expertise I still had, but it was infrequent. I was aware that things were not proceeding smoothly, but knew none of the detail.

It wasn't until Sweetspring appointed a senior manager to investigate the causes of the project disaster that I became aware of the true horror. Due to my early involvement in the sale, I was asked to participate in the review - I was one of the few people involved in the sales campaign that was still with the company.

The problems with the project were manifold: the hardware was undersized for the application that had been built - the SI had severely underestimated the memory occupancy. This was of particular importance for TP-6; its key feature was that it offered no swap space; every process had to reside in main memory. CPU usage was high. The word-processing application was buggy, and half of the features promised for it had never arrived.

I offered my opinions to the manager, and also helped him out with analysis of the performance figures we were seeing. A consultant from the OS development group was also drafted in to analyse the performance of the OS and application. I drifted away from the investigation, back to my proper job.

What happened after the project investigation, I cannot remember. I suspect a financial deal was struck, as that seems to be the norm in this kind of situation.

Many years later, by coincidence, I met an employee of the customer who had worked there during the implementation of this system. It seemed that the system had been delivered with much of the functionality we had promised missing or limited. I'd classify that as a project failure.

Product Development

The next project concerned me more directly. As part of the Digital Mapping product I was working on, we wanted to include a database component. We already had a gazetteer (for which I was responsible), but the desire was for something more general. A system that would support objects with attributes, network relationships, hierarchical relationships - in short GIS data. My personal preference was for us to support the Oracle relational database, but in this I was alone amongst the team. The team wanted to build a database system internally. The system was effectively designed by one guy, who came up with something fairly baroque and nothing like SQL. The development language was to be the same as the existing product set, FORTRAN 77.

Astonishingly, funding approval was given for the project. Now I look back on it, I see that the technical merits (or otherwise) of the project probably had little to do with the decision to proceed. Some unused budget existed, people needed to show they were legitimately occupied, spurious progress must be made - all the reasons described in Death March. Although, to be fair, this project was never a death march, more a leisurely stroll to non-existence.

In order to build this beast, we had to hire a couple of contractors. To manage the vast team (around 6 people) my boss decided that I would act as project manager. It was my first exposure to this particular discipline. Project management seemed to consist of making plans, finding the actuals were more than the estimates, re-planning, and then repeating the whole process. By dint of this approach, we remained on schedule, since the schedule kept changing. Quite how my boss managed to sell this to his masters, I don't know (but I could have used the knowledge later on). At least I did get a chance to code two pieces of functionality, which was the most enjoyable part of the whole experience.

One aspect that concerned me early on was our capability for the development on a scalable solution. For example, the indexing scheme was based on a simple ordered list. I asked if anyone was experienced in creating more performant indexing schemes (e.g. B Tree or radix), but no-one (including me) did. We therefore carried on with this simple indexing scheme.

Development was completed late, and we did ship the product to one customer, who didn't really know what they were getting. As far as I know, no-one else took it.

We had built a system that ended up in the worst position you can have: an installed base of one. All the support overhead, no money.

The offshoot system

I moved to a sales support position in a new company - still didn't help me escape from projects. Within six months I had been assigned to work on a large GIS implementation for a national telecommunications provider (let's call them ET). The project was ambitious - the creation of a national database containing all their field assets. The rather radical approach they had taken was to just digitise the gross field assets e.g. junction boxes and ducts. Individual wires and the network connectivity were only recorded at the database level; no graphics. To present a graphical view of the cable connectivity, a schematic drawing tool was used to generate cable diagrams on the fly, based on the end point geography and cable connectivity. A excellent idea, only let down by the incapacity of the chosen tool to handle all the special cases.

Naturally, this was not the only problem with the project. As the GIS software provider, we were a sub-contractor in the project, run by a large SI company. ET kept changing their minds, development was taking far longer than originally planned, and costing far more, than estimated (despite the importation of cheap (I assume) Indian programmers), our GIS software had significant bugs etc, etc.

Due to the delay in the main project, we (as the GIS provider) were asked to create a fibre planning system (FPS), based on the same schema as the main project. I was volunteered to project manage the team and also lead it from a technical perspective. By this time, I'd been with the company a year, so the technical side didn't worry me half so much as the project management side. The small project team were fairly new joiners, so I was the most experienced in developing applications with our GIS product.

We had three months to deliver this project. Luckily, we were given a huge leg-up by the ability to re-use the existing ET database schema. I've witnessed the advantages of building on an existing foundation in a number of projects, and the benefit can be enormous. It stems from two aspects; a number of fundamental decisions do not need to be made (they are givens), and a lot of the difficult problems have been solved (often at vast expense) in the earlier project. We were also able to take advantage of the development of a drawing engine I'd built for another customer; it only required minor extensions to fully support the FPS drawing requirements.

To avoid a massive time crunch at the end of the project, I started working 12 hours days at once, and Saturdays. Apart from 'setting an example' for the other team members, it meant I had broken the back of my tasks after two months, and could focus on providing more assistance to the other team members. The team, though small, ran the gamut from extremely talented to complete waste of space. Due to the passenger, I had to draft in another individual (also not tremendously experienced) to compensate. I guess I was lucky to have this luxury.

We did deliver on-time (probably not on budget though). The project became the subject of a paper within the SI, comparing the successful development of FPS with the (still un-delivered) main ET project. It boiled down to small team size, re-use of existing development work, and a good working relationship with the customer.

Powertool - Western Electricity

The next project was also a small team effort, but also involved a unhealthily close relationship with another company's graphical product, written in C.

We'd were engaged in a selling our GIS to an electricity company (Western Electricity - WE), who were interested in a design tool for new build electrical networks. They also needed an interface into the standard design package Power (name changed to protect the guilty), which checked for a valid set of components and tolerances for the new networks. For us, this would be a major development, but an off-the-shelf solution existed, developed by a small software company in C. The drawback of this approach was that it stored everything in flat files, and WE wanted to have all their new networks in a single database, which would allow them to migrate the network data to their soon-to-be-procured enterprise GIS.

Our GIS product offered an API, allowing C programs to invoke virtually all the significant functionality, including the storage and retrieval of geographic entities from a relational database. It was obviously a match made in heaven for WE.

Teams from both companies were small: they had three developers, plus a Technical Manager while we had two, myself and our best C developer. I performed much of the design work, and we expended considerable effort in ensuring that the GIS database contained topologically correct objects. For example, polygons in the design tool were just strings of coordinates, while within our system, polygons required a formation process to ensure topology was correct. The rationale for this exactitude was forward planning for the upcoming enterprise GIS database within WE. The project management of this team wasn't an issue - it was the relationship with the software company.

At first our relationship with our partner company was cordial, but was we got closer to the delivery date, and the C application was crashing mysteriously, our code was being blamed. We had to go as far as putting fences for each function in the run-time stack to prove that our code was not trampling over memory. The crashes disappeared as mysteriously as they had arrived; we never found out what the true cause of the problem was.

The team delivered on time and pretty much on budget. Easy to do with a small number of staff we had. What I did learn from this was that expending significant effort in planning for an uncertain future is wasted effort. WE never did buy an enterprise GIS system, so the care taken over topological correctness in our database came to naught.

US Pest OSP-FM

The US Pest OSP-FM project was the first in which I took a purely project management role. US Pest were one of the Baby Bells spun out of AT&T. The project was underway in the USA, in a place we'll call Campbell, Nostate. The original PM had been bulleted, and my boss prevailed on me to go to the US for three months to see the thing through. I agreed, but had no real clue what I was letting myself in for. The project was being run using a methodology developed by the company which had taken over our GIS unit, so I had to pick up the terminology real fast when I arrived in Campbell.

The project size was the largest I'd ever worked on, around 20 people. However, we were swamped by the US Pest staff. That was my main task, keeping US Pest off the project team's back. The project was behind schedule, caused principally by spending far too long in the functional definition phase. We had wonderful functional and detailed specifications, but not very much code. There were three months left to the scheduled delivery date - and we had to hit this milestone, otherwise penalties would be incurred.

In order to meet the deadline, everyone on the team had to work additional hours (no surprise there, then). The defect rate was high, especially in the most complex areas of the application (again, no surprise). The crunch came when the customer asked us to delay shipment of the release due to the "high number of defects". This demand caused much consultation of the contract, and unluckily for the customer, there was no provision for this - they had to go through the acceptance cycle as defined in the contract. We certainly weren't going to agree to a free-of-charge change request. As it happened, the defect rate began to drop, and the application was just about robust enough to deliver - well, it was definitely a beta at least. Shortly after this, we had recruited a local PM, so I returned to the UK. I never thought I'd savour the smell of diesel at Heathrow airport...

During the delivery of the main project, I also picked up a spin-off. US Pest asked us to create a planning system, based on the same database schema and symbology as the main project. Does this sound familiar? Since there was no bandwidth left in the Campbell office, we had to bring in a couple of developers from the UK in order to build a proof-of-concept. Once this was successfully completed, and our proposal for full development accepted the team returned to the UK to build the application proper. It was back to a small team, just three, plus myself as a part-time project manager. In fact the team reduced to two, as one of the developers was stolen for another project. Even so, we delivered on time and under budget. Once again, a small team, combined with an offshoot project had delivered successfully.

Worldcon OSP-FM

This was, without doubt, the worst IT project I had ever been involved with. Our company had been purchased by Worldcon, which had the happy side-effect of producing a profit from my hitherto valueless share options. Little did I know how much those share options were to cost...

Since Worldcon were a telecom company, with significant cable assets, it required an outside plant management application. As this was the game we were in, it seemed a good idea for our new parent to use our software. However, Worldcon had other ideas. Their preferred platform was a PC drawing package, on which all their plant/cable diagrams were drawn and maintained. Our system was UNIX-based, intimately connected to an RDBMS, and offered Windows-based client applications for query and update. In additional, we provided a number of ActiveX-type controls to allow the embedding of our functionality into customised Windows front ends.

They did, however, understand the benefits of maintaining an asset database, as opposed to a simple graphic record of where their plant was located. Hence we proposed a solution which gave them the best of both worlds; a geographic plant database held in Oracle, complete with all attribution, but with a design and maintenance tool based on their favourite CAD package.

In order for such a scheme to work, we had to build controls into the PC package which invoked the appropriate commands in our server, and also enable to drawing of graphic components in the PC package drawing window. This implementation required a level of integration between a PC package and our product which had not been contemplated before. A bastard child from hell was born, fathered by the Product Marketing manager. (I'm still not sure who the mother was, and I don't think I want to know).

The project was run out of our Canadian office. In the early stages, I was only aware of the project by dint of engagements which required technical architecture skills, and by the progress updates given at the regular services meetings. It became clear that progress was not going well; due to the technical difficulty and aggressive time-line of the project, very soon the developers were putting in long days and weekends, which left no slack, with no room for that final push at the end of the development cycle. Things worsened, the Programme Manager and the Development Manager were unable, or unwilling, to pull the project back on track. Given the level of commitment offered by the company at the beginning of the project, they had my sympathy. I'd also heard, from my boss, that the senior management responsible for this project in our parent company were vicious bastards of the first water. No excuses were to be accepted. It was therefore inevitable that the senior project staff had to take responsibility for the failure to deliver, a new team put in charge, and the time-line revised to something that was vaguely possible to meet. This played out with the sacking of the Programme Manager and Development Manager, putting me in as the new Programme Manager and the most senior developer as the Development Manager.

I spend a week in our Canadian head-office, working through a revised plan with development team. Armed with this, I visited the customer project team, who were based in an unnamed southern US state. Let's call the city Melrose. The Worldcon team seemed to understand the trauma we had been through, and accepted the revised project plan. They were also happy with my commitment to meet them in person every fortnight (my family were less enamoured with the idea).

Progress inched forward, until we were getting close to a delivery, which would involve acceptance testing. At this point I discovered that (a) we, the supplier, were writing the acceptance tests (I knew this, but had conveniently ignored the fact), and (b) only a subset of the functionality we were supposed to deliver was in fact tested for. Since we were using these tests to ensure coverage of our beta code, they had been giving us a false sense of success. Many areas we had not even coded yet. I flagged this to my management, with my recommendation that we should re-plan yet again. This suggestion was deemed unacceptable, and I was instructed to inform the customer team of the situation, but say that since the acceptance tests constituted the contractual acceptance of the solution, we would not be developing the 'missing' functionality. Naturally, the customer team did not accept this position. At the project level, we were at an impasse. We therefore downed tools and awaited the senior management explosion.

Papers were written justifying our position. Conference calls with head office and many emails followed. Eventually, my boss, and my boss's boss flew to Melrose to thrash out an agreement. I still recall the horror of that phone call after the meeting with the customer finished; my boss informed me that we were on the hook for everything we had originally promised. That meant the project restarted, and we had to build everything as defined in the functional specifications. Another four or five months on the schedule. At that point, we were already about six months behind the original delivery date, so were now looking at a 12 month slippage from the original delivery date.

As the project manager, I naturally had to visit the customer once the project was back on the rails. They didn't appear to hold any grudges, but were naturally concerned about the new project delivery dates. They also accepted that it was their responsibility to write the acceptance tests. At least that moved one unpleasant job from us to them. As supplier, we still had to agree the tests that they wrote.

Meanwhile, staffing was becoming a problem. A couple of key project members were from an external organisation, and wanted to leave the project. In addition, even the permanent members of staff were getting scratchy from having the project extended into the future. I needed ways to minimise the impact of staff leaving, while still keeping the new end date as a realistic target.

After sounding out the project team, everyone seemed supportive of one last push to finish the project. In order to achieve the transfer of knowledge required to allow us to detach ourselves, the best place to complete the project appeared to be the customer site. This would ensure communications remained tight, knowledge transfer would be expedited, and our customer could see progress (or lack of it) directly for themselves.

I proposed this approach, which found favour with the customer. The key project team members relocated from their home offices (UK and Canada) to the fine southern city of Melrose, USA.

Around this time, my boss left the company. It therefore fell to me to deal with the senior management within our customer. One individual was the nastiest piece of work I had ever come across, and I didn't have the interpersonal skills to deal with him properly. I think I dug my final hole when I told him that our staff didn't want to work on the project anymore. Funnily enough, he took exception to that.

At this point I left the company myself, in an extremely stupid move, but that's another story.

I heard later (as I still kept in contact with people from my old company) that the project had been delivered, and that they'd even started making money from the change requests. It also turned out that Worldcon now felt that they didn't need all that drawing functionality offered by the PC drawing tool - they would be quite happy with the drawing facilities of the GIS product. That's the way it goes...

Internet Banking Project

My next development project was observed from a different viewpoint - that of one of the designers. I had joined a small consulting company, and my third engagement (see how quickly I picked up the lingo) was with a large bank - let's call them Mafia Bank - on the Mafia Online Banking (MOB) project.

I joined the project shortly before the first release was due to due to roll into a pilot phase. The project was running nearly a year late, overspent by several million pounds, had been through at least two SI companies and now had the misfortune to have me assigned to it.

MOB was a J2EE-based Internet channel for the Mafia's business customers. Security was obviously a key design goal. Access to the existing Mafia backend systems was required. These backend systems ran on long-in-the-tooth mainframes which underwent change with the rapidity of a slug on largactil. The first release, did not offer a great deal of functionality, as the bulk of the development time was spent in getting the basic infrastructure to work.

The project was large, probably over 50 people. The team was in three separate geographic locations, developers in one (where I was based), the business design team in a northern town, and the management in London. Distance between the groups was more than just physical; there seemed to be armed warfare between the business designers and the development team. The business designers were using UML to the max. The original intent of the team, or so the scuttle-butt went, was that once all the RUP deliverables were complete, the coding would be a simple, mechanical process. I kind of believed this after I was part of a conversion with the business design group in which programmers were likened to monkeys...

My role in the project was to start the design process for some of the new functionality in the planned second release. This meant that I spent a large amount of time in the presence of the business designers, attempting to understand the business requirements, the capabilities of the bank's existing systems and scoring the various options for delivery of a particular product.

As I was working on the second release, the problems of the first release were not crucial to me. However, as I was in the same office and it was easy to tell that the first release was not going terribly well. The second SI company had promised to build the product within a very aggressive time-scale. Build time had been insufficient and integration difficult; during system testing the defect rate rose rapidly. The first release did roll out into pilot, but with a very limited number of customers (less than 10 I believe).

In the meantime, I was still working on the second release, which included some fairly sophisticated functionality. Just when things were getting interesting, the bank decided to cancel the second release. I had been on the project for over a year by then, and so was not unhappy at moving to something else. However, the consultancy market was drying up, contractor rates were dropping rapidly and my company felt it would be difficult to place me elsewhere. A deal on the day rate was struck, and the day after my leaving celebration, I was told that I was to be retained on the project, but working on a new version of MOB for a different part of the bank - the Retail division (ROB).

ROB was to be built in a much different way to MOB. A small development team was selected, the design team was three, which included a empowered business representative. The design team was located in the development offices. Existing MOB elements would be reused, and development of ROB was to be time-boxed to ensure something was delivered within a reasonable time-frame. The downside to me was that the work I was required to perform was more project management than design. I was therefore delighted when MOD restarted and I was able to return to a design function. In the meantime, ROB carried out quite happily without me, and delivered their first release successfully. Due to my early involvement, I was able to take part in the celebrations without too much guilt.

Of course, all bad things must come to an end, and nearly two years after I joined the project, MOB was finally axed by the bank. Virtually all the contractors and consultants were let go, with a small team retained to provide maintenance cover.

About two years after I left the project, an article on ROB appeared in the press. The revisionist nature of the content was astonishing. In a reversal of the facts, MOB was stated to be a new development, based on ROB. The entire history of the project had been re-written. I had always been cynical about project writeups, but this proved beyond doubt that there are lies, damned lies, and glowing self-publicity pieces in the computer press.

Conclusion

This article didn't quite follow the path I thought it would when I started it (but it has taken me nearly 8 months to write).

It seems that virtually all new projects, of a sufficiently large scale, are doomed to fail (and by fail I mean, exceed allocated budget, exceed allocated time, or be functionally incomplete). However, if it's possible to re-use the experience and some of the (hard) developments from the first project, a second, or offshoot, project can be an overwhelming success. It seems that the failure is a necessary part of the process; that the reduction in scope cannot be contemplated until the first project has proved beyond doubt that vaulting ambition, which o'erleaps itself is still rife.

I've talked to a number of people who were involved in the scope definition of that all important first project. In all cases, they said "if I knew then what I know now, I would have significantly reduced the scope". Of course, with hindsight everything is clear - making a stand at the initial project scoping for something much smaller than that required is not an easy task.

So, small teams, a lightweight methodology (it doesn't really matter what you choose, as long as the whole team buys into it), and realistic goals: that's the recipe for success. Sounds simple, doesn't it?

The challenge we face is that many projects are not do-able by a small team. The decision is: do you impose inefficient methodologies to ensure cookie-cutter programmers can be used, or do you restate the problem so that it is possible to break the problem down into areas that can be tackled by a small programming team? I know which I'd choose, but I've never had the chance to give it a try - probably just as well.

It remains uncertain how IT projects will fare in the near to medium term. Significant projects continue to be outsourced to country's where IT salaries are significantly cheaper than in the West, and delivery (so far) is no worse. Will there be scope for present-day UK employees to have the opportunity to experience the pit of despair (but also the money) that my generation found so uplifting?

My opinion, for what it's worth, is that small-scale projects will continue to operate in the UK (and other countries in the West). The economies of scale necessary to make outsourcing financially attractive simply are not there for anything less than a major project. However, as we know, the wheel turns; what may happen in the long term is anyone's guess.