28 July 2011

Reinventing and Glue

Engineers are prone to something called NIH--Not Invented Here--because that is what engineers like to do: figure out how to do something and implement it.  They'd like to "Reinvent the world".  That's hyperbole, of course, but it's a tendency that their managers try to get them to resist:  If somebody has already done it, the current team doesn't need to do any new work and chances are good that the previous, ostensibly successful implementers, did a better job than the current team would.  By and large, this is good advice.  But not always.

A lot of the time, the old implementation doesn't quite exactly do what the new one needs to.  To repurpose the old work, we have to invent a new interface, to be able to fit it into the new work.  In software, we call that "glue code", or just "glue".    It's rare that glue is simple.  If the old work was well designed and modularized, it's at least a few lines of code.  It's usually quite a lot more than that, and surprisingly often, it turns out to be more than the code that's being reused.   The code that's being reused doesn't have to be tested, says management.  Perhaps, but the glue does have to be tested just as hard, and since the fit isn't perfect, the old code does have to be tested.  Suppose the old component was 2000 lines long.  If the glue is anything up to, say, 300 lines or so and the function really is similar, it's probably worth trying to make the glue work.  But if the glue is more than a thousand lines, you're really getting into diminishing returns.  It's very likely that reinventing the thing from scratch will work better for the new application, and even if it's 2000 lines long, you've eliminated the need for glue, saving a thousand lines and a bunch of testing.

There are other cases too: sometimes the old implementation wasn't all that well done.  Fred Brooks, in The Mythical Man Month, talks about second systems syndrome.  When doing the first implementation, it's all they could do to get it to work at all, and they likely were changing algorithms and interfaces all along.  It's quite likely the first implementation is kludgey and not too good.  The second time, they know better, but now they have a whole bunch of new ideas and they get a whole lot of bloat.  On the third try, they're getting closer to the Goldilocks point: not too big nor too little, refined, shaken down, appropriate algorithms, well thought out interfaces.   If the code you're trying to reuse is a first or second implementation, it's very likely you're just propagating a bad thing.

Finally, there are a lot of things that are more appropriate being reimplemented.  For example, simple searches or insertions that occur at user time.   This is such a simple algorithm that nearly everyone who has written the code has done it several times, and won't get it wrong.   A new implementation will fit the new codebase perfectly and avoid any mingled source-tree complications.  Even if uses totally naive algorithms, it's often better to be simple than optimal, and it's likely that there's no actual algorithmic advantage to be gained (e.g: a bubble sort is so much simpler that it's actually faster than NlogN sorts if there are fewer than a dozen or so things to be sorted)

No comments:

Post a Comment