Why? DRY!

I like to talk. A lot. And about just about everything. As my career develops, I find myself coaching more and that means repeating speeches about the subjects that matter the most to me. Since one of my favorite software engineering practices is Don’t Repeat Yourself (or DRY), I thought I should put it to work in my life as well as my code. I started this blog to have somewhere to point people and avoid repeating myself.

The basic principle of DRY in software is that a given piece of information should be reperesented only once in the system. The most obvious use of this rule is that “copy and paste” should be avoided in favor of extracting methods and classes. These duplications are rampant in legacy codebases and those developed on short schedules. The reason to avoid duplication is that it makes change difficult and risky. For example, a buggy piece of code is replicated in three places in a codebase and one of the instances is found by QA. Now there are really three bugs in the application but only one is known. The developer fixing the detected bug will either fix just one, leaving bugs in the system, or notice the others and have to fix them as well. Futhermore, refactoring code becomes more difficult with duplication. Renaming a variable or method results in more touches.

Another dimension to DRY crosses the boundaries of code, documentation and tests. I believe that one of the purposes of a comprehensive automated test suite is to serve as documentation of behavior and developer intent. Often, these same ideas are also contained in a design document or some other non-compiled written piece. The issue here is similar to the copy/paste problem in that one version may be updated and the others forgotten. Most often, the code is updated and the documents are not. When tests are used in place of documentation, they must be kept in sync with the code or they do not pass. A problem that arises with this technique is that customers and other non-technical folks don’t like to read tests. Two solutions I’ve used to solve this are technologies like JBehave or scripts to parse tests and generate documentation.

DRY is not just about text replication. The really nefarious DRY violations are bigger and harder and very expensive to fix. These are the design duplications and data duplications. Design duplications show up when similar (or even the same) concepts are implemented differently within an application. For example, in a data driven application a view that shows a specific table or logical piece of data should be implemented only once. It can be parameterized if different instances need slightly different behavior, but the base behavior should not be replicated. To do so requires that any change (for example adding a column) requires many touches to the software, increasing development time, increasing the risk for bugs and boring developers. Data duplications are similar. With some exceptions for performnace, a piece of data should be represented exactly once in the database. Aggregations or calculations should be performed as needed by the applicaton. To do otherwise again opens the opportunity for bugs and requires that the application know when to refresh the stored calculation. This often leads to bugs or other coding style (e.g. Single Responsibility Principle) violations.

To sum up, and at the risk of being a hypocrite, Don’t Repeat Yourself! Keep duplication out of your code, out of your communication, out of your data and out of your design. You’ll save effort, have fewer bugs, and have more time to do interesting work!

Published: January 12 2013

category: