DRY vs KISS

Published Jun 9, 2020

Keep it simple and stupid

Two Coding Principles: DRY and KISS

There are two important coding principles that are very useful for improving code quality: DRY (Don’t Repeat Yourself) and KISS (Keep It Simple, Stupid).

Here is my simple and stupid summary of the two principles:

  • DRY says if you have similar pieces of code in many places, extract and reuse them.
  • KISS says to keep your code as plain as possible. Other people and your future self will thank you when they need to maintain it.

These are great coding guidelines. However, what I find is that they are not always compatible with each other, and I often need to pick one and give up the other.

The Trade-Off

If you look closely into these two principles, you’ll quickly realize they are indeed in conflict: If you need to extract common parts of your code into a separate function or module, you are making it more complicated, because when you need to understand the code, you have to go through other functions.

For example, consider the following two functions:

def inc_and_sum(list_of_numbers):
    list_plus_one = [n + 1 for n in list_of_numbers]
    return sum(list_plus_one)

def inc_and_square_sum(list_of_numbers):
    list_plus_one = [n + 1 for n in list_of_numbers]
    return sum([n * n for n in list_plus_one])

Extract list_plus_one into a sub-function and reuse:

def inc_and_sum(list_of_numbers):
    list_plus_one = _inc_each_element_by_one(list_of_numbers)
    return sum(list_plus_one)

def inc_and_square_sum(list_of_numbers):
    list_plus_one = _inc_each_element_by_one(list_of_numbers)
    return sum([n * n for n in list_plus_one])

def _inc_each_element_by_one(list_of_numbers):
    return [n + 1 for n in list_of_numbers]

Before extracting _inc_each_element_by_one, we can easily understand what inc_and_sum does without leaving the function. After the extraction, we’ll have to go to the _inc_each_element_by_one sub-function to understand what it does. The naming helps in this contrived example, but in many situations, you do need to at least go through the comments of the sub-function to understand what it does.

So DRY is anti-KISS to a certain degree. Then how do we decide when to follow it instead of KISS?

To answer this question, we have to understand why DRY is useful. On the surface, it just says don’t repeat yourself, but it doesn’t say why repeating yourself is bad. Copy-and-paste is not that hard after all…

Here are two important reasons:

  1. The repeated code might need some changes in the future, and you’ll need to change it in all places if it happens. This is difficult and prone to errors when you have a big codebase.
  2. You want to test the repeated code, and you don’t want to write testing code multiple times.

If these two reasons apply, the benefits of DRY could outweigh its costs. However, the line could be blurry in the real world, which makes it non-trivial to make a good decision. See below for two examples.

Real-World Example One

Here is a front-end example. Consider the following react component:

<Row>
    <Col span={8}>
        <Form.Item label="Hours">
            <Input
                defaultValue={1}
                onBlur={updateHours}
            />
        </Form.Item>
    </Col>
    <Col span={8}>
        <Form.Item label="Price">
            <Input
                defaultValue={10}
                onBlur={updatePrice}
            />
        </Form.Item>
    </Col>
    <Col span={8}>
        <Form.Item label="Notes">
            <Input
                defaultValue=""
                onBlur={updateNotes}
            />
        </Form.Item>
    </Col>
</Row>

The three Col components in this example are very similar to each other, and it’s really tempting to summon DRY:

interface Props {
    label: string;
    defaultValue: string | number;
    onBlur: (event: React.FocusEvent<HTMLInputElement>) => void;
}

function InputColumn(props: Props) {
    return (
        <Col span={8}>
            <Form.Item label={props.label}>
                <Input
                    defaultValue={props.defaultValue}
                    onBlur={props.onBlur}
                />
            </Form.Item>
        </Col>
    );
}

The main component then simplifies to:

<Row>
    <InputColumn
        label="Hours"
        defaultValue={1}
        onBlur={updateHours}
    />
    <InputColumn
        label="Price"
        defaultValue={10}
        onBlur={updatePrice}
    />
    <InputColumn
        label="Notes"
        defaultValue=""
        onBlur={updateNotes}
    />
</Row>

Much cleaner, right? Yes, it is, but there are troubles lurking in the dark and they could bite you later. Let’s check the two reasons why DRY is beneficial:

  1. Do we need the common code to be in sync when changes happen in the future? In this case, the answer is often no. Let’s say we need to add a min={1} property and a max={24} property to the Input component for Hours. It is a mess to update InputColumn: We have to add two optional properties min and max for InputColumn, which are only used for the Hours column. If we need to add more properties to the Price and Notes columns, the InputColumn component could explode very quickly with many optional properties.
  2. Do we need to test InputColumn? Probably not. It’s a combo of UI components with no business logic.

Tempting as it is to deduplicate the code here, we should just keep it as is. KISS outweighs DRY by a great margin here.

In reality, I deduplicated the code and had to revert it later when more properties were needed for each of the three columns.

Real-World Example Two

Here is another example. We train and run over 2,000 prediction models in production every hour, and the code for model training and running shares a similar initialization structure:

def train():
    logging.basicConfig(level=logging.INFO)
    network_lib.config_requests(os.environ['KEY_FOR_SOME_SECRET_TOKEN'])

    parser = argparse.ArgumentParser()
    parser.add_argument('location_id', type=int)
    args = parser.parse_args()

    model = _trained_prediction_model(args.location_id)
    _save_model(model)

def run():
    logging.basicConfig(level=logging.INFO)
    network_lib.config_requests(os.environ['KEY_FOR_SOME_SECRET_TOKEN'])

    parser = argparse.ArgumentParser()
    parser.add_argument('location_id', type=int)
    args = parser.parse_args()

    model = _get_prediction_model_or_none(args.location_id)
    if model is None:
        prediction_logger.warn("Model is not ready for location {}.".format(args.location_id))
    else:
        model.run()

Note that the first five lines of the two functions are exactly the same. DRY! The alarm in your mind goes off immediately.

However, before jumping to extract the common code, let’s think again. In the original code, the setup steps are simple and clear (configure logging, initialize networking, parse arguments). On the other hand, if we put the setup code into a sub-function, people have to leave the main function to understand what happens in the setup phase.

Now try to put it to our litmus test:

  1. Is the common code going to change in sync? Maybe, maybe not. For example, I might need to add an argument of a target time for each prediction run, but it’s not needed for model training. If the argument-parsing code were extracted into a sub-function, we’ll have to put that code back to each main function again.
  2. Do we need to test the common code? No. It’s just setup code, no testing needed.

This is an example that’s less obvious than the front-end one, but when in doubt, I’d side with KISS to keep the code simple and stupid.

It’s a Journey

It is common for a brand new engineer to copy-paste code around then make minor changes to make it work. I certainly did. Then until some day we learn that this is bad for code maintenance, and start to make friends with DRY.

However, it is also common for engineers at this stage to overuse DRY. The reason why we need DRY is often lost in coding.

It is important to always check whether we need to keep duplicate code in sync, and whether we need to test them separately. If not, or when in doubt, I’d leave it be and keep it simple, stupid. You can always refactor in the future when you change the duplicate code and find that they do need to stay in sync. The judgement of whether to apply DRY comes with both experience and foresight.

To this day, I still have a strong urge to go with DRY when I see duplicate code. I have to admit, there is a certain joy to it. But I am trying to give it more thoughts now, and start to get smiles when I need to make changes to duplicate but simple and stupid code.

  • coding
  • dry-principle
  • kiss-principle