When To Refactor Your Code?

How to make refactoring part of your Definition of Done

0*mXSiZNgcMoJe kT6 — Photo by rtisanyb on Unsplash

Writing code is an iterative process. The first iteration is usually not the best result. Grooming and polishing ✨ are needed before the code is ready to share with the world (and your future self).

There is a saying in software development that illustrates the importance of polishing code:

Make it work
Make it pretty
Make it fast

The first step is clear: your code should solve a particular problem.
But what if your program’s input is huge and requires clever code optimizations to reach the required performance?

Maybe you would be tempted to skip step two and jump immediately to step three. If you do so, you’ll find out that trying to improve the performance of unpolished, difficult-to-read spaghetti🍝 code, is much harder to do than improving the performance of polished, neatly organized, and readable code.

But what is polished and readable code? How much polishing is enough? And how do you prevent spending too much time on polishing, leaving too little time for solving real problems?

Keep reading to find an answer to these questions!

How functions grow

All software that is useful grows. In fact, the more useful a piece of software is, the faster it grows. There is not much developers can do about this growth but they can influence the way in which a codebase grows.

To see how software grows, and in which way it grows, let’s look how a single Python function grows.

The function we’re going to look at measures the volume of a Python codebase. For example, say you want to know the size of a very popular Python repository on GitHub: FastAPI. The function below (count_lines) takes a path as an argument and counts all .py files and the total number of lines of these files:

def count_lines(path):
    file_count = 0
    line_count = 0
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.lower().endswith('.py'):
                file_count += 1
                with open(os.path.join(root, file), 'r') as codefile:
                    line_count += len(codefile.readlines())
    print(f'Found {line_count} lines in {file_count} files.')

The function above is only 10 lines of code and is not too difficult to understand. If you run it over the FastAPI repository it will print:

Found 94291 lines in 1167 files.

Wow, that’s a lot of code in a lot of files.
Of course not every line in a Python file is a code line, some lines are whitespace or comment lines.

Let’s improve our function to only count lines with code:

def count_lines(path):
    file_count = 0
    line_count = 0
    for root, dirs, files in os.walk(path):
        for file in files:
            if file.lower().endswith('.py'):
                file_count += 1
                with open(os.path.join(root, file), 'r') as codefile:
                    lines = codefile.readlines()
                file_line_count = 0
                for line in lines:
                    stripped_line = line.strip()
                    if stripped_line == '' or stripped_line[0] == '#':
                        continue
                    file_line_count += 1
                line_count += file_line_count
    print(f'Found {line_count} lines in {file_count} files.')

The count_lines function is now 17 lines of code, and the output is:

Found 81673 lines of code in 1167 files.

OK, fewer lines but still a pretty large codebase.
Looking at the FastAPI repository we can see that it contains directories with Python files for documentation examples (docs_src), and a large amount of test code appears to be generated.
Let’s add a parameter to our function that can be used to exclude these directories:

def count_lines(path, exclude_dirs):
    file_count = 0
    line_count = 0
    for root, dirs, files in os.walk(path):
        rel_path = os.path.relpath(root, path)
        if rel_path.startswith(tuple(exclude_dirs)):
            continue
        for file in files:
            if file.lower().endswith('.py'):
                file_count += 1
                with open(os.path.join(root, file), 'r') as codefile:
                    lines = codefile.readlines()
                file_line_count = 0
                for line in lines:
                    stripped_line = line.strip()
                    if stripped_line == '' or stripped_line[0] == '#':
                        continue
                    file_line_count += 1
                line_count += file_line_count
    print(f'Found {line_count} lines of code in {file_count} files.')

Our function has now doubled in size (from 10 to 20 lines of code), but the output is a lot better:

Found 8321 lines of code in 49 files.

Besides the total size of a codebase, it’s always interesting to know what the larger files in a codebase are. Let’s extend the reporting part of our function:

def count_lines(path, exclude_dirs):
    measurements = []
    for root, dirs, files in os.walk(path):
        rel_path = os.path.relpath(root, path)
        if rel_path.startswith(tuple(exclude_dirs)):
            continue
        print(rel_path)
        for file in files:
            if file.lower().endswith('.py'):
                with open(os.path.join(root, file), 'r') as codefile:
                    lines = codefile.readlines()
                line_count = 0
                for line in lines:
                    stripped_line = line.strip()
                    if stripped_line == '' or stripped_line[0] == '#':
                        continue
                    line_count += 1
                measurements.append((line_count, os.path.join(rel_path, file)))
    total_lines = sum([m[0] for m in measurements])
    print(f'Found {total_lines} lines of code in {len(measurements)} files.')
    measurements.sort(reverse=True)
    for m in measurements[:10]:
        print(str(m[0]).rjust(4) + ' ' + m[1])

This version of the count_lines function has 23 lines of code and outputs the following:

Found 8321 lines of code in 49 files.
1283 fastapi/routing.py
 875 fastapi/applications.py
 726 fastapi/dependencies/utils.py
 708 fastapi/params.py
 584 .github/actions/people/app/main.py
 530 fastapi/param_functions.py
 514 fastapi/_compat.py
 480 fastapi/openapi/utils.py
 426 fastapi/openapi/models.py
 332 .github/actions/notify-translations/app/main.py

Apparently fastapi/routing.py is one of the most important code files in the FastAPI repository.

Legacy code and maintainability

The example above illustrates how a piece of code grows (from 10, to 17, to 20, to 23 lines of code) through small improvements and new functionalities.

Larger functions, and more code in general, become a bigger and bigger problem as time passes by. Code is read much more often than it is written. Every time an improvement or feature request comes in, a developer (you?) must find his or her way in the codebase to find the location where a change is needed, understand how the existing code works, make the change, and test the change afterward. This is the maintainability problem. In extreme cases, your codebase can become a “legacy” codebase: extremely costly and extremely frustrating to change.

FastAPI is very popular (over 61k stars on GitHub) and a very actively developed codebase. It is not a legacy codebase. However, it also contains a fair share of functions that grew out of hand, like the 177 lines of code get_openapi_pathfunction.

The joy of refactoring

Refactoring is fun. When our code works, it’s time to make it pretty.
Pure refactoring techniques — that only change the code structure, not its behavior — go a long way and modern IDEs have great built-in support for them.

Let’s polish ✨ our 23 lines of code function by applying a straightforward extract method refactoring and breaking it down into 3 shorter functions:

measure: 11 lines of code
count_lines_of_code: 8 lines of code
generate_report: 8 lines of code

Notice how the refactoring not only improves the naming of our functions but also makes it easier to unit-test the functions in isolation.

The refactored code is listed below:

def measure(path, exclude_dirs):
    measurements = []
    for root, dirs, files in os.walk(path):
        rel_path = os.path.relpath(root, path)
        if rel_path.startswith(tuple(exclude_dirs)):
            continue
        for file in files:
            if file.lower().endswith('.py'):
                with open(os.path.join(root, file), 'r') as codefile: 
                    loc = count_lines_of_code(codefile.readlines())
                measurements.append((loc, os.path.join(rel_path, file)))
    print(generate_report(measurements))

def count_lines_of_code(lines):
    result = 0
    for line in lines:
        stripped_line = line.strip()
        if stripped_line == '' or stripped_line[0] == '#':
            continue
        result += 1
    return result

def generate_report(measurements):
    result = ''
    total_lines = sum([m[0] for m in measurements])
    print(f'Found {total_lines} lines of code in {len(measurements)} files.')
    measurements.sort(reverse=True) 
    for m in measurements[:10]:
        result += str(m[0]).rjust(4) + ' ' + m[1] + '\n'
    return result

Your Definition of Done: When to Refactor?

A Definition of Done provides developers with a clear set of guidelines that define when a piece of code they’ve been working on is “Done” (i.e. ready for inclusion in the mainline of the product). Especially in developer teams a Definition of Done is used to set the bar for new code entering the codebase.

Examples of items on the Definition of Done list are:

Code is peer-reviewed
Code-style checks pass
Unit-tests pass
Unit-test coverage is higher than 80%

Adding “Code is refactored” to this list prevents the codebase from becoming a legacy codebase. But what is enough refactoring?
In the example above, making a small function slightly bigger does not mean it should be immediately refactored. Premature refactoring (making it pretty) can stand in the way of getting things done (making it work). But at some point, the size of the function becomes a risk to the correctness and future maintenance of the code. This risk can actually be estimated:

Functions with 1–15 lines of code: no risk (easy code)
Functions with 16–30 lines of code: low risk (verbose code)
Functions with 31–60 lines of code: medium risk (hard-to-maintain code)
Functions with more than 60 lines of code: high risk (unmaintainable code)

Based on the list above you might be tempted to put the limit for functions on 15 lines of code and demand all new code to be refactored until it satisfies this guideline. Unfortunately, perfect can be the enemy of good, like with so many other engineering guidelines. Just as a 100% unit-test coverage is impractical, a hard limit on function size might put developers in a “writers-block” and eventually leads to a rejection of the guideline.

So the question remains: when to refactor?

Software that tells you when to refactor: CodeLimit

CodeLimit is a tool for developers with one goal: it tells the developer when it’s time to refactor.
You can compare the concept of Code Limit with a Speed Limit, or really any kind of limit, where the risk increases proportionate to the measurement. These limits keep things safe for yourself and others.

CodeLimit measures the lines of code for each function in your codebase and assigns each function to a category:

Functions 1–15 lines of code, easy code
Functions 16–30 lines of code, verbose code
Functions 31–60 lines of code, hard-to-maintain code
Functions with more than 60 lines of code, unmaintainable code

A color coding from green to red is used for each category.
Below is a screenshot of CodeLimit with the measurements for FastAPI:

When to refactor your code? 2 — CodeLimit showing measurements for FastAPI

As you can see above FastAPI has 4 functions in the unmaintainable category.

CodeLimit also tells you when refactoring is not needed:

1*ce P5 UtkRoNSkn5sujAuw — CodeLimit showing measurements for CodeLimit

How to run CodeLimit

CodeLimit is easy to run on your development environment and requires no configuration if you have a typical Python project setup. Currently, CodeLimit only supports Python, but wider language support will follow soon.

The best way to run CodeLimit is as a pre-commit hook so it alarms you during development when it’s time to refactor:

-   repo: https://github.com/getcodelimit/codelimit
    rev: v0.3.0
    hooks:
    - id: codelimit

CodeLimit is intended to be used alongside formatting, linters, and other hooks that improve the consistency and quality of your code, such as Black, Ruff, and MyPy. As an example see the pre-commit-config.yaml from CodeLimit itself.

If you want to explore how function length looks for your codebase, CodeLimit also provides a standalone TUI (Text-based User Interface) that works in any terminal on any platform. To install the standalone version of CodeLimit for your default Python installation run:

python -m pip install codelimit

Answering the question: When to Refactor?

To quote Martin Fowler: “Any fool can write code that a computer can understand. Good programmers write code that humans can understand.”
Refactoring is a technique that improves the maintainability of your code, not only for others but also for your future self.

Knowing when to refactor and when refactoring is not needed is difficult. Time pressure can cause code to be shipped too early, slowly turning the codebase into a legacy codebase.
Focussing too much on code polishing can lead to frustrating code reviews.

CodeLimit encourages and challenges developers to apply just the right amount of refactoring. It will nudge you when it’s time to refactor and can be part of the team’s Definition of Done. CodeLimit is unlike many of the existing code quality tools, that measure a plethora of things. Instead, CodeLimit does one thing: tell you when it’s time to refactor.

Get started today with CodeLimit: https://github.com/getcodelimit/codelimit