Darek Łuczyński
Author: Based on the book by Robert C. Martin (Uncle Bob)
Title: Clean Code - A handbook of Agile Software Craftmanship
Specifying requirements in such detail that a machine can execute them is programming. Such a specification is code.
Ref.: Clean Code, R.C. Martin
(computing, derogatory) Software code that has had many editors with conflicting styles, making it nearly impossible to maintain. That software has dirty code and we should not use it.
There are a few occasions where its okay to write dirty code:
[…]programming is a craft more than it is a science. To write clean code, you must first write dirty code and then clean it. […] Most freshman programmers don’t follow this advice. They believe that the primary goal is to get the program working. Once it’s “working,” they move on to the next task, leaving the “working” program in whatever state they finally got it to “work.” Most seasoned programmers know that this is professional suicide.
― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship
“Code is read much more often than it is written. Code should always be written in a way that promotes readability.”
― Guido van Rossum, founder of Python, Python Enhancement Proposal (PEP 8)
# see ZEN of Python in python console
import this
“Indeed, the ratio of time spent reading versus writing is well over 10 to 1. We are constantly reading old code as part of the effort to write new code. …[Therefore,] making it easy to read makes it easier to write.”
― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship
Ref.: https://dmitripavlutin.com/coding-like-shakespeare-practical-function-naming-conventions/
The Cost of Owning a Mess is that:
any change is not trivial.
We’ve all said we’d go back and clean it up later. Those days we didn’t know LeBlanc’s law:
Later equals never.
Ref.: https://www.quora.com/What-resources-could-I-read-about-Leblancs-law
- elegant,
- efficient,
- complete error handling,
- does one thing well,
- simple and direct,
- well-written prose,
- written by someone who cares,
- reduced duplications (copy & paste),
- high expressiveness (including meaningful names),
- early building of simple abstractions, abstract method or class,
- test driven development (unit tests related),
It’s been measured by WTF quantity per minute:
The Boy Scouts of America have a simple rule that we can apply to our profession:
“Leave the campground cleaner than you found it.“
The cleanup doesn’t have to be something big. Change one variable name for the better, break up one function that’s a little too large, eliminate one small bit of duplication.
Raw | camelCase | PascalCase | kebab-case | snake_case |
---|---|---|---|---|
fruits in basket | fruitsInBasket | FruitsInBasket | fruits-in-basket | fruits_in_basket |
has error | hasError | HasError | has-error | has_error |
is visible | isVisible | IsVisible | is-visible | is_visible |
Names should not be encoded with type or scope information. Prefixes such as ‘m_’, ‘c’, ‘s’ or ‘i’ are useless in today’s environments.
Majority of languages f.ex. Java, C# including CAPL are strongly typed, and editing environments can detect a type error long before you can run a compile. So nowadays HN and other forms of type encoding are simply handicaps.
Encodings are also a possibility that the encoding system will mislead the reader.
PhoneNumber phoneString;
// name not changed when type changed!
This is probably one of the oldest rules in software development. In general it is a bad idea to have raw numbers in your code. Shall be hidden behind well-named constants.
Some constants are so easy to recognize that they don’t always need a named constant to hide behind so long as they are used in conjunction with very self-explanatory code.
double milesWalked = feetWalked/5280.0; //well known, easy to recognize... maybe for an american
int dailyPay = hourlyRate * 8;
double circumference = radius * Math.PI * 2;
The term “Magic Number” does not apply only to numbers. It applies to any token that has a value that is not self-describing.
assertEquals(7777, Employee.find(“John Doe”).employeeNumber());
There are two magic numbers in this assertion. The first is obviously “7777”. The second magic number is “John Doe”.
It also turns out that “John Doe” represents hourly paid, test employee in that test database. So this test should really read:
assertEquals(HOURLY_EMPLOYEE_ID, Employee.find(HOURLY_EMPLOYEE_NAME).employeeNumber());
Classes, structures and objects should have noun or noun phrase names like: Customer, WikiPage, Account, and AddressParser. A class name should not be a verb.
Methods should have verb or verb phrase names like: postPayment, deletePage, or save. Accessors, mutators, predicates should be prefixed with ‘get, set, is‘.
Choosing good names for one argument function can go together with the arguments. Function and argument should form a very nice verb/noun pair.
//is very evocative. Whatever this “name” thing is, it is being “written.”
write(name)
//even better name might be below, which tells us that the “name” thing is a “field.”
writeField(name)
Names in software are 90% of what make software readable. You need to take the time to choose them wisely and keep them relevant.
public int x() {
int q = 0;
int z = 0;
for (int kk = 0; kk < 10; kk++) {
if (l[z] == 10)
{
q += 10 + (1[z + 1] + 1[z + 2]);
z += 1;
}
else if (l[z] + l[z + 1] == 10)
{
q += 10 + 1 [z + 2];
z += 2;
} else {
q += l[z] + l[z + 1];
z += 2;
}
}
return q
}
The name of a variable, function, or class, should tell you why it exists, what it does, and how it is used. So take care with your names.
public int score() {
int score = 0;
int frame = 0;
for (int frameNumber = 0; frameNumber < 10; frameNumber++) {
if (isStrike(frame)) {
score += 10 + nextTwoBallsForStrikeiframe);
frame += 1;
} else if (isSpareiframe)) {
score += 10 + nextBallForSpareiframe);
frame += 2;
} else {
score += twoBallsInFrame(frame);
frame += 2;
}
}
return score;
}
Names are easier to understand if they are based on existing convention or usage. In Java, for example, functions that convert objects to string representations are often named:
toString();
Teams will often invent their own standard system of names for a particular project. Your code should use the terms from this language extensively.
More you can use names that are overloaded with special meanings that are relevant to your project, the easier it will be for readers to know what your code is talking about.
Choose names that make the workings of a function or variable unambiguous. Names may seem long, however it’s explanatory value outweighs the length cost.
Avoid Disinformation
// The word list means something specific to programmers.
// account list is actually a dictionary, but by the name indicates on the list
var accountList = new Dictionary<string, int>();
// account list is truely a list
var accountList = new List<int>();
// It is not ewasy to distinguish both variables
var yzControllerForEfficientHandlingOfStrings = new YZControllerForEfficientHandlingOfStrings();
var xyzControllerForEfficientStorageOfStrings = new XYZControllerForEfficientStorageOfStrings();
Example of disinformative names would be the use of lower-case l or uppercase O as variable names, especially in combination. The problem is that they look almost entirely like the constants one and zero.
int a = l;
if ( O == l )
a = O1;
else
l = 01;
// bad naming
int XPstn = 1;
// better naming
int xPosition = 1;
// good loop counter
customers[i];
customers[j];
customers[k];
// bad loop counter
// Seems like number '1'
customers[l];
Choose clarity over entertainment value.
// bad naming
void hollyHandGranade();
// good naming
void deleteAllItems();
If names must be different, then they should also mean something different.
Make Meaningful Distinctions
The length of a name should be related to the length of the scope. You can use very short variable names for tiny scopes, but for big scopes you should use longer names.
def make_md_from_from_sfa_tokens(tokens) -> str:
md = '' # short scope
for token in tokens:
local_md = make_md_description_from_sfa_token(token)
md += local_md + '\r\n'
return md
Single-letter names and numeric constants have a particular problem in that they are not easy to locate across a body of text.
Single responsibility rule
- FUNCTIONS SHOULD DO ONE THING
- THEY SHOULD DO IT WELL
- THEY SHOULD DO IT ONLY
// Bad example, fuction pay() does 3 operations
public void pay() {
for (Employee e : employees) { // review data of each emploee
if (e.isPayday()) { // checks if emploee is allowd to pay
Money pay = e.calculatePay(); // then finally calculates the paymen and pays
e.deliverPay(pay);
}
}
}
Functions that do one thing cannot be reasonably divided into sections.
// Better example
public void pay() {
for (Employee e : employees){
paylfNecessary(e);
}
}
private void payIfNecessary(Employee e) {
if (e.isPaydayi)){
calculateAndDeliverPay(e);
}
}
private void calculateAndDeliverPay(Employee e) {
Money pay = e .calculatePayi);
e.deliverPay(pay);
}
Functions should not be 100 lines long. Functions should hardly ever be 20 lines long.
Would you expect this to add five days to the date? Or is it weeks, or hours? You can’t tell from the call what the function does.
// bad example
// Does it add 5 days, weeks or hours to the date?
Date newDate = date.add(5);
If the function adds five days to the date and changes the date:
//better solution, if it incraeses by 5 days the better solution coold be
date.increaseByDays(5);
This is one of the most important rules in this book, it’s been called DRY principle (Don’t Repeat Yourself).
OO itself is a strategy for organizing modules and eliminating duplication.
Find and eliminate duplication wherever you can.
More subtle duplications are the modules that have similar algorithms, but that don’t share similar lines of code.
No argument is best, followed by one, two, and three. More than three is very questionable and should be avoided. They take a lot of conceptual power. Arguments are even harder from a testing point of view, because demands to write many test cases to ensure that all the various combinations of arguments work properly. If there are no arguments, this is trivial.
When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own.
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
Reducing the number of arguments by creating objects out of them may seem like cheating, but it’s not. When groups of variables are passed together, they are likely part of a concept.
Variables and function should be defined close to where they are used. Local variables should be declared just above their first usage and should have a small vertical scope.
Dependent functions should be defined just below their first usage. If one function calls another, they should be vertically close, and the caller should be above the callee. This gives the program a naturalflow.
The code should be read like a top-down, every function to be followed by those at the next level of abstraction so that we can read the program, descending one level of abstraction at a time as we read down the list of functions. I call this The Stepdown Rule.
Dead code is code that isn’t executed. The problem with dead code is that after awhile it starts to smell. This is because dead code is not completely updated when designs change.
It still compiles, but it does not follow newer conventions or rules. It was written at a time when the system was different. When you find dead code, delete it from the system. Do not afraid, your source code control system still remembers it.
Negatives are harder to understand than positives. So, when possible, conditionals should be expressed as positives.
// Is worse, negative conditional
if (!buffer.shouldNotCompact())
// Is better, positive conditional
if (buffer.shouldCompact())
Constant such as a default or configuration value that is known and expected at a high level of abstraction, do not bury it in a low-level function. Expose it as an argument to that low-level function called from the high-level function.
A comment is redundant if it describes something that adequately describes itself. Comments should say things that the code cannot say for itself.
i++;// increment i`
Sometimes people add a comment to the start of a module every time they edit it. These comments accumulate as a kind of journal, or log, of every change that has ever been made. Nowadays, we have source code control systems, these long journals make harder to read the code.
/*
* Changes (from 11-Oct-2001)
* --------------------------
* 11-Oct-2001 : Re-organised the class and moved it to new package
* com.jrefinery.date (DG);
* 05-Nov-2001 : Added a getDescription() method, and eliminated NotableDate
* class (DG);
* 12-Nov-2001 : IBD requires setDescription() method, now that NotableDate
* class is gone (DG); Changed getPreviousDayOfWeek(),
* getFollowingDayOfWeek() and getNearestDayOfWeek() to correct
* bugs (DG);
* 05-Dec-2001 : Fixed bug in SpreadsheetDate class (DG);
* 29-May-2002 : Moved the month constants into a separate interface
* (MonthConstants) (DG);
* 27-Aug-2002 : Fixed bug in addMonths() method, thanks to Nlevka Petr (DG);
* 03-Oct-2002 : Fixed errors reported by Checkstyle (DG);
* 13-Mar-2003 : Implemented Serializable (DG);
* 29-May-2003 : Fixed bug in addMonths method (DG);
* 04-Sep-2003 : Implemented Comparable. Updated the isInRange javadocs (DG);
* 05-Jan-2005 : Fixed bug in addYears() method (1096282) (DG);
*/
Sometimes you see comments that are nothing but noise. They restate the obvious and provide no new information.
/**
* Default constructor.
*/
protected AnnualDateRule() {
}
/** The day of the month. */
private int dayOfMonth;
/**
* Returns the day of the month.
*
* @return the day of the month.
*/
//Give me a break!
public int getDayOfMonth() {
return dayOfMonth;
}
These comments are so noisy that we learn to ignore them, our eyes tend to skip them over.
Consider the following stretch of code:
// does the module from the global list /mod/ depend on the
// subsystem we are part of
if (smodule
.getDependSubsystems()
.contains(subSysMod.getSubSystem())
){}
This could be rephrased without the comment as:
ArrayList moduleDependees = smodule.getDependSubsystems();
String ourSubSystem = subSysMod.getSubSystem();
if (moduleDependees
.contains(ourSubSystem)
){}
Sometimes cleanerCode != smarterCode
Sometimes programmers like to mark a particular position in a source file.
// Actions //////////////////////////////////
Right after eliminating long tail of slashes, sometimes it makes sense to use such marker to gather certain functions together. But in general they are clutter that should be eliminated. Use them occasionally, and only when the benefit is significant. If too many, our eyes learn to ignore them.
Although this might make sense for long functions with deeply nested structures, it serves only to clutter the kind of small and encapsulated functions that we prefer. So if you find yourself wanting to mark your closing braces, try to shorten your functions instead.
while ((line = in.readLine()) != null) {
lineCount++;
charCount += line.length();
String words[] = line.split("\\W");
wordCount += words.length;
} //while
/* Added by Rick */
Source code control systems are very good at remembering who added what, when. There is no need to pollute the code with little bylines.
The comment below was extracted from a module designed to test that a function could encode and decode base64 coding.
It is good enough to have RFC number, someone reading this code has no need for the arcane information contained in the comment.
/*
RFC 2045 - Multipurpose Internet Mail Extensions (MIME)
Part One: Format of Internet Message Bodies
section 6.8. Base64 Content-Transfer-Encoding
The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right, a
24-bit input group is formed by concatenating 3 8-bit input groups.
These 24 bits are then treated as 4 concatenated 6-bit groups, each
of which is translated into a single digit in the base64 alphabet.
When encoding a bit stream via the base64 encoding, the bit stream
must be presumed to be ordered with the most-significant-bit first.
That is, the first bit in the stream will be the high-order bit in
the first 8-bit byte, and the eighth bit will be the low-order bit in
the first 8-bit byte, and so on.
*/
It is just plain silly to have a rule that says that every function must have a doc, or every variable must have a comment. Comments like this just clutter up the code, propagate lies.
/**
*
* @param title The title of the CD
* @param author The author of the CD
* @param tracks The number of tracks on the CD
* @param durationInMinutes The duration of the CD in minutes
*/
public void addCD(String title, String author, int tracks, int durationInMinutes) {
CD cd = new CD();
cd.title = title;
cd.author = author;
cd.tracks = tracks;
cd.duration = duration;
cdList.add(cd);
}
The connection between a comment and the code it describes should be obvious. If you are going to the trouble to write a comment, then at least you’d like the reader to be able to look at the comment and the code and understand what the comment is talking about.
/*
* start with an array that is big enough to hold all the pixels
* (plus filter bytes), and an extra 200 bytes for header info
*/
this.pngBytes = new byte[((this.width + 1) * this.height * 3) + 200];
What is filter, what is its the relation to ‘+ 1‘ or ‘* 3‘ or both of them? Is the pixel a ‘byte‘? Why header info takes ‘200 bytes‘?
Don’t do this! Others who see that commented-out code won’t have the courage to delete it. They’ll think it is there for a reason and is too important to delete. That code sits there and rots, getting less and less relevant with every passing day. It calls functions that no longer exist. It uses variables whose names have changed.
InputStreamResponse response = new InputStreamResponse();
response.setBody(formatter.getResultStream(), formatter.getByteCount());
// InputStream resultsStream = formatter.getResultStream();
// StreamReader reader = new StreamReader(resultsStream);
// response.setContent(reader.read(formatter.getByteCount()));
Source code control systems will remember the code for us. We don’t have to comment it out any more. Just delete the code.
public class FitNesseServer implements SocketServer { private FitNesseContext
context; public FitNesseServerCFitNesseContext context) { this.context =
context; } public void serveCSocket s) { serveCs, 10000); } public void
serveCSocket s, long requestTimeout) { try { FitNesseExpediter sender = new
FitNesseExpediterCs, context); sender.setRequestParsingTimeLimitCrequestTimeout); sender.startC); }
catchCException e) { e.printStackTraceC); } } }
A source file is a hierarchy rather like an outline. To make this hierarchy of scopes visible, we indent the lines of source code in proportion to their position in the hiearchy. This allows them to quickly hop over scopes. Without indentation, programs would be virtually unreadable by humans. However it is good enough css/js code for the minification!
Comparing 7 different projects, files that are typically ~40 lines long, with an upper limit of ~500. Small files are usually easier to understand than large files are. Desirable is building an app having from 200 to 500 lines per file, however, this should not be a hard rule.
Figure: File length distributions LOG scale (box height = sigma)
Nearly all code is read left to right and top to bottom. Each line represents an expression or a clause, and each group of lines represents a complete thought. Those thoughts should be separated from each other with blank lines.
def is_str_instance(v):
return isinstance(v, str)
def hex_str_to_int(input: str) -> int:
return int(input, 16)
def int_to_hex(input: int) -> hex:
return hex(int)
Below figure suggests that we should strive to keep lines short. The old Hollerith limit was 80, currently lines edging out to 100 or even 120 are fine. Beyond that is probably just careless. But monitors are too wide for that nowadays, and programmers can shrink the font so small that they can get 200 characters across the screen. Don’t do that.
Figure. Java line width distribution
We use horizontal white space to associate things that are strongly related and disassociate things that are more weakly related.
private void measureLine(String line) {
lineCount++;
int lineSize = line.length();
totalChars += lineSize;
lineWidthHistogram.addLine(lineSize, lineCount);
recordWidestLine(lineSize);
}
Horizontal alignment was common for assembly language programmers. That this kind of alignment is not useful. The alignment seems to emphasize the wrong things and leads my eye away from the true intent. You are tempted to read down the list of variable names without looking at their types.
public class FitNesseExpediter implements ResponseSender
{
private Socket socket;
private InputStream input;
private OutputStream output;
private Request request;
private Response response;
private FitNesseContext context;
protected long requestParsingTimeLimit;
private long requestProgress;
private long requestParsingDeadline;
private boolean hasError;
public FitNesseExpediter(Socket s,
FitNesseContext context) throws Exception
{
this.context = context;
socket = s;
input = s.getInputStream();
output = s.getOutputStream();
requestParsingTimeLimit = 10000;
}
}
Automatic reformatting tools usually eliminate this kind of alignment.