Author: Based on the book by Robert C. Martin (Uncle Bob)
Title: Clean Code - A handbook of Agile Software Craftmanship
Definition of programming and code
Specifying requirements in such detail that a machine can execute them is
programming. Such a specification is code.
Ref.: Clean Code, R.C. Martin
Dirty code characteristics
dirty code:
(computing, derogatory) Software code that has had many editors with conflicting styles,
making it nearly impossible to maintain. That software has dirty code and we should not use
it.
[…]programming is a craft more than it is a science. To
write clean code, you must
first write dirty code and then clean it. […] Most freshman programmers
don’t follow this advice.
They believe that the primary goal is to get the program working. Once it’s
“working,”they move on to the next task, leaving the “working” program in whatever
state they finally got it to “work.”
Most seasoned programmers know that this is professional suicide.
― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship
Why keep the code clean
“Code is read much more often than it is written. Code should
always be written in a way that promotes readability.”
― Guido van Rossum, founder of Python, Python Enhancement Proposal (PEP 8)
# see ZEN of Python in python console
import this
“Indeed, the ratio of time spent reading versus writing is well over 10 to
1. We are
constantly reading old code as part of the effort to write new code. …[Therefore,] making it
easy to read makes it easier to write.”
― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship
- high expressiveness (including
meaningful names),
- early building of simple abstractions,
abstract method or class,
- test driven development (unit tests
related),
Code quality measurement
It’s been measured by WTF quantity per minute:
The Boy Scout Rule
The Boy Scouts of America have a simple rule that we can apply to our
profession:
“Leave the campground cleaner than you found it.“
The cleanup doesn’t have to be something big. Change one variable name for the better, break up
one function that’s a little too large, eliminate one small bit of duplication.
Names
Naming Conventions
Raw
camelCase
PascalCase
kebab-case
snake_case
fruits in basket
fruitsInBasket
FruitsInBasket
fruits-in-basket
fruits_in_basket
has error
hasError
HasError
has-error
has_error
is visible
isVisible
IsVisible
is-visible
is_visible
N6: Avoid Encodings
Names should not be encoded with type or scope information. Prefixes such as ‘m_’, ‘c’, ‘s’ or ‘i’
are useless in today’s environments.
Hungarian Notation (HN)
Majority of languages f.ex. Java, C# including CAPL are strongly typed, and editing environments
can detect a type error long before you can run a compile. So nowadays HN and other forms of
type encoding are simply handicaps.
Encodings are also a possibility that the encoding system will mislead the
reader.
PhoneNumber phoneString;
// name not changed when type changed!
G25: Replace Magic Numbers with Named
Constants
This is probably one of the oldest rules in software development. In general it is a bad
idea to have raw numbers in your code. Shall be hidden behind well-named constants.
Some constants are so easy to recognize that they don’t always need a named constant
to hide behind so long as they are used in conjunction with very self-explanatory code.
double milesWalked = feetWalked/5280.0; //well known, easy to recognize... maybe for an american
int dailyPay = hourlyRate * 8;
double circumference = radius * Math.PI * 2;
G25: Replace Magic Numbers with Named
Constants
The term “Magic Number” does not apply only to numbers. It applies to any token
that has a value that is not self-describing.
Classes, structures and objects should have noun or noun phrase
names like: Customer, WikiPage,
Account, and AddressParser. A class name should not be a verb.
Method Names
Methods should have verb or verb phrase names like:
postPayment, deletePage, or save.
Accessors, mutators, predicates should be prefixed with ‘get, set,
is‘.
Choosing good names for one argument function can go together with the
arguments. Function and argument should form a very nice verb/noun pair.
//is very evocative. Whatever this “name” thing is, it is being “written.”
write(name)
//even better name might be below, which tells us that the “name” thing is a “field.”
writeField(name)
N1: Choose Descriptive Names
Names in software are 90% of what make software readable. You need to take the time to choose
them wisely and keep them relevant.
public int x() {
int q = 0;
int z = 0;
for (int kk = 0; kk < 10; kk++) {
if (l[z] == 10)
{
q += 10 + (1[z + 1] + 1[z + 2]);
z += 1;
}
else if (l[z] + l[z + 1] == 10)
{
q += 10 + 1 [z + 2];
z += 2;
} else {
q += l[z] + l[z + 1];
z += 2;
}
}
return q
}
N1: Choose Descriptive Names
Use Intention-Revealing Names
The name of a variable, function, or class, should tell you why it exists, what it does,
and how it is used. So take care with your names.
public int score() {
int score = 0;
int frame = 0;
for (int frameNumber = 0; frameNumber < 10; frameNumber++) {
if (isStrike(frame)) {
score += 10 + nextTwoBallsForStrikeiframe);
frame += 1;
} else if (isSpareiframe)) {
score += 10 + nextBallForSpareiframe);
frame += 2;
} else {
score += twoBallsInFrame(frame);
frame += 2;
}
}
return score;
}
N3: Use Standard Nomenclature Where Possible
Names are easier to understand if they are based on existing convention or usage.
In Java, for example, functions that convert objects to string representations are often named:
toString();
Teams will often invent their own standard system of names for a particular project.
Your code should use the terms from this language extensively.
More you can use names that are overloaded with special meanings that are relevant
to your project, the easier it will be for readers to know what your code is talking about.
N4: Unambiguous Names
Choose names that make the workings of a function or variable unambiguous.
Names may seem long, however it’s explanatory value outweighs the length cost.
Avoid Disinformation
Do not refer to specific phrases, unless you know it’s truly represents it.
// The word list means something specific to programmers.
// account list is actually a dictionary, but by the name indicates on the list
var accountList = new Dictionary<string, int>();
// account list is truely a list
var accountList = new List<int>();
Beware of using names which vary in small ways.
// It is not ewasy to distinguish both variables
var yzControllerForEfficientHandlingOfStrings = new YZControllerForEfficientHandlingOfStrings();
var xyzControllerForEfficientStorageOfStrings = new XYZControllerForEfficientStorageOfStrings();
N4: Unambiguous Names
Example of disinformative names would be the use of lower-case l or uppercase
O as variable names,
especially in combination. The problem is that they look almost entirely like the constants one
and zero.
int a = l;
if ( O == l )
a = O1;
else
l = 01;
N4: Unambiguous Names
Use Pronounceable Names. This matters because programming is a social
activity. Hungarian notation and prefixes make names unpronouceable.
// bad naming
int XPstn = 1;
// better naming
int xPosition = 1;
Avoid Mental Mapping. This is a problem with single-letter variable
names.
// good loop counter
customers[i];
customers[j];
customers[k];
// bad loop counter
// Seems like number '1'
customers[l];
N4: Unambiguous Name
Don’t Be Cute
Choose clarity over entertainment value.
// bad naming
void hollyHandGranade();
// good naming
void deleteAllItems();
Pick One Word per Concept
Pick one word for one abstract concept and stick with it. For instance, it’s confusing
to have fetch, retrieve, and get as equivalent methods of different
classes.
N4: Unambiguous Name
If names must be different, then they should also mean something different.
Make Meaningful Distinctions
Number-series naming (a1, a2, .. aN) It is the opposite of
intentional naming. Such names are noninformative.
Noise words like Info, Data, the, a, an, variable
Noise words are redundant. The word variable should never appear in a variable name.
N5: Use Long Names
Use Long Names for Long Scopes
The length of a name should be related to the length of the scope. You can use very short
variable names for tiny scopes, but for big scopes you should use longer names.
def make_md_from_from_sfa_tokens(tokens) -> str:
md = '' # short scope
for token in tokens:
local_md = make_md_description_from_sfa_token(token)
md += local_md + '\r\n'
return md
Use Searchable Names
Single-letter names and numeric constants have a particular problem in that they are not
easy to locate across a body of text.
Code excercises
Code excercises
Functions
G30: Functions Should Do One Thing
Single responsibility rule
FUNCTIONS SHOULD DO ONE THING
THEY SHOULD DO IT WELL
THEY SHOULD DO IT ONLY
Sections within Functions
// Bad example, fuction pay() does 3 operations
public void pay() {
for (Employee e : employees) { // review data of each emploee
if (e.isPayday()) { // checks if emploee is allowd to pay
Money pay = e.calculatePay(); // then finally calculates the paymen and pays
e.deliverPay(pay);
}
}
}
G30: Functions Should Do One Thing
Functions that do one thing cannot be reasonably divided into sections.
// Better example
public void pay() {
for (Employee e : employees){
paylfNecessary(e);
}
}
private void payIfNecessary(Employee e) {
if (e.isPaydayi)){
calculateAndDeliverPay(e);
}
}
private void calculateAndDeliverPay(Employee e) {
Money pay = e .calculatePayi);
e.deliverPay(pay);
}
Small Functions
Functions should not be 100 lines long. Functions should hardly ever be 20
lines long.
G20: Function Names Should Say What They Do
Would you expect this to add five days to the date? Or is it weeks, or hours?
You can’t tell from the call what the function does.
// bad example
// Does it add 5 days, weeks or hours to the date?
Date newDate = date.add(5);
If the function adds five days to the date and changes the date:
//better solution, if it incraeses by 5 days the better solution coold be
date.increaseByDays(5);
G5: Duplication
This is one of the most important rules in this book, it’s been called DRY principle (Don’t
Repeat Yourself).
OO itself is a strategy for organizing modules and eliminating
duplication. Find and eliminate duplication wherever
you can.
More subtle duplications are the modules that have similar algorithms, but that don’t share
similar lines of code.
F1: Too Many Arguments
No argument is best, followed by one, two, and three. More than three is very
questionable and should be avoided. They take a lot of conceptual power. Arguments are
even harder from a testing point of view, because demands to write many test cases
to ensure that all the various combinations of arguments work properly. If there are no
arguments, this is trivial.
When a function seems to need more than two or three arguments, it is likely that some of those
arguments ought to be wrapped into a class of their own.
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
Reducing the number of arguments by creating objects out of them may seem like cheating, but it’s
not. When groups of variables are passed together, they are likely part of a concept.
G10: Vertical Separation
Variables and function should be defined close to where they are used.
Local variables
should be declared just above their first usage and should have a small vertical scope.
Dependent functions should be defined just below their first usage. If
one function calls another,
they should be vertically close, and the caller should be above the callee. This gives
the program a naturalflow.
G10: Vertical Separation
Reading Code from Top to Bottom: The Stepdown Rule
The code should be read like a top-down, every function to be followed by those at the next
level of abstraction so that we can read the program, descending one level of abstraction
at a time as we read down the list of functions. I call this The Stepdown Rule.
F4: Dead Function / G9: Dead Code
Dead code is code that isn’t executed. The problem with dead code is that after
awhile it starts to smell.
This is because dead code is not completely updated when designs change.
It still compiles, but it does not follow newer conventions or rules. It was written at a time
when the system was different.
When you find dead code, delete it from the system. Do not afraid, your source
code control system still remembers it.
G29: Avoid Negative Conditionals
Negatives are harder to understand than positives. So, when possible, conditionals
should be expressed as positives.
// Is worse, negative conditional
if (!buffer.shouldNotCompact())
// Is better, positive conditional
if (buffer.shouldCompact())
G35: Keep Configurable Data at High Levels
Constant such as a default or configuration value that is known
and expected
at a high level of abstraction, do not bury it in a low-level function.
Expose it as an argument to that low-level function called from the high-level function.
Comments
C3: Redundant Comment
A comment is redundant if it describes something that adequately describes itself.
Comments should say things that the code cannot say for itself.
i++;// increment i`
C3: Redundant Comment
Journal Comments
Sometimes people add a comment to the start of a module every time they edit it. These comments
accumulate as a kind of journal, or log, of every change that has ever been made. Nowadays, we
have source code control systems, these long journals make harder to read the code.
/*
* Changes (from 11-Oct-2001)
* --------------------------
* 11-Oct-2001 : Re-organised the class and moved it to new package
* com.jrefinery.date (DG);
* 05-Nov-2001 : Added a getDescription() method, and eliminated NotableDate
* class (DG);
* 12-Nov-2001 : IBD requires setDescription() method, now that NotableDate
* class is gone (DG); Changed getPreviousDayOfWeek(),
* getFollowingDayOfWeek() and getNearestDayOfWeek() to correct
* bugs (DG);
* 05-Dec-2001 : Fixed bug in SpreadsheetDate class (DG);
* 29-May-2002 : Moved the month constants into a separate interface
* (MonthConstants) (DG);
* 27-Aug-2002 : Fixed bug in addMonths() method, thanks to Nlevka Petr (DG);
* 03-Oct-2002 : Fixed errors reported by Checkstyle (DG);
* 13-Mar-2003 : Implemented Serializable (DG);
* 29-May-2003 : Fixed bug in addMonths method (DG);
* 04-Sep-2003 : Implemented Comparable. Updated the isInRange javadocs (DG);
* 05-Jan-2005 : Fixed bug in addYears() method (1096282) (DG);
*/
C3: Redundant Comment
Noise Comments
Sometimes you see comments that are nothing but noise. They restate the obvious and provide no
new information.
/**
* Default constructor.
*/
protected AnnualDateRule() {
}
/** The day of the month. */
private int dayOfMonth;
/**
* Returns the day of the month.
*
* @return the day of the month.
*/
//Give me a break!
public int getDayOfMonth() {
return dayOfMonth;
}
These comments are so noisy that we learn to ignore them, our eyes tend to
skip them over.
C3: Redundant Comment
Don’t Use a Comment When You Can Use a Function or a Variable
Consider the following stretch of code:
// does the module from the global list /mod/ depend on the
// subsystem we are part of
if (smodule
.getDependSubsystems()
.contains(subSysMod.getSubSystem())
){}
Sometimes programmers like to mark a particular position in a source file.
// Actions //////////////////////////////////
Right after eliminating long tail of slashes, sometimes it makes sense to use such marker to
gather certain functions together. But in general they are clutter that should be eliminated.
Use them occasionally, and only when the benefit is significant. If too many, our eyes learn to
ignore them.
C3: Redundant Comment
Closing Brace Comments
Although this might make sense for long functions with deeply nested structures, it serves only
to clutter the kind of small and encapsulated functions that we prefer. So if you find yourself
wanting to mark your closing braces, try to shorten your functions instead.
Source code control systems are very good at remembering who added what, when. There is no need
to pollute the code with little bylines.
C3: Redundant Comment
Too Much Information
The comment below was extracted from a module designed to test that a function could encode and
decode base64 coding.
It is good enough to have RFC number, someone reading this code has no need for the arcane
information contained in the comment.
/*
RFC 2045 - Multipurpose Internet Mail Extensions (MIME)
Part One: Format of Internet Message Bodies
section 6.8. Base64 Content-Transfer-Encoding
The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right, a
24-bit input group is formed by concatenating 3 8-bit input groups.
These 24 bits are then treated as 4 concatenated 6-bit groups, each
of which is translated into a single digit in the base64 alphabet.
When encoding a bit stream via the base64 encoding, the bit stream
must be presumed to be ordered with the most-significant-bit first.
That is, the first bit in the stream will be the high-order bit in
the first 8-bit byte, and the eighth bit will be the low-order bit in
the first 8-bit byte, and so on.
*/
C3: Redundant Comment
Mandated Comments
It is just plain silly to have a rule that says that every function must have a doc, or every
variable must have a comment. Comments like this just clutter up the code, propagate lies.
/**
*
* @param title The title of the CD
* @param author The author of the CD
* @param tracks The number of tracks on the CD
* @param durationInMinutes The duration of the CD in minutes
*/
public void addCD(String title, String author, int tracks, int durationInMinutes) {
CD cd = new CD();
cd.title = title;
cd.author = author;
cd.tracks = tracks;
cd.duration = duration;
cdList.add(cd);
}
C4: Poorly Written Comment
Inobvious Connection
The connection between a comment and the code it describes should be obvious. If you are
going to the trouble to write a comment, then at least you’d like the reader to be able to look
at the comment and the code and understand what the comment is talking about.
/*
* start with an array that is big enough to hold all the pixels
* (plus filter bytes), and an extra 200 bytes for header info
*/
this.pngBytes = new byte[((this.width + 1) * this.height * 3) + 200];
What is filter, what is its the relation to ‘+ 1‘ or ‘* 3‘ or
both of them? Is the pixel a ‘byte‘? Why header info takes ‘200
bytes‘?
C5: Commented-Out Code
Commented-Out Code
Don’t do this! Others who see that commented-out code won’t have the courage
to delete it. They’ll think it is there for a reason and is too important to delete. That code
sits there and rots, getting less and less relevant with every passing day. It calls functions
that no longer exist. It uses variables whose names have changed.
InputStreamResponse response = new InputStreamResponse();
response.setBody(formatter.getResultStream(), formatter.getByteCount());
// InputStream resultsStream = formatter.getResultStream();
// StreamReader reader = new StreamReader(resultsStream);
// response.setContent(reader.read(formatter.getByteCount()));
Source code control systems will remember the code for us. We don’t have to comment it out any
more. Just delete the code.
Code excercises
Foramtting
Blocks and Indenting
public class FitNesseServer implements SocketServer { private FitNesseContext
context; public FitNesseServerCFitNesseContext context) { this.context =
context; } public void serveCSocket s) { serveCs, 10000); } public void
serveCSocket s, long requestTimeout) { try { FitNesseExpediter sender = new
FitNesseExpediterCs, context); sender.setRequestParsingTimeLimitCrequestTimeout); sender.startC); }
catchCException e) { e.printStackTraceC); } } }
Code formatting is about communication
A source file is a hierarchy rather like an outline. To make this hierarchy of scopes visible,
we indent the lines of source code in proportion to their position in the hiearchy.
This allows them to quickly hop over scopes. Without indentation, programs would be virtually
unreadable by humans.
However it is good enough css/js code for the minification!
The indent level of a function should not be greater than one or
two, which makes the functions easier to read and understand.
Vertical Formatting
Comparing 7 different projects, files that are typically ~40 lines long, with an upper limit of
~500.
Small files are usually easier to understand than large files are. Desirable is building an app
having
from 200 to 500 lines per file, however, this should not be a hard rule.
Nearly all code is read left to right and top to bottom. Each line represents an expression or
a clause, and each group of lines represents a complete thought. Those thoughts should be
separated from each other with blank lines.
Below figure suggests that we should strive to keep lines short. The old Hollerith limit was 80,
currently lines edging out to 100 or even 120 are fine. Beyond that is probably just careless.
But monitors are too wide for that nowadays, and programmers can shrink the font so small that
they can get 200 characters across the screen. Don’t do that.
Figure. Java line width distribution
Horizontal Openness and Density
We use horizontal white space to associate things that are strongly related and disassociate
things that are more weakly related.
No spaces between function name and opening parenthesis
Horizontal Alignment
Horizontal alignment was common for assembly language programmers. That this kind of alignment is
not useful.
The alignment seems to emphasize the wrong things and leads my eye away from the true intent.
You are tempted
to read down the list of variable names without looking at their types.