How to write text that is easy to understand

Posted by on 2011 Sep 02, Fri in Technology / programming, Thoughts, Education

Writing requirements for practical assignments is not easy. Anyone can do it, but it takes some skill to make them interesting, challenging and fun to tinker with.

I always try to make my specs special, by adding easter eggs, pictures or hidden references to jokes related to the material. I was very glad to find out that a student from an earlier group was interested in writing assignments for the new groups. Having received the assignment, I was able to compare our writing styles and revealed some interesting deltas. I am sharing my findings with you.

The requirements for good... requirements are:

no ambiguity - there should be one way to interpret the text - the right way;
no redundancy - too much text makes the task look complicated. That's good, if you want to scare your students;
clear wording - otherwise the assignment becomes difficult. Not due to its complexity, but because of the convoluted representation.

This story focuses on the last requirement.

Here's the original text, imagine it is an assignment you were given at the university.

ADVANCED DESKTOP WIKIPEDIA SEARCHER:

The simplest way to search the Wikipedia is to go to its home page and
type the keywords into the search box. This often takes too much
time/many actions, this is why there are a number of browser plug-ins
or stand-alone command-line programs which allow to search the
Wikipedia in a less cumbersome manner.

Your mission is to develop a stand-alone application which takes the
following input:

the keywords to search for;

a number M (less than the number N of keywords);

a boolean value InOrder;

If M is not given it should default to N. If InOrder is not given, it
should default to False.

The application should produce links to at most the first K pages
matching the search criterion (K may be hard-coded, but had better be
a configurable value). The following conditions apply:

only the pages with paragraphs which contain M out of N keywords
match the search criterion;

if InOrder == True only the pages with paragraphs which contain the
keywords in the order in which they were specified match the search
criterion;

the search should only be done within the body of the page,
excluding the control elements.

The links may be accompanied by the corresponding matching paragraphs;
the number of paragraphs to show (which is greater or equal to zero)
is at the student's discretion.

Hint: Crawling the Wikipedia to find the pages matching the search
criterion is just too brutal; consider filtering the results produced
by the Wikipedia search engine.

Answer these questions:

How many keywords must be found?
How many pages must be retrieved?
What is the meaning of M?

How many times did you have to read the text in order to give correct answers? Your mileage may vary, but my guess is - "too many". It won't be surprising if you pressed Alt+Tab several times while thinking about it.

Typically, technical text needs to be read multiple times, but some text takes less passes than other text - we should strive towards minimizing this number. This saves time, reduces frustration and enhances the fun element.

When you read a text that you want to understand (i.e. rote learning is out of the question), it needs to be divided into logical elements that have to be linked to each other. As a result, you are building a mind-map that depicts the relationships between all the elements. If some nodes or relationships are missing - it means that comprehension is below 100%.

Well written text is easy to parse because the mind-map can be constructed gradually - by adding something to existing nodes. This means that you will not need to re-arrange the nodes in the mindmap, or alter the relationships by removing them or redirecting them to other nodes.

Example of a mindmap that grows by altering or deleting previously created nodes

If you end up with a loose node, it means you have a piece of information that doesn't stick to the context. If it was mentioned in the text, you expect that there has to be a connection, but the fact that you cannot grasp it leads to frustration and forces you to read again (or press Alt+Tab, which is easier and more rewarding).

Sporadic growth is another bad sign - it happens when the mind-map grows in multiple directions at the same time, thus your focus jumps from one node to another frequently.

Example of a mindmap that grows in different directions

Think about M. You know that (1) it is a number, (2) smaller than N, that it (3) defaults to N. But what does it mean?

This is very similar to programming. The best practices are to declare a variable close to the place where you actually use it. As soon as you're done, you can get it out of your mind and focus on something else.
It is worth pointing out that some languages (Pascal comes to mind) force you to declare all the variables in the beginning of the function. This has a negative effect on the programmer, because she has to keep that variable in her head for a long time as a "loose node", a connection can be made only when it can be seen where the variable is used.
Languages such as C or Python allow variables to be created on the spot, so you don't have to have them in your mind-map until they are necessary. From this perspective, C is even better - you can forget about a variable as soon as you've closed the brace, whereas in Python you can never be sure whether you're done with it or not, until you reach the end of the function.
While we're at it - this is one of the reasons global variables make things complicated - they force you keep things in mind all the time. Keeping track of the program's state becomes difficult.

So, which parts of the assignment can be improved?

variables are introduced without an explanation of their role, example: M and N;
the logic that describes the meaning of a variable is scattered around several sections;

It took me a lot of time to wrap my brain around the text and determine the essence of M. Rewrite attempts were quite frustrating - it feels just like maintaining someone else's code. You'd rather write it from scratch, if only you knew the requirements...

Here's another interesting angle on the same problem - check out rule#4 in Raymond Chen's article. Those toy-makers are guilty of "sporadic growth" - they force you to go back and add nodes to a part of a mind-map that should have been "finalized" a few steps ago.

Here is what the text looks like after tweaking:

1 or N keywords to search for;

a number M, less than N (if not specified, M is the same as N);

a boolean value InOrder (False by default);

The application should produce links to at most the first K pages matching the search criterion (K may be hard-coded, but had better be a configurable value). The following conditions apply ....

Another important factor is choosing variable names. I will elaborate on this in a follow up story, until then, here are the raw excerpts from an email with comments about the first version of the assignment:

I would also change the variable names to something more
descriptive. When I read M and N, I don't know what they mean. I learn
about N in the next line (where you introduce M), and I only
understand what M means when you talk about "M matches out of N
keywords". M is declared at the 30th% of the text, M is explained at
the 70th% of the text. This means that for a long time, the reader is
clueless about the role of M. Once they do find out, they better read
the text again, this time knowing what M is supposed to mean - then
interpreting the text and verifying if it makes sense.

When I read some text, I build a mindmap in my.. mind :-) Formulations
such as the one you've used force me to extend the mindmap by creating
a new node, but it has no connections to other entities. Is it a
child? A parent? A sibling? Which other nodes is it connected to? The
longer the node is 'alone', the more frustrating it is. The more nodes
of this kind you have, the more difficult it is to comprehend the
text.

If you used other names, for example matchCount - I could figure it
out almost without reading the text. Only now, after analyzing it, I
think you used M because it stands for 'match'; but if that were the
case, you'd probably use P for "pages" ["the first K pages..."].

Varnames such as N, M and K really remind me of my math days, that's
quite traumatic :-)

The version of the assignment that went into production can be found on Inforail.

It is difficult to write an ideal text that parsers will find irresistibly delicious. However, you'll be better off if you try to optimize these:

Avoid sporadic growth, by growing in the same direction
Avoid loose nodes
Update by adding, avoid deletions or and re-arranging

Some teachers do this naturally, others talk and write in ways that are difficult to understand. My guess is that if we analyze lecture transcripts and convert them into evolving mindmaps (i.e. we can visualize their growth), we will see that great lectures comply with the suggestions above, while the not-so-great ones tend to be closer to the other end of the spectrum.

Notes:

For an in-depth analysis of writing proper software requirements, check out Karl Wieger's Software requirements. Most of his suggestions apply to requirements in general, whether they are about planning a camping trip, software, or your future.
Yes, I am aware of the fact that it depends on the compiler, and that things may have changed in Pascal since I last touched it.
It turns out that M did not stand for "match", it stood for "M".
Here's how Jef Raskin squeezed one of the key ideas of this article into a footnote in "The humane interface".

No feedback yet

Form is loading...