Right now I'm fixing some code, originally written by someone long departed from our development group, which decodes encoded words in email message headers. Encoded words makes it possible to include non-ASCII text in headers. This sample contains two encoded words:
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
The three parts of an encoded word are character set, encoding (base64 or printed quotable), and encoded text, defined in RFC 2047. The RFC defines various restrictions on how you can use encoded words, for example:
- An 'encoded word' may not be more that 75 characters long.
- ... unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an 'encoded word'.
- ... an 'encoded word' that appears in a header field defined as '*text' MUST be separated from any adjacent 'encoded word' or 'text' by 'linear-white-space'.
- A multi-octet character may not be split across adjacent 'encoded word's.
And so on. Unfortunately many mail agents do not use encoded words correctly even though the RFC is reasonably clear. However we can't reject invalid encoded words because end-users would blame our software, not the originating mail agents, if they hear garbled message subjects being voiced via text-to-speech. So although we always use encoded words correctly when sending messages we accept incorrect usage of encoded words when decoding them. You might think that this is an application of the often quoted Postel Robustness Principle, as described in RFC 793:
2.10. Robustness Principle
TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.
This statement is based upon a terrible misunderstand of Postel's robustness principle. I knew Jon Postel. He was quite unhappy with how his robustness principle was abused to cover up non-compliant behavior, and to criticize compliant software.
Jon's principle could perhaps be more accurately stated as "in general, only a subset of a protocol is actually used in real life. So, you should be conservative and only generate that subset. However, you should also be liberal and accept everything that the protocol permits, even if it appears that nobody will ever use it."
So the next time that someone uses the Robustness Principle to argue that your code should accept any old rubbish as input, you can point out what Postel really meant.