HL7 Messages and Base64 Encoding

HL7 standards are widely used across the world to exchange clinical data. Rules for transmitting data are based on the use of readable ASCII characters. Messages can therefore be easily read. Each message comprises a sequence of segments. The first segment of a message always begins with 3 ASCII characters, the letters ‘MSH’ for Message Header. Other segment types found in a message can begin for example with ‘PID’ for Patient Identification. Elements within a segment are contained within symbols such as the pipe ‘|’ used to separate the fields, the caret ‘^’ used to separate the components, the ampersand ‘&’ to separate subcomponents and the tilde ‘~’ to indicate a field repetition.

An HL7 message allows you to insert binary files, such as images, or readable text, such as RTF encoded text, for example.

The important thing to remember here is that the HL7 protocol has been developed using readable text.


When we have a standard, such as HL7, that uses a text or ASCII based protocol, we run into the problem that when the information is encoded, the system inserts a character, which the parser recognizes as a control character. For example, if we want to use the ampersand character (&), in a message, such as “chest XR PA&LAT” or “Gilbert & Son”, the ‘&’ character must be converted, because within the HL7 standard it is a control character.

The problem is exacerbated when we have a whole document converted as an RTF file. The RTF format consists of ASCII readable characters and ASCII control words, such as <CR> for carriage return or <LF> for line feed.  

The <CR> that you find in the RTF file is also used in HL7 to mark the end of a segment. In reports, the tilde (~) is used to express an approximation and it is also a control character in HL7.

We can see that serious errors can occur without careful attention. And although the parser has the capability of recognizing characters it does not possess the ability to think.


There are two solutions to this problem. The first is to replace the characters that we want to use in a message – and that are used by HL7 as control characters – by escape sequences. For example, if we want to insert “chest XR PA&LAT”, the sequence will be “chest XR PA\T\LAT”.

The second would be, when we have an RTF file to insert, to convert or encode the text to BASE64, which will consequently remove all special characters from the ASCII sequence of the RTF file. Conversion to BASE64 also enables you to insert binary files, such as images or WAV or MP3 sound files, among others, into the data file and ensures the preservation of the integrity of the data to be transferred.

Here is an example of a BASE64 encoded string, which represents an RTF report:

Here is an example of a BASE64 encoded string, which represents an RTF report:

Note of Caution

If, while analysing the content of a message, the parser encounters a character that it recognizes as a control character, it will reject the message. This might be difficult to detect.

The HL7 is an exchange standard for the transfer of important health information and results between two applications or systems that must share a patient’s health data. It goes without saying that the exchange processes cannot alter, in any way, pieces of data included in a report. Parties who play a role in the data exchange must therefore recognize their obligations and ensure that the reports they send are properly encoded. At this time, and in related cases, Imagem must parse RTF reports to extract conflicting characters with HL7. Since this involves a structural alteration that may lead to erroneous reports, Imagem shall not be liable in such cases. Imagem encourages managers and decision makers to ask their suppliers to correct this deficiency that Imagem considers critical. Imagem wouldn’t tolerate this bad practice, which it considers an error that needs to be corrected.