Northern
Lighthouse

Feedback

© Northern Lighthouse Ltd   -   Last updated 4 Apr 2008

Introduction to BUFR

Disclaimer

The reader is referred to WMO official documentation as the source of definitions of terms related to BUFR and CREX. The information about BUFR and CREX in this file is based on our interpretation of WMO documents, and in case of discrepancies, WMO documentation should be considered the valid one.

The aim of this introduction is to give a short description, using simple words, for those of you who are not familiar with BUFR. We would like to be able to show that, whatever you have been told, BUFR is understandable. If you like our introduction, or if you do not like it, or if you have suggestions about how it could be improved, please send us your comments.

Use our Glossary if you hit an unfamiliar term

Contents


1. BUFR background

BUFR stands for Binary Universal Form for the Representation of meteorological data.

1.1 Mess of messages

The meteorological world is full of information exchange. All kinds of weather messages are continuously created, transmitted and stored all over the globe. These messages have very different contents: manual observations on the land and on the oceans, upper air soundings, satellite and radar observations, observations from aeroplanes, drifting buoys, constant level balloons, automatic weather stations etc.

All these different observation reports contain different kind of data and traditionally there has been a huge variety of different ways to code them. Too many different ways!

1.2 Simplifying solution

BUFR was developed to unify the way of coding weather information for transmission and/or storing. That is what the middle letters stand for: Universal Form! A single form to handle all kinds of meteorological data! What a blessing for programmers working in modern computerised meteorological centres: no need to code and maintain dozens of different coding packages, a single software package can handle all kinds of meteorological messages.

Although BUFR was initially defined for use in meteorology, it can also be used to code e.g. oceanographic or hydrological data. There is nothing in BUFR that limits its use to meteorological data.

1.3 Binary

The first letter B stands for Binary. It means that BUFR messages cannot be read or written by humans, and we need a computer program for encoding and decoding our data. Today this is not a major problem as most communication between meteorological centres goes through communication computers. But in some parts of the WWW (World Weather Watch) network it is still a problem and a new coding method, called CREX, has been defined for that reason.

On the other hand, most BUFR-coding software packages provide tools for turning any BUFR message into human readable listings, which are far more readable than any of the messages coded according to old character-based codes. One had to be an expert on meteorological codes to know what all the old character coded messages meant.

1.4 Table driven

BUFR code is table driven, i.e. a lot of BUFR coding logic is stored in external tables and only a part of the required logic is hardcoded into computer programs.

There is no need for a general BUFR-coding library to know about the data contents of messages, whether it is temperatures or winds or something else. BUFR messages contain metadata that tell what the actual data are and how they should be interpreted.

For the kernel coding library, a BUFR message is just a long stream of bits with a certain predefined, but not constant, structure. Using the metadata and external tables the coding library is able to find out what parameters are found in the message and how to decode their values.

Of course, application programs that use or create BUFR messages need to know about the meteorological contents. But an application program does not need to know about the stream of bits. It just uses the API (Application Program Interface) of a BUFR library, and lets the library do all the difficult work.

1.5 Versatile

The mechanism of using external tables makes BUFR very versatile. Although BUFR was designed for meteorological data, a universal BUFR coding library can be used for other sciences, or disciplines, as they are called in BUFR terminology, without any changes in the library itself. Of course, in order to use BUFR in other discipline one would need tables appropriate for that discipline, and application programs would be different.

1.6 Descriptors

The main BUFR table, called Table B, contains descriptors for all parameters used within a field of science, originally within meteorology. A descriptor describes the name and the unit of a parameter and also how its value is to be packed into a bit representation (see examples in 2.4).

These descriptors are keys to the parameter values within a BUFR message. As a user you do not need to know anything about bits nor bytes, but you must know the descriptor numbers for the parameters you are interested in. Those numbers can be found in Table B.


2. BUFR coding principles

In this chapter we discuss the basics of the BUFR coding mechanism.

2.1 Values in a bitstream

In a BUFR message data values are stored as a continuous bit stream of coded positive integer values. Each data value occupies an optimum number of bits, depending on the precision and range of the data parameter.

The number of bits each data value occupies, i.e. its width, and the way to interpret those bits is defined in coding rules stored in Table B.

The location of the bits occupied by an individual data value in this stream depends on the widths of all preceding data values.

2.2 External coding rules

Most coding rules are stored in external BUFR tables (files). To encode or decode BUFR messages the coding software needs to have access to the table files.

A key to all information stored in BUFR tables B, C and D are the descriptors.

2.3 Descriptors: links between coding rules and the bitstream

In addition to the bitstream of data values, a BUFR message contains also a sequence of descriptors needed to interpret the contents of the data bitstream.

Descriptor values (or "names") are composed of 3 numbers F, XX and YYY. Within Cipher documentation, descriptors are written either as a 6-digit number FXXYYY or, for readability, as a 8-character string F'XX'YYY.

As an example, element descriptor 0'12'004 describes 2 meter temperature and sequence descriptor 3'01'011 is a shorthand notation for date (year + month + day).

Descriptors are explained more in depth in the next chapter.

2.4 Table B entry examples

2.4.1 Temperature

BUFR Table B (in the format used in Cipher documentation) contains the following entry for descriptor 0'12'004 :

   012004 Dry-bulb temperature at 2m
   K
   1 0 12

The first number is the descriptor as an integer value, followed by the element name. The second line contains the unit of the element (K for Kelvin).

The third line contains the coding rules. The first value in this line is the scale (1 in this example), the second one is the reference value (=0) and the third one is the data width (=12 bits). The relation between the coded value and the actual value is given by the formula

   coded_value = original_value * 10^scale - reference_value
Scale is used to multiply decimal numbers into integer values (scale > 0) or to reduce precision of large values (scale < 0). Thus, in a way, scale tells the length of the decimal fraction used.

Reference value is used to ensure that the encoded value is always positive.

Reference value (together with scale) defines the smallest possible value for the parameter, while data width (together with scale and reference value) defines the largest possible value.

Thus 2m temperature is stored with one decimal fraction, the minimum value is 0.0K and the maximum value is 409.4K. Note that the highest value 409.5K is not available because a field with all bits as 1's is reserved to code a missing value.

2.4.2 Pressure

BUFR Table B contains the following entry for descriptor 0'10'051:

   010051 Pressure reduced to mean sea level
   Pa
   -1 0 14
i.e. unit is Pascal (Pa), stored value discards the last digit (-1), and the smallest possible value is 0. Because 14 bits are used in the data bit stream, the maximum value is 163820 Pa (i.e. 1638.2 hPa).

2.4.3 Longitude

Descriptor 0'06'002 describes longitude:

   006002 Longitude (coarse accuracy)
   Degree
   2 -18000 16
i.e. longitude is given in degrees, stored value retains 2 decimals (2), the smallest possible value is -180.00, and 16 bits are used in the data bit stream setting the maximum value to 475.34. This allows us to use the range -180.00 to +180.00, set by WMO rules.

2.5 Simple message example

This is the description part of of a simplified message which contains only temperature pressure, location and time:

   005002  Latitude (coarse accuracy)
   006002  Longitude (coarse accuracy)
   004003  Day
   004004  Hour
   004005  Minute
   012004  2 meter temperature
   010051  Pressure reduced to msl
Note that in real messages we would do better by using sequence descriptors for location and time (see 3.3).

3. BUFR descriptors

In this chapter we discuss BUFR descriptors more thoroughly.

3.1 Descriptor format

Within Cipher documentation descriptors are written as 8 character strings F'XX'YYY (containing 1+2+3=6 digits), where:

The digit F divides descriptors into 4 types:

Classes are used for grouping similar parameters together. For instance, XX=12 is reserved for different temperature parameters.

3.2 Element descriptors (F=0)

These are the basic key descriptors between data values and coded values. Element descriptor decoding rules are stored in BUFR Table B.

3.3 Sequence descriptors (F=3)

These are shorthand notations for a sequence of descriptors. BUFR Table D contains rules that describe how a single sequence descriptor is expanded into a sequence of descriptors, which again may contain other sequence descriptors which must be expanded into a sequence of descriptors... This continues until the final sequence contains only element and operator descriptors.

As an example, we could describe the same structure of the example in 2.5 by using sequence descriptor 3'01'025. According to table D, sequence descriptor 3'01'025 finally expands into the following element descriptors:

   005002  Latitude (coarse accuracy)	
   006002  Longitude (coarse accuracy)	
   004003  Day 
   004004  Hour	
   004005  Minute

So we could describe the structure of the example in 2.5 by using:

   301025  (lat + lon + day + hour + minute)
   012004  2 meter temperature
   010051  pressure reduced to msl

3.4 Replication descriptors (F=1)

These are shorthand notations for replicating a sequence of descriptors. Descriptor 1'XX'YYY indicates that the next XX descriptors will be repeated YYY times in the expanded descriptor sequence. YYY is called replication count.

If YYY=0 then the replication count is coded into the data section of the message. This is called delayed replication.

3.5 Operator descriptors (F=2)

These are special descriptors used to alter normal coding rules or to add some extra information e.g. quality control information.


4. BUFR message structure

In this chapter we discuss the physical structure of a BUFR message.

You can use Cipher BUFRtool utility program (downloadable from Northern Lighthouse website) to study structures of BUFR messages.

4.1 Message sections

A BUFR message starts with characters "BUFR", ends with characters "7777" and is binary, i.e. non-human-readable, in between.

For a computer program this binary part is most readable: it has a predefined structure of separate sections. Each section starts with length information, making it possible to find the start of the next section which starts with length information, making it possible to find... I think you got the idea ;-)

You can use the utility program Cipher BUFRtool (task msgexam) to print out the different sections of a BUFR message.

4.2 Multisubset message

If several observation reports share the same structure, i.e. they have the same descriptors in section 3, then it is possible to build a message that has the descriptors given only once but with a data section containing several observation reports. In this case each report is said to be a subset of the whole message.

4.3 Compressed multisubset message

If all the observation reports share exactly the same message structure (i.e. no delayed replication) then it is possible to compress them, i.e. use another type of coding rules. Compression saves even more space and makes the decoding more efficient.


5. Local BUFR extensions

Data processing centres can expand BUFR capabilities by defining their own local extensions. In this chapter we discuss these extensions.

5.1 Local section 2

Section 2 is optional, and if it exists, it is always locally defined.

Data processing centres can define their own section of metadata as section 2, for local use. For instance, a centre may want to include metadata relevant for its archive system, e.g. database keys that help to make the data retrieval more efficient.

Section 2 is meant for local use; if messages are intended for external distribution, they should either not have a section 2, or if they do, the section 2 should contain only information for local use at the emissor centre.

5.2 Local extension of section 1

Section 1 can be extended for local use. Up to BUFR edition 3, the first 17 octets are defined by WMO, but octets 18 onwards can be used to store local extra metadata. In BUFR edition 4, the first 22 octets are defined by WMO, and octets from 23 onwards can be used for local metadata if needed.

5.3 Local descriptors

The descriptors defined by WMO cater for a large range of parameters. However, it may happen that a center needs to encode data for which no suitable descriptors have been defined by WMO yet. Centers can define their own descriptors for their own needs. Local element descriptors are stored in a local table B and local sequence descriptors in a local table D.

Element and sequence descriptor classes XX=0,..,.47 and entries YYY=0,...,191 in those classes are reserved for WMO definition. The rest can be freely defined and used locally.

If messages containing local descriptors are distributed externally, recipients need to have access to the local tables used to create those messages, in order to be able to decode them. For this reason, when messages are intended for external distribution, the use of local element descriptors is not recommended, unless there is no standard alternative. The section below describes a method that allows the decoding of the standard part of a message containing non-standard element descriptors. However, the use of local sequence descriptors prevents the use of that method, and therefore the use of local sequence descriptors in general is not recommended.

5.4 Safeguarding local descriptors

As noted in 5.3, messages intended for external distribution should not contain local descriptors, if there is a standard alternative. However, some times there is no standard alternative. In these cases, if recipient centers do not have access to the local tables used to produce the messages, they cannot successfully decode the whole message. But there is a way, described below, to allow the recipients to decode at least the standard part of the message, even if they do not have the local tables used to produce the message.

It is possible to safeguard the non-local part of the message by preceding each local descriptor with a special operator descriptor 2'06'YYY. This operator indicates that the data value corresponding to the following local descriptor occupies YYY bits in the data section. This information allows the decoding program to skip the correct number of bits, and then continue with the rest of the message.

Here is a simplified and unusual ;-) example taken from file observer.bfr in Cipher BUFRtool. The last parameter 0'12'004 can always be decoded even if a site does not have the required local Table B to decode shoe size, mood and body temperature:

   3'01'011   expands into date
   2'06'004   local field of 4 bits follows
   0'02'202   Shoe size
   2'06'003   local field of 3 bits follows
   0'20'202   Mood
   2'06'006   local field of 6 bits follows
   0'12'199   Body temperature
   0'12'004   Dry-bulb temperature at 2 m

5.5 Example BUFR files

BUFR files observer.bfr and sauna.bfr contain examples of local descriptor usage. Use Cipher BUFRtool utility program to examine these demo files. Local Table B used in these examples is called B_v1d0s0.255 and can be found in Cipher BUFRtool utility program subdirectory tables.


Northern Lighthouse home Top of page