© Northern Lighthouse Ltd - Last updated 31 Aug 2011
The reader is referred to WMO official documentation as the source of definitions of terms related to BUFR and CREX. The information about BUFR and CREX in this file is based on our interpretation of WMO documents, and in case of discrepancies, WMO documentation should be considered the valid one.
The aim of this introduction is to give a short description, using simple words, for those of you who are not familiar with BUFR. We would like to be able to show that, whatever you have been told, BUFR is understandable. If you like our introduction, or if you do not like it, or if you have suggestions about how it could be improved, please send us your comments.
Use our Glossary if you hit an unfamiliar term.
BUFR stands for Binary Universal Form for the Representation of meteorological data.
The meteorological world is full of information exchange. All kinds of weather messages are continuously created, transmitted and stored all over the globe. These messages have very different contents: manual observations on the land and on the oceans, upper air soundings, satellite and radar observations, observations from aeroplanes, drifting buoys, constant level balloons, automatic weather stations etc.
All these different observation reports contain different kind of data and traditionally there has been a huge variety of different ways to code them. Too many different ways!
BUFR was developed to unify the way of coding weather information for transmission and/or storing. That is what the middle letters stand for:
Although BUFR was initially defined for use in meteorology, it can also be used to code e.g. oceanographic or hydrological data. There is nothing in BUFR that limits its use to meteorological data.
The first letter B stands for Binary. It means that BUFR messages cannot be read or written by humans, and we need a computer program for encoding and decoding our data. Today this is not a major problem as most communication between meteorological centres goes through communication computers. But in some parts of the WWW (World Weather Watch) network it is still a problem and a new coding method, called CREX, has been defined for that reason.
On the other hand, most BUFR-coding software packages provide tools for turning any BUFR message into human readable listings, which are far more readable than any of the messages coded according to old character-based codes. One had to be an expert on meteorological codes to know what all the old character coded messages meant.
BUFR code is
There is no need for a general BUFR-coding library to know about the data contents of messages, whether it is temperatures or winds or something else. BUFR messages contain metadata that tell what the actual data are and how they should be interpreted.
For the kernel coding library, a BUFR message is just a long stream of bits with a certain predefined, but not constant, structure. Using the metadata and external tables the coding library is able to find out what parameters are found in the message and how to decode their values.
Of course, application programs that use or create BUFR messages need to know about the meteorological contents. But an application program does not need to know about the stream of bits. It just uses the API (Application Program Interface) of a BUFR library, and lets the library do all the difficult work.
The mechanism of using external tables makes BUFR very versatile. Although BUFR was
designed for meteorological data, a universal BUFR coding library can be used for other
sciences, or
The main BUFR table, called Table B, contains
These descriptors are keys to the parameter values within a BUFR message. As a user you do
not need to know anything about bits nor bytes, but you must know the descriptor numbers for
the parameters you are interested in. Those numbers can be found in Table B.
In this chapter we discuss the basics of the BUFR coding mechanism.
In a BUFR message data values are stored as a continuous bit stream of coded positive integer values. Each data value occupies an optimum number of bits, depending on the precision and range of the data parameter.
The number of bits each data value occupies, i.e. its
The location of the bits occupied by an individual data value in this stream depends on the widths of all preceding data values.
Most coding rules are stored in external BUFR tables (files). To encode or decode BUFR messages the coding software needs to have access to the table files.
A key to all information stored in BUFR tables B, C and D are the
In addition to the bitstream of data values, a BUFR message contains also a sequence of
Descriptor values (or "names") are composed of 3 numbers
As an example, element descriptor
Descriptors are explained more in depth in the next chapter.
BUFR Table B (in the format used in Cipher documentation) contains the following entry
for descriptor
012004 Dry-bulb temperature at 2m
K
1 0 12
The first number is the descriptor as an integer value, followed by the element name.
The second line contains the unit of the element (
The third line contains the coding rules. The first value in this line is the
coded_value = original_value * 10^scale - reference_value
Scale is used to multiply decimal numbers into integer values (scale > 0) or to reduce
precision of large values (scale < 0). Thus, in a way, scale tells the length of
the decimal fraction used.
Thus 2m temperature is stored with one decimal fraction, the minimum value is 0.0 K and the maximum value is 409.4 K. Note that the highest value 409.5 K is not available because a field with all bits as 1's is reserved to code a missing value.
BUFR Table B contains the following entry for descriptor
010051 Pressure reduced to mean sea level
Pa
-1 0 14
Descriptor
006002 Longitude (coarse accuracy)
Degree
2 -18000 16
This is the description part of of a simplified message which contains only temperature pressure, location and time:
005002 Latitude (coarse accuracy)
006002 Longitude (coarse accuracy)
004003 Day
004004 Hour
004005 Minute
012004 2 meter temperature
010051 Pressure reduced to msl
Note that in real messages we would do better by using sequence descriptors for location and time
(see 3.3).
In this chapter we discuss BUFR descriptors more thoroughly.
Within Cipher documentation descriptors are written as 8 character strings
The digit
Classes are used for grouping similar parameters together. For instance,
These are the basic key descriptors between data values and coded values. Element descriptor decoding rules are stored in BUFR Table B.
These are shorthand notations for a sequence of descriptors. BUFR Table D contains rules that describe how a single sequence descriptor is expanded into a sequence of descriptors, which again may contain other sequence descriptors which must be expanded into a sequence of descriptors... This continues until the final sequence contains only element and operator descriptors.
As an example, we could describe the same structure of the example in 2.5 by using sequence
descriptor
005002 Latitude (coarse accuracy)
006002 Longitude (coarse accuracy)
004003 Day
004004 Hour
004005 Minute
So we could describe the structure of the example in 2.5 by using:
301025 (lat + lon + day + hour + minute)
012004 2 meter temperature
010051 pressure reduced to msl
These are shorthand notations for replicating a sequence of descriptors.
Descriptor
If
These are special descriptors used to alter normal coding rules or to add some
extra information e.g. quality control information.
In this chapter we discuss the physical structure of a BUFR message.
You can use Cipher BUFRtool utility program (downloadable from Northern Lighthouse website) to study structures of BUFR messages.
A BUFR message starts with characters "
For a computer program this binary part is most readable: it has a predefined structure of separate sections. Each section starts with length information, making it possible to find the start of the next section which starts with length information, making it possible to find... I think you got the idea ;-)
You can use the utility program Cipher BUFRtool (task
If several observation reports share the same structure, i.e. they have the same descriptors
in section 3, then it is possible to build a message that has the descriptors given only once but
with a data section containing several observation reports. In this case each report is said to
be a
If all the observation reports share exactly the same message structure (i.e. no delayed
replication) then it is possible to
Data processing centres can expand BUFR capabilities by defining their own local extensions. In this chapter we discuss these extensions.
Section 2 is optional, and if it exists, it is always locally defined.
Data processing centres can define their own section of metadata as section 2, for local use. For instance, a centre may want to include metadata relevant for its archive system, e.g. database keys that help to make the data retrieval more efficient.
Section 2 is meant for local use; if messages are intended for external distribution, they should either not have a section 2, or if they do, the section 2 should contain only information for local use at the emissor centre.
Section 1 can be extended for local use. Up to BUFR edition 3, the first 17 octets are defined by WMO, but octets 18 onwards can be used to store local extra metadata. In BUFR edition 4, the first 22 octets are defined by WMO, and octets from 23 onwards can be used for local metadata if needed.
The descriptors defined by WMO cater for a large range of parameters. However, it may happen that a center needs to encode data for which no suitable descriptors have been defined by WMO yet. Centers can define their own descriptors for their own needs. Local element descriptors are stored in a local table B and local sequence descriptors in a local table D.
Element and sequence descriptor classes
If messages containing local descriptors are distributed externally, recipients need to have access to the local tables used to create those messages, in order to be able to decode them. For this reason, when messages are intended for external distribution, the use of local element descriptors is not recommended, unless there is no standard alternative. The section below describes a method that allows the decoding of the standard part of a message containing non-standard element descriptors. However, the use of local sequence descriptors prevents the use of that method, and therefore the use of local sequence descriptors in general is not recommended.
As noted in 5.3, messages intended for external distribution should not contain local descriptors, if there is a standard alternative. However, some times there is no standard alternative. In these cases, if recipient centers do not have access to the local tables used to produce the messages, they cannot successfully decode the whole message. But there is a way, described below, to allow the recipients to decode at least the standard part of the message, even if they do not have the local tables used to produce the message.
It is possible to safeguard the non-local part of the message by preceding each local descriptor
with a special operator descriptor
Here is a simplified and unusual ;-) example taken from file
3'01'011 expands into date
2'06'004 local field of 4 bits follows
0'02'202 Shoe size
2'06'003 local field of 3 bits follows
0'20'202 Mood
2'06'006 local field of 6 bits follows
0'12'199 Body temperature
0'12'004 Dry-bulb temperature at 2 m
BUFR files
© Northern Lighthouse Ltd - Last updated 31 Aug 2011
| | |