|
Northern Lighthouse |
© Northern Lighthouse Ltd - Last updated 4 Apr 2008
| Introduction to CREX |
The reader is referred to WMO official documentation as the source of definitions of terms related to BUFR and CREX. The information about BUFR and CREX in this file is based on our interpretation of WMO documents, and in case of discrepancies, WMO documentation should be considered the valid one.
The aim of this introduction is to give a short description, using simple words, for those of you who are not familiar with CREX. If you like our introduction, or if you do not like it, or if you have suggestions about how it could be improved, please send us your comments.
Use our Glossary if you hit an unfamiliar term.
| 1. CREX background |
CREX stands for Character form for the Representation and EXchange of data.
The meteorological world is full of information exchange. All kinds of weather messages are continuously created, transmitted and stored all over the globe. These messages have very different contents: manual observations on the land and on the oceans, upper air soundings, satellite and radar observations, observations from aeroplanes, drifting buoys, constant level balloons, automatic weather stations etc.
All these different observation reports contain different kind of data and traditionally there has been a huge variety of different ways to code them. Too many different ways!
During the late 1980's a general table driven coding method, BUFR, was developed to unify the way of coding weather information for transmission and/or storing. In BUFR the two middle letters stand for Universal Form! A single form to handle all kinds of meteorological data! What a blessing for programmers working in modern computerised meteorological centres: no need to code and maintain dozens of different coding packages, a single software package can handle all kinds of meteorological messages.
But the meteorological world was slow to accept BUFR, partly because it was binary and thus not human readable as such. In some parts of the WWW (World Weather Watch) network, the transmission of binary messages is still a problem and CREX was defined to complement BUFR in areas where BUFR was not feasible.
CREX was developed during the late 1990's. CREX is, like BUFR, a TDCF (table driven code form), and it is in many ways similar to BUFR. However, unlike BUFR, CREX is character-based. The first letter in CREX stands for Character. It means that CREX messages can be read by trained humans, in the same way as the traditional character-based codes (e.g. SYNOP, TEMP,...).
CREX code is table driven, i.e. a lot of the coding logic is stored in external tables and only a part of the required logic is hardcoded into computer programs.
There is no need for a general CREX-coding library to know about the data contents of messages, whether it is temperatures or winds or something else. CREX messages contain metadata that tell what the actual data are and how they should be interpreted.
For the kernel coding library, a CREX message is just a stream of characters with a certain predefined, but not constant, structure. Using the metadata and external tables the coding library is able to find out what parameters are found in the message and how to decode their values.
Of course, application programs that use or create CREX messages need to know about the meteorological contents. But an application program does not need to know about the details of how the data values are encoded; it uses the API (Application Program Interface) of a CREX library, and lets the library do all the difficult work.
The mechanism of using external tables makes CREX and BUFR very versatile. Although CREX was designed for meteorological data, the same CREX coding libraries can be used for other sciences, or disciplines as they are called in CREX and BUFR terminology, without any changes in the library itself. Of course, to use CREX in other discipline one needs tables appropriate for that discipline, and application programs would be different.
The main CREX table, called Table B, contains element descriptors for all parameters used within a field of science, originally within meteorology. One descriptor describes the name and the unit of one parameter and also how its value is to be coded (see examples in 2.4).
These descriptors are keys to the parameter values within a CREX message. Developers of applications that handle CREX messages need to know the descriptors corresponding to the parameters in which they are interested.
| 2. CREX coding principles |
In this chapter we discuss the basics of the CREX coding mechanism:
In a CREX message, numeric data values are stored as integer values. Each data value occupies a predefined number of digits, depending on the precision and range of the data parameter. Text data occupies a predefined number of characters. Data values are separated by one or more spaces.
The number of characters or digits in each data value and the way to interpret the digits is defined in coding rules stored in Table B.
Most coding rules are stored in external CREX tables (files). To encode or decode CREX messages the coding software needs to have access to the table files.
- Table A is used to classify message types.
- Table B contains information on how different meteorological parameters are encoded into positive integers in a message (vice versa for decoding) plus their names and units.
- Table C contains special coding rules which need to be hardcoded into CREX software.
- Table D contains shorthand notations to shorten the description of long message structures. Only some of the CREX sequences are defined in CREX Table D. Most of CREX sequences are actually defined in BUFR Table D, which is common to BUFR and CREX.
- Flag & Code Table describes translations of coded numeric values into original non-numeric values. This table is shared with BUFR.
A key to all information stored in CREX tables B, C and D are the descriptors.
In addition to the character stream of data values, a CREX message contains also a sequence of descriptors needed to interpret the contents of the data character stream.
Descriptor values (or "names") are composed of three values F, XX and YYY, where XX consists of two digits, YYY consists of three digits, and F is one letter (B, C, D or R). Within Cipher documentation, CREX descriptors are written either as a string of six alphanumeric characters FXXYYY or, for readability, as an 8-character string F'XX'YYY.
As an example, element descriptor B'12'004 describes 2-meter temperature and sequence descriptor D'01'011 is a shorthand notation for date (year+month+day).
Descriptors are explained more in depth in the next chapter.
CREX Table B (in the format used in Cipher documentation) contains the following entry for descriptor B12004:
B12004 Dry-bulb temperature at 2m °C 1 3The first number is the descriptor name, followed by the parameter name. The second line contains the unit of the parameter (°C for degrees Celsius).
The third line contains the coding rules. Values on the line are for scale (1 in this example), and data width (3 digits in this example). The relation between the coded value and the actual value is given by the formula
coded_value = original_value * 10^scaleScale is used to multiply decimal numbers into integer values (scale > 0) or to reduce precision of large values (scale < 0). Thus, in a way, scale tells the length of the decimal fraction used.Data width (together with scale) defines the available possible range. Leading zeros are used to fill small values. Positive values are written without sign, and negative values are represented with a minus sign preceding the digits (the minus sign is not counted into width). As an example, value +23.4°C is coded as 234, and -1.2°C as -012.
Missing values are coded by filling all character positions with the character "/". For instance, a missing temperature would be written as ///.
CREX Table B contains the following entry for descriptor B10051:
B10051 Pressure reduced to mean sea level Pa -1 5i.e. unit is Pascal (Pa), stored value discards the last digit (-1). Five digits are used to represent a value. For instance, 09876 represents 98760 Pa, i.e. 987.6 hPa, and a missing value is written as /////.
Descriptor B06002 describes longitude:
B06002 Longitude (coarse accuracy) Degree 2 5i.e. longitude is given in degrees, the stored value retains two decimals, and five digits are used to represent the value. As an example, 123.4567°E would be coded as 12346, and 270°E should be coded as -09000 (longitude range is restricted to -180°...+180° by WMO).
This is the description part of of a simplified message which contains only temperature, pressure, location and time:
B05002 Latitude (coarse accuracy) B06002 Longitude (coarse accuracy) B04003 Day B04004 Hour B04005 Minute B12004 2 meter temperature B10051 Pressure reduced to mslNote that in real messages we would do better by using sequence descriptors for location and time (see 3.3).
| 3. CREX descriptors |
In this chapter we discuss CREX descriptors more thoroughly.
Descriptors are written as 6 character strings FXXYYY, where:
- F (one letter) indicates the type of the descriptor (B, C, D or R)
- XX (two digits) indicate a class within a type
- YYY (three digits) indicate an entry within a class
The character F divides descriptors into 4 types:
- element descriptors (F=B)
- replication descriptors (F=R)
- operator descriptors (F=C)
- sequence descriptors (F=D)
Classes are used for grouping similar parameters together. For instance, XX=12 is reserved for different temperature parameters.
These are the basic key descriptors between data values and coded values. Element descriptor decoding rules are stored in CREX Table B.
These are shorthand notations for sequences of descriptors. CREX Table D and BUFR Table D contain rules that describe how a single sequence descriptor is expanded into a sequence of descriptors, which again may contain other sequence descriptors which must be expanded into a sequence of descriptors... This continues until the final sequence contains only element and operator descriptors.
As an example, we could describe the same structure of the example in 2.5 by using sequence descriptor D01025. According to table D, sequence descriptor D01025 finally expands into the following element descriptors:
B05002 Latitude (coarse accuracy) B06002 Longitude (coarse accuracy) B04003 Day B04004 Hour B04005 MinuteSo we could describe the structure in example 2.5 by using:
D01025 (lat + lon + day + hour + minute) B12004 2 meter temperature B10051 pressure reduced to msl
These are shorthand notations for replicating a sequence of descriptors. Descriptor R'XX'YYY indicates that the next XX descriptors will be repeated YYY times in the expanded descriptor sequence. YYY is called replication count.
If YYY=0 then the replication count is coded into the data section of the message. This is called delayed replication.
These are special descriptors used to alter normal coding rules.
| 4. CREX message structure |
In this chapter we discuss the physical structure of a CREX message.
You can use Cipher CREXtool demo program (downloadable from Northern Lighthouse website) to study structures of real CREX messages.
A CREX message starts with the characters "CREX", and ends with the characters "7777". In between there are two (or sometimes three) other sections. Sections are separated by a character pair "++".
- Section 0, the indicator section, is simply "CREX".
- Section 1, the data description section, contains the descriptors, i.e. it gives the structure of the observation report.
- Section 2, the data section, contains coded data values, to be interpreted according to the structure defined in section 1. Note: it is possible to precede each coded data value with a check digit; section 1 indicates whether check digits are used in the message or not.
- Section 3, is an optional local section. It can be missing. If present, it is defined by the generating centre for local use (more about local extensions in chapter 5).
- Section 4, the end section, is always "7777". It provides a checkpoint for the decoding program.
You can use the demo program Cipher CREXtool (task msgexam) to print out the different sections of a CREX message.
If several observation reports share the same structure, i.e. they have the same descriptors in section 1, then it is possible to build a message that has the descriptors given only once but with a data section containing several observation reports. In this case each report is said to be a subset of the whole message.
Subsets are separated by a end-of-subset mark, which is a single "+".
| 5. Local CREX extensions |
It is possible to expand CREX capabilities locally, within a single data processing centre, in a similar manner as described in our Introduction to BUFR. But as the usage of the local extensions is strongly discouraged, we skip them for the time being.
| ||