The design of your mesonet, of course, takes into account the reason why you are building a mesonet (your objective) and the deliverables you intend to provide to your various stakeholders. To meet your objective and provide your stakeholders with high-quality, accurate mesonet deliverables (including data products), you will need to devise a system for the management, review, processing, and distribution of your data.
 
Your mesonet may consist of hundreds of automated weather stations that are collecting data that need to be housed—somewhere and somehow. Typically, the “owner” of the raw data files (and processed data) is a central data processing facility hosted by the mesonet operator. For example, for a mesonet operated by a university, the data files may reside on servers that are physically located on the university campus.
Depending on the scope of the mesonet, file servers and offsite data hosting for backup may be used. Most entities are state sponsored and require direct access to data products. However, there are certain data products that are created for specific applications where it is not intended for a larger general audience. The data product may be geographic dependent or for a specific entity (for example, emergency managers, public health department, etc.).
 
The metadata for a mesonet are the data that describe and give historical information about the measured data that have been collected by the network of automated weather stations. The metadata may include the following types of information:
Metadata are important because they give context for measurements and instill confidence in the veracity of the measurements and data products. You could deploy the highest quality hardware, but unless you know how those measurements are obtained and managed, it would be difficult to trust and not question the resultant data. Unfortunately, all too often in environmental research, data are used without prior knowledge of the source and how they were obtained. This fails the scientific method.
In a dynamic system such as a mesonet, metadata are subject to change during the life of the automated weather station. Therefore, maintaining the station history and tracking the changes are important for measuring climate variables. If a climate signature change at a station is recorded, you can use your metadata of the site—documenting the site, instrumentation changes, and program changes—to help determine if the change is real or if it is an artifact of intervention.
Methods to record metadata include a standard paper form and mobile platforms with custom software.
The once-standard field notebooks are being routinely replaced by weatherproof handheld computers to document this information. At the Hubbard Brook Experimental Forest in New Hampshire, technicians track known events on handheld devices with electronic forms that have pull-down menus to ensure uniformity. When the technician returns from the field, the digital notes are downloaded and automatically synchronized with the sensor data using the date and time stamp.
Standards for documenting metadata are described here:
In recent years, various metadata standards have been developed for environmental data and can be applied to sensors that produce streaming data. SensorML (Sensor Model Language), EML (Ecological Metadata Language), and WaterML (Water Markup Language) are all common metadata standards that use Extensible Markup Language (XML). XML is a flexible and widely used standard for encoding information in a format that is both human and machine readable, which facilitates its use in Internet applications.
The credibility of your measurement data and metadata is paramount to the accuracy of the mesonet products you provide. Consequently, it is essential to protect your data from being hacked or otherwise treated maliciously.
Starting with the data logger at the automated weather station, there are measures you can take to protect your data. For example, you can encode the data stored and transmitted over communication links to a central location so that smaller-sized data packets are used for transmission, which speeds up the data transmission rates and reduces power consumption. This encrypted format is not typically known and can’t be viewed without the proper decoding software. After the data reaches its destination—a central location, such as a university—required IT security protocols are used to store the data onsite.
 
To ensure the continuity of your mesonet data operations, you can purchase external web services to store your data at additional offsite locations. (This is known as data storage redundancy.) The additional external storage could back up and protect source code, databases, and synchronization algorithms—in addition to other sources such as satellite data, river data, and lightning data. Putting all the software code pieces back together (even though they are backed up), however, could be an extensive exercise. An easier alternative may be to use a mirrored data site that would take care of the internal data communication paths (such as IP addresses and modem configurations). A mirror site is a website or a set of files on a computer server that has been copied to another computer server so that the same website or data files are available from multiple locations. A mirror site has its own URL but is otherwise identical to the principal site.
Don't underestimate the resources required to maintain the data side of a mesonet. It is a considerable effort to maintain all the communication and equipment used— from the sensors and stations to the data servers, software, databases, computer models, data products, etc. System upgrades are typically deployed when two criteria are met: the current technology has become obsolete, and funding is available.
 
While conducting regular maintenance of your mesonet’s stations is critical, it is equally important to have procedures in place to monitor your mesonet data and raise flags for abnormal measurement values. Any mechanical or electrical instrument in the field will eventually fail, and sensors often degrade over time. Failures seldom occur during or immediately preceding regular station maintenance, so developing automated procedures to monitor your measurements can trigger a quicker resolution of those issues.
Mesonet operators must be willing to spend considerable time reviewing the data from their stations to ensure that the data products they deliver to stakeholders are accurate and of high quality. Operators have an obligation to ensure that “bad data”—especially without supporting metadata—does not cause problems for stakeholders and the potentially life-affecting decisions they need to make based on that data.
Operators have an obligation to flag potentially bad data during the quality assurance process and disregard suspect data. Operators can compare the data of one station to that of a nearby station to see if there is a discrepancy, investigate why there might be differences, and annotate the measurements using metadata.
There is a core list of suggestions regarding best practices for quality control of streaming environmental sensor data that are outlined in Campbell et al. (2013):
- Automate QA/QC procedures.
- Maintain an appropriate level of human inspection.
- Replicate sensors.
- Schedule maintenance and repairs to minimize data loss.
- Have ready access to replacement parts.
- Record the date and time of known events that may affect measurements.
- Implement an automated alert system to warn about potential sensor network issues.
- Retain the original unmanipulated data.
- Ensure that the data are collected sequentially.
- Perform range checks on numerical data.
- Perform domain checks on categorical data.
- Perform slope and persistence checks on continuous data.
- Compare the data with data from related sensors.
- Correct the data or fill gaps, if that is prudent.
- Use flags to convey information about the data.
- Estimate uncertainty in the value, if that is feasible.
- Provide complete metadata.
- Document all QA/QC procedures that were applied.
- Document all data processing (e.g., correction for sensor drift).
- Retain all versions of the input data, workflows, QC programs, and models used (data provenance).
The two levels at which you should review the quality of the measurement data from your automated weather stations are the following:
Your measurement data can be quality checked in many ways to ensure the proper operation of your mesonet stations. The following are some example methods you can program your software to use:
Climate testing, step tests, temporal tests, spatial tests, and sensor-specific testing are also data filters used to identify suspect data.
Caution: Based on the practical experience of mesonet operators whose networks have collected billions of data points, you are strongly urged to NEVER delete any data—even if it has been flagged as suspect data. All raw data—even questionable data—from a station should be kept. You may discover that you have a sensor with an intermittent problem. Rather than deleting the data points, flag them. Change the quality flag as many times as you desire, but refrain from changing the raw data. By following this practice, you will be able to track a potential sensor problem.
Networks may choose to measure, store, and output raw data with minimal quality review taking place at the measurement site. Known sensor characteristics, however, can be chosen to have a minimal amount of processing in real time in the data logger programming code. (For example, an HMP45C Temperature and Relative Humidity Probe showing relative humidity values from 100 to 107% is known to be 100%.)
A trained staff member should review the raw mesonet data several times per day. This does not mean reviewing all raw observations—which could be in the order of tens of thousands of data points—but rather the processed data that was flagged and failed from the automated quality review process. From this data, the staff member could then take the following actions:
The following is a suggested list of tasks for a technician to conduct on a site visit:
The Oklahoma Mesonet offers this perspective on site visits:
On-site intercomparisons are not used to correct data. Instead, they provide performance checkpoints during the sensor’s operational life cycle. As a result of these visits, technicians may detect sensor problems that are difficult to ascertain by other components of the QA system. On the other hand, a site visit may reveal that a perceived QA problem is attributable to a real local effect. During the time that a technician is present at a site, a special data flag is activated in the site datalogger. The flag is reported with the regular observations and is archived for future reference. Should questions arise concerning any aspect of data quality, it is possible to check the archives of QA flags to see if the presence of a technician could be a proximate cause of the suspicious data.
During scheduled physical site visits, a technician may perform maintenance tasks ranging from validating instrument performance to maintaining vegetation surrounding the station. The presence of a technician could influence measurements made by the station during the service period. For example, a technician may be validating a rain gage by pouring a known amount of water into the funnel, which then causes false tips in the stored data set. A possible solution to this interference would be to install a door switch on the environmental enclosure housing the data logger. Upon arrival, the technician could open the door, which would signal the data logger to flag all measured data until the door is closed again. The data would still be recorded, but it would be noted that the data are not valid due to a known site visit.
After maintaining the physical structures of mesonet stations, proper IT and data management procedures constitute the most important aspect for a mesonet’s continual operation. A mesonet produces thousands of data points daily, and quality computing infrastructure and database management expertise are vital to process all the measurement values and keep the mesonet online. In addition to managing their measurement data, many mesonets use their database and software to manage operations affecting metadata, maintenance routines, calibration information, manual quality review flags, and inventory management.
Most current mesonets opt to store all their raw measurement data and then post-process the data for calculations and quality checks using automated programs and manual processing. Currently, a commercially available master software package does not exist to handle these processes for mesonets. Consequently, mesonet operators have had to build databases and software to handle all their data and convert them to the formats needed by their stakeholders. To code their database management systems, mesonet operators have used C++ and MYSQL.
There are certain data file formats that work well for stakeholders with TV meteorology computer systems but are not suitable for other stakeholders. When a television station upgrades its server software, changes may be necessary when providing these stakeholders with their data products. Sometimes multiple data formats may need to be provided. Your system administrator may prefer one data format, but in reality, you may need to provide these stakeholders with four or five data formats. This situation is further complicated when you flag a suspicious variable that needs to be updated in multiple data products.
Your processed data will need to be distributed to your stakeholders in various formats, depending on the agreed-upon data products that your mesonet is providing.
Data distribution can be achieved through either of two methods:
Popular data distribution formats include FTP (File Transfer Protocol), HTTP (Hypertext Transfer Protocol), and LDM (Local Data Manager), which all use a dedicated IP (Internet Protocol) port.
RSS (Rich Site Summary or Really Simple Syndication) feeds are another method of data distribution, but these feeds are more difficult to set up than using the other data distribution formats.