Comments on the FDSN Proposed Data Quality Metrics:
1. In several places the time window is stated as
[t0,t1]
but the description then goes on to define this as a half-open
(or half-closed) interval that does not include t1.
The proper international notation ISO 33-11 for this half-open interval is:
[t0,t1)
See:
https://en.wikipedia.org/wiki/ISO_31-11
2. I believe that the the metric
ms_timing_correction
should be defined as:
Number of records fitting time window [t0,t1) in which the MiniSEED
"time correction" field is non-zero.
The reason for this is that a time correction can be made to the record
by having a non-zero value in the timing correction field, but NOT added
into the "Record start time", in which case bit 1 of the Activity Flag
will be set to 0. I believe my proposed definition handles both cases.
3. I suspect that computing metrics for gaps and overlaps using a time
tolerance epsilon of 0 is overkill and man not convey useful information
depending on the timestamp resolution. "Standard" MiniSEED had a time
resolution of 0.0001 seconds, but the addition of blockette 1001
adds resolution to 1 microsecond, 0.000001 seconds. Since the ms_timing_
quality uses values from the blockette 1001, I assume that the timestamp
of the time series will also use the microsecond resolution (usec99) of
the blockette 1001 if it is available.
The number of gaps and overlaps will vary depending on the precision
of the timestamp. Do users really care if there is dithering in the
1 microsecond timestamp that may occur when maintaining a phase lock loop?
Even in a closely temperature controlled environment I see the microseconds
field changing by +- 2 microseconds during a 24 hour period.
Data streams that have timestamps of 1 microsecond resolution
will certainly have a larger number of reported gaps and overlaps
than streams that use the standard 0.0001 MiniSEED clock resolution.
However, if we do NOT use an epsilon of 0, we need to standardize
on what value to use, and the value may be sample-rate dependent.
4. Many of the metrics are defined as "the number of records"
that match some criteria. Is this a meaningful value, since the
record size of data streams are not identical. Common MiniSEED record
sizes are 512, 1024, 2048, and 4096 bytes, and even with a
given record size, the number of samples may vary by a factor of
6 or more depending on the type of compression. Would we do better
to compute these metrics based on the number of seconds of data
or the number of data samples rather than the number of MiniSEED records?
- Doug N
On 11/24/2015 11:16 AM, Dan Auerbach wrote:
Not sure I follow all of the potential subtleties, but I support the
idea that the Quality Code of a time-series for which metrics have
been computed should be know. While an end-user/consumer may not care
(much) about the Quality Code if they are using the metrics to
evaluate data quality, for Data Centers that perform QC, and produce
Q data, it is certainly helpful to know whether metrics have been
computed using Q versus R.
Best,
Dan
Dan Auerbach, Application Developer
Project IDA Data Coordinating Center, Rm 2120
Institute of Geophysics & Planetary Physics, MS 0225
Scripps Institution of Oceanography, UC San Diego
La Jolla, CA 92093
858-822-0797
On Nov 24, 2015, at 10:30 AM, Florian Haslinger <florian.haslinger<at>sed.ethz.ch <florian.haslinger<at>sed.ethz.ch>> wrote:
Hi all,
it took me a little while to figure out (I think) what Rick meant:
the Quality Code (D/R/Q/M) would allow to distinguish between the
same time-series streams that are quality checked vs those that are
not. In that line, any stream for which the quality metrics under
discussion were computed would qualify as ‘checked’ (‘Q’), correct?
(or would it be ‘M’ - because the data center had added the quality
metrics?)
Could this interpretation be extended such that a Quality Code of Q
(or M, see above) would mean that the FDSN agreed quality metrics
are available on that data stream (segment)?
One further comment / question to the proposal: In the definition
of ‘Continuous time series’ (p7 bottom)
a) there is a typo 'ε is the time tolerance is s'
should probably read 'ε is the time tolerance in s'
b) the addition of the ‘0’ default tolerance might be impractical?
Given that we usually qualify digitizers to have a timing accuracy
of ~10e-4 / 10e-5, this in principle translates into a possible
deviation of the actual from the nominal (epsilon not zero) - even
though it may be practically quite impossible to determine
precisely. But if one takes the ‘default 0’ seriously, almost all
adjacent time series should have a ‘Gap’ between them?
Further, there seems no way of defining a (non-default) value for epsilon?
kind regards,
florian
On 24 Nov 2015, at 18:31, Rick Benson <rick<at>iris.washington.edu <rick<at>iris.washington.edu>> wrote:
Hello WG-II
I have only 1 comment/suggestion, namely adding one additional
field to properly distinguish “unique” time series records
in the SEED domain.
A “time series” is defined on the opening page as being:
*_/to belong to a data stream uniquely identified by a SEED
network code, stations code, channel code and location code. /_*
/*However, I think that the *_*Quality code *_*_should be
included_**__from field 2 of the FSDH that’s been around since
April 2004, so that there is NO ambiguity.*/
*/
/*
Thank you ,
Rick
Dear WG-II members,
I believe this email (early November) never reached you. Please find it again.
Please find the proposal for the definition of FDSN waveform quality metrics as suggested during
the Prague meeting. After exploring the systems in place and/or in development at IRIS DMC and
ORFEUS EIDA there are a number of basic metrics in common. The attached document describes the
proposed metrics where the green highlighted text refers to, in my opinion, differences in both
systems and/or definitions that requires agreement.
I believe the 2 systems are pretty close but some details must be defined slightly better and agreed upon.
Looking forward for your feedback.
Cheers,
Reinoud
<Proposal definition QC metrics.pdf>
----------------------
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences athttp://www.fdsn.org/account/profile/
/
/================/
/Rick Benson/
/Director of Data Management/
/IRIS DMC/
/(206)547-0393 ext. 119(office)/
/rick<at>iris.washington.edu <rick<at>iris.washington.edu>/
/
----------------------
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
----------------------
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
----------------------
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
--
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 221 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)