Hi Doug,
Thanks for your comments. Below I address some of your points/questions from my perspective.
On Oct 18, 2013, at 1:31 PM, Doug Neuhauser <doug<at>seismo.berkeley.edu> wrote:
Comments on FDSN Web Service Specification version 1.1 RC
Doug Neuhauser, doug<at>seismo.berkeley.edu
2013/10/18
------------------------------------------------------------------------------
1. For time series and catalog searches, I think it would be preferable
to specify a half-open time interval:
[start_time ... end_time)
which means:
start_time <= interval < end_time
rather than
start_time < interval <= end_time
Otherwise, there is no good way to specify non-overlapping time intervals.
I agree that such functionality is important, as this would effect every service I suggest we discuss this in the context of the specification after 1.1.
The use case of non-overlapping intervals for the purposes of dove-tailing the results makes sense. At the DMC, and I would guess more broadly, most requests are independent of other requests and are not destined to be combined with other results. In these cases an inclusive end time makes sense, so we really have two conflicting use cases.
Instead of changing the meaning of the endtime parameter, we might consider adding an alternate end time parameter defined as non-inclusive. For example: "endtimeexclusive", could be such an alternate end time specifier and used to define half-open intervals.
2. For time specifications, I would recommend allowing ISO 8601 ordinal
dates, eg:
YYYY-DDD
YYYY-DDDTHH:MM:SS
YYYY-DDDTHH:MM:SS.ssssss
in addition to the ISO 8601 calendar dates.
These two time conventions are easily converted on either the server or client end. In my experience more-quickly-discovered errors and errors with less significant impact occur when leaving it on the client end. In this case the "syntactic sugar" is not worth it in my opinion. It would be interesting to hear perspectives from others on the working group.
3. Table 1 is incomplete.
"longestonly", "quality", minimumlength" used in fdsn-dataselect do
not appear in Table 1 (used in fdsn-dataselect).
Therefore, their types, allowable values, and defaults are never defined.
Thanks for the careful read, I will remedy that in the specification.
4. Text output (for fdsnws-event and fdsnws-station) is not well defined.
a. How do you include a vertical bar | in a field?
You do not, any that exist in the metadata fields should be replaced with something else.
b. Is it permissible for any field (or all fields) to be empty?
For the DMC's services yes.
c. In fdsnws-event, if format=text is specified, does this imply that
the following are invalid (or ignored)?
includeallorigins
includeallmagnitudes
includedarrivals
For the DMC's services yes.
I would not oppose adding a bit of clarification to the specification regarding these issues if no one objects.
While we should clarify format details that will cause incompatibilities, I recommend against trying to work out a very strict formalism or extensions to allow the text output to serve more use cases; the text output works very well for a large number of common use cases where simpler is better. If we need something more flexible and formal, containing information between the levels provided by the XML and text, I would recommend we spend our effort defining or adopting a JSON-based output.
5. The meaning and use of "includeavailability" in fdsnws-0sstation
is not well defined.
a. Does this mean include detailed waveform availability for each
channel in the specified time, just a single min and max time for
any waveform in that time interval?
In StationXML the data availability extension includes the capability to specify either: details of specific segments and/or extents (min/max). This is left intentionally up to the data center regarding what level of detail is included, the reason being that each data center has different capabilities of reporting these details.
b. Can the waveform starttime be before the time inteval,
and can the waveform endtime be outside of the time interval?
Yes?
b. Do you output this info only if "level=channel" is specified?
Again, seems like a detail that might depend on data center capability and desires. StationXML is capable of reporting data availability at Channel, Station and Network levels.
c. Do you output this information for every discrete segment of
time series data for the channel?
Up to the data center.
There is currently no mechanism to request "degree of detail" for data availability information or "level at which to report", to properly address these questions I think we need either more parameters or values for the existing 'includeavailabilty' parameter.
To sum up, the current specification leaves it up to the data center what degree of detail and at what level of information data availability is reported. You are free to suggest extensions for consideration that would allow more explicit requests for availability.
At the DMC we currently only include data availability information at the channel level and then only the extents. Our ability to report data availability will evolve over time and we hope to provide more detailed information in the future. So we will be interested in refining this part of the specification, and would be happy to work with others on the best approach.
d. How do you represent this info when "format=text" is specified?
You do not.
6. What does "matchtimeseries" specify?
Need better definition of "where selection matches time series data
availability". Does this mean that you output a metadata segment
iff there is any time series data in that interval, or that you limit
the metadata interval to match each discrete time series segment?
Neither. The intent of matchtimeseries is: In addition to matching metadata based on selection criteria, the results will be further limited to metadata for which the criteria also match the existence of available time series data.
The idea is to prepare for a time series data request. For example, a client can request metadata for a number of channels intersecting a one hour range and by specifying matchtimeseries=TRUE has a reasonable expectation that time series data is available networks/stations/channels of metadata returned.
I will clarify the original intent in the specification.
Chad
- Doug N
--
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 215 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
_______________________________________________
fdsn-wg3-products mailing list
fdsn-wg3-products<at>iris.washington.edu
http://www.iris.washington.edu/mailman/listinfo/fdsn-wg3-products