Hi all,
Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached:
Change CRC to represent encoded rather than decoded data.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
HiYes, I have software that uses the last sample check,
I am inclined to agree with this. Back in the old days, the first
sample/last sample were part of the record, with I guess the idea that
if there was a transmission error you could both check that the
decompression was done correctly, as well as potentially decompress
backwards in time to the point of error. In practice I think that
errors in the compression are near zero and so this amounts to just a
check on bit errors in transmission. In this case, there is no benefit
to having the CRC on the decompressed data, and quite a lot of speed
improvement to having it on the encoded.
However, I question whether this is even needed as part of the file
format. It adds complexity to data loggers, perhaps small, but not
zero. It makes sense as part of a transmission protocol or a file
system, but that is a separate issue. Perhaps it is cheap insurance,
but unless it is actively used by receiving software, it doesn't
really help. Perhaps a question to be asked is does anyone use the
existing last sample check and has anyone actually encountered real
miniseed packets with errors? If this type of error doesn't actually
happen, then perhaps we should not add in a fix for it.
I am not totally opposed to having a CRC in the format, but feel that--
its benefits vs. costs should be considered. There is value in
simplicity.
thanks
Philip
On Thu, Aug 11, 2016 at 3:48 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached:
Change CRC to represent encoded rather than decoded data.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
On 08/11/2016 02:29 PM, Philip Crotwell wrote:
HiYes, I have software that uses the last sample check,
I am inclined to agree with this. Back in the old days, the first
sample/last sample were part of the record, with I guess the idea that
if there was a transmission error you could both check that the
decompression was done correctly, as well as potentially decompress
backwards in time to the point of error. In practice I think that
errors in the compression are near zero and so this amounts to just a
check on bit errors in transmission. In this case, there is no benefit
to having the CRC on the decompressed data, and quite a lot of speed
improvement to having it on the encoded.
However, I question whether this is even needed as part of the file
format. It adds complexity to data loggers, perhaps small, but not
zero. It makes sense as part of a transmission protocol or a file
system, but that is a separate issue. Perhaps it is cheap insurance,
but unless it is actively used by receiving software, it doesn't
really help. Perhaps a question to be asked is does anyone use the
existing last sample check and has anyone actually encountered real
miniseed packets with errors? If this type of error doesn't actually
happen, then perhaps we should not add in a fix for it.
Yes, I often use it,
Yes, I have found errors.
I use the "last sample vs last decompressed sample" check
often, and I do find that it oaccasionally detects bad packets.
Most of the time it is due to a datalogger crash creating a bad packet
on disk. However, with other data loggers using SeedLink across
TCP radios, I have seen MiniSEED packet corruption, both in
the data AND in the headers. So TCP does not flag all multi-bit errors.
- Doug N
I am not totally opposed to having a CRC in the format, but feel that--
its benefits vs. costs should be considered. There is value in
simplicity.
thanks
Philip
On Thu, Aug 11, 2016 at 3:48 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached:
Change CRC to represent encoded rather than decoded data.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 221 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
On Aug 12, 2016, at 9:54 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:
OK, then I am in favor of including the CRC.
However, in thinking more about this, I do not think it makes any
sense to protect only the data with the CRC. After all an error in the
headers could be more damaging than one in the data. So, perhaps it is
better to have the CRC field be over the entire record, with the CRC
bytes assumed to be zero for purposes of the calculation.
Philip
On Thu, Aug 11, 2016 at 7:19 PM, Doug Neuhauser
<doug<at>seismo.berkeley.edu> wrote:
On 08/11/2016 02:29 PM, Philip Crotwell wrote:----------------------
HiYes, I have software that uses the last sample check,
I am inclined to agree with this. Back in the old days, the first
sample/last sample were part of the record, with I guess the idea that
if there was a transmission error you could both check that the
decompression was done correctly, as well as potentially decompress
backwards in time to the point of error. In practice I think that
errors in the compression are near zero and so this amounts to just a
check on bit errors in transmission. In this case, there is no benefit
to having the CRC on the decompressed data, and quite a lot of speed
improvement to having it on the encoded.
However, I question whether this is even needed as part of the file
format. It adds complexity to data loggers, perhaps small, but not
zero. It makes sense as part of a transmission protocol or a file
system, but that is a separate issue. Perhaps it is cheap insurance,
but unless it is actively used by receiving software, it doesn't
really help. Perhaps a question to be asked is does anyone use the
existing last sample check and has anyone actually encountered real
miniseed packets with errors? If this type of error doesn't actually
happen, then perhaps we should not add in a fix for it.
Yes, I often use it,
Yes, I have found errors.
I use the "last sample vs last decompressed sample" check
often, and I do find that it oaccasionally detects bad packets.
Most of the time it is due to a datalogger crash creating a bad packet
on disk. However, with other data loggers using SeedLink across
TCP radios, I have seen MiniSEED packet corruption, both in
the data AND in the headers. So TCP does not flag all multi-bit errors.
- Doug N
I am not totally opposed to having a CRC in the format, but feel that--
its benefits vs. costs should be considered. There is value in
simplicity.
thanks
Philip
On Thu, Aug 11, 2016 at 3:48 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached:
Change CRC to represent encoded rather than decoded data.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 221 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Hi,
I agree with the rationale for this proposal. Even though we lose the capability to validate that the data was properly decoded, that process is not well defined given byte order differences.
A further consideration is whether we should move the CRC near the top of the header and include as much of the header as possible along with the data.
Chad
On Aug 12, 2016, at 9:54 AM, Philip Crotwell <crotwell<at>seis.sc.edu> wrote:----------------------
OK, then I am in favor of including the CRC.
However, in thinking more about this, I do not think it makes any
sense to protect only the data with the CRC. After all an error in the
headers could be more damaging than one in the data. So, perhaps it is
better to have the CRC field be over the entire record, with the CRC
bytes assumed to be zero for purposes of the calculation.
Philip
On Thu, Aug 11, 2016 at 7:19 PM, Doug Neuhauser
<doug<at>seismo.berkeley.edu> wrote:
On 08/11/2016 02:29 PM, Philip Crotwell wrote:----------------------
HiYes, I have software that uses the last sample check,
I am inclined to agree with this. Back in the old days, the first
sample/last sample were part of the record, with I guess the idea that
if there was a transmission error you could both check that the
decompression was done correctly, as well as potentially decompress
backwards in time to the point of error. In practice I think that
errors in the compression are near zero and so this amounts to just a
check on bit errors in transmission. In this case, there is no benefit
to having the CRC on the decompressed data, and quite a lot of speed
improvement to having it on the encoded.
However, I question whether this is even needed as part of the file
format. It adds complexity to data loggers, perhaps small, but not
zero. It makes sense as part of a transmission protocol or a file
system, but that is a separate issue. Perhaps it is cheap insurance,
but unless it is actively used by receiving software, it doesn't
really help. Perhaps a question to be asked is does anyone use the
existing last sample check and has anyone actually encountered real
miniseed packets with errors? If this type of error doesn't actually
happen, then perhaps we should not add in a fix for it.
Yes, I often use it,
Yes, I have found errors.
I use the "last sample vs last decompressed sample" check
often, and I do find that it oaccasionally detects bad packets.
Most of the time it is due to a datalogger crash creating a bad packet
on disk. However, with other data loggers using SeedLink across
TCP radios, I have seen MiniSEED packet corruption, both in
the data AND in the headers. So TCP does not flag all multi-bit errors.
- Doug N
I am not totally opposed to having a CRC in the format, but feel that--
its benefits vs. costs should be considered. There is value in
simplicity.
thanks
Philip
On Thu, Aug 11, 2016 at 3:48 PM, Chad Trabant <chad<at>iris.washington.edu> wrote:
Hi all,----------------------
Change proposal #6 to the 2016-3-30 straw man (iteration 1) is attached:
Change CRC to represent encoded rather than decoded data.
Please use this thread to provide your feedback on this proposal by
Wednesday August 24th.
thanks,
Chad
----------------------
Posted to multiple topics:
FDSN Working Group II
(http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III
(http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
------------------------------------------------------------------------
Doug Neuhauser University of California, Berkeley
doug<at>seismo.berkeley.edu Berkeley Seismological Laboratory
Office: 510-642-0931 221 McCone Hall # 4760
Fax: 510-643-5811 Berkeley, CA 94720-4760
Remote: 530-752-5615 (Wed,Fri)
----------------------
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
Posted to multiple topics:
FDSN Working Group II (http://www.fdsn.org/message-center/topic/fdsn-wg2-data/)
FDSN Working Group III (http://www.fdsn.org/message-center/topic/fdsn-wg3-products/)
Sent via IRIS Message Center (http://www.fdsn.org/message-center/)
Update subscription preferences at http://www.fdsn.org/account/profile/
But the main point is that because the location of the checksum in theI have one thought about why CRC might usefully only be over data:
header does not determine what can be part of the checksum, I would
argue that everything should be included (why not?), and we do not
need to be overly concerned about the location of the CRC in the
header. For efficiency reasons you might actually prefer it to be
after the data to allow it to be computed/stored in a streaming mode.
Not sure if that complication is worth it, but moving the CRC to the
final 4 bytes of the record might be worth thinking about.
It may also be wise to allow the CRC not to be set for cases whereWe already have ~100TB of archived data: ~2^46 bytes. If they were
computational speed is more important and errors are not likely, like
reading, modify, write from a disk. Setting the CRC to zero probably
is sufficient, but I think there is the possibility that a real CRC
could actually end up being zero, one chance in 2^32, so likely that
is small enough not to worry about.