current networks and eight-bit data

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

current networks and eight-bit data

Michael Sperberg-McQueen
Glenn Adams, in his very pertinent remarks, argues that current networks
do support eight-bit data transfer reliably.  He points out that he is
omitting Bitnet (and presumably also EARN and NetNorth) from
consideration; less visible is that we appear to be speaking of
different kinds of reliability.

On the first question, what is a network?, we run into a basic policy
question.  How much attention shall we pay to the technically advanced
and capacious portions of our infrastructure, and how much to the
technologically more conservative portions?  If eight-bit data really
does travel reliably on the Internet (but that's not my experience --
see below) but not on Bitnet/NetNorth and EARN, how much attention need
we pay to the less-reliable system?  Can we ignore it to simplify some
technical questions, at the risk of making life more complicated for
some users?  Or must we pay attention to it, at the risk of tying
ourselves firmly to a long history of what now look like mistakes in
data processing history?

This is not a simple question.  But I (personally) believe that we
cannot simply ignore the current state of affairs on Bitnet and EARN, at
the Bitnet/EARN-to-Janet gateway, and at other gateways between ASCII-
or ISO-based and EBCDIC-based networks, simply because they have
problems which do not exist on some smaller subset of our international
networks.  We ought not to ignore Bitnet or Janet, because a significant
fraction of our constituency works with them as the primary network
communication tool.  We ought not to ignore the current state of PCs and
Macs, because they are widely used too.  Nor should Unix systems be
ignored, because they too are widely used.  But I have never had the
pleasure of seeing any of these systems come out of the box supporting
*any* standard 8-bit character set, let alone reliable interchange to
systems which may use a different character set.  ISO character set
standards do often ignore the existence of non-conforming systems --
that is one contributing cause of the pervasive feeling of unreality one
gets while reading the relevant standards.

Since it is frequently those with the least access to the technical
expertise of network gurus who have the technically least sophisticated
systems, to ignore the 7-bit base of most current systems would be to
abandon those who most need help.  (It is arguable that the help for
7-bit environments ought to be separated out from the Guidelines per se
and incorporated into a separate document with special recommendations
for the current state of affairs.  Maybe they should.  That would at
least allow the guidelines themselves to be a little less dated by
current technical deficiencies.)

But let's continue to the second question.  What is meant by reliable
data interchange?  I mean this:

    Data interchange (of 7-bit, 8-bit, n-bit data) is reliable
    over a given link if, on their arrival, all characters are
    interpreted on the target system as they were interpreted on
    the source system, without special or heroic measures being
    necessary on the part of those engaging in the interchange.

I haven't found this level of support in TCP/IP implementations I use
today, and I'm told by network gurus that the specifications for Telnet
require support for ASCII and do *not* define a full eight-bit data
path.  FTP does support eight-bit data (and wider, if I understand it
correctly), if you remember to ask for it, but since it has no support
for character-set labeling, there is no way short of writing special
software to ensure that when an ISO 8859-1 machine sends data to an ISO
8859-2 machine, it is received correctly.  (Yes, the number of bits is
correct, the binary value of each byte is correct, but it's not the same
*character*.)  Writing your own telecommunications software, enjoyable
though it may be for some of us, should probably be classed as a special
or heroic measure.

Some people in the wider network and dp community are aware of these
problems (I should mention the dedicated work of the Kermit people in
adding 8859 support to Kermit, and of course the work of the standards
committees, the IBM user groups SHARE and SEAS, and others), so maybe we
will see solutions eventually.  But glancing around at existing software
and networks, who would feel like assuming that a perfect solution made
available today could be universally implemented in under ten years'
time?

The sad fact of the matter is, that 7-bit data streams appear *not* to
be part of the past, for many of us, nor just a deplorable fact of the
present.  For a great many people, I submit, 7-bit data streams,
national character sets, and the like are part of the future and will be
for some time.  Unless we feel like telling them "you should get a
*real* system, one that supports gfill in blank
" it is premature to
assume a well-behaved network environment.

As a technical footnote, I should observe that Bitnet does support
eight-bit data transfer from node to node in the sense apparently
intended by Glenn Adams -- many people successfully ship load modules
over the net without mishap.  There are problems, but they occur outside
the network proper (mostly when translating from the EBCDIC form used
for network transfer to a local ASCII form, or vice versa).  Problems of
character re-interpretation after crossing national boundaries, of
course, remain and will remain until most of our file systems and most
of our networks have the capacity to support character-set labelling
of files, and most of our systems actually use that capacity.

-Michael Sperberg-McQueen
 ACH / ACL / ALLC Text Encoding Initiative
 University of Illinois at Chicago