INFO: Structure of .ARC files
From: Doug Wokoun (aa384@cleveland.Freenet.Edu)
Date: 07/03/90-12:00:51 PM Z
From: aa384@cleveland.Freenet.Edu (Doug Wokoun)
Subject: INFO: Structure of .ARC files
Date: Tue Jul 3 12:00:51 1990
(ARCINF.TXT)
ARC-FILE.INF, created by Keith
Petersen, W8SDZ, 21-Sep-86, extracted
from UNARC.INF by Robert A. Freed.
>From: Robert A. Freed Subject:
Technical Information for ARC files
Date: June 24, 1986
Note: In the following discussion,
UNARC refers to my CP/M-80 program for
extracting files from MSDOS ARCs. The
definitions of the ARC file format are
based on MSDOS ARC512.EXE.
ARCHIVE FILE FORMAT
-------------------
Component files are stored
sequentially within an archive. Each
entry is preceded by a 29-byte header,
which contains the directory
information. There is no wasted space
between entries. (This is in contrast
to the centralized directory used by
Novosielski libraries. Although random
access to subfiles within an archive
can be noticeably slower than with
libraries, archives do have the
advantage of not requiring
pre-allocation of directory space.)
Archive entries are normally
maintained in sorted name order. The
format of the 29-byte archive header
is as follows:
Byte 1: 1A Hex.
This marks the start of an
archive header. If this byte is not
found
when expected, UNARC will
scan forward in the file (up to 64K
bytes)
in an attempt to find it
(followed by a valid compression
version).
If a valid header is found in
this manner, a warning message is
issued and archive file
processing continues. Otherwise, the
file is
assumed to be an invalid
archive and processing is aborted.
(This is
compatible with MS-DOS ARC
version 5.12). Note that a special
exception is made at the
beginning of an archive file, to
accomodate
"self-unpacking" archives
(see below).
Byte 2: Compression version, as
follows:
0 = end of file marker
(remaining bytes not present)
1 = unpacked (obsolete)
2 = unpacked
3 = packed
4 = squeezed (after packing)
5 = crunched (obsolete)
6 = crunched (after packing)
(obsolete)
7 = crunched (after packing,
using faster hash algorithm)
(obsolete)
8 = crunched (after packing,
using dynamic LZW variations)
Bytes 3-15: ASCII file name,
nul-terminated.
(All of the following numeric values
are stored low-byte first.)
Bytes 16-19: Compressed file size in
bytes.
Bytes 20-21: File date, in 16-bit
MS-DOS format:
Bits 15:9 = year - 1980
Bits 8:5 = month of year
Bits 4:0 = day of month
(All zero means no
date.)
Bytes 22-23: File time, in 16-bit
MS-DOS format:
Bits 15:11 = hour
(24-hour clock)
Bits 10:5 = minute
Bits 4:0 = second/2 (not
displayed by UNARC)
Bytes 24-25: Cyclic redundancy check
(CRC) value (see below).
Bytes 26-29: Original (uncompressed)
file length in bytes.
(This field is not
present for version 1 entries, byte 2
= 1.
I.e., in this case the
header is only 25 bytes long. Because
version 1 files are
uncompressed, the value normally found
in
this field may be
obtained from bytes 16-19.)
SELF-UNPACKING ARCHIVES
-----------------------
A "self-unpacking" archive is one
which can be renamed to a .COM file
and executed as a program. An example
of such a file is the MS-DOS program
ARC512.COM, which is a standard
archive file preceded by a three-byte
jump instruction. The first entry in
this file is a simple "bootstrap"
program in uncompressed form, which
loads the subfile ARC.EXE (also
uncompressed) into memory and passes
control to it. In anticipation of a
similar scheme for future distribution
of UNARC, the program permits up to
three bytes to precede the first
header in an archive file (with no
error message).
CRC COMPUTATION ---------------
Archive files use a 16-bit cyclic
redundancy check (CRC) for error
control. The particular CRC polynomial
used is x^16 + x^15 + x^2 + 1, which
is commonly known as "CRC-16" and is
used in many data transmission
protocols (e.g. DEC DDCMP and IBM
BSC), as well as by most floppy disk
controllers. Note that this differs
from the CCITT polynomial (x^16 + x^12
+ x^5 + 1), which is used by the
XMODEM-CRC protocol and the public
domain CHEK program (although these do
not adhere strictly to the CCITT
standard). The MS-DOS ARC program does
perform a mathematically sound and
accurate CRC calculation. (We mention
this because it contrasts with some
unfortunately popular public domain
programs we have witnessed, which from
time immemorial have based their
calculation on an obscure magazine
article which contained a
typographical error!)
Additional note (while we are on the
subject of CRC's): The validity of
using a 16-bit CRC for checking an
entire file is somewhat questionable.
Many people quote the statistics
related to these functions (e.g. "all
two-bit errors, all single burst
errors of 16 or fewer bits, 99.997% of
all single 17-bit burst errors,
etc."), without realizing that these
claims are valid only if the total
number of bits checked is less than
32767 (which is why they are used in
small-packet data transmission
protocols). I.e., for file sizes in
excess of about 4K bytes, a 16-bit CRC
is not really as good as what is often
claimed. This is not to say that it is
bad, but there are more reliable
methods available (e.g. the 32-bit
AUTODIN-II polynomial). (End of
lecture!)
Bob Freed
62 Miller
Road
Newton
Centre, MA 02159
Telephone
(617) 332-3533
--
--
-----------------------------------------
Return to message index