Background
DNC fileservers are required to quickly retrieve CNC programs from
archives containing many thousands of files, and transmit these files to
the requesting CNC machines. Through the evolution of DNC fileservers
through the 1980's, the underlying operating systems have offered
varying degrees of file system performance, sometimes resulting in
unacceptably slow file retrieval.
In our experience, the worst-performing file system is the "FAT-16"
file system used by MS-DOS. The MS-DOS file system imposes enormous
access-time penalties as the numbers of files in directories increase,
sometimes to the point of unusability.
The purpose of this Technical Note is to describe the high-performance
file management capabilities of the UltraServer, and to recommend a
strategy for employing these capabilities.
UltraServer File Management Capabilities
Part 1: NTFS for files stored on the server
If the user chooses to warehouse CNC part program files on the DNC
UltraServer platform itself, the system is capable of handling large
numbers of files in a single directory. A complete description of NTFS'
ability to handle large numbers of files is beyond the scope of this
technical note, but a comparison to FAT-16 may prove instructive.
Under FAT-16, each directory entry can contain up to 512 filenames.
When the 513th file is added to a FAT-16 directory, a new directory
record must be linked to the first, and the entry is written there. When
the system must search for the 513th file, it must first scan all 512
entries in the initial directory, then retrieve the linked directory and
scan the linked directory.
This process is repeated for every 512 files added to a directory.
Since each linked directory extension must be retrieved in sequence, the
access becomes progressively slower as the number of files (thus
directory extensions) increases. For example, if a FAT-16 directory were
loaded with 50,000 files, there would be 98 directory extensions to
search when the system was looking for a file.
Unlike FAT-16, NTFS doesn't simply store directory entries in linked
lists of names. It uses a high-performance self-sorting data structure
known as a "B-Tree". This structure both speeds file lookups and
distributes large structures evenly. So by design, NTFS makes for much
quicker retrieval of files from large directories.
UltraServer File Management Capabilities
Part 2: File distribution
It is not always practical or desirable to place one's main file
archive on the UltraServer. In fact, in cases where a customer has a
reliable office network (Novell, NT Advanced Server, etc.),
FASTechnologies recommends using that file server to hold the CNC file
archive, because these mission-critical office servers tend to be backed
up on a regular basis. In this case, the UltraServer's file system
access will be across an Ethernet network, which may result in poor file
lookup performance when large numbers of files are stored in a single
directory.
The UltraServer offers an elegant solution to this problem, namely
computed file paths. Using a "Swapper" script, the UltraServer can
automatically prepend a filename-derived path to the front of
operator-supplied filenames. The files on the office fileserver may then
be evenly distributed across small directories, resulting in very rapid
file retrieval. An example case is shown below:
Assumptions:
1) Filenames are numeric.
2) The files will all be stored in a directory named "\CNC", but will
be distributed into one hundred subdirectories named "00" to "99".
Thus the directories would be...
\CNC\00
\CNC\01
\CNC\02
...and so on.
With this scheme, if new files coming from the CAM department are
numbered sequentially, then the new files can be evenly distributed
across the one hundred subdirectories. For example, a new file named
"12345678.DRL" would be written as "\CNC\78\12345678.DRL". The next
file, named "12345679.DRL" would be saved "\CNC\79\12345679.DRL".
Using a FASTechnologies-supplied "Swapper" script, the UltraServer
will automatically compute these paths. So, a drill operator could
simply request the file "12345678.DRL", but the UltraServer would
compute the path, and transparently look for "\CNC\78\12345678.DRL".
This scheme can be easily reduced or expanded to use ten directories
or one thousand directories, depending on the need. For details on how
to set up your UltraServer to employ this scheme, contact
FASTechnologies Technical Support.
A case-study example
In an unusually demanding case, a FASTechnologies customer was
concerned about the UltraServer's ability to retrieve files from a
growing archive which currently holds 65,000 files. His intent was to
store these files directly on an NTFS volume on the UltraServer
platform.
To stress-test this case, we set up an UltraServer on a relatively
slow Pentium-133 (much slower than any UltraServer). We then created a
database of 100,000 CNC programs. Using the scheme described above,
these were distributed across 1,000 numerically-named subdirectories,
namely "\CNC\000" up to "\CNC\999". This required over one gigabyte of
storage space.
With this enormous structure in place, we measured the average access
time for the UltraServer to retrieve files. The measured time began when
a request was received, and ended when the requested file had been
opened, fully read, and closed. (The transmission time over the DNC link
was not added, as this time is independent of file access time). We ran
this test for 10,000 randomly-selected file names.
The results of this test were impressive. Across 10,000 file
accesses, the average "time to acquire" (which includes the entire
directory-lookup overhead) was thirty milliseconds! The worst-case
access time was seventy milliseconds. This result conclusively
demonstrates that, using simple file distribution techniques on its NTFS
platform, the UltraServer can easily deal with enormous numbers of
files, with no measurable degradation in performance. |