Some Other Tools in UNIX:Compression.

Compression

The need to compress arises from efficiency consideration. It is more efficient to store compressed information as the storage utilization is much better. Also, during network transfer of information one can utilize the bandwidth better. With enormous redundancy in coding of information, files generally use more bits than the minimum required to encode that information. Let us consider text files. The text files are ASCII files and use a 8 bit character code. If, however, one were to use a different coding scheme one may need fewer than 8 bits to encode. For instance, on using a frequency based encoding scheme like Huffman encoding, we would arrive at an average code length of 5 bits per character. In other words, if we compress the information, then we need to send fewer bits over the network. For transmission one may use a bit stream of compressed bits.
As such tar, by itself, preserves the ASCII code and does not compress information. Unix provides a set of compression utilities which include a compress and a uuencode command. The command structure for the compress or uncompress command is as follows:
compress options filename uncompress options filename
On executing the compress command we will get file with a .Z extension, i.e. with a file filename we get filename.Z file. Upon executing uncompress command with filename.Z as argument, we shall recover the original file filename.
The example below shows a use of compress (also uncompress) command which results in a .Z file.
bhatt@SE-0 [T] >>cp cfiles.tar test; compress test; ls
M ReadMe cfiles.tar test.Z bhatt@SE-0 [T] >>uncompress test.Z; ls
M ReadMe cfiles.tar test
Another method of compression is to use the uuencode command. It is quite common to use a phrase like uuencode a file and then subsequently use uudecode to get the original file. Let us uuencode our test file. The example is shown below:
bhatt@SE-0 [T] >>uuencode test test > test.uu ; ls; rm test ; \ ls ; uudecode test.uu ; rm test.uu; ls
ReadMe cfiles.tar test test.uu M ReadMe cfiles.tar test.uu M ReadMe cfiles.tar test
Note that in using the uuencode command we have repeated the input file name in the argument list. This is because the command uses the second argument (repeated file name) as the first line in the compressed file. This helps to regenerate the file with the original name on using uudecode. Stating it another way, the first argument gives the input file name but the second argument helps to establish the file name in the output. One of the most common usages of the uuencode and uudecode is to send binary files. Internet expects users to employ ASCII format. Thus, to send a binary file it is best to uuencode it at the source and then uudecode it at the destination.
The way to use uuencode/uudecode is as follows:
uuencode my_tar.tar my_tar.tar > my_tar.uu
There is another way to deal with internet-based exchanges. It is to use MIME (base 64) format. MIME as well as SMIME (secure MIME), are Internet Engineering Task Force defined formats. MIME is meant to communicate non-ASCII characters over the net as attachments to a mail. Being a non-ASCII file, it is ideally suited for transmission of post-script, graphics, images, audio or video files over the net. A software uudeview allows one to decode and view both MIME and uuencoded files.
A uuencoded file must end with end without which the file is considered to end improperly. A program called uudeview is very useful to decode uuencoded files as well as files in the base 64 format.
Zip and unzip: Various Unix flavors, as also MS environments, provide instructions to compress a file with the zip command. A compressed file may be later unzipped by using an unzip command. In GNU environment the corresponding commands are gzip (to compress) and gunzip (to uncompress). Below is a simple example which shows use of these commands:
bhatt@SE-0 [T] >>gzip test; ls; gunzip test.gz; ls; M ReadMe cfiles.tar test.gz
M ReadMe cfiles.tar test
In MS environment one may use .ZIP or PKZIP to compress and PKUNZIP to decompress the files.
One of the best known compression schemes is the LZW compression scheme (the letters LZW stand for the initials of the two inventors and the one who refined the scheme).
It was primarily designed for graphics and image files. It is used in the .gif format. A discussion on this scheme is beyond the scope of this book.
Network file transfers: The most frequent mode of file transfers over the net is by using the file transfer protocol or FTP. To perform file-transfer from a host we use the following command.
ftp <host-name>
This command may be replaced by using an open command to establish a connection with the host for file transfer. One may first give the ftp command followed by open as shown below:
ftp
open <host-name>
The first ftp command allows the user to be in the file transfer mode. The open arranges to open a connection to establish a session with a remote host. Corresponding to open we may use close command to close a currently open connection or FTP session. Most FTP protocols would leave the user in the FTP mode when a session with a remote host is closed. In that case a user may choose to initiate another FTP session immediately. This is useful when a user wishes to connect to several machines during a session. One may use the bye command to exit the FTP mode. Usually, the ftp ftp connects a user to the local server.
With anonymous or guest logins, it is a good idea to input one's e-mail contact address as the password. A short prompt may be used sometimes to prompt the user. Below we show an example usage:
user anonymous e-mail-address
Binary files must be downloaded using the BINARY command. ASCII files too can be downloaded with binary mode enabled. FTP starts in ASCII by default. Most commonly used ftp commands are get and put. See the example usage of the get command.
get <rfile> <lfile>
This command gets the remote file named rfile and assigns it a local file name lfile. Within the FTP protocol, the hash command helps to see the progression of the ftp transfers. This is because of # displayed for every block transfer (uploaded or downloaded). A typical get command is shown below.
ftp> hash ftp> binary
ftp> get someFileName
During multiple file downloads one may wish to unset interactivity (requiring a user to respond in y/n) by using the prompt command. It toggles on/off on use as shown in the example below:
ftp> prompt
ftp> mget filesFromAdirectory
The mget or mput commands offer a selection to determine which amongst the files need to be transferred. One may write shell scripts to use the ftp protocol command structure. This may be so written to avoid prompts for y/n which normally shows up for each file transfer under the mget or mput commands.
Unlike tar, most ftp protocols do not support downloading files recursively from the subdirectories. However, this can be achieved in two steps. As a first step one may use the tar command to make an archive. Next, one can use the ftp command to effect the file transfer. Thus all the files under a directory can be transferred. In the example below, we additionally use compression on the tarred files.
1. Make a tar file: create xxx.tar file
2. Compress: generate xxx.tar.z file
3. Issue the ftp command: ftp
Below we have an example of such a usage:
1. Step 1: $ tar -cf graphics.tar /pub/graphics
This step takes all files in /pub/graphics and its subdirectories and creates a tar file named graphics.tar.
2. Step 2: $ compress graphics.tar
This step will create graphics.tar.z file.
3. Step 3: uncompress graphics.tar.z to get graphics.tar
4. Step 4: tar gf graphics.tar will give file gf .
















































Comments

Popular posts from this blog

Input Output (IO) Management:HW/SW Interface and Management of Buffers.

Introduction to Operating Systems:Early History: The 1940s and 1950s

Input Output (IO) Management:IO Organization.