Introduction
Why a VB.NET implementation of LZW compression?
I am working with a team of VB.NET developers on a large project, and it is best for us to have all of the components in native VB to simplify maintainence and upgrades. I also thought this might be a good introduction to LZW for people who don't know C.
VB.Net实现LZW压缩与解压的方法Credit where Credit is Due
The source code is a nearly direct port of the LZW implementation by Mark Nelson on his web site on C. I even retained many of his comments. Be sure to look at his code here and view his C implementation here.
His original article was published in the October, 1989 issue of Dr. Dobb's Journal, which implies that Mark is very likely at least as ancient as I am.
Demo Code
Running the Code
Start a new Windows Forms project in VB.NET. Add clsLZW.vb to the project. Draw a button in the center of the main form, view the form's code, and paste the following code just above the form's End Class
statement:
Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) _
Handles Button1.Click
' test the LZW program:
' Place a bunch of files in c:\testdir
' Create c:\testdir\lzw and c:\testdir\postlzw
' Run this program
' test it with a command file such as:
' c:
' cd \testdir
' for %%1 in (*.*) do fc /b %%1
' postlzw\%%1 >> results.txt
Button1.Enabled = False
Dim di As New IO.DirectoryInfo("c:\testdir")
For Each fi As IO.FileInfo In di.GetFiles
' print filename
Dim g As Graphics = Me.CreateGraphics
g.FillRectangle(New SolidBrush(Me.BackColor), _
0, 0, Me.Width, 100)
g.DrawString(fi.Name, Me.Font, Brushes.Black, 0, 0)
g.Dispose()
' compress...
Dim lzw1 As New clsLZW
lzw1.brInput = _
New IO.BinaryReader(IO.File.Open(fi.FullName, _
IO.FileMode.Open))
lzw1.bwOutput = _
New IO.BinaryWriter(IO.File.Open("c:\testdir\lzw\" & _
fi.Name & ".lzw14", _
IO.FileMode.OpenOrCreate, IO.FileAccess.Write))
lzw1.compress()
lzw1.brInput.Close()
lzw1.bwOutput.Close()
' decompress
Dim lzw2 As New clsLZW
lzw2.brInput = _
New IO.BinaryReader(IO.File.Open("c:\testdir\lzw\" _
& fi.Name & ".lzw14", IO.FileMode.Open))
lzw2.bwOutput = _
New IO.BinaryWriter(IO.File.Open("c:\testdir\postlzw\" _
& fi.Name, IO.FileMode.OpenOrCreate, _
IO.FileAccess.Write))
lzw2.expand()
lzw2.brInput.Close()
lzw2.bwOutput.Close()
Next
Button1.Enabled = True
End Sub
Testing the Results
To test, create the file structure specified by the comments in the above Sub
. You should choose a variety of files.
The batch command 'For %%1...' uses the FileCompare utility (fc) to verify that the uncompressed files match the originals.
If you choose to run it outside of the batch or command file, and at a command prompt, change the three %% to % so it will work.
For %1 in (*.*) do fc /b %1 postlzw\%1 >> results.txt
Usage
Refer to the sample code for usage. I have not tested multiple compressions per instantiation of the class, so it is best to create a new instance for each compression or decompression that you wish to complete.
I had to use this to build a multi-file archive, so the calling program maintains control of the streams. Multiple files can be written to the same stream without closing it, however, the input stream is exausted until the end of the file is reached. This can be changed with minor modifications to prevent the need to write temporary files.
Best wishes and good luck with your VB coding!
源代码如下:
Option Strict On
Option Explicit On
Public Class clsLZW
' see http://marknelson.us/1989/10/01/lzw-data-compression/
' this is a VB.NET conversion port of mark's C program.
' Please refer to that program prior to modifying this one.
Private BITS As Integer = 14
Private HASHING_SHIFT As Integer = 4
Private MAX_VALUE As Integer = (1 << BITS) - 1
Private MAX_CODE As Integer = MAX_VALUE - 1
'Private Const TABLE_SIZE As Integer = 5021 ' 12 bits
'Private Const TABLE_SIZE As Integer = 9029 ' 13 bits
Private Const TABLE_SIZE As Integer = 18041 ' 14 bits
Private EOF As Integer = -1
Public brInput As IO.BinaryReader = Nothing
Public bwOutput As IO.BinaryWriter = Nothing
Private iaCode_Value(TABLE_SIZE) As Integer
Private iaPrefix_Code(TABLE_SIZE) As Integer
Private baAppend_Character(TABLE_SIZE) As Byte
'** This is the compression routine. The code should be a fairly close
'** match to the algorithm accompanying the article.
Public Sub compress()
Dim iNextCode As Integer = 0
Dim iCharacter As Integer = 0
Dim iStringCode As Integer = 0
Dim iIndex As Integer = 0
iNextCode = 256 ' Next code is the next available string code
For i As Integer = 0 To TABLE_SIZE - 1 ' Clear out the string table before starting
iaCode_Value(i) = -1
Next i
' Get the first iCharacter. Assuming it to be 0 - 255
' Hence only valid for ASCII text files */
iStringCode = ReadByte()
'** This is the main loop where it all happens. This loop runs util all of
'** the Input has been exhausted. Note that it stops adding codes to the
'** table after all of the possible codes have been defined.
iCharacter = ReadByte()
While iCharacter <> -1
iIndex = find_match(iStringCode, iCharacter) ' See if the string is in */
If (iaCode_Value(iIndex) <> -1) Then ' the table. If it is, */
iStringCode = iaCode_Value(iIndex) ' get the code value. If */
Else ' the string is not in the table, try to add it. */
If (iNextCode <= MAX_CODE) Then
iaCode_Value(iIndex) = iNextCode
iNextCode += 1
iaPrefix_Code(iIndex) = iStringCode
baAppend_Character(iIndex) = CByte(iCharacter)
End If
output_code(iStringCode) ' When a string is found */
iStringCode = iCharacter ' that is not in the table */
End If ' after adding the new one */
iCharacter = ReadByte()
End While
' End of the main loop.
output_code(iStringCode) ' Output the last code
output_code(MAX_VALUE) ' Output the end of buffer code */
output_code(0) ' This code flushes the Output buffer*/
End Sub
' This is the hashing routine. It tries to find a match for the prefix+char
' string in the string table. If it finds it, the iIndex is returned. If
' the string is not found, the first available iIndex in the string table is
' returned instead.
Private Function find_match(ByVal iHashPrefix As Integer, ByVal iHashCharacter As Integer) As Integer
Dim iIndex As Integer = 0VB.Net实现LZW压缩与解压的方法
Dim iOffset As Integer = 0
iIndex = CInt((iHashCharacter << HASHING_SHIFT) Xor iHashPrefix)
If (iIndex = 0) Then
iOffset = 1
Else
iOffset = TABLE_SIZE - iIndex
End If
While (True)
If iaCode_Value(iIndex) = -1 Then
Return iIndex
End If
If (iaPrefix_Code(iIndex) = iHashPrefix) And (baAppend_Character(iIndex) = iHashCharacter) Then
Return iIndex
End If
iIndex -= iOffset
If (iIndex < 0) Then
iIndex += TABLE_SIZE
End If
End While
End Function
' The following routine is used to output variable length
' codes. It is written strictly for clarity, and is not
' particularly efficient.
Private Sub output_code(ByVal code As Integer)
Static output_bit_count As Integer = 0
Static output_bit_buffer As Long = 0
output_bit_buffer = output_bit_buffer Or (code << (32 - BITS - output_bit_count))
output_bit_count += BITS
While output_bit_count >= 8
WriteByte(CByte((output_bit_buffer >> 24) And 255))
output_bit_buffer <<= 8
output_bit_count -= 8
End While
End Sub
' This is the expansion routine. It takes an LZW format file, and expands
' it to an bwOutput file. The code here should be a fairly close match to
' the algorithm in the accompanying article.
Public Sub expand()
Dim baDecode_Stack(TABLE_SIZE) As Byte
Dim iNextCode As Integer
Dim iNewCode As Integer
Dim iOldCode As Integer
Dim bCharacter As Byte
Dim iCurrCode As Integer
Dim i As Integer
'This is the next available code to define.
iNextCode = 256
' Read in the first code, initialize the
' character variable, and send the first
' code to the output file.
iOldCode = input_code()
bCharacter = CType(iOldCode, Byte)
WriteByte(CByte(iOldCode))
' This is the main expansion loop. It reads in characters from the LZW file
' until it sees the special code used to inidicate the end of the data.
iNewCode = input_code()
While (iNewCode <> MAX_VALUE)
If iNewCode >= iNextCode Then
' This code checks for the special STRING+CHARACTER+STRING+CHARACTER+STRING
' case which generates an undefined code. It handles it by decoding
' the last code, and adding a single character to the end of the decode string.
baDecode_Stack(0) = bCharacter
i = 1
iCurrCode = iOldCode
Else
' Otherwise we do a straight decode of the new code.
i = 0
iCurrCode = iNewCode
End If
While iCurrCode > 255
' This routine simply decodes a string from the string table, storing
' it in a buffer. The buffer can then be output in reverse order by
' the expansion program.
baDecode_Stack(i) = baAppend_Character(iCurrCode)
i = i + 1
If i >= MAX_CODE Then
Throw New ApplicationException("Fatal error during iCurrCode expansion.")
End If
iCurrCode = iaPrefix_Code(iCurrCode)
End While
baDecode_Stack(i) = CType(iCurrCode, Byte)
bCharacter = baDecode_Stack(i)
'Now we output the decoded string in reverse order.
While i >= 0
WriteByte(baDecode_Stack(i))
i = i - 1
End While
' Finally, if possible, add a new code to the string table.
If (iNextCode <= MAX_CODE) Then
iaPrefix_Code(iNextCode) = iOldCode
baAppend_Character(iNextCode) = bCharacter
iNextCode += 1
End If
iOldCode = iNewCode
iNewCode = input_code()
End While
End Sub
' The following routine is used to input variable length
' codes. It is written strictly for clarity, and is not
' particularly efficient.
Private Function input_code() As Integer
Dim return_value As Long
Static input_bit_count As Integer = 0
Static input_bit_buffer As Long = 0
Static Mask32 As Long = CLng(2 ^ 32) - 1
While input_bit_count <= 24
input_bit_buffer = (input_bit_buffer Or _
ReadByte() << (24 - input_bit_count)) And Mask32
input_bit_count += 8
End While
return_value = (input_bit_buffer >> 32 - BITS) And Mask32
input_bit_buffer = (input_bit_buffer << BITS) And Mask32
input_bit_count -= BITS
Return CInt(return_value)
End Function
Private Sub WriteByte(ByVal b As Byte)
bwOutput.Write(b)
End Sub
Private Function ReadByte() As Integer
Dim ba(1) As Byte
Dim iResult As Integer
iResult = brInput.Read(ba, 0, 1)
If iResult = 0 Then
Return -1
End If
Return ba(0)
End Function
End Class