Run-length encoding of binary sequences followed by two independent compressions Technical Field
The present invention relates to compression of sequences of symbols - encoding /decoding method.
Background Art
The following terminology will be used:
• "codec/coder" : method of encoding/decoding information aiming at shortening
• If S is a sequence of symbols from an alphabet A , then "finite scheme of sequence S" is FS(S)=(A1P(S)) where P(S) is the set of probabilites of occurrence of each symbol from A in 5".
• If S is a sequence of symbols from an alphabet A , then "Entropy of sequence S", or H(S) , is the entropy of FS(S)=(A1P(S)). [4, §23]
• If S is a sequence of symbols from an alphabet A , then "entropy codec/entropy coder" is a method and/or means to encode a symbol a using only FS(S) from the sequence S with an average size close to — Iog2(p(a)) , As usual, p(a) is probability of occurrence of a in ^.(for example Huffrnan coding [4, §2.1], arithmetic coding [4, §4]- US patent 4,122,440)
• If S is a sequence of symbols from an alphabet A , then with \S\ the number of letters from A in S will be denoted.
|ιS| H (S) is the size of compressed S with an entropy coder. The run length encoding is old and simple, but used only in very special cases. For example see [4, page 2,Pattern-Finding Approaches]: "For instance, fax machines send simple black and white images. These are easily compressed with a solution known as run length encoding, which counts the number of times a black or white pixel is repeated. ..." and "... Run length encoding is the most common example of solutions based on identifying patterns. In most of the cases, patterns are just too complicated for a computer to find regularly." O1 Brien found and patented RLE followed by LZ77(US patent 4,988,998 "Data compression system for successively applying at least two data compression methods to an input data stream"). See [5,§3.3] for the usage of RLE in JPEG.
The enclosed method uses RLE to construct better entropy compression system, as compared to existing entropy coders. This means, that if S is a sequence of symbols from an alphabet A (A and S are fixing the statistical structure, i.e. finite scheme), then the length of encoded S with the enclosed method is less or equal to Is(H(S) (the best possible achievement
of a pure entropy coder).
Disclosure of Invention
Let two sets of numbers be give where .
The following denotations will be used:
•
•
•
If a binary source BS is given, then ch d, C, D, ... can be interpreted as:
• ck is the number of 1-runs with size k
• d, is the number of 0-runs with size /
• / is the number of ones in the BS • O is the number of zeros in BS
• C is the number of runs with ones
• D is the number of runs with zeros
. ck + dk is the number of runs with size k
After every run with ones, there is a run with zeros (and vice versa), except the last run in the sequence. \C— D\≤ 1 is always true. If C and D are both even (or odd) then C-D . If C is even and D is odd (or vice versa), then \C— D\— 1 . But in the latter case the last run (a few bits) can be skipped (stored and loaded separately). So, from now on we will assume that C is equal to D.
Let the output sequence of RLE applied on BS =[ b
x h
2 b
3 ..., Z>
|βS|} be denoted with RLS. If B
1=O then RLS is 0,x
0 t y
Q x
x t y
x ... . IfO
1=I then iώis l,J>
0> VΛ- • Where *, is the length of /-th run with zeros in BS and y
} is the length of j-th run with ones in BS. RLS without the first symbol (which is b
x ) can be devided in two sequences RLS
0 and RLS
1 where: - is the length ofi-th run with zeros in BS. - is the length of >'-th run with ones in BS.
Next statements are obvious:
• pk is probability of occurrence of ch (1-runs with size k ) in RLS1
• ability of occurrence of d, (0-runs with size / ) in RLS0
'
s probability of occurrence of c
k + d
k (runs with size Ar ) in RLS
90 Tree finite schemes can be formed:
95
100
105
The size in bits of JKS is 21(without the skipped last 0). The entropy is H(BS )=0.99836 . The overall size of encoded BS with an entropy coder is 11 d RLS
0 and
Example 1 is a regular case and is the object of the invention, as indicate in claim 1 to compress binary sequences better than the other entropy coders do. BS22RLS and RLS22RLS 115 Inequalities explain the reasons, which will be proven below.
Lemma: Let X
1 ,x
2> ... ; y
t , y
2,— be arbitrary positive numbers wit
120 The lemma was proven in [3 Lemma 1.4.1 page 16].
BS22RLS Inequality -
Proof: Let us use the lemma two times, having in mind th
125
1) Substitute: X1= P1 and y^qp'
130 2) Substitute: X1=^q1ODd y,=pgl l
Summing 1) and 2): ^d because c=jD
BS22RLS InequaUty is proved.
Equality in BS22RLS is reached, when p—q— 0.5 . It is colorary from [3, Lemma IAl].
140 If abinary source BS is given, then symbolize
Using BS22RLS , it is possible to design multiple hybrid methods.
145 The following inequality explains why it is better to use two independent comρressions(
RLS0 RLS1) than just one (RLS).
150 Proof:
Because function xlog2(x) is continuous and convex then [2, page 4 or page 6]
Brief Description of Drawinfgs
165 An example for carrying out the invention is shown in the attached drawings and is described in detail as follows:
Fig.l shows a simplified hardware implementation of an encoder according to claim 1. It consist of:
• "RLE" - run length encoder, its input receives the bits of a binary sequence and its 170 outputs are 0 or 1 runs.
• "Switch" - It has one input and two outputs ("RunO" and "Runl "). The input is going directly to the active output. The output "RunO" can be activated by activating "Select run 0". The output "Runl " can be activated by activating "Select run 1 ". It is necessary to activate an output before starting the encoding process. The first bit of
175 encoded binary sequence can be used to activate an output. The first bit is needed for initialization of the decoder as well.
• "Entropy coder" (e.g. Huffinan[4, §2.1] or Arithmetic [4, §4]) consists of two finite schemes ("FSO" and "FSl") and only one of them is active at a given moment. The "Entropy coder" encodes the input symbol depending on the active finite scheme.
180 "FSO" can be activated by activating "Select FSO". "FSl" can be activated by activating "Select FSl". Activation of an finite scheme must be done before receiving a symbol. "FSO" and/or "FSl" are found earlier or are updated after every symbol (adaptive compression^, §5]).
• Line "Run" is used to move current run from "RLE" to the "Switch"
185 • Line "RunO" is used to move runs with zeros from "Switch" to the "Entropy coder".
The line is responsible to activate the "Select FSO" and "Select run 1 " before sending the run to the "Entropy coder".
• Line "Runl " is used to move runs with ones from "Switch" to the "Entropy coder". Also the line is responsible to activate the "Select FSl" and "Select run 0" before
190 sending the run to the "Entropy coder".
• "First bit": The first bit of encoded binary sequence and it is used to initialize the device.
• " S " : the input of the device.
• "E": the output of the device. 195
Fig.2 shows a simplified hardware implementation of a decoder according to claim 1. It consist of:
• "RLE" - run length decoder, its input receives 0 or 1 runs. Its outputs are bits of a 200 binary sequence.
• "Switch" - It has one input and two outputs ("RunO" and "Runl "). The input is going
directly to the active output. The output "RunO" can be activated by activating "Select run 0". The output "Runl" can be activated by activating "Select run 1". It is necessary to activate an output before starting the decoding process. The first bit of 205 decoded sequence can be used to activate an output.
• "Entropy coder" (for example Huffman[4, §2.1] or Arithmetic [4, §4]) - consists of two finite schemes ("FSO" and "FSl") and only one of them is active at a given moment. The "Entropy coder" decodes the input symbol depending on the active finite scheme. "FSO" can be activated by activating "Select FSO". "FSl" can be
210 activated by activating "Select FSl ". Activation of an finite scheme must be done before receiving a symbol. "FSO" and/or "FSl" are found earlier or are updated after every symbol (adaptive compression^, §5]).
• Line "Run" is used to move current run from active output of "Switch" to "RLE".
• Line "RunO" is used to move runs with zeros from "Switch" to the "Run". The line is 215 responsible to activate the "Select FSO" and "Select run 1 " before sending the run to the "Run".
• Line "Runl " is used to move runs with ones from "Switch" to the "Run". The line is responsible to activate the "Select FSl" and "Select run 0" before sending the run to the "Run".
220 • "First bit": The first bit of encoded binary sequence and it is used to initialize the device.
• "S": the input of the device.
• "E": the output of the device.
225 The required explanation of the invention by means of two drawings is attached.
Modes for Carrying Out the Invention (Advanced Entropy Coders)
230
An advantageous embodiment of the invention is indicated in claim 2 . The further development according to claim 2: it is possible to compress any sequence better than other entropy coders by compressing three sequences, one of which is binary.
Let a sequence S of source symbols be given. The number of its symbols is \s\ and the
235 symbols are from an alphabet A . LGt S-[S
1 s
2>... ,s^ then, the set 5
1^=(O
1 Z>
2)... ,δ
|S|} will denote the sequence of first bits
1 or b, is the first bit of S
1. S
B can be seen as a binary source and as a random variable associated with S . Substitute (XY) with S and X with S
B in main equation for conditional uncertainty H (XJ)= H [X)+ H [YlX) [3, theorem 1.4.4] .Then H(S)=H(S
B)+H(Y I S
B) where:
• p is the probability of 1 in SB
lCan be last bit or some other bit.
• If S
1 is the sequence of all elements from S starting with 1 , and S
0 is the sequence of all elements from S starting with 0, then next equality is true.
Example : If A={θΛ23] , S=[ 1,1,3,2,1,0,2,3,1,2} then 255 55={θ,O.l, 1,0,0,1,1,0,1} -first bits from the binary representation of elements of S.
S1 ={l,O,O,l,θ} - second bits from the binary representation of elements of S, but only if the first bit is 1.
So={ 1,1,1,0,1} - second bits from the binary representation of elements of S, but only if the first bit is 0. 260 Now p=0.5 , ^0=O-I, ^=0.4,^2=0.3, ^=0.2 .
Some calculations follow
265
270
275
n advantageous embodiment of the invention is indicated in claim 3. The further development according to claim 3: it is possible to compress any sequence better than other entropy coders by compressing several binary sequences. 280
Proof: Base equality 1 can be applied for S
1 and S
0 also. And so on.
285 Proof: Because B for every u .
In example 2 S1 and S0 can be compressed further with BS22RLS but there is no further compression for SB because p—q=0.5 .
290
Industrial Applicability
The invention can be used in: digital communication, digital television, digital photography, computers. Especially in JPEG, MPEG: "The JPEG algorithm, for instance, can 295 use either Huffman coding or arithmetic coding to compress the coefficients"[4,§4.4], "Lossy JPEG compression can be described in six main steps: .... 5.Run length coding- in order to make the best possible use of the long series of zeros ... 6. Variable length coding(Huffman coding)..."[5,§3.3].
300
References:
[1] Claude E. Shannon, Warren Weaver, The mathematical theory of commumcation,\JϊάvQrsity of Illinois Press, (1998)
[2] AJ. Khinchin, Mathematical foundations of information theory, Dover Publications, Inc.,
305 New York (1957)
[3] Robert B. Ash, Information Theory, Dover Publications, Inc., New York (1990)
[4] Peter Wayner, Compression Algorithms for Real Programmers, Morgan Kaufmann-
Academic Press, a Harcourt Science and Technology Company (2000)
[5J H. Benoit, Digital television:MPEG-l, MPEG-2 and principles of DVB system^ocdX Press
310 (Second edition 2002)