huffman coding

Work in Progress

Concept

Huffman coding

  • lossless compression
  • high frequency symbols are given the shortest length code
  • prefix code, symbols can have different length codes
  • need to provide dictionary (overhead)
  • used in:
    • PNG
    • JPEG (end)

Algorithm

  1. sort the symbols by frequency
  2. join the two smallest groups together, add the total frequency
    • a group is either:
      • a symbol
      • a pair of groups
  3. repeat step 2 until there is only one group encompassing all the symbols
  4. form the huffman tree
  5. form the dictionary
abaabc
  • frequency table
a3b2c1N1/3N2/6
  • huffman tree
N2a0N1b0c11

huffman trees defer from prefix trees since only the leaf nodes

  • dictionary

the code is a prefix code, so no prefix of any code is another code that encodes a symbol

  abaabc -> 0 10 0 0 10 11
6×8 bits    9 bits + dict

Application

Encoding text (short)

this is an example for huffman encoding
a3b2c1N1/3N2/6
space6
N4
A3
E3
F3
I3
H2
M2
O2
S2
C1
D1
G1
L1
P1
R1
T1
U1
X1

Encoding text (long)

data compression is the process of encoding information using fewer bits than the original representation compression can be either lossy or lossless
a3b2c1N1/3N2/6
space21
S16
E15
O14
I12
N12
R10
T9
A7
C5
H4
L4
P4
F3
G3
M3
B2
D2
U1
W1
Y1