Skip to main content

Huffman Coding

 

Huffman coding is a lossless data compression algorithm named after its inventor, David A. Huffman. It is a variable-length prefix coding algorithm, which means that it assigns shorter codes to the more frequently occurring symbols in a dataset and longer codes to the less frequently occurring symbols. This results in a smaller overall size of the compressed data.

How it Works

The Huffman coding algorithm starts by building a frequency table of all the symbols in the dataset, which shows the number of occurrences of each symbol. Then, it creates a binary tree with each symbol represented by a leaf node. The parent node of two children represents the sum of their frequencies. The process continues until there is only one node left, which is the root of the tree.

Each leaf node in the tree is assigned a unique binary code, where a 0 is assigned to the left child and a 1 is assigned to the right child. The code for each symbol is the path from the root of the tree to the corresponding leaf node, where each edge is either a 0 or a 1. The symbols with the higher frequency are assigned shorter codes, which results in a smaller compressed size.

Example

Suppose we have a dataset that contains the following symbols: "A", "B", "C", "D", and "E", and the frequency of each symbol is as follows: "A" (45), "B" (13), "C" (12), "D" (16), and "E" (9). The frequency table would look like this:

Symbol Frequency
A 45
B 13
C 12
D 16
E 9

The algorithm would start by building the binary tree, which would look like this:


       *
      / \\
     /   \\
    /     \\
   *       *
  / \\     / \\
 /   \\   /   \\
A     B C     D
       |
       E

Finally, the algorithm would assign binary codes to each symbol, which would look like this:

Symbol Binary Code
A 0
B 100
C 101
D 11
E 1100

Applications and Real-World Examples

Huffman coding is widely used in various fields, including data compression, image and video compression, and communication networks. For example, the GIF image format uses Huffman coding to compress its data. The MP3 audio format also uses Huffman coding to reduce the size of the audio data.

Alternatives and New Developments

There are several alternatives to Huffman coding, including Arithmetic coding, Shannon-Fano coding, and Run Length Encoding. In recent years, there have been many new developments in the field of data compression, including the use of neural networks for compression and the development of lossy compression algorithms that use machine learning to remove less significant data from the dataset.

Conclusion

Huffman coding is a powerful and widely used data compression algorithm that assigns shorter codes to more frequently occurring symbols in a dataset. It is used in a variety of applications, including image and video compression, communication networks, and data compression. The algorithm is relatively simple to implement and can result in significant reductions in the size of the compressed data.

Comments

Popular posts from this blog

Krishna ki chetawani -- कृष्ण की चेतावनी

कृष्ण की चेतावनी -- रामधारी सिंह दिनकर  वर्षों तक वन में घूम घूम बाधा विघ्नों को चूम चूम सह धूप घाम पानी पत्थर पांडव आये कुछ और निखर सौभाग्य न सब दिन सोता है देखें आगे क्या होता है मैत्री की राह दिखाने को सब को सुमार्ग पर लाने को दुर्योधन को समझाने को भीषण विध्वंस बचाने को भगवान हस्तिनापुर आए पांडव का संदेशा लाये दो न्याय अगर तो आधा दो पर इसमें भी यदि बाधा हो तो दे दो केवल पाँच ग्राम रखो अपनी धरती तमाम हम वहीँ खुशी से खायेंगे परिजन पे असी ना उठाएंगे दुर्योधन वह भी दे ना सका आशीष समाज की ले न सका उलटे हरि को बाँधने चला जो था असाध्य साधने चला जब नाश मनुज पर छाता है पहले विवेक मर जाता है हरि ने भीषण हुँकार किया अपना स्वरूप विस्तार किया डगमग डगमग दिग्गज डोले भगवान कुपित हो कर बोले जंजीर बढ़ा कर साध मुझे हां हां दुर्योधन बाँध मुझे ये देख गगन मुझमे लय है ये देख पवन मुझमे लय है मुझमे विलीन झंकार सकल मुझमे लय है संसार सकल अमरत्व फूलता है मुझमे संहार झूलता है मुझमे उदयाचल मेरा दीप्त भाल, भूमंडल वक्षस्थल विशाल, भुज परिधि-बन्ध को घेरे हैं, मैनाक-मेरु पग मेरे हैं। दिपते जो ग्रह नक्षत्र निकर, सब हैं

आ तमाशा तू भी देख

देखने वाले देखते हैं, सब कुछ देखते हैं ये लोग देख देख कुछ करते नहीं, जाने कहाँ से लगा ये रोग ।  गरीब देखा, पीड़ित देखा, देखे उनके खेत बंजर फर्क उनको कुछ पड़ा नहीं, देख किसानों का ये मंजर झूठ वादा, झूठे काम, किसानों के प्रति झूठा सम्मान सब देख मंद मुस्काते हैं, चाहे फांद गला लटके किसान हिन्दू देखा, मुस्लिम देखा, देखी जाने कितनी जाती पर जिससे इंसान दिखें, ऐसी कला कहाँ उनको आती देखने वाले देखते हैं, सब कुछ देखते हैं ये लोग देख देख कुछ करते नहीं, जाने कहाँ से लगा ये रोग ।  घर में देखा, ऑफिस में देखा, देखा ओलंपिक्स में परचम लहराते चाहे जितने हुनर उनके देखे, पर कसी फब्तियां आते जाते कल के दुश्मन आज हैं भाई, गले पड़े भुला के सब लफड़े लेकर ठेका आदर्शवाद का, नाप रहे दूजों के कपडे अधरों पे बेशर्मी का पर्दा, जो पीड़ित है उसी की गलती देख देख इन बड़बोलों को, दानवों की कमी कहाँ है खलती देखने वाले देखते हैं, सब कुछ देखते हैं ये लोग देख देख कुछ करते नहीं, जाने कहाँ से लगा ये रोग ।  सड़क नहीं, बिजली नहीं, जनता का पैसा, उनकी जेब जहाँ देखो वहीँ मिलेंगे, भरे प

कोई तो होता

भटकता जब मैं अपना पथ  भूल जाता लगाकर मैं गोता, वापस मुझको लाने वाला  काश ऐसा कोई तो होता।  गिरकर, भटककर, खाकर चोट  जब मैं मन ही मन रोता,  मेरे दुखों को समझने वाला  काश ऐसा कोई तो होता।  सन्नाटे के धुंध में जब  चुप चुप अकेले मैं सोता,  मुझसे बातें करने वाला काश ऐसा कोई तो होता।  अनगिनत जिम्मेदारियां अपनी  होकर असहाय जब मैं ढोता, मुझको सहारा देने वाला  काश ऐसा कोई तो होता।  जीवन के संघर्षों से लड़कर  जब मैं अपना मनोबल खोता, साहस मुझे बंधाने वाला  काश ऐसा कोई तो होता।  खाकर अपने पीठ पे खंजर  जब मैं अपने जख्मों को धोता,  मरहम मुझको करने वाला काश ऐसा कोई तो होता। -- शशिकांत  * उपरोक्त पंक्तियाँ मेरी पुस्तक " आ तमाशा तू भी देख " का अंश हैं।

Coin Flipping Puzzle: Interview Question

Golu has 100 identical coins (with head side and tail side) which he wants to donate to someone. There are many people who are aiming to get this collection of 100 coins. So Golu created a puzzle using all 100 coins and declared that the one who will solve his puzzle will get all the coins. You desperately need money so solving the puzzle is only option you have. The Problem goes like this: All 100 coins are laying flat on a table. 80 of them are heads up and remaining 20 are tails up. You can’t feel, see or in any other way find out which side is up. Split the coins into two piles(sets) such that there are the same number of tails in each pile. [Baby Hint]: If number of coins in first pile is n then other pile will have 100-n coins. First try yourself before peeking into the solution below. 

10 basic git commands you must know

If you work on software development project then you might have already used git for version control. If you haven't used git yet then it's high time you should learn and start using this. Following are the 10 git commands you must know while getting started with git. 1.  git init :  It initializes the git repository. Running this command creates a directory(. git ) inside the current directory which contains git configuration and other repository data.  See below image how there was no git repository initially and after running git init , an empty git repository is initialized as we can see the .git directory gets created. 2. git status : It shows the current status of the repository by showing the difference between index file and the current HEAD commit. It shows the untracked files and files which are added to staging area but not yet committed. You can use different options with git status .  Using -s option gives the output in short format. Option -v or --

JSON vs YAML

JSON JSON(JavaScript Object Notation) is a human-readable data exchange format.  JSON is built on two structures: A collection of name/value pairs. In various languages, this is realized as an object, record, struct, dictionary, hash table, keyed list, or associative array. An ordered list of values. In most languages, this is realized as an array, vector, list, or sequence. JSON's basic data types are: Number : a signed decimal number that may contain a fractional part and may use exponential E notation, but cannot include non-numbers such as NaN. The format makes no distinction between integer and floating-point. JavaScript uses a double-precision floating-point format for all its numeric values, but other languages implementing JSON may encode numbers differently. String : a sequence of zero or more Unicode characters. Strings are delimited with double-quotation marks and support a backslash escaping syntax. Boolean : either of the values true or false Arra

[Fixed] Alexa can count only from 1 to 10

Note : At the time of writing this post, Alexa had this issue which has been resolved. Alexa can now count flawlessley even in reverse. Alexa is the smart virtual assistance by Amazon. It is beating google now and Siri on non-screen devices. Alexa is hailed for its intelligence but it seems that the Alexa knows to count only from 1 to 10. By 1 to 10, I mean exactly from 1 to 10; neither more nor less. Even asking Alexa to count from 1 to 5 or 3 to 10 or 3 to 5 results in an apology from Alexa. Watch the above video to see for yourself.

Close up images of US Dollar bill of $10

 

7 Principles to Investing by Warren Buffett

 

List of Indian Stocks paying dividend in 2020

 Following is the list of all companies which are paying dividend in 2020.  Dividend % is the percentage of face value of stock paid as dividend.  Souce: Moneycontrol COMPANY NAME DIVIDEND DATE Type % Announcement Record Ex-Dividend Keynote Finance Final 10.00 29-06-2020 - 24-09-2020 Ajmera Realty Final 14.00 12-08-2020 - 24-09-2020 Jyoti Resins Final 25.00 27-07-2020 - 22-09-2020 Fiberweb India Final 5.00 12-08-2020 - 22-09-2020 Vikram Thermo Final 15.00 29-06-2020 - 22-09-2020 Benares Hotels Final 75.00 28-05-2020 - 21-09-2020 Divyashakti Final 15.00 18-06-2020 - 21-09-2020 Goldcrest Fin Final 5.00 11-08-2020 - 21-09-2020 Goldcrest Corp Final 5.00 11-08-2020 - 21-09-2020 Hind Tin Works Final 6.00 30-06-2020 - 21-09-2020 Kemp and Co Final 5.00 29-06-2020 - 21-09-2020 Kkalpana Ind Final 6.00 29-06-2020 - 21-09-2020 Creative Periph Final 5.00 25-06-2020 20-09-2020 19-09-2020 Dilip Buildcon Final 10.00 29-05-2020 - 18-09-2020 DHP Final 25.00 30-06-2020 - 18-09-2020 India Cements Final 6.