Actions

Reduced to a Bag of Words

From Algolit

Revision as of 20:59, 28 February 2019 by An (talk | contribs) (Created page with "The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, d...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, disregarding grammar, punctuation and even word order. The model transforms the text into a unique list of words and how many times they're used in the text, or quite literally a bag of words.

This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words. For this work the article 'Le Livre de Demain' by engineer G. Vander Haeghen, published in 1907 in the 'Bulletin de l'Institut International de Biobliographie', has been literally reduced to a bag of words to take away.