Reduced to a Bag of Words: Difference between revisions
From Algolit
(Created page with "The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, d...") |
|||
Line 1: | Line 1: | ||
The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, disregarding grammar, punctuation and even word order. The model transforms the text into a unique list of words and how many times they're used in the text, or quite literally a bag of words. | The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, disregarding grammar, punctuation and even word order. The model transforms the text into a unique list of words and how many times they're used in the text, or quite literally a bag of words. | ||
− | This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words. For this work the article 'Le Livre de Demain' by engineer G. Vander Haeghen, published in 1907 in the 'Bulletin de l'Institut International de | + | This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words. For this work the article 'Le Livre de Demain' by engineer G. Vander Haeghen, published in 1907 in the 'Bulletin de l'Institut International de Bibliographie', has been literally reduced to a bag of words to take away. |
Latest revision as of 20:59, 28 February 2019
The bag-of-words model is a simplifying representation of text used in natural language processing. In this model, a text is represented as a collection of its unique words, disregarding grammar, punctuation and even word order. The model transforms the text into a unique list of words and how many times they're used in the text, or quite literally a bag of words.
This model is often used to understand the subject of a text by recognizing the most frequent or important words, or to measure the similarities of texts by comparing their bags of words. For this work the article 'Le Livre de Demain' by engineer G. Vander Haeghen, published in 1907 in the 'Bulletin de l'Institut International de Bibliographie', has been literally reduced to a bag of words to take away.