Algebra with Words: Difference between revisions
From Algolit
Line 1: | Line 1: | ||
− | + | by Algolit | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Word embeddings are language modelling techniques that through multiple mathematical operations of counting and ordering, plot words into a multi-dimensional vector space. When embedding words, they transform from being distinct symbols into mathematical objects that can be multiplied, divided, added or substracted. | Word embeddings are language modelling techniques that through multiple mathematical operations of counting and ordering, plot words into a multi-dimensional vector space. When embedding words, they transform from being distinct symbols into mathematical objects that can be multiplied, divided, added or substracted. | ||
Line 13: | Line 6: | ||
This exploration is using [https://radimrehurek.com/gensim/index.html gensim], an open source vector space and topic modelling toolkit implemented in Python, to manipulate text according to the mathematic relationships which emerge between the words, once they have been plotted in a vector space. | This exploration is using [https://radimrehurek.com/gensim/index.html gensim], an open source vector space and topic modelling toolkit implemented in Python, to manipulate text according to the mathematic relationships which emerge between the words, once they have been plotted in a vector space. | ||
+ | |||
+ | ------------------------------------------ | ||
+ | Concept & interface: Cristina Cochior | ||
+ | |||
+ | Technique: word embeddings, word2vec | ||
+ | |||
+ | Original model: Radim Rehurek and Petr Sojka | ||
[[Category:Data_Workers]][[Category:Data_Workers_EN]] | [[Category:Data_Workers]][[Category:Data_Workers_EN]] |
Revision as of 15:59, 1 March 2019
by Algolit
Word embeddings are language modelling techniques that through multiple mathematical operations of counting and ordering, plot words into a multi-dimensional vector space. When embedding words, they transform from being distinct symbols into mathematical objects that can be multiplied, divided, added or substracted.
While distributing the words along the many diagonal lines of the vector space, the visibility of their new geometrical placements disappears. However, what is gained are multiple, simultaneous ways of ordering. Algebraic operations make the relations between vectors graspable again.
This exploration is using gensim, an open source vector space and topic modelling toolkit implemented in Python, to manipulate text according to the mathematic relationships which emerge between the words, once they have been plotted in a vector space.
Concept & interface: Cristina Cochior
Technique: word embeddings, word2vec
Original model: Radim Rehurek and Petr Sojka