Algoliterary Encounters: Difference between revisions

Revision as of 10:54, 24 October 2017

Start of the Algoliterary Encounters catalog.

CHARNN text generator
human & view & power in 5 landscapes - Five word2vec graphs, each of them containing the words 'human', 'view' and 'power'.

(Before: talking_about_machine_learning - exploring the vocabulary of machine learning textbooks in 7 stages with word2vec)

@@ Line 22: / Line 22: @@
 ==== Datasets ====
-* [[The datasets speak]]
+* [[Many many words]]
+* [[The Enron email archive]]
+* [[Common Crawl]] (used by GloVe): selection of urls (Constant, Maison du Livre...)
+* [[Google News]] (used by word2vec)
+* [[Learning from Deep Learning]] (from lib.gen.rus.ec) (.txt)
+* [[HG Wells personal dataset]] (from Gutenberg.org) (.txt)
+* Jules Verne (FR), Shakespeare (FR) -> download from Gutenberg & clean up
+* [[AnarchFem]] (from aaaaarg.fail) (.txt)
 ==== From words to numbers ====