Algoliterary Encounters: Difference between revisions
From Algolit
(→Datasets) |
|||
Line 24: | Line 24: | ||
* [[The Enron email archive]] | * [[The Enron email archive]] | ||
− | * [[Common Crawl]] (used by GloVe): selection of urls (Constant, Maison du Livre...) | + | * [[Common Crawl]] (used by [[GloVe]]): selection of urls (Constant, Maison du Livre...) |
* [[Google News]] (used by word2vec) | * [[Google News]] (used by word2vec) | ||
* [[Frankenstein]] | * [[Frankenstein]] |
Revision as of 17:13, 24 October 2017
Start of the Algoliterary Encounters catalog.
General Introduction
Algoliterary works
- Oulipo scripts
- i-could-have-written-that
- Obama, model for a politician
- ClueBotNG, a special Algolit edition
Algoliterary explorations
A few outputs to see how it works
- CHARNN text generator
- You shall know a word by the company it keeps - Five word2vec graphs, each of them containing the words 'human', 'view' and 'power'.
Parts of NN process
Datasets
- The Enron email archive
- Common Crawl (used by GloVe): selection of urls (Constant, Maison du Livre...)
- Google News (used by word2vec)
- Frankenstein
- Learning from Deep Learning (from lib.gen.rus.ec) (.txt)
- HG Wells personal dataset (from Gutenberg.org) (.txt)
- Jules Verne (FR), Shakespeare (FR) -> download from Gutenberg & clean up
- AnarchFem (from aaaaarg.fail) (.txt)
- WikiHarass
- Tristes Tropiques
From words to numbers
Different views on the data
Creating word embeddings using word2vec
- word2vec applications - this can serve as an introduction to word2vec?
- word2vec_basic.py - in piles of paper
- softmax annotated
- chatbot for word mathematics
Autonomous machine as inspection
Algoliterary Toolkit
- cgi interface template
- text-punctuation-clean-up.py
Bibliography
- Algoliterary Bibliography - Reading Room texts