Actions

Algoliterary Encounters: Difference between revisions

From Algolit

(Datasets)
 
(84 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
 
+
== About ==
Start of the Algoliterary Encounters catalog.
+
* [[An Algoliterary Journey]]
 
+
* [[Program]]
== General Introduction ==
 
 
 
* [[Introduction Algolit]]
 
  
 
==Algoliterary works==
 
==Algoliterary works==
* [[Oulipo scripts]]
+
A selection of works by members of Algolit presented in other contexts before.
 
* [[i-could-have-written-that]]
 
* [[i-could-have-written-that]]
* Obama, model for a politician
+
* [[The Weekly Address, A model for a politician]]
* ClueBotNG, a special Algolit edition
+
* [[In the company of CluebotNG]]
 +
* [[Oulipo recipes]]
  
 
==Algoliterary explorations==
 
==Algoliterary explorations==
===A few outputs to see how it works===
+
This chapter presents part of the research of Algolit over the past year.  
* CHARNN text generator
 
* [[You shall know a word by the company it keeps]] - Five word2vec graphs, each of them containing the words 'human', 'view' and 'power'.
 
  
===Parts of NN process===
+
=== What the Machine Writes: a closer look at the output ===
 +
Two neural networks are presented more closely, what content do they produce?
 +
* [[CHARNN text generator]]
 +
* [[You shall know a word by the company it keeps]]
  
 +
=== How the Machine Reads: Dissecting Neural Networks ===
 
==== Datasets ====
 
==== Datasets ====
* [[Many many words]]
+
Working with Neural Networks includes collecting big amounts of textual data.
 +
We compared a 'regular' size with the collection of words of the Library of St-Gilles.
 +
* [[Many many words]]  
  
* [[The Enron email archive]]
+
=====Public datasets=====
* [[Common Crawl]] (used by GloVe): selection of urls (Constant, Maison du Livre...)
+
Most commonly used public datasets are gathered at [https://aws.amazon.com/public-datasets/ Amazon].
* [[Google News]] (used by word2vec)
+
We looked closely at the following two:
 +
* [[Common Crawl]]  
 +
* [[WikiHarass]]
 +
 
 +
=====Algoliterary datasets=====
 +
Working with literary texts allows for poetic beauty in the reading/writing of the algorithms.
 +
This is a small collection used for experiments.
 +
* [[The data (e)speaks]]
 
* [[Frankenstein]]
 
* [[Frankenstein]]
* [[Learning from Deep Learning]] (from lib.gen.rus.ec) (.txt)
+
* [[Learning from Deep Learning]]
* [[HG Wells personal dataset]] (from Gutenberg.org) (.txt)
+
* [[nearbySaussure]]
* Jules Verne (FR), Shakespeare (FR) -> download from Gutenberg & clean up
+
* [[astroBlackness]]
* [[AnarchFem]] (from aaaaarg.fail) (.txt)
 
  
 
==== From words to numbers ====
 
==== From words to numbers ====
* [[bag-of-words]]
+
As machine learning is based on statistics and math, in order to process text, words need to be transformed to numbers. In the following section we present three technologies to do so.
* [[one-hot-vector script]] & [[word embeddings]]
+
* [[A Bag of Words]]
 +
* [[A One Hot Vector]]
 +
* [[About Word embeddings|Exploring Multidimensional Landscapes: Word Embeddings]]
 +
* [[Crowd Embeddings|Word Embeddings Casestudy: Crowd embeddings]]  
  
==== Different views on the data ====
+
===== Different vizualisations of word embeddings =====
 
* [[Word embedding Projector]]
 
* [[Word embedding Projector]]
* [[5 dimensions 32 graphs]]
 
 
* [[The GloVe Reader]]
 
* [[The GloVe Reader]]
  
==== Creating word embeddings using word2vec ====
+
===== Inspecting the technique behind word embeddings =====
* [[word2vec applications]] - this can serve as an introduction to word2vec?
+
* [[word2vec_basic.py]]
* [[word2vec_basic.py]] - in piles of paper
+
* [[Reverse Algebra]]
* [[softmax annotated]]
 
* [[chatbot for word mathematics]]
 
  
=== Autonomous machine as inspection ===
+
=== How a Machine Might Speak ===
 +
If a computer model for language comprehension could speak, what would it say?
 
* [[We Are A Sentiment Thermometer]]
 
* [[We Are A Sentiment Thermometer]]
  
===Algoliterary Toolkit===
+
== Sources ==
* cgi interface template
+
The scripts we used and a selection of texts that kept us company.
* [[text-punctuation-clean-up.py]]
+
* [[Algoliterary Toolkit]]
 +
* [[Algoliterary Bibliography]]
 +
 
  
===Bibliography===
+
[[Category:Algoliterary-Encounters]]
* [[Algoliterary Bibliography]] - Reading Room texts
 

Latest revision as of 13:50, 2 November 2017

About

Algoliterary works

A selection of works by members of Algolit presented in other contexts before.

Algoliterary explorations

This chapter presents part of the research of Algolit over the past year.

What the Machine Writes: a closer look at the output

Two neural networks are presented more closely, what content do they produce?

How the Machine Reads: Dissecting Neural Networks

Datasets

Working with Neural Networks includes collecting big amounts of textual data. We compared a 'regular' size with the collection of words of the Library of St-Gilles.

Public datasets

Most commonly used public datasets are gathered at Amazon. We looked closely at the following two:

Algoliterary datasets

Working with literary texts allows for poetic beauty in the reading/writing of the algorithms. This is a small collection used for experiments.

From words to numbers

As machine learning is based on statistics and math, in order to process text, words need to be transformed to numbers. In the following section we present three technologies to do so.

Different vizualisations of word embeddings
Inspecting the technique behind word embeddings

How a Machine Might Speak

If a computer model for language comprehension could speak, what would it say?

Sources

The scripts we used and a selection of texts that kept us company.