Data Workers: Difference between revisions
From Algolit
Line 24: | Line 24: | ||
===Cleaners=== | ===Cleaners=== | ||
+ | Algolit chooses to work with texts that are free of copyright. This means that they are published under a Creative Commons 4.0 license - which is rare -, or that they are in the public domain because the author has died more than 70 years ago. This condition has the great advantage that we can use texts without asking permission or giving explantions, that we often discover unprecedented pearls and that we help to make datasets available for others online. The disadvantage is that we cannot use epubs or other contemporary text formats, and we are often at the mercy of cleaning up documents. We are not alone in this. | ||
+ | Books are scanned at high resolution, page by page. This is intense human work and often the reason why archives and libraries transfer their collections to a commercial company like Google. The photos are converted into text via OCR (Optical Character Recognition), a software that recognizes letters, but often makes mistakes. Again intense human work to improve the texts. This is work for freelancers via little paid platforms like Mechanical Turk; or for volunteers, such as the community around the Gutenberg Proofreaders, who does fantastic work. Whoever does it or wherever it is done, cleaning up texts is a huge job for which there is no structural automation yet. | ||
+ | |||
+ | '''Works:''' | ||
+ | * [[Cleaning for Poems]] | ||
+ | |||
===Informants=== | ===Informants=== | ||
===Readers=== | ===Readers=== |
Revision as of 16:35, 6 February 2019
About
This exhibition shows a selection of algoliterary works made by members of Algolit, an artistic research group with a focus on F/LOSS code and texts, based in Brussels. While artificial intelligences are being created to serve, entertain, record, and know us, they are usually hidden behind interfaces. In these works of fiction, the algoritmic storytellers leave the invisible underworld to become interlocutors. The works show the voice of the robots, algorithmic models that read data, turn words into numbers, make calculations that define patterns and are able to endlessly process new texts thereafter. This exhibition is an attempt to grasp and multiply voices which are absent in our representations of the world. It allows the robots to go into dialogue with us, humans. It allows us to understand their reasoning, to demystify their behaviour, to encounter their personalities, without having to study intensively for years. It is also a tribute to the many machines that Paul Otlet and Henri Lafontaine imagined for their Mundaneum, showing their potential but also their limits.
Stations
The origins of the Mundaneum go back to the late nineteenth century. The project was created by two young Belgian jurists, Paul Otlet (1868-1944), the father of documentation, and Henri La Fontaine (1854-1943), Nobel Peace Prize winner. Itaimed at gathering all the world’s knowledge and file it using the Universal Decimal Classification (UDC) system that they had created. At first it was international institutions bureau dedicated to knowledge and fraternity. In the 20th century the Mundaneum became a universal centre of documentation. Its collections are made up of thousands of books, newspapers, journals, documents, posters, glass plates, postcards and other bibliographic cards. These were put together and kept in various buildings in Brussels, including the Palais du Cinquantenaire. The archive only moved to Mons in 1998.
Based on Mundaneum, the two men designed a World City for which Le Corbusier made scale models and plans. The aim of the World City was to gather, at world level, the institutions of intellectual work: libraries, museums and universities. This project was never be realised. The Mundaneum project soon faced the scale of the technical development of its era. It suffered from its own utopia. The Mundaneum is the result of a Visionary dream. It attained mythical dimensions at the time. When looking at the concrete archive that was developed, that collection is very fragmented and incomplete. The same can be said for artifical intelligences today. When reading about them, the visionary dream has been there since the beginning of their development in the 50s.
Nowadays the promise has attained mythical dimensions. When looking at the concrete applications, that collection is truely innovative and fascinating, but rather fragmented and incomplete.
[Algoliterator]
Writers
Data workers need data to work with. The data that is used in the context of this exhibition, is written language. Where does it come from? Who is writing? Machine learning relies of many types of writing. We could say that every human being who has access to the internet is an algorithm writer each time they interact with it by adding reviews, writing Wikipedia articles, or writing emails. Machine learning algorithms are not critics: they take whatever they are given, no matter the writing style, no matter the CV of the author, no matter the spelling mistakes. In fact, mistakes make it better: the more variety, the better it can anticipate. Sometimes, the authors are not particularly aware of what happens to their oeuvre: offline material, such as printed literature, is digitized too and turned into prediction fodder. Some writing is in English, some in French, and some in Python. The latter is done by writers with intent, the programmers who scrawled the code we're now discussing. The algorithm can be a writer too, some neural networks write their own rules. And for the rest, the code that is still wrestling with the subtleties of human language, there are human editors who take over. Poets, playwrights, novellists start their exciting new careers as ventriloquists for AI assistants.
Works:
Cleaners
Algolit chooses to work with texts that are free of copyright. This means that they are published under a Creative Commons 4.0 license - which is rare -, or that they are in the public domain because the author has died more than 70 years ago. This condition has the great advantage that we can use texts without asking permission or giving explantions, that we often discover unprecedented pearls and that we help to make datasets available for others online. The disadvantage is that we cannot use epubs or other contemporary text formats, and we are often at the mercy of cleaning up documents. We are not alone in this. Books are scanned at high resolution, page by page. This is intense human work and often the reason why archives and libraries transfer their collections to a commercial company like Google. The photos are converted into text via OCR (Optical Character Recognition), a software that recognizes letters, but often makes mistakes. Again intense human work to improve the texts. This is work for freelancers via little paid platforms like Mechanical Turk; or for volunteers, such as the community around the Gutenberg Proofreaders, who does fantastic work. Whoever does it or wherever it is done, cleaning up texts is a huge job for which there is no structural automation yet.
Works: