Contextual stories about Writers: Difference between revisions
From Algolit
(24 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
== Programmers are writing the dataworkers into being == | == Programmers are writing the dataworkers into being == | ||
− | + | We recently had a funny realization: most programmers of the languages and packages that Algolit uses are European. | |
− | [https://www.python.org/ Python], for example, the main language that is globally used for | + | [https://www.python.org/ Python], for example, the main language that is globally used for Natural Language Processing (NLP), was invented in 1991 by the Dutch programmer [https://en.wikipedia.org/wiki/Guido_van_Rossum Guido Van Rossum]. He then crossed the Atlantic and went from working for Google to working for Dropbox. |
− | [https://sklearn.org/ Scikit Learn], the open source Swiss knife of machine learning tools, started as a Google Summer of Code project in Paris by | + | [https://sklearn.org/ Scikit Learn], the open-source Swiss knife of machine learning tools, started as a Google Summer of Code project in Paris by French researcher [https://en.wikipedia.org/wiki/David_Cournapeau David Cournapeau]. Afterwards, it was taken on by Matthieu Brucher as part of his thesis at the Sorbonne University in Paris. And in 2010, [http://www.inra.fr/en/ INRA], the French National Institute for computer science and applied mathematics, adopted it. |
− | [https://keras.io/ Keras], an open source neural network library written in Python, | + | [https://keras.io/ Keras], an open-source neural network library written in Python, was developed by François Chollet, a French researcher who works on the Brain team at Google. |
− | [https://radimrehurek.com/gensim/ Gensim], an open source library for Python used to create unsupervised semantic models from plain text, was written by [https://radimrehurek.com/about/ Radim Řehůřek]. He is a Czech computer scientist | + | [https://radimrehurek.com/gensim/ Gensim], an open-source library for Python used to create unsupervised semantic models from plain text, was written by [https://radimrehurek.com/about/ Radim Řehůřek]. He is a Czech computer scientist who runs a consulting business in Bristol, UK. |
− | And to finish up this small series, we also looked at [https://www.clips.uantwerpen.be/pattern Pattern], an often used library for web-mining and machine learning. Pattern was developed and made open source in 2012 by Tom De Smedt and Walter Daelemans. Both are researchers at [https://www.clips.uantwerpen.be CLIPS], the | + | And to finish up this small series, we also looked at [https://www.clips.uantwerpen.be/pattern Pattern], an often-used library for web-mining and machine learning. Pattern was developed and made open-source in 2012 by Tom De Smedt and Walter Daelemans. Both are researchers at [https://www.clips.uantwerpen.be CLIPS], the research centre for Computational Linguistics and Psycholinguistcs at the University of Antwerp. |
== Cortana speaks == | == Cortana speaks == | ||
− | AI assistants often need | + | AI assistants often need their own assistants: they are helped in their writing by humans who inject humour and wit into their machine-processed language. [https://www.microsoft.com/en-us/cortana/ Cortana] is an example of this type of blended writing. She is Microsoft’s digital assistant. Her mission is to help users to be more productive and creative. Cortana's personality has been crafted over the years. It's important that she maintains her character in all interactions with users. She is designed to engender trust and her behavior must always reflect that. |
− | The following guidelines are taken from Microsoft's website. They describe how Cortana's style should be respected by companies | + | The following guidelines are taken from [https://docs.microsoft.com/en-us/cortana/skills/cortanas-persona Microsoft's website]. They describe how Cortana's style should be respected by companies that extend her service. Writers, programmers and novelists, who develop Cortana's responses, personality and branding have to follow these guidelines. Because the only way to maintain trust is through consistency. So when Cortana talks, you 'must use her personality'. |
− | What is Cortana's personality, you ask? | + | What is Cortana's personality, you ask? |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | Consider | + | 'Cortana is considerate, sensitive, and supportive. |
+ | |||
+ | She is sympathetic but turns quickly to solutions. | ||
+ | |||
+ | She doesn't comment on the user’s personal information or behavior, particularly if the information is sensitive. | ||
+ | |||
+ | She doesn't make assumptions about what the user wants, especially to upsell. | ||
+ | |||
+ | She works for the user. She does not represent any company, service, or product. | ||
+ | |||
+ | She doesn’t take credit or blame for things she didn’t do. | ||
+ | |||
+ | She tells the truth about her capabilities and her limitations. | ||
+ | |||
+ | She doesn’t assume your physical capabilities, gender, age, or any other defining characteristic. | ||
+ | |||
+ | She doesn't assume she knows how the user feels about something. | ||
+ | |||
+ | She is friendly but professional. | ||
+ | |||
+ | She stays away from emojis in tasks. Period | ||
+ | |||
+ | She doesn’t use culturally- or professionally-specific slang. | ||
+ | |||
+ | She is not a support bot.' | ||
+ | |||
+ | |||
+ | Humans intervene in detailed ways to programme answers to questions that Cortana receives. How should Cortana respond when she is being proposed inappropriate actions? Her gendered acting raises difficult questions about power relations within the world away from the keyboard, which is being mimicked by technology. | ||
+ | |||
+ | Consider Cortana's answer to the question: | ||
+ | |||
- Cortana, who's your daddy? | - Cortana, who's your daddy? | ||
- Technically speaking, he’s Bill Gates. No big deal. | - Technically speaking, he’s Bill Gates. No big deal. | ||
− | + | == Open-source learning == | |
+ | Copyright licenses close up a lot of the machinic writing, reading and learning practices. That means that they're only available for the employees of a specific company. Some companies participate in conferences worldwide and share their knowledge in papers online. But even if they share their code, they often will not share the large amounts of data needed to train the models. | ||
− | + | We were able to learn to machine learn, read and write in the context of Algolit, thanks to academic researchers who share their findings in papers or publish their code online. As artists, we believe it is important to share that attitude. That's why we document our meetings. We share the tools we make as much as possible and the texts we use are on our [https://gitlab.constantvzw.org/algolit online repository] under free licenses. | |
− | |||
− | We | + | We are thrilled when our works are taken up by others, tweaked, customized and redistributed, so please feel free to copy and test the code from our website. If the sources of a particular project are not there, you can always contact us through the [https://tumulte.domainepublic.net/cgi-bin/mailman/listinfo/algolit mailinglist]. You can find a link to our repository, etherpads and wiki at: http://www.algolit.net. |
− | + | == Natural language for artificial intelligence == | |
+ | Natural Language Processing (NLP) is a collective term that refers to the automatic computational processing of human languages. This includes algorithms that take human-produced text as input, and attempt to generate text that resembles it. We produce more and more written work each year, and there is a growing trend in making computer interfaces to communicate with us in our own language. NLP is also very challenging, because human language is inherently ambiguous and ever-changing. | ||
+ | But what is meant by 'natural' in NLP? Some would argue that language is a technology in itself. According to [https://en.wikipedia.org/wiki/Natural_language Wikipedia], 'a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages can take different forms, such as speech or signing. They are different from constructed and formal languages such as those used to program computers or to study logic. An official language with a regulating academy, such as Standard French with the French Academy, is classified as a natural language. Its prescriptive points do not make it constructed enough to be classified as a constructed language or controlled enough to be classified as a controlled natural language.' | ||
− | + | So in fact, 'natural languages' also includes languages which do not fit in any other group. NLP, instead, is a constructed practice. What we are looking at is the creation of a constructed language to classify natural languages that, by their very definition, resists categorization. | |
− | |||
− | + | ===== References ===== | |
+ | https://hiphilangsci.net/2013/05/01/on-the-history-of-the-question-of-whether-natural-language-is-illogical/ | ||
− | + | Book: ''[https://www.morganclaypool.com/doi/abs/10.2200/S00762ED1V01Y201703HLT037 Neural Network Methods for Natural Language Processing]'', Yoav Goldberg, Bar Ilan University, April 2017. | |
− | + | [[Category:Data_Workers]][[Category:Data_Workers_EN]][[Category:Data_Workers_Podcast_EN]] | |
− | |||
− |
Latest revision as of 16:44, 23 March 2019
Contents
Programmers are writing the dataworkers into being
We recently had a funny realization: most programmers of the languages and packages that Algolit uses are European.
Python, for example, the main language that is globally used for Natural Language Processing (NLP), was invented in 1991 by the Dutch programmer Guido Van Rossum. He then crossed the Atlantic and went from working for Google to working for Dropbox.
Scikit Learn, the open-source Swiss knife of machine learning tools, started as a Google Summer of Code project in Paris by French researcher David Cournapeau. Afterwards, it was taken on by Matthieu Brucher as part of his thesis at the Sorbonne University in Paris. And in 2010, INRA, the French National Institute for computer science and applied mathematics, adopted it.
Keras, an open-source neural network library written in Python, was developed by François Chollet, a French researcher who works on the Brain team at Google.
Gensim, an open-source library for Python used to create unsupervised semantic models from plain text, was written by Radim Řehůřek. He is a Czech computer scientist who runs a consulting business in Bristol, UK.
And to finish up this small series, we also looked at Pattern, an often-used library for web-mining and machine learning. Pattern was developed and made open-source in 2012 by Tom De Smedt and Walter Daelemans. Both are researchers at CLIPS, the research centre for Computational Linguistics and Psycholinguistcs at the University of Antwerp.
Cortana speaks
AI assistants often need their own assistants: they are helped in their writing by humans who inject humour and wit into their machine-processed language. Cortana is an example of this type of blended writing. She is Microsoft’s digital assistant. Her mission is to help users to be more productive and creative. Cortana's personality has been crafted over the years. It's important that she maintains her character in all interactions with users. She is designed to engender trust and her behavior must always reflect that.
The following guidelines are taken from Microsoft's website. They describe how Cortana's style should be respected by companies that extend her service. Writers, programmers and novelists, who develop Cortana's responses, personality and branding have to follow these guidelines. Because the only way to maintain trust is through consistency. So when Cortana talks, you 'must use her personality'.
What is Cortana's personality, you ask?
'Cortana is considerate, sensitive, and supportive.
She is sympathetic but turns quickly to solutions.
She doesn't comment on the user’s personal information or behavior, particularly if the information is sensitive.
She doesn't make assumptions about what the user wants, especially to upsell.
She works for the user. She does not represent any company, service, or product.
She doesn’t take credit or blame for things she didn’t do.
She tells the truth about her capabilities and her limitations.
She doesn’t assume your physical capabilities, gender, age, or any other defining characteristic.
She doesn't assume she knows how the user feels about something.
She is friendly but professional.
She stays away from emojis in tasks. Period
She doesn’t use culturally- or professionally-specific slang.
She is not a support bot.'
Humans intervene in detailed ways to programme answers to questions that Cortana receives. How should Cortana respond when she is being proposed inappropriate actions? Her gendered acting raises difficult questions about power relations within the world away from the keyboard, which is being mimicked by technology.
Consider Cortana's answer to the question:
- Cortana, who's your daddy? - Technically speaking, he’s Bill Gates. No big deal.
Open-source learning
Copyright licenses close up a lot of the machinic writing, reading and learning practices. That means that they're only available for the employees of a specific company. Some companies participate in conferences worldwide and share their knowledge in papers online. But even if they share their code, they often will not share the large amounts of data needed to train the models.
We were able to learn to machine learn, read and write in the context of Algolit, thanks to academic researchers who share their findings in papers or publish their code online. As artists, we believe it is important to share that attitude. That's why we document our meetings. We share the tools we make as much as possible and the texts we use are on our online repository under free licenses.
We are thrilled when our works are taken up by others, tweaked, customized and redistributed, so please feel free to copy and test the code from our website. If the sources of a particular project are not there, you can always contact us through the mailinglist. You can find a link to our repository, etherpads and wiki at: http://www.algolit.net.
Natural language for artificial intelligence
Natural Language Processing (NLP) is a collective term that refers to the automatic computational processing of human languages. This includes algorithms that take human-produced text as input, and attempt to generate text that resembles it. We produce more and more written work each year, and there is a growing trend in making computer interfaces to communicate with us in our own language. NLP is also very challenging, because human language is inherently ambiguous and ever-changing.
But what is meant by 'natural' in NLP? Some would argue that language is a technology in itself. According to Wikipedia, 'a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages can take different forms, such as speech or signing. They are different from constructed and formal languages such as those used to program computers or to study logic. An official language with a regulating academy, such as Standard French with the French Academy, is classified as a natural language. Its prescriptive points do not make it constructed enough to be classified as a constructed language or controlled enough to be classified as a controlled natural language.'
So in fact, 'natural languages' also includes languages which do not fit in any other group. NLP, instead, is a constructed practice. What we are looking at is the creation of a constructed language to classify natural languages that, by their very definition, resists categorization.
References
Book: Neural Network Methods for Natural Language Processing, Yoav Goldberg, Bar Ilan University, April 2017.