Actions

Contextual stories about Learners: Difference between revisions

From Algolit

(Created page with "== Naive Bayes & Viagra == [https://en.wikipedia.org/wiki/Naive_Bayes_classifier Naive Bayes] is a famous learner that performs well with little data. We apply it all the time...")
 
(A story about sweet peas)
 
(35 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
== Naive Bayes & Viagra ==
 
== Naive Bayes & Viagra ==
[https://en.wikipedia.org/wiki/Naive_Bayes_classifier Naive Bayes] is a famous learner that performs well with little data. We apply it all the time. Christian & Griffiths state in their book, '[http://algorithmstoliveby.com/ Algorithms to Live by]', that 'our days are full of small data'. Imagine for example you're standing at a bus stop in a foreign city. The other person who is standing there, has been waiting for 7 minutes. What do you do? Do you decide to wait? And if yes, for how long? When will you initiate other options? Another example. Imagine a friend asking advice on a relationship. He's been together with his new partner for 1 month. Should he invite the partner to join him at a family wedding?
+
[https://en.wikipedia.org/wiki/Naive_Bayes_classifier Naive Bayes] is a famous learner that performs well with little data. We apply it all the time. Christian and Griffiths state in their book, [http://algorithmstoliveby.com/ ''Algorithms To Live By''], that 'our days are full of small data'. Imagine, for example, that you're standing at a bus stop in a foreign city. The other person who is standing there has been waiting for 7 minutes. What do you do? Do you decide to wait? And if so, for how long? When will you initiate other options? Another example. Imagine a friend asking advice about a relationship. He's been together with his new partner for a month. Should he invite the partner to join him at a family wedding?
  
Having preexisting beliefs is crucial for Naive Bayes to work. The basic idea is that you calculate the probabilities based on prior knowledge and given a specific situation.
+
Having pre-existing beliefs is crucial for Naive Bayes to work. The basic idea is that you calculate the probabilities based on prior knowledge and given a specific situation.
  
The theorem was formulated during the 1740s by reverend and amateur mathematician [https://en.wikipedia.org/wiki/Thomas_Bayes Thomas Bayes]. He dedicated his life to solving the question of how to win the lottery. But Bayes' rule was only made famous and known as it is today by the mathematician [https://en.wikipedia.org/wiki/Pierre-Simon_Laplace Pierre Simon Laplace] in France a bit later in the same century. For a long time after La Place's death, the theory sunk to oblivion until it was dug out again during the Second World War in an effort to break the Enigma code.
+
The theorem was formulated during the 1740s by [https://en.wikipedia.org/wiki/Thomas_Bayes Thomas Bayes], a reverend and amateur mathematician. He dedicated his life to solving the question of how to win the lottery. But Bayes' rule was only made famous and known as it is today by the mathematician [https://en.wikipedia.org/wiki/Pierre-Simon_Laplace Pierre Simon Laplace] in France a bit later in the same century. For a long time after La Place's death, the theory sank into oblivion until it was dug up again during the Second World War in an effort to break the Enigma code.
  
Most people today have come in contact with Naive Bayes through their email spam folders. Naive Bayes is a widely used algorithm for spam detection. It is by coincidence that, for example, Viagra, the erectile dysfunction drug was approved by the US Food & Drug Administration in 1997, around the same time as about 10 million users worldwide had free web mail accounts. The selling companies were very smart to make use of massive email advertising: it was an intimate medium, at the time reserved for private communication. In 2001, the first [https://spamassassin.apache.org/ SpamAssasin] programme relying on Naive Bayes was uploaded to [https://sourceforge.net/ SourceForge], cutting down on guerilla email marketing.
+
Most people today have come in contact with Naive Bayes through their email spam folders. Naive Bayes is a widely used algorithm for spam detection. It is by coincidence that Viagra, the erectile dysfunction drug, was approved by the US Food & Drug Administration in 1997, around the same time as about 10 million users worldwide had made free webmail accounts. The selling companies were among the first to make use of email as a medium for advertising: it was an intimate space, at the time reserved for private communication, for an intimate product. In 2001, the first [https://spamassassin.apache.org/ SpamAssasin] programme relying on Naive Bayes was uploaded to [https://sourceforge.net/ SourceForge], cutting down on guerilla email marketing.
  
 
===== Reference =====
 
===== Reference =====
Machine Learners, by Adrian MacKenzie, The MIT Press, Cambridge, US, November 2017.
+
''Machine Learners'', by Adrian MacKenzie, MIT Press, Cambridge, US, November 2017.
 
 
  
 
== Naive Bayes & Enigma ==
 
== Naive Bayes & Enigma ==
This story about Naive Bayes is taken from the book: '[https://yalebooks.yale.edu/book/9780300188226/theory-would-not-die The theory that would not die]', written by Sharon Bertsch McGrayne. Amongst other things, she describes how Naive Bayes was soon forgotten after the death of [https://en.wikipedia.org/wiki/Pierre-Simon_Laplace Pierre Simon Laplace], its inventor. The mathematician was said to have failed to credit the works of others. Therefore he suffered widely circulated charges against his reputation. Only after 150 years the accusation was taken back, because it was untrue.
+
This story about Naive Bayes is taken from the book '[https://yalebooks.yale.edu/book/9780300188226/theory-would-not-die ''The Theory That Would Not Die'']', written by Sharon Bertsch McGrayne. Among other things, she describes how Naive Bayes was soon forgotten after the death of [https://en.wikipedia.org/wiki/Pierre-Simon_Laplace Pierre Simon Laplace], its inventor. The mathematician was said to have failed to credit the works of others. Therefore, he suffered widely circulated charges against his reputation. Only after 150 years was the accusation refuted.
  
Fast forward to 1939 when Bayes' rule was still virtually taboo, dead and buried in the field of statistics. When France was occupied in 1940 by Germany, who controlled Europe's factories and farms, Winston Churchill's biggest worry was the U-boat peril. The U-boat operations were tightly controlled by German headquarters in France. Each submarine went to sea without orders and received them as coded radio messages after it was well out into the Atlantic. The messages were encrypted by word scrambling machines, called Enigma machines. [https://en.wikipedia.org/wiki/Enigma_machine Enigma] looked like a complicated typewriter. It was invented by the German firm Scherbius & Ritter after the First World War, when the need for message encoding machines had become painfully obvious.  
+
Fast forward to 1939, when Bayes' rule was still virtually taboo, dead and buried in the field of statistics. When France was occupied in 1940 by Germany, which controlled Europe's factories and farms, Winston Churchill's biggest worry was the U-boat peril. U-boat operations were tightly controlled by German headquarters in France. Each submarine received orders as coded radio messages long after it was out in the Atlantic. The messages were encrypted by word-scrambling machines, called Enigma machines. [https://en.wikipedia.org/wiki/Enigma_machine Enigma] looked like a complicated typewriter. It was invented by the German firm Scherbius & Ritter after the First World War, when the need for message-encoding machines had become painfully obvious.  
  
Interestingly, and luckily for Naive Bayes and the world, at that time, the British government and educational systems saw applied mathematics and statistics as largely irrelevant to practical problem solving So the British agency charged with cracking German miltary codes mainly hired men with linguistic skills. Statistical data was seen as bothersome because of its detail-oriented nature. So wartime data was often analyzed not by statisticians, but by biologists, physicists, and theoretical mathematicians. None of them knew that as far as sophisticated statistics was concerned, the Bayes rule was considered to be unscientific. Their ignorance proved fortunate.  
+
Interestingly, and luckily for Naive Bayes and the world, at that time, the British government and educational systems saw applied mathematics and statistics as largely irrelevant to practical problem-solving. So the British agency charged with cracking German military codes mainly hired men with linguistic skills. Statistical data was seen as bothersome because of its detail-oriented nature. So wartime data was often analysed not by statisticians, but by biologists, physicists, and theoretical mathematicians. None of them knew that the Bayes rule was considered to be unscientific in the field of statistics. Their ignorance proved fortunate.  
  
It was the now famous [https://en.wikipedia.org/wiki/Alan_Turing Alan Turing], a mathematician, computer scientist, logician, cryptanalyst, philosopher and theoretical biologist, who used Bayes' rules probabilities system to design the 'bombe'. This was a high-speed electromechanical machine for testing every possible arrangement that an Enigma machine would produce. In order to crack the naval codes of the U-boats, Turing simplified the 'bombe' system using Baysian methods. The 'bombe' turned the UK headquartes into a code-breaking factory. The story is well illustrated in a non-technical way in '[https://www.imdb.com/title/tt2084970/ The Imitation Game]', a film by Morten Tyldum of 2014.
+
It was the now famous [https://en.wikipedia.org/wiki/Alan_Turing Alan Turing] a mathematician, computer scientist, logician, cryptoanalyst, philosopher and theoretical biologist who used Bayes' rules probabilities system to design the 'bombe'. This was a high-speed electromechanical machine for testing every possible arrangement that an Enigma machine would produce. In order to crack the naval codes of the U-boats, Turing simplified the 'bombe' system using Baysian methods. It turned the UK headquarters into a code-breaking factory. The story is well illustrated in [https://www.imdb.com/title/tt2084970/ ''The Imitation Game''], a film by Morten Tyldum dating from 2014.
  
 +
== A story about sweet peas ==
 +
Throughout history, some models have been invented by people with ideologies that are not to our liking. The idea of regression stems from Sir [https://en.wikipedia.org/wiki/Francis_Galton Francis Galton], an influential nineteenth-century scientist. He spent his life studying the problem of heredity – understanding how strongly the characteristics of one generation of living beings manifested themselves in the following generation. He established the field of eugenics, defining it as ‘the study of agencies under social control that may improve or impair the racial qualities of future generations, either physically or mentally'. On Wikipedia, Galton is a prime example of scientific racism.
 +
Galton initially approached the problem of heredity by examining characteristics of the sweet pea plant. He chose this plant because the species can self-fertilize. Daughter plants inherit genetic variations from mother plants without a contribution from a second parent. This characteristic eliminates having to deal with multiple sources.
  
== A story on sweet peas ==
+
Galton's research was appreciated by many intellectuals of his time. In 1869, in [http://galton.org/books/hereditary-genius/text/pdf/galton-1869-genius-v4.pdf ''Hereditary Genius''], Galton claimed that genius is mainly a matter of ancestry and he believed that there was a biological explanation for social inequality across races. Galton even influenced his half-cousin [https://en.wikipedia.org/wiki/Charles_Darwin Charles Darwin] with his ideas. After reading Galton's paper, Darwin stated, 'You have made a convert of an opponent in one sense for I have always maintained that, excepting fools, men did not differ much in intellect, only in zeal and hard work'. Luckily, the modern study of heredity managed to eliminate the myth of race-based genetic difference, something Galton tried hard to maintain.  
In statistics, linear regression is a supervised learning method. After training with labeled data, the model tries to predict values for new unknown data. Linear Regression allows us to summarize and study relationships between two elements, to see whether there exists a correlation between them. If there is a positive correlation, the knowledge of one element helps to predict the other. For example, given a movie review, we can predict the average number of stars assigned to it, rather than just saying if the review is positive or negative.
 
  
Sometimes the figures we encounter while scratching under the surface are not to our liking. The idea of regression stems from Sir [https://en.wikipedia.org/wiki/Francis_Galton Francis Galton], an influential 19th Century scientist. He spent his life studying the problem of heredity – understanding how strongly the characteristics of one generation of living beings manifested in the following generation. He established the field of eugenics, and defined it as ‘the study of agencies under social control that may improve or impair the racial qualities of future generations, either physically or mentally.’ Therefore, his name has forever marked the history and legacy of scientific racism.
+
Galton's major contribution to the field was linear regression analysis, laying the groundwork for much of modern statistics. While we engage with the field of machine learning, Algolit tries not to forget that ordering systems hold power, and that this power has not always been used to the benefit of everyone. Machine learning has inherited many aspects of statistical research, some less agreeable than others. We need to be attentive, because these world views do seep into the algorithmic models that create new orders.
 
 
Galton initially approached the problem of heredity by examining characteristics of the sweet pea plant. He chose the sweet pea because the species can self-fertilize. Daughter plants inherit genetic variations from mother plants without a contribution from a second parent. This characteristic eliminates having to deal with multiple sources.
 
 
 
In 1875, Galton distributed packets of sweet pea seeds to seven friends. Each friend received seeds of uniform weight, but there was substantial variation across different packets. Galton’s friends harvested seeds from the new generations of plants and returned them to him. He then plotted the weights of the daughter seeds against the weights of the mother seeds. He discovered that the median weights of daughter seeds from a particular size of mother seed approximately described a straight line with positive slope less than 1.0. Galton’s first insights about regression sprang from this two-dimensional diagram plotting the sizes of daughter peas against the sizes of mother peas. He used this representation of his data to illustrate basic foundations of what statisticians still call regression today. For Galton, it was also a way to describe the benefits of eugenics.
 
 
 
Galton's research was appreciated by many intellectuals of his time. In 1869, in '[http://galton.org/books/hereditary-genius/text/pdf/galton-1869-genius-v4.pdf Hereditary Genius]', Galton claimed that genius is mainly a matter of ancestry. He falsely believed that there was a biological explanation for social inequality across races. Galton even persuaded his half-cousin [https://en.wikipedia.org/wiki/Charles_Darwin Charles Darwin] of his ideas. After reading Galton's paper, Darwin stated, "You have made a convert of an opponent in one sense for I have always maintained that, excepting fools, men did not differ much in intellect, only in zeal and hard work." Luckily, the modern study of heredity managed to eliminate the myth of racially-based genetic difference, something Galton tried so hard to maintain.
 
 
 
The reason why we bring him up in this series, is that he was among the first scientists to use statistical methods in his research. His major contribution to the field was linear regression analysis, laying the groundwork for much of modern statistical modelling. While we engage with the field of machine learning, Algolit tries not to forget that ordering systems hold power, and that this power has not really been wielded for everyone. Machine learning has inherited many aspects of statistical research, some less agreeable than others. We need to be wary, because these worldviews do seep into the algorithmic models that create new order and orders.
 
  
 +
===== References =====
 +
http://galton.org/letters/darwin/correspondence.htm
 +
https://www.tandfonline.com/doi/full/10.1080/10691898.2001.11910537
 +
http://www.paramoulipist.be/?p=1693
  
 
== Perceptron ==
 
== Perceptron ==
We find ourselves in a decade in which neural networks are sparking a lot of attention. This was not always the case. The study of neural networks goes back to the 1940s, when the first neuron metaphor emerged. The neuron is not the only biological reference in the field of machine learning - think of the word corpus or training. The artificial neuron was constructed in strong connection to its biological counterpart.  
+
We find ourselves in a moment in time in which neural networks are sparking a lot of attention. But they have been in the spotlight before. The study of neural networks goes back to the 1940s, when the first neuron metaphor emerged. The neuron is not the only biological reference in the field of machine learning - think of the word corpus or training. The artificial neuron was constructed in close connection to its biological counterpart.  
  
Psychologist [https://en.wikipedia.org/wiki/Frank_Rosenblatt Frank Rosenblatt] was inspired by fellow psychologist [https://en.wikipedia.org/wiki/Donald_O._Hebb Donald Hebb]'s work on the role of neurons in human learning. Hebb stated that "cells that fire together wire together." His theory now lies at the basis of associative human learning, but also unsupervised neural network learning. It moved Rosenblatt to expand on the idea of the artificial neuron.  
+
Psychologist [https://en.wikipedia.org/wiki/Frank_Rosenblatt Frank Rosenblatt] was inspired by fellow psychologist [https://en.wikipedia.org/wiki/Donald_O._Hebb Donald Hebb]'s work on the role of neurons in human learning. Hebb stated that 'cells that fire together wire together'. His theory now lies at the basis of associative human learning, but also unsupervised neural network learning. It moved Rosenblatt to expand on the idea of the artificial neuron.  
  
In 1962 he created the Perceptron. The perceptron is a model that learns through the weighting of inputs. It was set aside by following researchers, because it can only handle binary classification. This means that the data has to be linearly separable, as for example, men and women, black and white. It is clear that this type of data is very rare in the real world. When the so-called first AI winter arrived in 1974–1980 and the funding that went into this research decreased, the Perceptron was also neglected. For 10 years it stayed dormant. When the spring settled in, new researcher generations picked it up again and used it to construct neural networks. These contain multiple layers of perceptrons. That is how neural networks saw the light. One could say that this machine learning season is particularly warm, but it takes another winter to know a summer.
+
In 1962, he created the Perceptron, a model that learns through the weighting of inputs. It was set aside by the next generation of researchers, because it can only handle binary classification. This means that the data has to be clearly separable, as for example, men and women, black and white. It is clear that this type of data is very rare in the real world. When the so-called first AI winter arrived in the 1970s and the funding decreased, the Perceptron was also neglected. For ten years it stayed dormant. When spring settled at the end of the 1980s, a new generation of researchers picked it up again and used it to construct neural networks. These contain multiple layers of Perceptrons. That is how neural networks saw the light. One could say that the current machine learning season is particularly warm, but it takes another winter to know a summer.
  
 +
== BERT ==
 +
Some online articles say that the year 2018 marked a turning point for the field of Natural Language Processing (NLP). A series of deep-learning models achieved state-of-the-art results on tasks like question-answering or sentiment-classification. Google’s BERT algorithm entered the machine learning competitions of last year as a sort of 'one model to rule them all'. It showed a superior performance over a wide variety of tasks.
  
 +
BERT is pre-trained; its weights are learned in advance through two unsupervised tasks. This means BERT doesn’t need to be trained from scratch for each new task. You only have to finetune its weights. This also means that a programmer wanting to use BERT, does not know any longer what parameters BERT is tuned to, nor what data it has seen to learn its performances.
  
== BERT ==
+
BERT stands for Bidirectional Encoder Representations from Transformers. This means that BERT allows for bidirectional training. The model learns the context of a word based on all of its surroundings, left and right of a word. As such, it can differentiate between 'I accessed the bank account' and 'I accessed the bank of the river'.  
Some online articles say the year 2018 marked a turning point for the field of Natural Language Processing. A series of deep-learning models achieved state-of-the-art results on tasks like question answering or sentiment classification. Google’s BERT algorithm entered the machine learning competitions of last year as a sort of “one model to rule them all.” It shows a superior performance over a wide variety of tasks.
 
BERT is pre-trained; its weights are learned in advance through two unsupervised tasks. This means BERT doesn’t need to be trained from scratch for each new task. You only have to finetune its weights.
 
This also means that a programmer wanting to use BERT, does not know any longer what parameters BERT is tuned to, nor what data it has seen to learn its performances.
 
BERT stand for Bidirectional Encoder Representations from Transformers. This means that BERT allows for bidirectional training. The model learns the context of a word based on all of its surroundings, left and right of a word. As such, it can differenciate between 'I accessed the bank account' and 'I accessed the bank of the river'.  
 
  
 
Some facts:
 
Some facts:
 +
- BERT_large, with 345 million parameters, is the largest model of its kind. It is demonstrably superior on small-scale tasks to BERT_base, which uses the same architecture with 'only' 110 million parameters.
 +
- to run BERT you need to use TPUs. These are the Google's processors (CPUs) especially engineered for TensorFLow, the deep-learning platform. TPU's renting rates range from $8/hr till $394/hr. Algolit doesn't want to work with off-the-shelf packages, we are interested in opening up the blackbox. In that case, BERT asks for quite some savings in order to be used.
  
* BERT_large, with 345 million parameters, is the largest model of its kind. It is demonstrably superior on small-scale tasks to BERT_base, which uses the same architecture with “only” 110 million parameters.
 
* to run BERT you need to use TPU's. These are the Google's CPU's especially engineered for TensorFLow, the deep learning platform. TPU's renting rates from 8$/h till 394$/h. If you don't want to work with off-the-shelf-packages, as we do with Algolit, but are interested in opening the blackbox, BERT asks for quite some savings in order to be used.
 
  
===== References =====
+
[[Category:Data_Workers]][[Category:Data_Workers_EN]][[Category:Data_Workers_Podcast_EN]]
* https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html
 
* https://towardsdatascience.com/deconstructing-bert-distilling-6-patterns-from-100-million-parameters-b49113672f77
 

Latest revision as of 16:09, 23 March 2019

Naive Bayes & Viagra

Naive Bayes is a famous learner that performs well with little data. We apply it all the time. Christian and Griffiths state in their book, Algorithms To Live By, that 'our days are full of small data'. Imagine, for example, that you're standing at a bus stop in a foreign city. The other person who is standing there has been waiting for 7 minutes. What do you do? Do you decide to wait? And if so, for how long? When will you initiate other options? Another example. Imagine a friend asking advice about a relationship. He's been together with his new partner for a month. Should he invite the partner to join him at a family wedding?

Having pre-existing beliefs is crucial for Naive Bayes to work. The basic idea is that you calculate the probabilities based on prior knowledge and given a specific situation.

The theorem was formulated during the 1740s by Thomas Bayes, a reverend and amateur mathematician. He dedicated his life to solving the question of how to win the lottery. But Bayes' rule was only made famous and known as it is today by the mathematician Pierre Simon Laplace in France a bit later in the same century. For a long time after La Place's death, the theory sank into oblivion until it was dug up again during the Second World War in an effort to break the Enigma code.

Most people today have come in contact with Naive Bayes through their email spam folders. Naive Bayes is a widely used algorithm for spam detection. It is by coincidence that Viagra, the erectile dysfunction drug, was approved by the US Food & Drug Administration in 1997, around the same time as about 10 million users worldwide had made free webmail accounts. The selling companies were among the first to make use of email as a medium for advertising: it was an intimate space, at the time reserved for private communication, for an intimate product. In 2001, the first SpamAssasin programme relying on Naive Bayes was uploaded to SourceForge, cutting down on guerilla email marketing.

Reference

Machine Learners, by Adrian MacKenzie, MIT Press, Cambridge, US, November 2017.

Naive Bayes & Enigma

This story about Naive Bayes is taken from the book 'The Theory That Would Not Die', written by Sharon Bertsch McGrayne. Among other things, she describes how Naive Bayes was soon forgotten after the death of Pierre Simon Laplace, its inventor. The mathematician was said to have failed to credit the works of others. Therefore, he suffered widely circulated charges against his reputation. Only after 150 years was the accusation refuted.

Fast forward to 1939, when Bayes' rule was still virtually taboo, dead and buried in the field of statistics. When France was occupied in 1940 by Germany, which controlled Europe's factories and farms, Winston Churchill's biggest worry was the U-boat peril. U-boat operations were tightly controlled by German headquarters in France. Each submarine received orders as coded radio messages long after it was out in the Atlantic. The messages were encrypted by word-scrambling machines, called Enigma machines. Enigma looked like a complicated typewriter. It was invented by the German firm Scherbius & Ritter after the First World War, when the need for message-encoding machines had become painfully obvious.

Interestingly, and luckily for Naive Bayes and the world, at that time, the British government and educational systems saw applied mathematics and statistics as largely irrelevant to practical problem-solving. So the British agency charged with cracking German military codes mainly hired men with linguistic skills. Statistical data was seen as bothersome because of its detail-oriented nature. So wartime data was often analysed not by statisticians, but by biologists, physicists, and theoretical mathematicians. None of them knew that the Bayes rule was considered to be unscientific in the field of statistics. Their ignorance proved fortunate.

It was the now famous Alan Turing – a mathematician, computer scientist, logician, cryptoanalyst, philosopher and theoretical biologist – who used Bayes' rules probabilities system to design the 'bombe'. This was a high-speed electromechanical machine for testing every possible arrangement that an Enigma machine would produce. In order to crack the naval codes of the U-boats, Turing simplified the 'bombe' system using Baysian methods. It turned the UK headquarters into a code-breaking factory. The story is well illustrated in The Imitation Game, a film by Morten Tyldum dating from 2014.

A story about sweet peas

Throughout history, some models have been invented by people with ideologies that are not to our liking. The idea of regression stems from Sir Francis Galton, an influential nineteenth-century scientist. He spent his life studying the problem of heredity – understanding how strongly the characteristics of one generation of living beings manifested themselves in the following generation. He established the field of eugenics, defining it as ‘the study of agencies under social control that may improve or impair the racial qualities of future generations, either physically or mentally'. On Wikipedia, Galton is a prime example of scientific racism. Galton initially approached the problem of heredity by examining characteristics of the sweet pea plant. He chose this plant because the species can self-fertilize. Daughter plants inherit genetic variations from mother plants without a contribution from a second parent. This characteristic eliminates having to deal with multiple sources.

Galton's research was appreciated by many intellectuals of his time. In 1869, in Hereditary Genius, Galton claimed that genius is mainly a matter of ancestry and he believed that there was a biological explanation for social inequality across races. Galton even influenced his half-cousin Charles Darwin with his ideas. After reading Galton's paper, Darwin stated, 'You have made a convert of an opponent in one sense for I have always maintained that, excepting fools, men did not differ much in intellect, only in zeal and hard work'. Luckily, the modern study of heredity managed to eliminate the myth of race-based genetic difference, something Galton tried hard to maintain.

Galton's major contribution to the field was linear regression analysis, laying the groundwork for much of modern statistics. While we engage with the field of machine learning, Algolit tries not to forget that ordering systems hold power, and that this power has not always been used to the benefit of everyone. Machine learning has inherited many aspects of statistical research, some less agreeable than others. We need to be attentive, because these world views do seep into the algorithmic models that create new orders.

References

http://galton.org/letters/darwin/correspondence.htm https://www.tandfonline.com/doi/full/10.1080/10691898.2001.11910537 http://www.paramoulipist.be/?p=1693

Perceptron

We find ourselves in a moment in time in which neural networks are sparking a lot of attention. But they have been in the spotlight before. The study of neural networks goes back to the 1940s, when the first neuron metaphor emerged. The neuron is not the only biological reference in the field of machine learning - think of the word corpus or training. The artificial neuron was constructed in close connection to its biological counterpart.

Psychologist Frank Rosenblatt was inspired by fellow psychologist Donald Hebb's work on the role of neurons in human learning. Hebb stated that 'cells that fire together wire together'. His theory now lies at the basis of associative human learning, but also unsupervised neural network learning. It moved Rosenblatt to expand on the idea of the artificial neuron.

In 1962, he created the Perceptron, a model that learns through the weighting of inputs. It was set aside by the next generation of researchers, because it can only handle binary classification. This means that the data has to be clearly separable, as for example, men and women, black and white. It is clear that this type of data is very rare in the real world. When the so-called first AI winter arrived in the 1970s and the funding decreased, the Perceptron was also neglected. For ten years it stayed dormant. When spring settled at the end of the 1980s, a new generation of researchers picked it up again and used it to construct neural networks. These contain multiple layers of Perceptrons. That is how neural networks saw the light. One could say that the current machine learning season is particularly warm, but it takes another winter to know a summer.

BERT

Some online articles say that the year 2018 marked a turning point for the field of Natural Language Processing (NLP). A series of deep-learning models achieved state-of-the-art results on tasks like question-answering or sentiment-classification. Google’s BERT algorithm entered the machine learning competitions of last year as a sort of 'one model to rule them all'. It showed a superior performance over a wide variety of tasks.

BERT is pre-trained; its weights are learned in advance through two unsupervised tasks. This means BERT doesn’t need to be trained from scratch for each new task. You only have to finetune its weights. This also means that a programmer wanting to use BERT, does not know any longer what parameters BERT is tuned to, nor what data it has seen to learn its performances.

BERT stands for Bidirectional Encoder Representations from Transformers. This means that BERT allows for bidirectional training. The model learns the context of a word based on all of its surroundings, left and right of a word. As such, it can differentiate between 'I accessed the bank account' and 'I accessed the bank of the river'.

Some facts: - BERT_large, with 345 million parameters, is the largest model of its kind. It is demonstrably superior on small-scale tasks to BERT_base, which uses the same architecture with 'only' 110 million parameters. - to run BERT you need to use TPUs. These are the Google's processors (CPUs) especially engineered for TensorFLow, the deep-learning platform. TPU's renting rates range from $8/hr till $394/hr. Algolit doesn't want to work with off-the-shelf packages, we are interested in opening up the blackbox. In that case, BERT asks for quite some savings in order to be used.