Machine Learning – CodingRestart

Mona Lisa’s video

For centuries, people have wondered about Mona Lisa’s smile. Now they can stop wondering and just watch her videos.

A group of AI researchers published a paper titled “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models“, where they describe a new algorithm to generate videos of peoples’ heads (talking heads models). Methods to produce talking heads models using generative adversarial networks (GAN) were already published previously. GANs are essentially two neural networks combined into one system, where one NN is trained to produce samples and the second NN is trained to identify good examples.

However, the existing methods using GANs required long videos or large sets of photographs of each talking head to train GANs. The existing methods used various warping techniques; for an overview read the introduction in the “Facial Animation System Based on Image Warping Algorithm” study.

The above paper describes a new way of producing the talking heads using just a few training examples, possibly only a single photograph. Instead of warping, a direct synthesizing method is used. This method is called few-shot learning and relies on pre-trained models that were trained using a large number of videos of various people in different situations. In those models, a critical part of the training relies on the identification of face landmarks, like eyes, nose, mouth, and chin.

The results of the new research are summarized in a 5 minutes video that shows how the properly trained GAN can produce short talking head videos from still images. A talking head created from the Mona Lisa painting was particularly impressive because it was trained on several human models and the differences in three facial expressions are easily recognizable. The process of video synthesis of a certain person based on the face landmarks of a different person is called puppeteering.

Talking head videos could be combined with the latest NLP improvements that I described in an earlier post. This would create highly realistic fake videos and text. If you were concerned about the proliferation of deepfakes before reading this post, this will only heighten your fears. And how are the authors of the above described few-shot adversarial learning algorithm responding to those concerns? The statement below their YouTube video that I linked above states: “Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities.” Noble enough. But considering that researchers are from Russia and Russia’s proven track of meddling in recent US/EU elections, it is not far-fetched to assume that high-quality deepfakes will be common soon.

How soon? Let’s look at the progress of images generated by GANs over the past five years.

GANs results throughout the years. Please note that none of the above images is a real person. Courtesy of Gidi Shperber.

I’ll let you extrapolate the progress into the future.

Dangers of NLP

Natural language processing (NLP) continues its rapid advance, leading some people to fear its latest results.

The research organization OpenAI published a blog post titled “Better Language Models and Their Implications” summarizing its progress on “predicting the next word, given all of the previous words within some text”. OpenAI calls its latest model GPT-2. Some samples of generated texts are very high quality, while some texts show that future improvements are needed. The uniqueness of GPT-2 is the fact that it was not trained on domain-specific datasets as is typical for most NLP models. Rather, GPT-2 was trained on popular links from readers on Reddit, as measured by karma ratings on the outbound links. OpenAI scraped 40GB of text from the Internet to use as training and testing data.

Human text (first two lines) and GPT-2 generated text. Sample published in an OpenAI blog titled “Better Language Models”.

OpenAI was founded by some of the biggest names in the tech industry (Elon Musk and Sam Altman) to “freely collaborate” with others. But it is not only OpenAI that follows this ethos. The whole AI community has been built on the premise of sharing research and patents. The AI community was then surprised that OpenAI decided not to publish its full GPT-2 model. OpenAI claims that if GPT-2 was published, it could be easily misused, possibly by creating high-quality fake news. OpenAI decided to publish its smaller (and less accurate) versions of GPT-2 to the public.

However, the original paper “Language Models are Unsupervised Multitask Learners” from OpenAI may contain enough details to replicate the full GPT-2 model, given enough time and money. It is estimated that training GPT-2 costs $40K in computing resources.

An AI student from Germany claims he did just that by replicating GPT-2 with the data he scraped using the same methodology as OpenAI. He wrote a blog post explaining that he will release the full model in a few weeks – unless somebody provides arguments to sway him from publishing. One may dismiss it as a publicity stunt until one notices the GitHub repo containing the whole NLP pipeline. It includes Python code for loading the data and for training TensorFlow models at various quality. The author also publishes trained smaller size models that are not as accurate as the large model. The author is kind enough to include examples of his predictions for the same samples published by OpenAI. He claims that, unlike OpenAI’s blog, he does not cherry pick examples.

Text generated by non-OpenAI model described above using the same leading lines.

ML: MSDN and Xavier

Microsoft’s MSDN magazine for its development community has been publishing quite a few introductory articles to Machine Learning (ML) over the past few months. January’s issue emphasizes ML with another series of articles, albeit with differing qualities. I liked this quote from the editorial “Advancing AI“: ”ML is a huge graph that requires you to repeatedly examine topics, learning a bit more each time and understanding how topics are interrelated”. An interesting article “Introduction to PyTorch on Windows” is from James McCarthy and only highlights recent rumbles from Microsoft about switching from developing and using its own CNTK library to open source PyTorch, developed mainly by Facebook. In terms of activity, PyTorch statistics on GitHub are about half the activity for TensorFlow and both dwarf CNTK numbers. From this perspective, Microsoft made the right choice abandoning CNTK. For an example of PyTorch staying current, latest version of PyTorch introduces two modes for Python-based ML that enables just in time JIT compilation to improve PyTorch’s adaptability to production environments. Another article “Self-Organizing Maps Using C#” is about an ML technique that is not well known and its usability seems questionable. The third article “Leveraging the Beliefs-Desires-Intentions Agent Architecture” is poorly written and shamelessly plugs the author’s travel agency in the provided sample app.

For most problems in ML, learning is achieved by applying non-linear activation functions to hypotheses. This allows the algorithms to discover non-linearity in data and predict unseen data with greater clarity. Trusted sigmoid activation functions are being replaced by other activation functions and also initialization is being improved. In fact, the “Understanding the difficulty of training deep feedforward neural networks” study details replacing standard random initialization of Deep Learning (DL) networks with other initialization methods. DL networks are neural networks with high number of hidden layers. The study gets rather mathy half way through, but most of the article is digestible. The resulting initialization algorithm is now called Xavier initialization after the first author of the paper and is supported by all leading ML frameworks.