Viktor Sanek

Mona Lisa’s video

For centuries, people have wondered about Mona Lisa’s smile. Now they can stop wondering and just watch her videos.

A group of AI researchers published a paper titled “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models“, where they describe a new algorithm to generate videos of peoples’ heads (talking heads models). Methods to produce talking heads models using generative adversarial networks (GAN) were already published previously. GANs are essentially two neural networks combined into one system, where one NN is trained to produce samples and the second NN is trained to identify good examples.

However, the existing methods using GANs required long videos or large sets of photographs of each talking head to train GANs. The existing methods used various warping techniques; for an overview read the introduction in the “Facial Animation System Based on Image Warping Algorithm” study.

The above paper describes a new way of producing the talking heads using just a few training examples, possibly only a single photograph. Instead of warping, a direct synthesizing method is used. This method is called few-shot learning and relies on pre-trained models that were trained using a large number of videos of various people in different situations. In those models, a critical part of the training relies on the identification of face landmarks, like eyes, nose, mouth, and chin.

The results of the new research are summarized in a 5 minutes video that shows how the properly trained GAN can produce short talking head videos from still images. A talking head created from the Mona Lisa painting was particularly impressive because it was trained on several human models and the differences in three facial expressions are easily recognizable. The process of video synthesis of a certain person based on the face landmarks of a different person is called puppeteering.

Talking head videos could be combined with the latest NLP improvements that I described in an earlier post. This would create highly realistic fake videos and text. If you were concerned about the proliferation of deepfakes before reading this post, this will only heighten your fears. And how are the authors of the above described few-shot adversarial learning algorithm responding to those concerns? The statement below their YouTube video that I linked above states: “Shifting a part of human life-like communication to the virtual and augmented worlds will have several positive effects. It will lead to a reduction in long-distance travel and short-distance commute. It will democratize education, and improve the quality of life for people with disabilities.” Noble enough. But considering that researchers are from Russia and Russia’s proven track of meddling in recent US/EU elections, it is not far-fetched to assume that high-quality deepfakes will be common soon.

How soon? Let’s look at the progress of images generated by GANs over the past five years.

GANs results throughout the years. Please note that none of the above images is a real person. Courtesy of Gidi Shperber.

I’ll let you extrapolate the progress into the future.

Dangers of NLP

Natural language processing (NLP) continues its rapid advance, leading some people to fear its latest results.

The research organization OpenAI published a blog post titled “Better Language Models and Their Implications” summarizing its progress on “predicting the next word, given all of the previous words within some text”. OpenAI calls its latest model GPT-2. Some samples of generated texts are very high quality, while some texts show that future improvements are needed. The uniqueness of GPT-2 is the fact that it was not trained on domain-specific datasets as is typical for most NLP models. Rather, GPT-2 was trained on popular links from readers on Reddit, as measured by karma ratings on the outbound links. OpenAI scraped 40GB of text from the Internet to use as training and testing data.

Human text (first two lines) and GPT-2 generated text. Sample published in an OpenAI blog titled “Better Language Models”.

OpenAI was founded by some of the biggest names in the tech industry (Elon Musk and Sam Altman) to “freely collaborate” with others. But it is not only OpenAI that follows this ethos. The whole AI community has been built on the premise of sharing research and patents. The AI community was then surprised that OpenAI decided not to publish its full GPT-2 model. OpenAI claims that if GPT-2 was published, it could be easily misused, possibly by creating high-quality fake news. OpenAI decided to publish its smaller (and less accurate) versions of GPT-2 to the public.

However, the original paper “Language Models are Unsupervised Multitask Learners” from OpenAI may contain enough details to replicate the full GPT-2 model, given enough time and money. It is estimated that training GPT-2 costs $40K in computing resources.

An AI student from Germany claims he did just that by replicating GPT-2 with the data he scraped using the same methodology as OpenAI. He wrote a blog post explaining that he will release the full model in a few weeks – unless somebody provides arguments to sway him from publishing. One may dismiss it as a publicity stunt until one notices the GitHub repo containing the whole NLP pipeline. It includes Python code for loading the data and for training TensorFlow models at various quality. The author also publishes trained smaller size models that are not as accurate as the large model. The author is kind enough to include examples of his predictions for the same samples published by OpenAI. He claims that, unlike OpenAI’s blog, he does not cherry pick examples.

Text generated by non-OpenAI model described above using the same leading lines.

Anki: Best Practices

This post concludes the series of posts on Anki and follows the post
introducing Anki and another one about the theory behind Anki.

I have been using Anki for five years and achieved high consistency in completing daily reviews. Here are my best practices:

Be selective about cards you create in Anki: My goal is to learn the information on my Anki cards for a very long time, possibly for life.
Do not import cards created by others: A big part of Anki is the mental connection to the cards you create because you invest time in their creation.
Make cards atomic: When cards contain too much information it is likely you will have problems remembering the cards. Anki tracks the history of your answers and flags the cards that you repeatedly answer incorrectly as leeches. Anki removes these cards from reviewed cards and they become suspended.
Do not give in to leeches: Decide whether you still wish to remember the information on a leech (see point 1 above). If you do, make it easier to remember, possibly by rewriting the card, so it is more atomic using cloze deletion.
Spend time maintaining your Anki database: As you review cards, mark the ones that have issues, like cards containing misspelled words or confusing format. Review the marked cards at least weekly to correct them.
Vary the environment where you study: Anki has applications for computers (Windows, MacOS, Linux) and mobile devices (iOS, Android), making it easy to review them anywhere. The more diverse environments you use to study, the easier it is to recall the information from them in real life where you need them. Anki synchronizes your notes between your devices, so do not worry about losing your progress.
If you want to explore more suggestions, there is quite a famous list of 20 rules for effective learning. The list is written for SuperMemo, a product similar to Anki, but the rules are generic and apply to Anki as well.

It is easy to modify and extend the Anki’s functionality using add-ons written in Python. You can view a list of all available add-ons here. The following is a list of add-ons that I use:

AutoDefine: Automatically retrieves the definition of an English word from the Merriam-Webster dictionary and populates the Anki card, optionally offering to include a corresponding image retrieved from Google. To learn how to use this add-on, please watch this tutorial. The author of this add-on is my son, who introduced me to Anki years ago.
Syntax Highlighting for Code: Inserts syntax-highlighted code snippets into your notes. Syntax of many programming languages is supported.
Hierarchical Tags: Using tags in Anki allows users to divide notes according to topics they belong to. Install this add-on to improve the usage of tags.

Anki is a wonderful tool that you can use to reap the benefits of the best memory augmenting techniques available. However, it is not the most user-friendly software. One of the areas that could be improved on is configuring Anki. There are many complicated manual settings that users can set to modify the default functionality of Anki. These manual settings could be replaced by a machine learning algorithm based on a previous history of how the user answered the questions.

In closing, I hope that this series was useful to you. If you have any comments or questions about Anki, please write them in the comment field below. You can also search the manual or visit the vibrant Reddit community. Happy learning!

Theory Behind Anki

This post covers the theory behind Anki and follows the previous post introducing Anki. The last post in this series describes best practices for using Anki.

One of the best meta-analyses reviewing best approaches to learning is Improving Students’ Learning With Effective Learning Techniques by Dunlosky et al. If I were to summarize its 50+ pages in one paragraph, it would be: The most utility for improved learning is provided by distributed practice and practice testing, followed by interleaved practice, elaborative interrogation, and self-explanation. In contrast, popular techniques such as highlighting and rereading provide only low utility for learning. The meta-analysis was done in the context of students’ learning, but the top techniques were shown to benefit learners across varying ages and abilities. This makes these techniques applicable to lifetime learning.

Anki implements the above mentioned top two techniques directly and provides a way for the third most useful practice. Let’s examine these three techniques in more detail:

practice testing (a.k.a. active recall): memory needs to be actively stimulated during the learning process, which is what Anki does by showing questions that the user must answer.
distributed practice (a.k.a. spaced learning): learning is broken up into several short sessions over a long time. Anki schedules flashcards for review based on an algorithm over a long period.
interleaved practice: mixing multiple subjects or topics while studying improves learning. Anki cards are organized in decks and Anki encourages users to limit the number of decks being reviewed. Since decks are reviewed in succession, ideally all the Anki cards are within a single deck to ensure that knowledge from different subjects is reviewed at the same time.

Another great source of the theory behind Anki is a lengthy article by Michael Nielsen titled “Augmenting Long-Term Memory,” where the author lists many historical underpinnings for memory augmentation. It is surprising that as early as in 1885, some scientists studied memory decay, see “Memory: A Contribution to Experimental Psychology” by Hermann Ebbinghaus. For a modern take on memory decay, read “A Trainable Spaced Repetition Model for Language Learning” that describes an algorithm used in Duolingo, an excellent app for learning foreign languages.

At the end of his article, Michael Nielsen goes through a rough estimate of the effort needed to remember one piece of information. It assumes that the user takes just a few seconds to recall a fact listed on an Anki card when it is scheduled. If her effort is spaced over twenty years in ever increasing intervals, the total time spent on the card is about 5 minutes. This sounds like a decent trade-off: spending a total of 5 minutes over twenty years will result in remembering one particular flash card. In other words, it will take an average of only 7 minutes daily to remember 10,000 flashcards for twenty years!

Anki: Introduction

This is the first article in a series of posts about Anki, a spaced learning system to improve users’ long term memory. The next article in this series will cover the theoretical background of spaced learning, and the final article will concentrate on best practices using Anki.

I have been using Anki software for years to learn information that is important for me to remember for a long time. Such information can be defined very broadly and can include almost anything, from a penny test to random forest and The Starry Night.

I have almost ten thousand individual pieces of information in Anki and review them daily in about half an hour. I often hear comments like “There is no way you can review this amount of information daily!”, so let me explain.

Anki is a system to store flashcards, where individual cards are scheduled for review at various intervals determined by an algorithm. The user reads a question on one side of the flash card and answers to herself. She then compares her answer to the correct answer displayed on the back of the card. If she answers the card correctly, the card will be offered for review in ever-increasing intervals. If she misses the question, the card is scheduled the next day to restart the learning cycle. Cards that the user has answered correctly many times are called mature cards and can be scheduled again after a year or more. Because each day the user reviews just a subset of the information contained on all her flashcards, all scheduled cards can be reviewed daily in 30 minutes or so.

Let’s look at how one card might look. Let’s assume I want to remember how The Starry Night painting by Vincent van Gogh looks. Here are screenshots of both sides of the card.

Flashcard front asking the user to visualize the painting.

And the answer provided on the back of the card.

Similar questions can be created to inquire about the painter and the title of the painting. Indeed, questions can be created about anything.

To recap what I covered today. If you want to remember something, no longer do you need to rely just on your memory. You can selectively choose what to remember. For a modest investment of 30 minutes or so daily, you can remember it for decades.

MBA Programs

A Master of Business administration (MBA) degree can be a valuable career enhancement for people that are already in, or think about switching to, management positions. MBA courses cover various areas of business such as accounting, finance, marketing, and operations. There are various types of MBA programs, including part-time, Executive MBA programs (EMBA), and online programs. EMBA programs are abridged versions of MBA programs tailored to people with work experience.

US News & World Report publishes rankings of the best schools for MBAs (led by Wharton) and EMBAs(Chicago Booth). The cost of very best MBA degrees is $200K, which is steep, but may be a better deal than a $60K degree from an unranked school.

People who already received MBA degrees are certainly very satisfied with the programs they completed. Great majority of them rates the value of the MBA degree as outstanding or excellent, would pursue the degree again knowing what they know now, and would hire alumni from their alma mater.

How do you stack up against students in top schools? Wharton EMBA students have median GMAT score 700, their average age is 35 (with 14% over 40) and have 11 years of work experience.

Of course, first you need to be accepted to an MBA program. All business schools use GMAT to screen applicants. GMAT tests reasoning and writing skills using an adaptive computer program. Tests are administered in test centers around the world. You can take a mini-quiz online to quickly assess your test taking chops. I was disappointed to get 2 out of 8 questions wrong, including a math question 🙂

What is a School Worth Anyway?

I read the Los Angeles Times’ article “What Students Know that Experts Don’t”, that claims that completing a college education is not about gaining the knowledge, but about signaling, that graduates are hardworking and that they conform to the rules.

The article has implications for the recruiting and interviewing I do for my company. Due to time constraints, particularly during recruiting trips, I scan applicants’ resumes and if the GPA is below 3.0, my first question is often why. These may be bright kids, passionate about programming or some other useful skill and my following questions will attempt to tease how good and passionate they are. But a low GPA might anchor me against hiring them.

The author of the article is Bryan Caplan, a professor of economics at George Mason University. I plan on reading a book he wrote on this subject titled “The Case Against Education: Why The Education System Is a Waste of Time and Money” and will report on it when I do.

This week, the problem of college degrees’ value was brought into sharp focus. FBI charged parents, top school college administrators and their coaches with a scam to help wealthy parents get their children into elite universities. It is not clear, what will happen to students benefiting from actions of their parents. Some students allegedly did not know about these scams and if true, it is hard to imagine how these kids feel now. Even if they are not expelled from their schools, they are tainted by actions their parents took.

Lessons From re:Work

Recently I discovered re:Work, a website where Google shares its knowledge about human resources and management. In this post I highlight a few of the articles that I found inspiring.

A thought provoking article “Changing the Change Rules at Google” covers a new approach to reorgs. As a large, fast moving company, Google goes through frequent reorgs that are not always handled well. In some instances, Google found that half the people do not understand the reorg. The new ChangeRules approach is driven by four key broad questions: Why? What? Who? How? The reason behind these questions is to find out what is the goal of this change and how to best implement it. It may actually turn out that the reorg is not needed. One of the aspects that I found surprising is to get stakeholders’ input early, including input from employees affected by the reorg.

The project I am working on at work is a large undertaking that requires a major shift in the way teams are working. People need to nurture their innovation abilities, atrophied by many years working on legacy projects. These legacy projects needed to be maintained, without making any significant changes. An article “Hacking Your Innovation Mindset” provides tips from an eponymous Stanford d.school class to achieve improvement in innovation. Three main abilities are promoted during the class:

Learn to navigate ambiguity. I love this quote: “It is not only about problem solving, it is about problem finding”.
Practice mindful observation.
Experiment with your ideas.

After completing the Stanford class, the students were significantly more confident in showing their work to others, before it was finished to their satisfaction. Showing unfinished work to provide feedback is one of the major signs of innovative work.

For people wanting to learn more about this fascinating subject, read “Entrepreneur Behaviors, Opportunity Recognition, and the Origins of Innovative Ventures”. This study is based on interviews with dozens of innovative entrepreneurs and hundreds of executives and it posits that innovation differs from management in the patterns by which information is acquired. To my surprise, it is not personal traits, including risk taking, that explain the difference between being innovative and non-innovative.

Project Chameleon is a project in one of the divisions at Google to provide an internal marketplace where employees are matched to projects and managers, based on their skills and preferences. The project has been running for 2 years and 80% people who participate in it are satisfied, with 90% employees and managers getting one of their top three choices. This project has potential to provide benefit at other larger companies as well.

WebAssembly

Until recently, JavaScript was the only language supported by all web browsers. But it was difficult to compile it efficiently and the JavaScript applications typically run much slower than native applications.

Then, programmers from the four main browser vendors designed a new language, sort of a machine code for the web, and called it WebAssembly. The new language was first announced in the “Bringing the Web up to Speed with WebAssembly” paper in 2017, a specification published together by Google, Microsoft, Mozilla, and Apple. The specification lists in detail the formal syntax and explains design choices made. A major design goal of WebAssembly has been high performance without sacrificing safety or portability. To provide security, each WebAssembly module is designed to have a single block of memory (array of bytes), disjoint from code space and other internal memory spaces. It is not possible for compiled programs to corrupt their execution environment.

This week, the “Analyzing the Performance of WebAssembly vs. Native Code” study was published, which appeared to throw cold water on claims that WebAssembly is only about 10% slower than native code. The study shows numbers that are closer to 50-90% slower. However, in order to run large suite of performance tests SPEC CPU, the authors significantly modified how WebAssembly is run in the Unix environment by adding support for a file system and I/O. It is not clear if their implementation is sound.

While discussing WebAssembly, you might be wondering about what is the level of WebAssembly support across web browsers. For this purpose, head over to CanIUse.com, a web site that details web technology support on a per-browser basis. When searching for “WebAssembly”, you get the following picture, indicating that all major browsers do support WebAssembly.

Screenshot from CanIUse.com searching for WebAssembly support. All major browsers support it, with the exception of an outdated Internet Explorer. Box sizes are relative to browser popularity. Numbers in boxes indicate browsers’ versions.

I work in CAD software development, so I read with great interest about Autodesk’s recently completed rewrite of its behemoth AutoCAD software to support web browsers by using the LLVM framework with Emscripten on the back-end to produce WebAssembly. Considering millions lines of code comprise AutoCAD, it was no small feat.

Autodesk’s AutoCAD Web browser based user interface.

I suspect that I will come back to this topic in a future blog post here at CodingRestart. Until then, happy coding!

Kate Matsudaira

I read an interesting article “Design Patterns for Managing Up” that discusses four challenging situations at work and approaches to resolve them. It turns out that the author is technologist Kate Matsudaira, previously working at Microsoft, Amazon, and several startups. Kate is currently a director of engineering at Google.

Her personal blog is full of great articles and is well worth reading, because Kate has a unique combination of good technical knowledge and great management skills. I particularly liked her articles on resolving conflicts and managing people and teams. Kate’s personal story is inspiring. She also shares a very comprehensive list of interview questions for CS jobs. From this list I liked converting BST to ordered array, when is a hash map a poor data choice and given the following variables: time, budget, customer happiness and best practices, explain which are the most important in a project.

There are several talks by Kate available online, concentrating on personal growth at work and leadership. I suggest you start with the “Leveling Up” talk that is motivating and fun. Speaking of leadership,
Lighthouse has a nice interview with her.

Kate’s Twitter account seems to be updated regularly and you may want to start 2019 with the planning tools she links to.

I love it when the Internet takes me on a tour like this. I start from an article by an unknown person on Hacker News, liking the article and finding out more articles by the author, all the way to her life story on her blog and seeing that she went to Harvey Mudd College on her LinkedIn profile. Coming back to a game theme from Kate’s talk, it’s like playing different levels in a game:-)