DALL·E: Creating Images from Text

DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.

If you give DALL·E the task to generate images with “an armchair in the shape of an avocado”, you get this:

And to say I was questioning the power of computers earlier today 😅

DALL·E: Creating Images from Text →

🤓 Clever name too: Dali + WALL·E = DALL·E

Remove background noises during a conference call with Krisp

The past few weeks I’ve been using Krisp during Zoom/Meet/Skype/etc. calls. It is a virtual microphone which uses artificial intelligence to filter out background noises. Once installed you select a microphone for it to filter, and then inside Skype/Zoom/Meet you choose the Krisp Virtual Microphone to use.

While I was a bit sceptical at first, here’s a few scenarios where it’s already helped me out:

  • My kids playing with their LEGOs in the same room.
  • Traffic passing by (We live next to a busy road — until we move to our new house that is).
  • A fan whirring in the background.
  • Me, typing, while we converse

Krisp →
(referral link which will get you 1 month of free use)

Cleverly Cropping Images on Twitter using AI

To crop uploaded images, Twitter doesn’t simply cut them off starting from the center. After first having used Face Detection, they – in 2018 already – switched to AI to cleverly crop uploaded images.

Previously, we used face detection to focus the view on the most prominent face we could find. While this is not an unreasonable heuristic, the approach has obvious limitations since not all images contain faces.

A better way to crop is to focus on “salient” image regions. Academics have studied and measured saliency by using eye trackers, which record the pixels people fixated with their eyes. In general, people tend to pay more attention to faces, text, animals, but also other objects and regions of high contrast. This data can be used to train neural networks and other algorithms to predict what people might want to look at. The basic idea is to use these predictions to center a crop around the most interesting region.

💡 Note that depending on how many images you upload, Twitter will use a different aspect ratio.

What I find weird is that this clever cropping only works on their website, and not in embeds nor other clients. Take this tweet for example, embedded below:

When viewed on the Twitter website it does use the clever cropping:

Now, it wouldn’t surprise me that Twitter hides this extra information from 3rd party clients, given that they basically imposed a no-fly zone back in the day.

Speedy Neural Networks for Smart Auto-Cropping of Images →

TossingBot – Learning Robots to Throw Arbitrary Objects

TossingBot, a robotic arm that picks up items and tosses them to boxes outside its reach range. It is double the speed and dexterity of other state-of-the-art picking systems achieving 500+ mean picks per hour, and is better at throwing than most of the engineers on the team. The key to TossingBot is a self-improving artificial intelligence algorithm that uses a combination of both physics and deep learning. This enables it to learn quickly — starting with just grasping objects in the morning, and eventually learning to toss them by the evening. The system is also general — capable of handling objects it has never seen before, and adapting to new target landing locations.

Here’s a more in-depth video:

During its first rodeo, the mechanical arm didn’t know what to do with the pile of objects it was presented with. After 14 hours of trial and error and analyzing them with its overhead cameras, it was finally able to toss the right item into the right container 85 percent of the time.

TossingBot: Learning to Throw Arbitrary Objects with Residual Physics →

(Via Sara, a former student of mine)

AutoDraw: Fast Drawing for Everyone

After Quickdraw a few months ago – in which A.I. guesses what you are doodling – now comes AutoDraw from Google.

AutoDraw is a new kind of drawing tool. It pairs machine learning with drawings from talented artists to help everyone create anything visual, fast.

AutoDraw’s suggestion tool uses the same technology used in QuickDraw, to guess what you’re trying to draw. Right now, it can guess hundreds of drawings and we look forward to adding more over time.

AutoDraw →

The Potential Future of Selfie Photography

Adobe Research:

What happens when we tap into the power of artificial intelligence and deep learning to transform bad portrait shots into good ones – all on a smartphone? By combining perspective effect editing, automatic, software-only photo masking, and photo style transfer technology, we’re able to transform a typical selfie into a flattering portrait with a pleasing depth-of-field effect that can also replicate the style of another portrait photo.

Amazing!

Sidenote: Replicating the style of one photo to another was recently described, and demo’d in a paper entitled “Deep Photo Style Transfer”. It has some stunning results:

On the left you see the source photo, in the middle the style to apply, and on the right the result.

Deep Photo Style Transfer Paper →
Deep Photo Style Transfer Code and Data →

Using Machine Learning to Predict Parking Difficulty

No monitoring of parking meters, video feeds, etc. Looking at the users their behavior is the way to do it:

Google determined that if users circled around a location like in the picture above, it usually suggested that parking might be difficult. To recognize this behavior, they took the difference between when they should have arrived at the location versus when they actually arrived there, with a large difference indicating more difficulty in finding a parking spot.

Using Machine Learning to Predict Parking Difficulty →

Saving bandwidth through machine learning

When requesting a high-resolution image with the Android Google+ app, one no longer gets the full version sent over. Instead, one gets a low-resolution version which is then processed by RAISR:

RAISR, which was introduced in November, uses machine learning to produce great quality versions of low-resolution images, allowing you to see beautiful photos as the photographers intended them to be seen. By using RAISR to display some of the large images on Google+, we’ve been able to use up to 75 percent less bandwidth per image we’ve applied it to.

Wow!

Google Blog: Saving you bandwidth through machine learning →
Enhance! RAISR Sharp Images with Machine Learning →

TensorKart: self-driving MarioKart with TensorFlow

During the winter break Kevin Hughes decided to try and train an artificial neural network to play MarioKart 64:

After playing way too much MarioKart and writing an emulator plugin in C, I managed to get some decent results. Getting to this point wasn’t easy and I’d like to share my process and what I learned along the way.

TensorKart: self-driving MarioKart with TensorFlow →
TensorKart Source (GitHub) →