DALL·E is a 12-billion parameter version of GPT-3 trained to generate images from text descriptions, using a dataset of text–image pairs. We’ve found that it has a diverse set of capabilities, including creating anthropomorphized versions of animals and objects, combining unrelated concepts in plausible ways, rendering text, and applying transformations to existing images.
If you give DALL·E the task to generate images with “an armchair in the shape of an avocado”, you get this:
The past few weeks I’ve been using Krisp during Zoom/Meet/Skype/etc. calls. It is a virtual microphone which uses artificial intelligence to filter out background noises. Once installed you select a microphone for it to filter, and then inside Skype/Zoom/Meet you choose the Krisp Virtual Microphone to use.
While I was a bit sceptical at first, here’s a few scenarios where it’s already helped me out:
My kids playing with their LEGOs in the same room.
Traffic passing by (We live next to a busy road — until we move to our new house that is).
A fan whirring in the background.
Me, typing, while we converse
Krisp → (referral link which will get you 1 month of free use)
To crop uploaded images, Twitter doesn’t simply cut them off starting from the center. After first having used Face Detection, they – in 2018 already – switched to AI to cleverly crop uploaded images.
Previously, we used face detection to focus the view on the most prominent face we could find. While this is not an unreasonable heuristic, the approach has obvious limitations since not all images contain faces.
A better way to crop is to focus on “salient” image regions. Academics have studied and measured saliency by using eye trackers, which record the pixels people fixated with their eyes. In general, people tend to pay more attention to faces, text, animals, but also other objects and regions of high contrast. This data can be used to train neural networks and other algorithms to predict what people might want to look at. The basic idea is to use these predictions to center a crop around the most interesting region.
TossingBot, a robotic arm that picks up items and tosses them to boxes outside its reach range. It is double the speed and dexterity of other state-of-the-art picking systems achieving 500+ mean picks per hour, and is better at throwing than most of the engineers on the team. The key to TossingBot is a self-improving artificial intelligence algorithm that uses a combination of both physics and deep learning. This enables it to learn quickly — starting with just grasping objects in the morning, and eventually learning to toss them by the evening. The system is also general — capable of handling objects it has never seen before, and adapting to new target landing locations.
Here’s a more in-depth video:
During its first rodeo, the mechanical arm didn’t know what to do with the pile of objects it was presented with. After 14 hours of trial and error and analyzing them with its overhead cameras, it was finally able to toss the right item into the right container 85 percent of the time.
What happens when we tap into the power of artificial intelligence and deep learning to transform bad portrait shots into good ones – all on a smartphone? By combining perspective effect editing, automatic, software-only photo masking, and photo style transfer technology, we’re able to transform a typical selfie into a flattering portrait with a pleasing depth-of-field effect that can also replicate the style of another portrait photo.
Sidenote: Replicating the style of one photo to another was recently described, and demo’d in a paper entitled “Deep Photo Style Transfer”. It has some stunning results:
On the left you see the source photo, in the middle the style to apply, and on the right the result.
No monitoring of parking meters, video feeds, etc. Looking at the users their behavior is the way to do it:
Google determined that if users circled around a location like in the picture above, it usually suggested that parking might be difficult. To recognize this behavior, they took the difference between when they should have arrived at the location versus when they actually arrived there, with a large difference indicating more difficulty in finding a parking spot.
When requesting a high-resolution image with the Android Google+ app, one no longer gets the full version sent over. Instead, one gets a low-resolution version which is then processed by RAISR:
RAISR, which was introduced in November, uses machine learning to produce great quality versions of low-resolution images, allowing you to see beautiful photos as the photographers intended them to be seen. By using RAISR to display some of the large images on Google+, we’ve been able to use up to 75 percent less bandwidth per image we’ve applied it to.
During the winter break Kevin Hughes decided to try and train an artificial neural network to play MarioKart 64:
After playing way too much MarioKart and writing an emulator plugin in C, I managed to get some decent results. Getting to this point wasn’t easy and I’d like to share my process and what I learned along the way.