Realtime Face and Hand Tracking in the browser with TensorFlow

The MediaPipe and Tensorflow.js teams have released facemesh and handpose:

The facemesh package infers approximate 3D facial surface geometry from an image or video stream, requiring only a single camera input without the need for a depth sensor. This geometry locates features such as the eyes, nose, and lips within the face, including details such as lip contours and the facial silhouette.

The handpose package detects hands in an input image or video stream, and returns twenty-one 3-dimensional landmarks locating features within each hand. Such landmarks include the locations of each finger joint and the palm.

Once you have one of the packages installed, it’s really easy to use. Here’s an example using facemesh:

import * as facemesh from '@tensorflow-models/facemesh';

// Load the MediaPipe facemesh model assets.
const model = await facemesh.load();
 
// Pass in a video stream to the model to obtain 
// an array of detected faces from the MediaPipe graph.
const video = document.querySelector("video");
const faces = await model.estimateFaces(video);
 
// Each face object contains a `scaledMesh` property,
// which is an array of 468 landmarks.
faces.forEach(face => console.log(face.scaledMesh));

The output will be a prediction object:

{
    faceInViewConfidence: 1,
    boundingBox: {
        topLeft: [232.28, 145.26], // [x, y]
        bottomRight: [449.75, 308.36],
    },
    mesh: [
        [92.07, 119.49, -17.54], // [x, y, z]
        [91.97, 102.52, -30.54],
        ...
    ],
    scaledMesh: [
        [322.32, 297.58, -17.54],
        [322.18, 263.95, -30.54]
    ],
    annotations: {
        silhouette: [
            [326.19, 124.72, -3.82],
            [351.06, 126.30, -3.00],
            ...
        ],
        ...
    }
}

Both packages run entirely within the browser so data never leaves the user’s device.

Be sure to check the demos as they’re quite nice. I did notice that the handpose demo only shows one hand, even though the library can detect more than one.

Face and hand tracking in the browser with MediaPipe and TensorFlow.js →
facemesh Demo →
handpose Demo →

Published by Bramus!

Bramus is a frontend web developer from Belgium, working as a Chrome Developer Relations Engineer at Google. From the moment he discovered view-source at the age of 14 (way back in 1997), he fell in love with the web and has been tinkering with it ever since (more …)

Join the Conversation

1 Comment

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.