The MediaPipe and Tensorflow.js teams have released facemesh and handpose:
The facemesh package infers approximate 3D facial surface geometry from an image or video stream, requiring only a single camera input without the need for a depth sensor. This geometry locates features such as the eyes, nose, and lips within the face, including details such as lip contours and the facial silhouette.
The handpose package detects hands in an input image or video stream, and returns twenty-one 3-dimensional landmarks locating features within each hand. Such landmarks include the locations of each finger joint and the palm.
Once you have one of the packages installed, it’s really easy to use. Here’s an example using facemesh:
import * as facemesh from '@tensorflow-models/facemesh';
// Load the MediaPipe facemesh model assets.
const model = await facemesh.load();
// Pass in a video stream to the model to obtain
// an array of detected faces from the MediaPipe graph.
const video = document.querySelector("video");
const faces = await model.estimateFaces(video);
// Each face object contains a `scaledMesh` property,
// which is an array of 468 landmarks.
faces.forEach(face => console.log(face.scaledMesh));
The output will be a prediction object:
{
faceInViewConfidence: 1,
boundingBox: {
topLeft: [232.28, 145.26], // [x, y]
bottomRight: [449.75, 308.36],
},
mesh: [
[92.07, 119.49, -17.54], // [x, y, z]
[91.97, 102.52, -30.54],
...
],
scaledMesh: [
[322.32, 297.58, -17.54],
[322.18, 263.95, -30.54]
],
annotations: {
silhouette: [
[326.19, 124.72, -3.82],
[351.06, 126.30, -3.00],
...
],
...
}
}
Both packages run entirely within the browser so data never leaves the user’s device.
Be sure to check the demos as they’re quite nice. I did notice that the handpose demo only shows one hand, even though the library can detect more than one.
Face and hand tracking in the browser with MediaPipe and TensorFlow.js →facemesh
Demo →handpose
Demo →
Leave a comment