Hand Gesture Recognition (MediaPipe)
The Hand Gesture Recognition model is a 2 stage model consisting of the MediaPipe HandLandmarker and a Hand Classification model.The Hand Gesture Recognition model is a two-stage model consisting of the MediaPipe HandLandmarker and a Hand Classification model.
MediaPipe HandLandmarker uses the Hands Tracking model under the hood. The Hands Tracking model is a convolutional neural network which outputs 21 three-dimensional hand landmarks if it detects a hand in the input image. This model is pre-trained and used without any finetuning in the Hand Gesture Recognition model pipeline.
The Hand Classification model is a fully connected neural network which takes as input the 21 three-dimensional hand landmarks and outputs a classification score for each class. The model itself consists of a pre-trained hand embedding model and a fine-tunable classification head. The number of layers in the classification head can be configured as well.
This model can be used in a notebook. Click Open notebook to use the model in Colab.
Training images run through two preprocessing steps. First, the images are run through MediaPipe HandLandmarker in order to obtain the hand landmarks for each image. Images without detected hands are dropped. Next, the hand landmarks go through a pre-trained embedding model which outputs a 128-dimensional embedding representation used as input to fine-tune the classification head.
The model outputs confidence scores for each gesture class.
For best results, use MediaPipe Tasks GestureRecognizer to deploy the output TFLite model on-device as it ensures the same preprocessing logic between training and inference. Use MediaPipe Studio to evaluate the model through a live demo.
Resource ID | Release date | Release stage | Description |
---|---|---|---|
mediapipe/gesture-recognizer-001 | 2024-04-01 | General Availability | Fine-tuning and on-device serving |
O console do Google Cloud não conseguiu carregar fontes JavaScript a partir de www.gstatic.com.
Os possíveis motivos são: