Human pose estimation is one of the key problems in computer vision that has been studied for well over 15 years. The reason for its importance is the abundance of applications that can benefit from such a technology. For example, human pose estimation allows for higher level reasoning in the context of human computer interaction and activity recognition. Human Pose Estimation has some pretty cool applications and is heavily used in action recognition, animation, gaming, etc. It is also one of the basic building blocks for marker-less motion capture technology. MoCap technology is useful for applications ranging from character animation to clinical analysis of gait pathologies.
Computer vision:
Computer vision is the field of computer science that focuses on replicating parts of the complexity of the human vision system and enabling computers to identify and process objects in images and videos in the same way that humans do.
Pose estimation:
Human pose estimation is an important problem in the field of Computer Vision. Imagine being able to track a person’s every small movement and do a bio-mechanical analysis in real time. The technology will have huge implications. Applications may including video surveillance, assisted living, advanced driver assistance systems and sport analysis. Formally speaking, Pose Estimation is predicting the body part or joint positions of a person from an image or a video.
Human pose estimation plays an important role in many computer vision tasks and has been studied for many decades. Human pose estimation refers to the task of recognizing postures by localizing body key points from images. It is vital prerequisite step for many computer vision tasks such as human action recognition, tracking, human-computer interaction and video surveillance.

Types of pose recognition:
Several approaches to human pose estimation were introduced over the years. The earliest and slowest methods typically estimating the pose of a single person in an image which only had one person to begin with. These methods often identify the individual parts first, followed by forming connections between them to create the pose.
Naturally, these methods are not particularly useful in many real-life scenarios where images contain multiple people. Multi-Person pose estimation is more difficult than the single person case as the location and the number of people in an image are unknown. Typically, we can tackle the above issue using one of two approaches:
The simple approach is to incorporate a person detector first, followed by estimating the parts and then calculating the pose for each person. This method is known as the top-down approach. Another approach is to detect all parts in the image (i.e. parts of every person), followed by associating/grouping parts belonging to distinct persons. This method is known as the bottom-up approach.
In the past few decades, many efforts are devoted to build robust human pose estimation models under the controlled and uncontrolled setting. The typical methods include pictorial structures models, hierarchical models and non-tree models.
Pictorial structures model:
The pictorial structures model constructs a classical tree-structured graphical framework by exploring spatial correlations between parts of the body and cinematic priors that couple connected limbs.
Hierarchical models:
The hierarchical models represent the relationships between parts at different scales in a hierarchical tree structure, leading to capture high-order relationships among parts and characterize an exponential number of plausible poses.
Non-tree models:
Non-tree models use loops to augment the tree structure with additional edges, which can well capture symmetry, occlusion and long-range relationships.
A human pose skeleton represents the orientation of a person in a graphical format. Essentially, it is a set of coordinates that can be connected to describe the pose of the person. Each coordinate in the skeleton is known as a part (or a joint, or a keypoint). A valid connection between two parts is known as a pair. Note that, not all part combinations give rise to valid pairs.
Applications:
Pose Estimation has applications in myriad fields, some of which are listed below.
Activity Recognition
Tracking the variations in the pose of a person over a period of time can also be used for activity, gesture and gait recognition. There are several use cases for the same, including:
- Applications to detect if a person has fallen down or is sick.
- Applications that can autonomously teach proper workout regimes, sport techniques.
- Applications that can understand full-body sign language. (Ex: Airport runway signals, traffic policemen signals, etc.).
- Applications that can enhance security and surveillance.
Motion Capture and Augmented Reality:
An interesting application of human pose estimation is for CGI applications. Graphics, styles, fancy enhancements, equipment and artwork can be superimposed on the person if their human pose can be estimated. By tracking the variations of this human pose, the rendered graphics can “naturally fit” the person as they move.
A good visual example of what is possible can be seen through emoji. Even though the above only tracks the structure of a face, the idea can be extrapolated for the key points of a person. The same concepts can be leveraged to render Augmented Reality (AR) elements that can mimic the movements of a person.
Training Robots:
Instead of manually programming robots to follow trajectories, robots can be made to follow the trajectories of a human pose skeleton that is performing an action. A human instructor can effectively teach the robot certain actions by just demonstrating the same. The robot can then calculate how to move its articulators to perform the same action.
Motion Tracking for Consoles:
An interesting application of pose estimation is for tracking the motion of human subjects for interactive gaming. Popularly, Kinect used 3D pose estimation (using IR sensor data) to track the motion of the human players and to use it to render the actions of the virtual characters.

Pose Estimation Challenges:
Despite many years of research, however, pose estimation remains a very difficult and still largely unsolved problem. Among the most significant challenges are: variability of human visual appearance in images, variability in lighting conditions, variability in human physique, partial occlusions due to self-articulation and layering of objects in the scene, complexity of human skeletal structure, high dimensionality of the pose.
Great strides have been made in the field of human pose estimation, which enables us to better serve the myriad applications that are possible with it. Moreover, research in related fields such as pose tracking can greatly enhance its productive utilization in several fields.
Related posts::