The Kinect body tracking pipeline -

The Kinect body tracking pipeline -

The Kinect body tracking pipeline Oliver Williams, Mihai Budiu Microsoft Research, Silicon Valley With slides contributed by Johnny Lee, Jamie Shotton NASA Ames, February 14, 2011 Outline

Hardware overview The body tracking pipeline Learning a classifier from large data Conclusions 2

What is Kinect? 3 ~2000 people Caveat: we only have knowledge about a small part of this process. 4 Input device

5 The Innards Source: iFixit 6 The vision system

RGB camera IR camera IR laser projector Source: iFixit 7

RGB Camera Used for face recognition Face recognition requires training Needs good illumination 8 The audio sensors

4 channel multi-array microphone Time-locked with console to remove game audio 9 Prime Sense Chip Xbox Hardware Engineering dramatically improved upon Prime Sense reference design performance Micron scale tolerances on large components Manufacturing process to yield ~1 device / 1.5 seconds

10 Projected IR pattern 11 Source: Depth computation

Source: 12 Depth map Source: 13 Kinect video output

30 HZ frame rate 57deg field-of-view 8-bit VGA RGB 640 x 480 11-bit monochrome 320 x 240 14

XBox 360 Hardware Triple Core PowerPC 970, 3.2GHz Hyperthreaded, 2 threads/core 500 MHz ATI graphics card DirectX 9.5 512 MB RAM 2005 performance envelope Must handle real-time vision AND

a modern game Source: 15 THE BODY TRACKING PIPELINE 16

Generic Extensible Architecture Expert 1 fuses the hypotheses Expert 2 Arbiter Expert 3 probabilistic

Raw Sensor data Skeleton Stateless Statefull estimates

Final estimate 17 One Expert: Pipeline Stages Sensor Body Part Classifier

Depth map Body Part Identification Background segmentation Player separation Skeleton

18 Sample test frames 19 Constraints No calibration - no start/recovery pose - no background calibration

- no body calibration Minimal CPU usage Illumination-independent 20 The test matrix body size

hair FOV body type clothes angle pets furniture 21

Preprocessing Identify ground plane Separate background (couch) Identify players via clustering 22 Two trackers Hands + head tracking

Body tracking not exposed through SDK 23 The body tracking problem Classifier

Input Depth map Runs on GPU @ 320x240 Output Body parts 24

Training the classifier Start from ground-truth data depth paired with body parts Train classifier to work across pose scene position Height, body shape

25 Getting the Ground Truth (1) Use synthetic data (3D avatar model) Inject noise 26

Getting the Ground Truth (2) Motion Capture: - Unrealistic environments - Unrealistic clothing - Low throughput 27 Getting the Ground Truth (3)

Manual Tagging: - Requires training many people Potentially expensive Tagging tool influences biases in data. Quality control is an issue 1000 hrs @ 20 contractors ~= 20 years 28

Getting the Ground Truth (4) Amazon Mechanical Turk: - Build web based tool Tagging tool is 2D only Quality control can be done with redundant HITS 2000 frames/hr @ $0.04/HIT -> 6 yrs @ $80/hr

29 Classifying pixels Compute P(ci|wi) pixels i = (x, y) body part ci image window wi example image windows

window moves with classifier Learn classifier P(ci|wi) from training data randomized decision forests 30 Features -

( ) -- depth of pixel x in image I = (u,v) -- parameter describing offets u and v 31 From body parts to joint positions

Compute 3D centroids for all parts Generates (position, confidence)/part Multiple proposals for each body part Done on GPU 32 From joints positions to skeleton Tree model of skeleton topology

Has cost terms for: Distances between connected parts (relative to body size) Bone proximity to body parts Motion terms for smoothness 33 Where is the skeleton?

34 LEARNING THE BODY PARTS CLASSIFIER FROM A MOUNTAIN OF DATA 35 Learn from Data Training examples Machine learning

Classifier 36 Cluster-based training Classifier Training examples

Machine learning DryadLINQ > Millions of input frames > 1020 objects manipulated

Sparse, multi-dimensional data Complex datatypes (images, video, matrices, etc.) Dryad 37 Data-Parallel Computation Application

SQL Language Execution Storage Parallel Databases Sawzall, Java

Sawzall,FlumeJava MapReduce GFS BigTable SQL LINQ, SQL

Pig, Hive DryadLINQ Scope Hadoop HDFS S3 Dryad

Cosmos Azure SQL Server 38 Dryad = 2-D Piping Unix Pipes: 1-D grep | sed | sort | awk | perl

Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50 39 Virtualized 2-D Pipelines 40 Virtualized 2-D Pipelines

41 Virtualized 2-D Pipelines 42 Virtualized 2-D Pipelines 43

Virtualized 2-D Pipelines 2D DAG multi-machine virtualized 44 Fault Tolerance

LINQ => DryadLINQ Dryad 46 LINQ = .Net+ Queries Collection collection; bool IsLegal(Key); string Hash(Key);

var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; 47 DryadLINQ Data Model .Net objects Partition

Collection 48 DryadLINQ = LINQ + Dryad Vertex code Collection collection; bool IsLegal(Key k);

string Hash(Key); var results = from c in collection where IsLegal(c.key) select new { Hash(c.key), c.value}; Query plan (Dryad job) Data

collection C# C# C# C#

results 49 Language Summary Where Select GroupBy OrderBy Aggregate

Join 50 machine Highly efficient parallellization time 51

CONCLUSIONS 52 Huge Commercial Success 53 Tremendous Interest from Developers

54 Consumer Technologies Push The Envelope Price: 6000$ Price: 150$ 55

Unique Opportunity for Technology Transfer 56 I can finally explain to my son what I do for a living 57

Recently Viewed Presentations

  • Course Selection -

    Course Selection -

    myBlueprint. Students will choose and submit their courses on . Everyone will receive step by step directions on how to select their courses. Where to get help: Guidance Counsellors, Winston Churchill Website, Secondary Placemat, Course Selection video. Student...
  • Neurological Examination of Spinal injury

    Neurological Examination of Spinal injury

    7 cervical vertebrae . 12 thoracic vertebrae . 5 lumbar vertebrae . 5 fused sacral vertebrae . 3-4 small bones comprising the coccyx . Spinal cord ends as conus medullaris at . level of first lumbar vertebra lumbar and ....
  • The Elephant in the Room Ulcers Ulcers Ulcers

    The Elephant in the Room Ulcers Ulcers Ulcers

    Avoidance of barefoot walking indoor or outdoor and of wearing shoes without socks should be promoted . Chemical agents or plasters to remove corn and calluses- should not be used. Recommend daily inspection and palpation of the inside of the...
  • Poll Question - Senturus

    Poll Question - Senturus

    Author: devo Created Date: 05/22/2015 03:21:32 Title: Poll Question Last modified by: Bobbi Ewelt
  • Chapter 16

    Chapter 16

    Materials can fall on employees, and papers or files stored on the floor or in a hall are a fire risk. To prevent these accidents: Do not stack boxes or papers on tall cabinets. Store heavier objects on lower shelves....
  • Statistics - New York University

    Statistics - New York University

    The Visual Data Do Tell the Story:Napoleon's March to and from Moscow
  • Window's Edit Tools

    Window's Edit Tools

    Search Engines Subject Directories Specializes Databases -"Invisible Web" Recommended Search Strategy: Analyze your topic & Search with peripheral vision Evaluating Web Pages: Techniques to Apply & Questions to Ask Style Sheets for Citing Internet Resources Internet Research Assignment Quiz 1
  • Computability and Complexity 20-1 Random Sources Computability and

    Computability and Complexity 20-1 Random Sources Computability and

    Computability and Complexity 20-1 Random Sources Computability and Complexity Andrei Bulatov Computability and Complexity 20-2 Random Choices We have seen several probabilistic algorithms, that is algorithms that make some random choices during the computation We have proved that those algorithms...