Skip to main content

Summary: Lesson 1 - Camera Systems and Computer Vision

Module: Module 2 - Sensors and Perception for Humanoid Robots Lesson: 01-camera-systems.md Target Audience: CS students with Python + Module 1 (ROS2) knowledge Estimated Time: 30-45 minutes Difficulty: Beginner

Learning Outcomes

By the end of this lesson, students will be able to:

  1. Understand the five foundational camera parameters (pixels, resolution, frame rate, field of view, image encoding)
  2. Differentiate between monocular, stereo, and RGB-D cameras based on trade-offs
  3. Apply ROS2 sensor_msgs/Image and CameraInfo message structures to process camera data
  4. Analyze camera placement strategies (head, wrist, chest) and their impact on humanoid capabilities
  5. Evaluate trade-offs in multi-camera system design for competing requirements

Key Concepts Covered

Camera Types (Section 3.1)

  • Monocular: Simple, low-cost, no depth (2D only)
  • Stereo: Passive depth via triangulation, 0.5-10m range
  • RGB-D: Active depth (IR/ToF), 0.3-10m range, limited outdoor use

Camera Parameters (Section 3.2)

  • Resolution: 640×480 (VGA) to 4K, trade-off with computation
  • Field of View: 30-180°, inverse relationship with focal length
  • Frame Rate: 10-60+ FPS, trade-off with bandwidth

Camera Placement (Section 3.3)

  • Head-mounted: Navigation, situational awareness, pan/tilt capability
  • Wrist-mounted: Manipulation, visual servoing, eye-in-hand
  • Chest-mounted: Stable SLAM reference, compromise viewpoint

ROS2 Integration (Sections 3.4-3.5)

  • sensor_msgs/Image: header, dimensions, encoding, step, raw data
  • sensor_msgs/CameraInfo: K matrix (intrinsics), distortion coefficients
  • Publisher-Subscriber Pattern: Camera driver → multiple vision nodes
  • QoS: Best effort reliability, moderate queue depth for real-time

Real-World Examples

Boston Dynamics Atlas

  • Stereo cameras in head (752×480 @ 30fps, 10cm baseline)
  • RGB cameras in wrists (1280×720 @ 10-15fps)
  • Task specialization: depth for locomotion, color for manipulation

Tesla Optimus

  • Eight monocular cameras (1280×960 @ 36fps)
  • Vision-only approach (no lidar)
  • Neural network depth estimation from monocular cues

Code Examples

  1. CameraSubscriber: Subscribe to /camera/image_raw and log metadata
  2. CameraInfoSubscriber: Extract K matrix and calculate FOV from focal length

Both examples demonstrate:

  • Type hints on all function signatures
  • ROS2 node inheritance pattern
  • Callback-based message processing
  • Proper use of sensor_msgs types

Practice Exercises

  1. Multi-Camera System Design: Design vision system for delivery robot (navigation + object recognition + human interaction)
  2. Live Topic Inspection: Use ros2 CLI tools to inspect camera topics and extract parameters
  3. AI Colearning Prompts:
    • FOV/focal length relationship analogy
    • Stereo synchronization requirements

Common Pitfalls (Expert Insights)

  1. Resolution Sweet Spot: 640×480 @ 15-30fps often better than 4K for real-time humanoid applications
  2. RGB vs BGR: OpenCV uses BGR by default; rgb8 messages need conversion to avoid color inversion

Assessment Criteria

Students demonstrate mastery when they can:

  • Explain camera type differences and justify selection for specific humanoid tasks
  • Subscribe to ROS2 camera topics and process image/calibration data
  • Design multi-camera configurations with trade-off justification
  • Calculate FOV from intrinsic matrix parameters
  • Identify appropriate camera placements for manipulation vs navigation

Prerequisites

  • Module 1: ROS2 Basics (nodes, topics, publishers, subscribers, message types)
  • Python 3.11+ with type hints
  • Basic linear algebra (vectors, matrices for K matrix understanding)

Next Steps

  • Lesson 2: Depth Sensing Technologies (LiDAR, structured light, ToF)
  • Connection: Cameras provide 2D semantic info; depth sensors add 3D spatial awareness

Metadata

  • Generated by: Agent Pipeline (9-agent system)
  • Created: 2025-12-08
  • Tags: ros2, sensors, camera, computer-vision, humanoid-robotics
  • Cognitive Load: Moderate (6 new concepts, builds on Module 1)
  • Word Count: ~3,200 words (including code + callouts)
  • Sections: 7 (What Is, Why Matters, Key Principles, Callouts, Code Examples, Summary, Next Steps)

Validation Status

  • ✅ Technical Review: PASS WITH MINOR REVISIONS (all fixed)
  • ✅ Structure & Style: PASS (all 7 sections, proper callouts)
  • ✅ Frontmatter: COMPLETE (13 fields generated)
  • ✅ Code Quality: PASS (type hints, docstrings, ROS2 patterns)
  • ✅ Case Studies: 2 detailed examples (Atlas, Optimus)
  • ✅ Callouts: 2 AI Colearning, 2 Expert Insights, 2 Practice Exercises