Sherbot Holmes
By Eirik M?ller Nilsen og Eivind Nordahl Andersen
1) Topic C – Animatronics
We chose Animatronics because it connects creative character design with practical robotics and modern AI.
Inspired by compliant mechanisms and soft robotics from the animation industry, we aim to explore how a physical “creature” can feel alive when it understands its surroundings and responds naturally.
This topic provides a holistic learning arena — from perception (video analysis) to decision-making (AI) and safe, compliant actuation. It also has high demo value: a robot that follows a target, recognizes objects, and responds to speech is both technically challenging and engaging to present.
Our motivation lies in achieving measurable, iterative improvements. Vision can be optimized for accuracy and robustness, motion control for smooth tracking, and the voice interface for reliable understanding.
2) Goals
Project Goal
The goal of this project is to design and build a small mobile animatronic robot capable of visually detecting, locating, and guiding users toward specific objects or areas.
The robot should demonstrate a natural and lifelike interaction loop where it can:
- Find and approach a relevant target, such as an empty chair, a trash bin, or a specific lecture room sign (“Smalltalk” or “Simula”).
- Respond to user input either through voice commands (“Find the red chair”, “Show me the Simula room”) or through predefined physical buttons.
- Each button corresponds to a specific task (e.g., Locate chair, Locate trashcan, Locate room sign).
- The button fallback ensures reliable operation even if voice recognition fails.
The overall goal is to achieve robust, repeatable, and expressive behavior that feels intuitive and engaging during live demonstrations — focusing on the quality of perception–decision–motion integration rather than raw computational performance.
Intended Outcomes
By the end of the project, the robot should be able to:
- Perceive its surroundings using edge-device, camera, microphone, speaker and proximity sensors.
- Interpret a command (spoken or button-based) to identify what to look for.
- Detect and locate the specified target in real time (>5fps).
- Move toward the target and stop at a safe distance (~0.5 m).
- Perform a simple expressive gesture upon task completion and task activation.
Inspiration and Design Character
The design draws inspiration from Alonso Martinez’s expressive 3D-printed robots[^1], but differs in two key aspects:
- It emphasizes autonomous perception and interaction rather than purely expressive animation.
The goal is to blend soft-robotic motion with intelligent behavior, creating an approachable and “alive” demonstration platform for embodied AI.
[^1]: Alonso Martinez's 3D-Printed Animated Robots! — Adam Savage's Tested: https://www.youtube.com/watch?v=0vfuOW1tsX0&t=300s/
3) Sketch
Project Overview
The system combines computer vision, speech or button-based interaction, and motor control into one integrated loop.
It is composed of two main subsystems:
1. Perception & Intelligence
- Uses the camera to capture the environment and identify objects of interest.
- Uses the microphone for voice commands.
- Runs a finetuned compact edge-device vision model(Yolo11s)[^2][^3] for detecting relevant items such as chairs, trash bins, or room labels.
- Interprets simple commands:
- “Find an empty chair”
- “Show me Smalltalk”
- “Find the trashcan”
- Communicates target coordinates and control signals wirelessly (via WiFi or Bluetooth Low Energy) to the robot’s microcontroller.
[^2]: You Only Look Once: Unified, Real-Time Object Detection — Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: https://arxiv.org/abs/1506.02640
[^3]: Ultralytics YOLO11 — Model documentation: https://docs.ultralytics.com/models/yolo11/
2. Actuation & Mobility (Microcontroller Layer)
- Controls four Dynamixel smart servos configured in wheel mode for differential drive and balance.
- Executes simple behaviors:
- Navigate forward, backward, turn left/right, or stop.
- Perform expressive gestures (LED blink, spin, chime) once a target is reached.
- Houses the control electronics, power source, and communication module in a compact 3D-printed chassis.
3. Interaction Inputs
- Voice control (primary): User issues natural commands via speech.
- Physical buttons (fallback):
- Multiple buttons are assigned to specific pre-defined tasks (e.g., Chair, Trashcan, Simula, Smalltalk).
- Each button directly informs the perception system which target to look for — ensuring functionality even in noisy environments or without microphone access.
Physical and Aesthetic Design
The robot’s outer shell and motion elements will be 3D-printed, inspired by compliant-mechanism and soft-robotic principles.
This gives the robot smooth, lifelike movement and an approachable character suitable for interactive demonstrations.
Intended Demonstration
In the final demonstration, the robot will be asked to:
“Find an empty chair” — or, if voice fails, the Chair button is pressed.
The robot will visually search, approach, and stop near the correct object, signaling success with an expressive gesture.
Summary of Key Integrations | Layer | Components | Communication | Function | |-------|-------------|----------------|-----------| | Perception | (camera, mic, AI) | WiFi / BLE / On-device | Detects, interprets, decides | | Control | Microcontroller | WiFi / BLE / On-device | Receives motion/gesture commands | | Actuation | 4× Dynamixel Servos | Serial / TTL | Drives movement and expression | | User Input | Voice or 4 Physical Buttons | Local | Command selection |
Illustration
First demo design

Sketch witch annotations

Demo: Real-Time CV — Bounding Boxes on Target Objects

4) Bill of Materials (BOM)
| Component | Description / Function | Qty | Notes |
|---|---|---|---|
| Microcontroller | Main control unit for motors and BLE/WiFi communication | 1 | Rasberry Pi 5 Wifi |
| Microphone | Small microphone for microcontroller | 1 | |
| Camera Module | Camera for interpreting surroundings | 1 | Pi camera module v3 |
| Small speaker | Communication to users | 1 | |
| Dynamixel Smart Servos | Drive + feedback; run in Wheel Mode for base | 4 | AX-12A. |
| Battery Pack | Power source for logic and motors | 1 | |
| Edge Device | Either microcontroller or smartphone | 1 | Handles object detection & voice command recognition |
| Chassis (3D-printed) | Structural frame for electronics | 1 | Custom PLA design |
| Wheels | Mobility system | 4 | 3D printed or RC-type |
| Proximity Sensors | For obstacle avoidance | 3 | Ultrasonic or IR sensors |
| Optional Components | LED, buttons, cables etc. | – | Status and feedback |
5) Plan
We divide the work into three main iterations, each with clear milestones, responsibilities, and deliverables.
Progress will be documented through logs, photos, sketches, and short demo videos.
| Week | Milestone / Goal | Description | Responsible | Deliverables |
|---|---|---|---|---|
| 42 | Concept & Design | Define functional scope, design layout. | Eirik & Eivind | Initial sketch, component list (Assignment 4). |
| 43–46 | Hardware Assembly | 3D-print chassis, mount components, verify motor control manually. | Eirik & Eivind | Working 4WD base, photos, wiring diagram. |
| 44–46 | Voice & Perception Integration | Implement command recognition and YOLO detection; send BLE commands. | Eirik (voice) / Eivind (vision) | BLE verified, YOLO functional, demo video (Assignment 6). |
| 47 | Autonomous Navigation | Integrate movement logic and gesture behavior. | Eirik & Eivind | “Find and approach” demo (Assignment 6). |
| 48 | Documentation & Demo | Summarize results, reflect on challenges, finalize report. | Eirik & Eivind | Final demo and report. |
Documentation Notes
During each phase, we will:
- Take photos and videos of key milestones and test setups.
- Capture screenshots of detection results and BLE logs.
- Write short development notes on what worked, what failed, and what was improved.
- Maintain all material for the final reflection and documentation.