Data Super Hero
This is the third idaration of this prototype
High-Level Goal
Create a fully autonomous device that transfers data from any removable storage media to cloud storage (e.g. OneDrive) with zero user interaction, other than physically plugging in the drive.
The system must:
- Work offline
- Handle multiple drive types
- Preserve full file structure
- Generate a structured notes file
- Upload to cloud asynchronously
- Expose real-time system status via a LAN dashboard
Core Design Principles
-
Zero-input operation
- No UI, no commands, no buttons
- Plug in a drive → data transfer begins automatically
-
Offline-first
- Drive ingestion must never depend on internet access
- Cloud upload happens opportunistically when connectivity exists
-
Separation of concerns
- Drive ingestion and cloud uploading are handled by separate services
- This maximizes throughput and avoids blocking
-
Data safety over speed
- No destructive operations on source drives
- Clear locking and completion states
- No race conditions between copy and upload
-
Debuggable and observable
- Continuous logs
- LAN status dashboard
- Resettable state for testing
Target Hardware
Prototype
- Raspberry Pi Zero 2 W
- MicroSD card (system + temporary storage)
- USB OTG hub
Final Version
- Larger Raspberry Pi (4 or 5)
- External SSD for internal storage
- Same software architecture
Supported Media Types
The system must automatically handle:
- USB flash drives
- External HDDs
- External SSDs
- Floppy drives (via USB floppy adapter)
No assumptions are made about:
- File system type
- Drive size
- Drive name
- Read/write speed
System Architecture Overview
+-----------------------------+
| Raspberry Pi |
| |
| ┌─────────────┐ |
| | Drive Watch |◄── USB |
| └─────────────┘ |
| │ |
| ▼ |
| ┌─────────────┐ |
| | Ingestion | |
| | Service | |
| └─────────────┘ |
| │ |
| ▼ |
| Local Storage (SSD/SD) |
| │ |
| ┌─────────────┐ |
| | Upload |◄── Internet|
| | Service | |
| └─────────────┘ |
| │ |
| ▼ |
| OneDrive / Cloud Provider |
| |
| ┌─────────────┐ |
| | LAN Web UI | |
| └─────────────┘ |
+-----------------------------+
Boot Behavior
On system boot:
-
Initialize debug mode
-
If
DEBUG_RESET = true:- Clear all drive locks
- Reset incomplete transfer states
- Reset LAN dashboard state
-
-
Start services
- Drive detection / ingestion loop
- Upload worker loop
- LAN status web server
- Debug logging service
Drive Ingestion Service (Script #1)
Purpose
Handle physical drives as fast and safely as possible.
Behavior
-
Runs continuously after boot
-
Waits for a new drive to be connected
-
On detection:
- Identify the device
- Mount it read-only if possible
- Generate a unique drive ID (timestamp + hash)
-
Create a corresponding folder in internal storage:
/ingest/ └── DRIVE_<id>/ ├── data/ ├── notes.txt ├── file_tree.txt └── status.json -
Copy entire file structure exactly as-is into
data/ -
Generate:
file_tree.txt: full directory structure snapshot-
notes.txt: contains- Drive ID
- Timestamp
- Placeholder section for human notes
-
Mark the drive folder as:
INGEST_COMPLETE
-
Safely unmount the physical drive
-
Release the drive for removal
⚠️ This service never uploads and never waits for internet
Upload Service (Script #2)
Purpose
Move completed drive folders to cloud storage as bandwidth allows.
Behavior
-
Runs independently from ingestion
-
Continuously scans for folders marked:
INGEST_COMPLETE- Not locked by ingestion
-
For each eligible drive folder:
- Acquire upload lock
- Upload entire folder to OneDrive
- Preserve directory structure
- Ensure notes file is included
-
On successful upload:
- Mark as
UPLOAD_COMPLETE - Optionally archive or delete local copy
- Mark as
-
If no internet:
- Idle
- Retry periodically
Locking & State Management
Each drive folder contains explicit state:
status.json
{
"ingest": "complete",
"upload": "pending",
"locked_by": null
}
- Ingestion service sets and clears ingest locks
- Upload service respects ingest locks
- Debug mode can reset all locks at boot
This prevents:
- Partial uploads
- Corrupted transfers
- Race conditions
Notes Workflow (Human)
- User plugs in drive
- Waits for ingestion to complete
-
Later (from any device):
- Opens OneDrive
- Edits
notes.txt - Adds contextual notes about the drive contents
No interaction with the Pi required.
LAN Status Website
Purpose
Provide real-time observability.
Features
- Accessible on local network
-
Displays:
- Connected drives
- Active ingestion
- Upload queue
- Upload progress
- Errors
- Storage usage
- Internet status
-
Debug mode:
- Clears displayed history on boot
No authentication required (LAN-only).
Debug & Logging System
-
Continuous debug output including:
- Drive detection events
- Mount/unmount
- Copy progress
- Upload state
- Errors and retries
-
Logs written to:
- Console
- Rotating log files
-
Verbosity controlled by a debug flag