Data Super Hero

This is the third idaration of this prototype

High-Level Goal

Create a fully autonomous device that transfers data from any removable storage media to cloud storage (e.g. OneDrive) with zero user interaction, other than physically plugging in the drive.

The system must:

  • Work offline
  • Handle multiple drive types
  • Preserve full file structure
  • Generate a structured notes file
  • Upload to cloud asynchronously
  • Expose real-time system status via a LAN dashboard

Core Design Principles

  1. Zero-input operation

    • No UI, no commands, no buttons
    • Plug in a drive → data transfer begins automatically
  2. Offline-first

    • Drive ingestion must never depend on internet access
    • Cloud upload happens opportunistically when connectivity exists
  3. Separation of concerns

    • Drive ingestion and cloud uploading are handled by separate services
    • This maximizes throughput and avoids blocking
  4. Data safety over speed

    • No destructive operations on source drives
    • Clear locking and completion states
    • No race conditions between copy and upload
  5. Debuggable and observable

    • Continuous logs
    • LAN status dashboard
    • Resettable state for testing

Target Hardware

Prototype

  • Raspberry Pi Zero 2 W
  • MicroSD card (system + temporary storage)
  • USB OTG hub

Final Version

  • Larger Raspberry Pi (4 or 5)
  • External SSD for internal storage
  • Same software architecture

Supported Media Types

The system must automatically handle:

  • USB flash drives
  • External HDDs
  • External SSDs
  • Floppy drives (via USB floppy adapter)

No assumptions are made about:

  • File system type
  • Drive size
  • Drive name
  • Read/write speed

System Architecture Overview

+-----------------------------+
| Raspberry Pi                |
|                             |
|  ┌─────────────┐            |
|  | Drive Watch |◄── USB     |
|  └─────────────┘            |
|         │                   |
|         ▼                   |
|  ┌─────────────┐            |
|  | Ingestion   |            |
|  | Service     |            |
|  └─────────────┘            |
|         │                   |
|         ▼                   |
|  Local Storage (SSD/SD)     |
|         │                   |
|  ┌─────────────┐            |
|  | Upload      |◄── Internet|
|  | Service     |            |
|  └─────────────┘            |
|         │                   |
|         ▼                   |
|  OneDrive / Cloud Provider  |
|                             |
|  ┌─────────────┐            |
|  | LAN Web UI  |            |
|  └─────────────┘            |
+-----------------------------+

Boot Behavior

On system boot:

  1. Initialize debug mode

    • If DEBUG_RESET = true:

      • Clear all drive locks
      • Reset incomplete transfer states
      • Reset LAN dashboard state
  2. Start services

    • Drive detection / ingestion loop
    • Upload worker loop
    • LAN status web server
    • Debug logging service

Drive Ingestion Service (Script #1)

Purpose

Handle physical drives as fast and safely as possible.

Behavior

  1. Runs continuously after boot

  2. Waits for a new drive to be connected

  3. On detection:

    • Identify the device
    • Mount it read-only if possible
    • Generate a unique drive ID (timestamp + hash)
  4. Create a corresponding folder in internal storage:

    /ingest/
      └── DRIVE_<id>/
          ├── data/
          ├── notes.txt
          ├── file_tree.txt
          └── status.json
    
  5. Copy entire file structure exactly as-is into data/

  6. Generate:

    • file_tree.txt: full directory structure snapshot
    • notes.txt: contains

      • Drive ID
      • Timestamp
      • Placeholder section for human notes
  7. Mark the drive folder as:

    • INGEST_COMPLETE
  8. Safely unmount the physical drive

  9. Release the drive for removal

⚠️ This service never uploads and never waits for internet


Upload Service (Script #2)

Purpose

Move completed drive folders to cloud storage as bandwidth allows.

Behavior

  1. Runs independently from ingestion

  2. Continuously scans for folders marked:

    • INGEST_COMPLETE
    • Not locked by ingestion
  3. For each eligible drive folder:

    • Acquire upload lock
    • Upload entire folder to OneDrive
    • Preserve directory structure
    • Ensure notes file is included
  4. On successful upload:

    • Mark as UPLOAD_COMPLETE
    • Optionally archive or delete local copy
  5. If no internet:

    • Idle
    • Retry periodically

Locking & State Management

Each drive folder contains explicit state:

status.json
{
  "ingest": "complete",
  "upload": "pending",
  "locked_by": null
}
  • Ingestion service sets and clears ingest locks
  • Upload service respects ingest locks
  • Debug mode can reset all locks at boot

This prevents:

  • Partial uploads
  • Corrupted transfers
  • Race conditions

Notes Workflow (Human)

  1. User plugs in drive
  2. Waits for ingestion to complete
  3. Later (from any device):

    • Opens OneDrive
    • Edits notes.txt
    • Adds contextual notes about the drive contents

No interaction with the Pi required.


LAN Status Website

Purpose

Provide real-time observability.

Features

  • Accessible on local network
  • Displays:

    • Connected drives
    • Active ingestion
    • Upload queue
    • Upload progress
    • Errors
    • Storage usage
    • Internet status
  • Debug mode:

    • Clears displayed history on boot

No authentication required (LAN-only).


Debug & Logging System

  • Continuous debug output including:

    • Drive detection events
    • Mount/unmount
    • Copy progress
    • Upload state
    • Errors and retries
  • Logs written to:

    • Console
    • Rotating log files
  • Verbosity controlled by a debug flag


This site uses Just the Docs, a documentation theme for Jekyll.