The Onion Routed File Share
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Yorhel 9acdc6d703 Add beautiful overview diagram thing 3 days ago
FileHash.md Formatting + minor ShareApi fix and rationale 1 week ago
Overview.svg Add beautiful overview diagram thing 3 days ago
README.md s/Torfs/TorFS/g 1 week ago
ShareApi.md ShareApi: Use the more strict RFC 3339 timestamps 3 days ago

README.md

Project overview

TorFS is a system for distributed file sharing, search and discovery on top of Tor.

General goals:

  • Easy sharing of complete directories
  • Decentralized file browsing and search interface
  • Curated: People share what they want and with their own file hierarchy, hubs can filter out stuff they don’t want.
  • Resilient: Being able to download a file from multiple sources

Project status

At this point, TorFS is nothing more than an assorted collection of notes and ideas. There are no detailed specifications and no working implementations.

High-level architecture

I’m going to re-use a lot of existing technologies here to make this project easy to implement and easy to use. This architecture is probably going to sound stupidly simple: That’s the intention.

The TorFS network would consist of the following entities:

  • Share: This is an onion service offering files and metadata over HTTP.
  • Hub: An onion service indexing multiple Shares and providing a friendly browse/search web UI and API.

A Share would create and manage an index of its own files and expose this metadata as a file through its HTTP service. A Share then registers itself with one or more Hubs. The Hubs fetch this metadata and update their internal indices. Users looking for files would use a Hub for discovery, and then download the files directly from the Share(s) that have it.

Since Hubs are managed by people, they can curate what is being indexed and what isn’t - thus filtering out malicious and illegal content according to policies set by the operator.

Share metadata should include a Merkle-tree based hash for each shared file, to offer secure failover if a file is in multiple Shares.

TorFS sub-projects

To make all this work, the following sub-projects will be needed:

Specification

Document the architecture and protocols used to make all the other TorFS components work together. This includes:

  • Description of the metadata format
  • API specification and requirements for Share implementations
  • API specification for Hub implementations

torfs-share

A simple tool to create and manage a Share. I envision this would be a tool, written in Rust (or a similarly efficient and easy to deploy language), that can index a directory, create/update the necessary metadata and register with Hubs. A Share can run in two modes:

  • Server: A special TorFS web server that shares a given directory. The web server would automatically manage and update the metadata.
  • CLI: A tool that can create the necessary metadata from the command line or as a cron job. A regular web server (thttpd/Apache/nginx) can be used to publish the files.

I suspect these two modes can be implemented in a single tool without too much bloat. The downside of the CLI mode is that the metadata will be saved alongside the published files, whereas server mode can store this data elsewhere.

torfs-hub

This may get complex, but by no means insurmountable. Some challenges include:

  • Dealing with information on hundreds of millions of files while still providing a fast querying interface
  • Active monitoring of known Shares for reliability and file updates
  • Having proper moderation tools to implement anti-spam, anti-abuse and custom filtering rules

Clients

Since a Hub exposes a web interface to discover files and since all the downloads happen over HTTP, a web browser is sufficient to download from the TorFS network. But a standard web browser is not a very good download manager:

  • No automatic resumption(?) when downloading from a slow and unreliable Share.
  • No way to download a single file from multiple Shares (either in parallel or as fallback).
  • No integrity checks.

Special TorFS client implementations will be needed in order to have reliable downloading from multiple Shares. These could come in various forms:

  • A browser plugin(?)
  • A CLI tool to download a specific file or directory
  • A GUI download manager

A client could also provide an alternative interface to the search and discovery functionality of Hubs, so there could be an integrated TorFS GUI download tool that does not require the use of a web browser.

Comparison with other projects

DC++: This is my main source for inspiration. The DC network has “hubs”, which are essentially curated lists of users who share files. Hubs facilitate presence notifications and file search. Users themselves offer their shared files to other users in the network. An essential part of the DC network is that all shared files are hashed, which allows for fast discovery of multiple users sharing the same file, thus allowing for faster and more resilient file downloads. The use of Merkle-tree hashing gives clients the opportunity to verify smaller chunks of downloaded files, to detect and handle corrupted sources early on. DC also offers chat functionality, but that is out of the scope of TorFS. DC does not protect the privacy of its users.

“Hidden Wikis” are one approach in Tor to aid discoverability. These are okay for finding web services, but do not really provide a good platform for browsing, finding and publishing files.

There are also file upload services and dropbox-like projects for Tor. These are, at the moment, centralized and isolated islands of files. TorFS could serve as an index on top of such services.

Tor Search Engines: Ahmia, Torch, Not Evil. These are close to what I wish to accomplish, but they’re not distributed, do not support easy browsing of files, and lack file hashes to make it easy to find alternative sources for the same content. Adding a new site to the search index is a manual process, I hope to automate this with tooling. These search engines do offer full-text search and index dynamic web pages, both of which are out of the scope of TorFS.