GitGram — README.md — GitGram
Hobbes_OS2_Archive / main / v1.12 / README.md17,903 B↓ Raw
# Hobbesgram — OS/2 File Archive

**v1.12** — A flat-file PHP file-sharing archive in the style of the original
Hobbes OS/2 archive at hobbes.nmsu.edu. No database required.


## Requirements

- PHP 8.1+ (PHP 8.4 recommended)
- Apache 2.4 with `mod_rewrite` enabled
- PHP-FPM (shared-hosting compatible)
- Writable `data/` directory


## Installation

1. Upload all files to your web root (`htdocs/`).

2. Ensure the `data/` directory and all its subdirectories are writable
   by the web server. On most shared hosts:

   ```
   chmod -R 755 data/
   ```

3. Visit `http://yoursite.com/setup` to create the admin account and
   seed the default category tree. The setup page is only reachable
   until the first user exists.

4. Delete `setup.php` and `check.php` from the server once setup is done.
   `check.php` is a diagnostic tool that reveals server information.

5. Configure your site name, tagline, color theme, and landing page text
   via **Admin → Settings**, **Admin → CSS**, and **Admin → Landing Page**.

6. *(Optional)* Place files in `data/pool/` via FTP for batch import
   through the Pool approval interface.

7. *(Optional)* Configure Archive.org S3 credentials in **Admin → Settings**
   to enable mirroring files to the Internet Archive.


## Directory Structure

```
htdocs/
  index.php               Single entry point and URL router
  config.php              Paths, file size limits, allowed extensions,
                          role list, OS/2 UA patterns, default settings,
                          THEME_PRESETS constant (5 built-in color themes)
  check.php               Diagnostic page (delete after setup)
  setup.php               First-run wizard (self-disables after use)
  php.ini                 Upload/memory limits for PHP-FPM environments
  .htaccess               mod_rewrite rules; CGIPassAuth for wget/curl auth
  .user.ini               PHP limits override for shared hosting
  README.md               This file
  CHANGELOG.md            Version history
  hobbes.txt              Example hobbes.txt format reference (large batch)
  pmmail.txt              Example pmmail.txt format reference (single entry)

  includes/
    functions.php         Utility functions: roles, CSRF, flash messages,
                          pagination, date formatting, filename safety,
                          OS/2 UA detection, category tree helpers,
                          category path building, file icon map,
                          file_public_meta() for mirror/catalog output
    storage.php           Atomic JSON read/write (temp + rename, no flock).
                          CRUD helpers for users, files, categories,
                          invites, pool listing, session management,
                          download log (dllog_append / dllog_load)
    auth.php              Custom PHP session save handler (no flock),
                          current_user(), require_role(), can() checks,
                          HTTP Basic Auth support (auth_try_basic)
    search.php            Inverted keyword index: index, remove, query,
                          rebuild functions
    markdown.php          Minimal Markdown-to-HTML parser (no infinite loops)

  pages/                  One PHP file per route
    home.php              Landing page (renders Markdown landing content)
    browse.php            Category browser with file listing and DL counts
    file.php              File detail page; Archive.org link; wget hint
    file/edit.php         Edit metadata / move to category (editor+);
                          Rename file on disk (admin only);
                          Delete file from disk (admin only)
    download.php          File streaming with download counter increment;
                          logs wget/curl downloads to dllog
    download_meta.php     Public JSON metadata sidecar for a file
                          (served at /download/{path}.json)
    catalog.php           Public JSON catalog of all approved files
                          (served at /catalog.json)
    mirror.php            Public mirror info page with wget/curl examples
                          and bulk-mirror shell script (/mirror)
    search.php            Keyword search results with DL counts
    upload.php            Web file upload form (contributor+);
                          category field locked when a limit is active
    pool.php              Approval queue: web uploads (inline edit+approve)
                          + FTP single files + FTP folder batch import
                          (editor+)
    login.php             Authentication form
    logout.php            Session teardown
    register.php          Account creation (open or invite-only)
    invite.php            Invite code generation and listing
    profile.php           User dashboard: upload history, invite codes,
                          personal display theme override
    setup.php             First-run account and category seeding
    admin/
      index.php           Admin dashboard (stats, quick links)
      settings.php        Site name, tagline, open registration,
                          global contributor upload category,
                          Archive.org S3 credentials
      css.php             Color palette editor; 5 named theme presets;
                          site-default preset selector
      users.php           User list, role changes, account management,
                          Cat. Access column, Limits button
      user_limits.php     Per-user upload category restriction
                          (/admin/user-limits/{username})
      categories.php      Category tree editor (add, rename, nest, delete);
                          rename does not change the slug
      landing.php         Landing page Markdown content editor
      splash.php          Splash screen content editor (non-OS/2 visitors)
      meta_merge.php      Bulk metadata import from hobbes.txt / pmmail.txt
                          or a .zip bundle of .txt files
      bulk_delete.php     Bulk file deletion by category (admin only)
      repair.php          Archive integrity check: orphaned files,
                          empty-browse categories, duplicate categories
      reports.php         Quality reports: duplicate files (size+MD5),
                          same filename in multiple locations,
                          files missing descriptions
      mirror.php          Mirror files to Archive.org (per-file and batch);
                          wget/curl download log viewer

  templates/
    header.php            HTML head, CSS custom properties (--c-*),
                          navigation, category sidebar, flash messages;
                          applies per-user theme override if set
    footer.php            Page footer

  data/                   All persistent state (never served directly)
    .htaccess             Denies all HTTP access to data/ and subdirs
    settings/
      settings.json       Site settings, CSS palette, active theme preset,
                          Archive.org credentials
    categories/
      categories.json     Category tree (id, name, slug, parent, desc)
    users/                One .json file per registered user
    files/                One .json metadata record per uploaded file
    uploads/              Uploaded files organized by category slug path
                          e.g. uploads/multimedia/images/icons/file.zip
    pool/                 FTP staging folder for pending imports
    invites/              One .json file per invite code
    index/
      search.json         Inverted search index (keyword -> [file ids])
      dllog.json          wget/curl download log (capped at 5,000 entries)
    sessions/             PHP session files (custom handler, no flock)
    merges/               Temporary meta-merge sessions (auto-expire 2 h)
```


## User Roles

| Role | Description |
|------|-------------|
| **guest** | Visitors with an OS/2 User-Agent string get full browse and download access automatically. No account required. |
| **contributor** | Registered user. Can upload files (pending editor approval) and generate invite codes. |
| **editor** | Can approve or reject uploads, edit any file's metadata (title, description, author, etc.), move files between categories, import from the FTP pool, and run Meta Merge. |
| **admin** | Full access. All editor capabilities plus: user management, site settings, CSS theming, bulk file deletion, single-file deletion, file renaming, quality reports, and Archive.org mirroring. |


## Access Control

- **OS/2 browser detection** is based on the HTTP User-Agent string.
  Recognized patterns: `OS/2`, `Warp`, `WebExplorer`, `Warpzilla`, `Lynx.*OS`,
  `SPRY`, `PMX`. All browsers sending one of these strings receive guest
  browse/download access without a login.

- **Non-OS/2 visitors** see the landing page and splash screen only.
  They must create an account (via invite or open registration) to browse
  or download.

- **Open registration** can be toggled in Admin → Settings. When off, new
  accounts require an invite code.

- **Invite codes** are generated by contributors and above. Each code sets the
  invited user's starting role.

- **CSRF tokens** protect all state-changing POST requests.


## wget / curl Access

Registered users can download files from the command line using HTTP Basic Auth:

```sh
wget --user=USERNAME --password=PASSWORD "https://yoursite.com/download/path/to/file.zip"
curl -u USERNAME:PASSWORD "https://yoursite.com/download/path/to/file.zip" -O
```

The file detail page shows a pre-filled `wget` command for logged-in users.

When credentials are not provided on a download URL, the server returns a
`401 Unauthorized` response with a `WWW-Authenticate` header so `wget` and
`curl` know to prompt for or accept credentials.

Admin users can view wget/curl download activity in
**Admin → Mirror to Archive.org → Download Log tab**.


## File Uploads

**Web uploads** (contributor+):
Upload via `/upload`. The file is stored in `data/uploads/` and marked
pending. An editor or admin must approve it in the Pool.

**FTP single file import** (editor+):
Drop a file into `data/pool/` via FTP. It appears in `/pool` with a metadata
entry form. Supply title, description, author, and category, then click Import.

A companion `.meta.json` file can pre-fill the form:
```json
{ "title": "...", "desc": "...", "author": "...", "version": "..." }
```

**FTP folder batch import** (editor+):
Drop an entire directory tree into `data/pool/`. The folder and its
subdirectories are mapped to a new (or existing) category hierarchy. Each
file gets a title derived from its filename; required fields default to
"Unknown" and can be edited after import.


## Meta Merge (`/admin/meta-merge`)

Bulk import of metadata from plain-text files. Upload a `.txt` file (or a
`.zip` bundle containing multiple `.txt` files) in either supported format.
The system parses it, matches entries to existing archive files by category
path and filename, and presents a review page before writing any changes.

**Supported formats:**

`hobbes.txt` — one or more blocks separated by dashed lines:
```
----------------------------------------
DIR:  pub/multimedia/images/icons
FILE: 1700ico2.zip
DESC:
Multi-line description of the file.
----------------------------------------
```

`pmmail.txt` — labelled key: value fields:
```
Archive Filename: pmmail-3-25-00-1993.wpi
Short Description: Email client for OS/2.
Long Description: PMMail is an enhanced TCP/IP email client...
Proposed directory for placement: /pub/os2/apps/internet/mail/reader/pmm
Your name: Neil Waldhauer
Program URL: http://pmmail.os2voice.org/
Operating System/Version: OS/2, ArcaOS and eComStation
Additional requirements: See the readme.
```

Path matching: the `DIR` / *Proposed directory* value is stripped of leading
`pub/` or `hobbes/pub/` prefixes, then each path segment is slugified and
compared against the category path of each file in the archive.

Merge behavior: by default, only empty or "Unknown" fields are updated. Tick
"Overwrite existing fields" to replace all values from the txt.

Unmatched entries offer: filename-only suggestions, manual file-ID entry, or
Skip. A before/after diff is shown for every matched entry before anything is
written.


## Bulk Delete (`/admin/bulk-delete`)

Admin-only tool for removing multiple files from a category at once.

- Select a category from the dropdown; the page lists all files in that
  category (approved and pending).
- Per-page options: 25 / 50 / 100 / All. A warning is shown if "All" is
  selected and the category contains more than 250 files.
- "Select ALL N files in this category (all pages)" marks every file in the
  category for deletion regardless of current pagination.
- Deletion removes the physical file, the metadata JSON, and all search index
  entries. Empty category directories are cleaned up.

Single-file delete is also available to admins from the file edit page
(`/file/edit/{id}`) via the Danger Zone section at the bottom of the form.


## File Editing / Recategorisation

Editors and admins can edit any approved file's metadata from the file detail
page (Edit Metadata button) or directly at `/file/edit/{id}`.

When the category is changed, the physical file is moved on disk and
`stored_name` is updated. If a file with the same name already exists in the
target category, the move is blocked and an error is shown.

**Admins** can also rename the physical file on disk from the Danger Zone
section of the edit page. The rename validates the extension (must match the
original), checks for name collisions in the current directory, and updates
both `original_name` and `stored_name` in the metadata.


## Archive Repair (`/admin/repair`)

Scans the archive for integrity problems and reports:

- **Orphaned files** — physical files on disk with no matching metadata record.
- **Empty-browse categories** — categories with no approved files and no
  sub-categories (candidates for pruning).
- **Duplicate categories** — category names that appear more than once under
  the same parent.

No changes are made automatically; the report is read-only.


## Quality Reports (`/admin/reports`)

Three optional reports for archive hygiene:

- **Duplicate Files** — groups files that share the same size and MD5 hash.
  Useful for finding accidental re-uploads across different categories.
- **Same Filename** — groups files that share the same filename (case-
  insensitive) regardless of location. Not necessarily duplicates, but
  worth reviewing.
- **Missing Descriptions** — lists approved files with no description, or
  whose description is a placeholder value ("Unknown", "N/A", etc.).

All reports include direct links to each file's Edit Metadata page.


## Archive.org Mirroring (`/admin/mirror`)

Files can be mirrored to the Internet Archive for long-term preservation.

**Setup:** Enter your Archive.org S3 credentials in **Admin → Settings →
Archive.org Mirror Credentials**. Get your keys at `archive.org/account/s3.php`.

**Per-file upload:** Click "Upload" next to any file in the Not Yet Mirrored
list. The item identifier is set to `hobbesgram-{file_id}` and all available
metadata is attached as Archive.org headers.

**Batch upload:** Select a batch size (5–50) and click "Start Batch Upload"
to upload the next N unmirrored files in sequence.

Once mirrored, `archiveorg_id` is saved to the file's metadata and a link
to the Archive.org item appears on the file detail page.


## CSS Theming

The entire color scheme is controlled from **Admin → CSS**. Colors are stored
as CSS custom properties (`--c-*`) applied at render time; no external CSS
files are needed.

**Five built-in presets:**

| Preset | Description |
|--------|-------------|
| OS/2 Classic | Grey desktop with navy accents (default) |
| Dark Mode | Dark grey background with blue highlights |
| Green Terminal | Black background with green-on-black text |
| Hobbes OG | Deep blue palette echoing the original hobbes.nmsu.edu |
| Amber | Black background with amber terminal text |

**Site-default preset:** Admins select the active preset in Admin → CSS.
The selection is saved as `active_preset` in `settings.json`.

**Per-user theme:** Registered users can override the site default from their
Profile page, choosing any of the five presets or reverting to the site default.


## Atomic Writes and Shared Hosting

All JSON writes use a temp file + atomic `rename()` pattern. `flock()` is
never used. This is safe on NFS mounts and shared hosting filesystems where
`flock()` can block indefinitely.

The same pattern is used for all `storage.php` writes (settings, users, files,
categories, invites), the custom PHP session save handler in `auth.php`, and
meta merge session files in `data/merges/`.

The `.htaccess` in `data/` denies direct HTTP access to all data files. If
your host does not support `.htaccess`, move the `data/` directory above the
web root and update `DATA_DIR` in `config.php`.


## Allowed File Types

| Category | Extensions |
|----------|-----------|
| OS/2 programs | `zip wpi exe cmd bat inf rpm tar gz bz2 lzh arj 7z cab img iso` |
| Media | `jpg jpeg png gif bmp ico wav mp3 mid midi au aiff avi mov mp4 mpeg mpg` |
| Documents | `txt nfo diz doc html htm pdf rtf me` |

Maximum upload size: 200 MB (configurable in `config.php` and `php.ini`).


## Search

Keyword search uses an inverted index stored in `data/index/search.json`.
The index is updated automatically when files are approved or their metadata
is edited. To rebuild the full index from scratch, use
**Admin → Rebuild Search Index**.

Search behavior:
- Queries are tokenized (3+ characters, non-stop words, case-insensitive)
- AND search first (all keywords must match)
- Falls back to OR search if AND yields no results
Ready
GitGram