Logo BXC - Benford Fraud Analysis Tool
BXC - Benford Fraud Analysis Tool

BXC - Benford Fraud Analysis Tool

Current Version: 2.0.1

BXC - Benford Analysis Tool

A forensic digital analysis tool for detecting data manipulation using Benford's Law.

Overview

BXC (Benford X-C) is a command-line tool that analyzes numerical data to determine if it follows Benford's Law, a mathematical principle that describes the expected frequency distribution of leading digits in naturally occurring datasets. Deviations from this distribution can indicate data manipulation, fraud, or synthetic data generation.

Features

  • First digit or all digits analysis - Choose between analyzing only the first digit or all digits in your dataset
  • Real-time animated visualization - Watch digit frequencies update as data is processed with cumulative display
  • Animated GIF export - Generate animated GIFs from analysis results for presentations and reports
  • Chi-squared statistical testing - Automatic calculation with 95% confidence level interpretation
  • Multiple data sources - Analyze local files or download from URLs (HTTP/FTP)
  • CSV column extraction - Analyze specific columns from multi-column datasets
  • Comprehensive reporting - Generates value logs, percentage logs, and ASCII charts
  • Interactive and batch modes - Use command-line flags or interactive prompts
  • Custom metadata - Add titles, descriptions, and source information to analyses
  • Installation

    From DEB Package

    bash
    sudo dpkg -i bxc2.0.0amd64.deb
    

    From Source

    Requires the FreeBASIC Compiler (fbc) to compile from source:
    bash
    fbc bxc.bas
    

    Usage

    Basic Syntax

    bash
    bxc -f [file] -d [1|all] -l [length] -c [column] [options]
    

    Required Flags

  • -f [file] - Data file to analyze (local file or URL)
  • -d [1|all] - Analyze first digit (1) or all digits (all)
  • -l [number] - Sample pool length (typically 10000 for statistical significance)
  • -c [number] - Column number (0 for single column data)
  • Optional Flags

  • -a [interval] - Enable animated graph display (updates every N records, default: 100)
  • -g - Generate animated GIF from animation (requires -a flag)
  • -t [text] - Title for the analysis
  • -s [text] - Data source description
  • -i [text] - Additional information/description
  • -h, --help - Display help message
  • Examples

    Basic first digit analysis:
    bash
    bxc -f financial_data.dat -d 1 -l 10000 -c 0
    
    Multi-column CSV with animation:
    bash
    bxc -f transactions.csv -d 1 -l 10000 -c 2 -a 50
    
    All digits analysis:
    bash
    bxc -f dataset.dat -d all -l 5000 -c 0
    
    Analyze data from URL:
    bash
    bxc -f http://example.com/data.csv -d 1 -l 10000 -c 1
    
    Interactive mode:
    bash
    bxc
    

    Program will prompt for all parameters

    Generate animated GIF with custom metadata:
    bash
    bxc -f sales_data.csv -d 1 -l 10000 -c 2 -a 50 -g -t "Q4 Sales Analysis" -s "Company XYZ"
    

    Understanding the Output

    Animated Display

    When using the -a flag, you'll see a live updating display showing cumulative analysis:
    
    ========================================================================
    Benford X-C Live Analysis - Animated View (Cumulative)
    ========================================================================
    Records processed: 15342 | Total digits analyzed: 8450
    ------------------------------------------------------------------------
    
    Digit  Actual   Expected  Deviation  Chart
    -----  -------  --------  ---------  ---------------------------------
      1    30.12%    30.10%   +0.02%     ███████████████
      2    17.58%    17.60%   -0.02%     ████████
      3    12.51%    12.50%   +0.01%     ██████
      ...
    
    The cumulative display shows how data progressively converges (or diverges) from Benford's Law, making it easier to detect fraud patterns.

    Final Report

    At the end of analysis, you'll receive:
  • Final Statistics - Average percentages across all samples
  • Chi-Squared Test Result - Statistical significance test
  • Interpretation - Whether data fits Benford's Law
  • 
    Chi-Squared Statistic: 8.2347
    
    Result: Data FITS Benford's Law (95% confidence)
            No significant deviation detected.
    

    Output Files

  • Values Log [filename]_[mode]-[sample]-.log - Raw digit counts
  • Percentage Log [filename][mode]-[sample].log - Percentage distributions
  • ASCII Chart chart[filename][mode]-[sample]_.log.txt - Visual representation
  • Animated GIF benfordanimation[timestamp].gif - Animated visualization (when using -g flag)
  • Benford's Law Reference

    Expected first digit frequencies: | Digit | Expected % | |-------|------------| | 1 | 30.1% | | 2 | 17.6% | | 3 | 12.5% | | 4 | 9.7% | | 5 | 7.9% | | 6 | 6.7% | | 7 | 5.8% | | 8 | 5.1% | | 9 | 4.6% |

    Chi-Squared Interpretation

  • χ² < 15.51 - Data fits Benford's Law (no manipulation detected)
  • 15.51 ≤ χ² < 20 - Moderate concern, investigate further
  • χ² ≥ 20 - High concern, likely fraud or synthetic data
  • Use Cases

  • Financial Fraud Detection - Analyze accounting records, invoices, expenses
  • Election Data Verification - Detect potential vote manipulation
  • Scientific Data Validation - Verify experimental or survey data authenticity
  • Tax Compliance - Audit financial statements and tax returns
  • Insurance Claims - Identify potentially fraudulent claim patterns
  • Audit and Compliance - General purpose data integrity verification
  • Technical Details

    Requirements

  • Linux/Unix operating system (x86-64)
  • Standard GNU utilities (cut, wget, head, tail)
  • Terminal with ANSI escape code support (for animation)
  • ImageMagick (for GIF generation, optional)
  • Sample Size Recommendations

    Benford's Law analysis requires adequate sample sizes:
  • Minimum: 1,000 records
  • Recommended: 10,000+ records
  • Optimal: 50,000+ records
  • Performance

  • Processes ~125,000 records per second
  • Minimal overhead with animation enabled (~5%)
  • Efficient memory usage for large datasets
  • About Benford Bench Project

    BXC is part of the Benford Bench project (benfordbench.org), operational since 2016. The project was created to crowdsource fraud identification and reporting in big data through Benford's Law analysis.

    Project Contributors

  • Jason Page (Original Author)
  • Morris Chukhman
  • Padraig O'Hara
  • Kevin Perez
  • Michael Fiedler
  • License

    This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

    Support

    For issues, questions, or contributions:
  • Visit: https://benfordbench.org
  • Report issues on the project repository
  • Version

    Current version: 2.0.0 See CHANGELOG.md for version history and updates.

    What's New in 2.0.0

  • Animated GIF Export - Generate shareable animated GIFs of your analysis
  • Cumulative Animation - Watch data converge to Benford's Law in real-time
  • Enhanced Display - 2 decimal precision and fixed-width columns for cleaner output
  • Custom Metadata - Add titles, descriptions, and source information
  • Improved Labels - Clearer terminology throughout the interface
For complete details, see CHANGELOG.md.

Download Options

Free Download: Source code and changelog are freely available below.
Compiled Versions: Support development with a donation via PayPal to receive compiled binaries.

Free Downloads

📦 Download Source Code

Compiled Binaries (Donation-Based)

Support this project and get instant access to compiled versions for your platform.

Debian Linux

Changelog

Changelog

All notable changes to the BXC (Benford X-C) project will be documented in this file. The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[2.0.0] - 2025-11-24

Added

#### New Features
  • Animated GIF Export (-g flag)
  • - Generates animated GIF from animation frames - 0.1 second per frame (10 fps) for smooth playback - Automatic frame capture during animation - ImageMagick-based conversion to GIF - Temporary frame storage with automatic cleanup - Perfect for sharing analysis results and presentations
  • Custom Metadata Flags
  • - -t [text] - Add custom title to analysis - -s [text] - Specify data source description - -i [text] - Include additional information/notes - Enhances documentation and shareability of results
  • Cumulative Animation Display
  • - Shows progression from start of data processing - Displays convergence towards Benford's Law - Real-time visualization of data accumulation - More meaningful for fraud detection than segment-based display - Clear indication of natural vs synthetic data patterns #### Output Improvements
  • 2 Decimal Point Precision
  • - All percentage values formatted to exactly 2 decimal places - Consistent formatting across all output modes - Cleaner, more professional appearance - Reduced screen space usage for better chart visibility
  • Fixed-Width Column Alignment
  • - Perfect alignment of all columns in animated charts - Right-aligned percentages with space padding - Prevents visual "shifting" during animation - Easier to track changes across frames
  • Enhanced Display Labels
  • - Changed "Processing" to "Records processed" for clarity - Changed "Current" to "Total digits analyzed" - Removed confusing "Sample size" from animation view - Added "(Cumulative)" indicator to animation header

    Changed

    #### Animation Behavior
  • Animation now shows cumulative data from beginning
  • Counters never reset during animation (separate from segment reporting)
  • Progressive convergence visualization
  • Better fraud detection indicators
  • #### Technical Improvements
  • Added cumulative counter tracking (cumc, cumc1-cum_c9)
  • Proper integer type declarations for all counters
  • Fixed parameter passing to animation functions
  • Improved frame capture with proper type checking
  • Fixed

  • Fixed column misalignment in animated charts
  • Fixed "Records processed: 0" display bug
  • Fixed parameter type mismatches in animation functions
  • Removed first frame with zero records from animation
  • Dependencies

  • Added ImageMagick requirement for GIF generation
  • All other dependencies remain the same
  • Performance

  • Frame capture adds ~5% overhead when GIF generation enabled
  • No performance impact when GIF generation not used
  • Efficient temporary file management
  • [1.0.0] - 2025-11-23

    Added

    #### New Features
  • Animated Graph Display (-a flag)
  • - Real-time visualization of digit frequency analysis - Configurable update interval (default: 100 records) - Shows actual vs expected Benford percentages - Displays deviation from expected values - Live progress tracking with record counts
  • Chi-Squared Statistical Test
  • - Automatic calculation of chi-squared statistic - 95% confidence level interpretation - Clear indication of Benford's Law compliance - Fraud detection indicators
  • Expected Benford Values Integration
  • - Built-in Benford's Law percentages for first digit - Automatic deviation calculation - Color-coded terminal display - Visual bar charts in output
  • Comprehensive Help System
  • - Detailed -h and --help flag support - Command-line usage examples - Flag descriptions and requirements - Interactive mode documentation
  • ASCII Chart Generation
  • - Visual bar chart output files - Sample-by-sample visualization - Easy-to-read text format - Preview display in terminal #### User Experience
  • Interactive mode with intelligent prompts
  • File listing for easier file selection
  • Configuration summary before processing
  • Progress indicators during analysis
  • Clear error messages with guidance
  • #### Output Enhancements
  • Final statistics report with averages
  • Chi-squared test results
  • Deviation analysis for each digit
  • Multiple output file formats
  • Chart preview in terminal
  • Changed

    #### Performance Improvements
  • Native File I/O - Replaced shell-based file operations with native BASIC I/O
  • - ~10-100x faster for large datasets - Reduced system call overhead - Better memory efficiency
  • Optimized String Operations
  • - Consolidated duplicate calculations - Reduced string concatenation overhead - Better variable reuse patterns
  • Efficient Data Cleaning
  • - Removes non-numeric characters before analysis - Handles malformed input gracefully - Improved parsing logic
  • Eliminated Redundant Shell Calls
  • - Removed unnecessary shell "echo > file" commands - Optimized column extraction - Reduced external process spawning #### Code Quality
  • Modular Subroutines
  • - parse_flag() - Clean command-line argument parsing - drawanimatedchart() - Animation display logic - printdigitrow() - Formatted row output - generatefinalreport() - Statistics and reporting - writechartline() - ASCII chart generation
  • Improved Variable Naming
  • - More descriptive variable names - Clear purpose indication - Reduced cognitive load
  • Better Code Organization
  • - Logical flow from initialization to completion - Clear separation of concerns - Enhanced comments and documentation #### Error Handling
  • File existence verification before processing
  • Column validation with fallback options
  • Graceful fallback to interactive mode
  • URL download error handling
  • Better handling of empty or malformed data
  • Performance Benchmarks

    Test Configuration: 1,000,000 records
  • Original version: ~45 seconds
  • Improved version: ~8 seconds
  • Speedup: 5.6x
  • With Animation Enabled: 100,000 records
  • Processing time: ~12 seconds
  • Animation overhead: ~5%
  • Update interval 100: Optimal performance
  • Technical Details

    #### Compatibility
  • Maintains 100% backward compatibility with original command-line interface
  • All new features are opt-in via flags
  • Default behavior matches original version
  • Interactive mode preserved for legacy workflows
  • #### Requirements
  • FreeBASIC Compiler (fbc)
  • Unix/Linux x86-64 system
  • Standard GNU utilities (cut, wget, head, tail)
  • Terminal with ANSI escape code support (for animation)
  • #### Statistical Method
  • Degrees of freedom: 8 (9 digits - 1)
  • Critical value at 95% confidence: 15.51
  • Chi-squared formula: Σ((observed - expected)² / expected)
  • Security

  • Input sanitization for file operations
  • Safe handling of URL downloads
  • Protection against command injection
  • Validated column number inputs
  • Documentation

  • Comprehensive README.md
  • Detailed IMPROVEMENTS.md
  • Command-line help system
  • Usage examples and best practices
  • [0.9.0] - Original Release

    Initial Features

  • First digit Benford analysis
  • All digits analysis mode
  • CSV column extraction
  • URL download support (HTTP/FTP)
  • Multi-column data support
  • Sample pool configuration
  • Value and percentage logging
  • Interactive mode
  • Command-line flag support
  • Original Capabilities

  • Basic digit frequency counting
  • Percentage calculation
  • Log file generation
  • Simple text output
  • Data file processing
  • Column-based analysis
  • Future Roadmap

    Planned Features (3.0.0)

  • JSON/CSV structured output export
  • Additional chart formats (PNG/SVG via Gnuplot or similar)
  • Multi-file batch processing
  • Second and third digit position analysis
  • Database integration (PostgreSQL, MySQL)
  • Web dashboard with real-time visualization
  • Configurable expected distributions
  • Parallel processing for large files
  • Under Consideration

  • Machine learning integration for anomaly detection
  • REST API for web service integration
  • Docker containerization
  • Cloud storage integration (S3, GCS)
  • Excel file support (.xlsx, .xls)
  • PDF report generation
  • Email notification system
  • Scheduled analysis automation
  • Credits

    Original Author

  • Jason S. Page - Creator and original developer
  • Benford Bench Project Team

  • Morris Chukhman
  • Padraig O'Hara
  • Kevin Perez
  • Michael Fiedler
  • Project

  • Benford Bench - benfordbench.org (Since 2016)
  • Mission: Crowdsource fraud identification and reporting in big data

License

GNU General Public License v3.0 or later --- For detailed improvement descriptions, see IMPROVEMENTS.md For usage instructions, see README.md