Features

Deep dive into LegalEase capabilities and AI pipelines.

This reference explains how the major subsystems work and what to expect from each one.

Document Processing Pipeline

LegalEase processes documents through a multi-stage pipeline to make them searchable and reviewable.

Processing Flow

Upload - Files are uploaded to Firebase Storage under users/{userId}/documents/{documentId}/
Trigger - A Firestore trigger detects the new document and starts processing
Extraction - Docling extracts text, structure, tables, and bounding boxes. OCR runs automatically for scanned pages.
Chunking - Content is split into hierarchical chunks:
- summary - Coarse overviews for jumping into large files
- section - Medium sections (~500 tokens)
- paragraph - Smaller blocks for precise matching
Embedding - Gemini generates dense vector embeddings for each chunk
Indexing - Vectors are stored in Qdrant with metadata for hybrid search
Page Renders - PDF pages are rendered as images for the viewer

Document Viewer

The built-in viewer provides:

Page-by-page navigation with thumbnails
Search hit highlighting using extracted bounding boxes
Text layer overlay for copy/paste
Entity sidebar showing extracted people, organizations, dates
Chunk metadata for debugging

Supported Formats

Format	Support
PDF	Full extraction with OCR
DOCX	Text and structure extraction
Images (PNG, JPG)	OCR extraction
HTML/Markdown	Direct text extraction

Transcription Pipeline

Audio and video files are transcribed using frontier AI models with speaker diarization.

Providers

LegalEase supports two transcription providers:

Gemini 2.5 Flash (default)

Works with Firebase Storage emulator
Automatic speaker diarization via prompt engineering
Speaker name inference from conversational context
Supports files up to 9.5 hours
Structured JSON output with timestamps

Google Speech-to-Text (Chirp 3)

Requires production GCS (not compatible with emulator)
Native speaker diarization
Higher accuracy for some audio types
Supports files up to 8 hours via BatchRecognize

Processing Flow

Upload - Audio/video uploaded to Firebase Storage
Trigger - Firestore trigger starts transcription job
Provider Selection - Based on TRANSCRIPTION_PROVIDER setting
Transcription - Provider generates segments with:
- Start/end timestamps
- Speaker identification
- Transcript text
- Confidence scores (Chirp only)
Speaker Inference - Names inferred from conversation
Summarization - Gemini generates summary, key moments, entities
Waveform - Audio peaks extracted for visual player (in progress)

Output Format

{
  fullText: string,
  segments: [{
    id: string,
    start: number,    // seconds
    end: number,
    text: string,
    speaker: string   // "Speaker 1", "Speaker 2", etc.
  }],
  speakers: [{
    id: string,
    inferredName?: string  // "John", "Jane", etc.
  }],
  duration: number,
  language: string,
  summarization: {
    summary: string,
    keyMoments: [...],
    actionItems: [...],
    topics: [...],
    entities: {...}
  }
}

Summarization

All transcripts are automatically analyzed using Gemini 2.5 Flash.

Output Components

Executive Summary

1-2 paragraph overview of the conversation
Focus on key facts and outcomes

Key Moments

Timestamped highlights with importance ratings (high/medium/low)
Click to jump directly to that point in the audio

Action Items

Follow-up tasks mentioned or implied
Extracted automatically from conversation

Topics

Main subjects discussed
Useful for categorization and filtering

Entities

People - Names mentioned in the conversation
Organizations - Companies, agencies, firms
Locations - Places referenced
Dates - Dates, deadlines, timeframes mentioned

Hybrid Search

LegalEase combines semantic and keyword search using Qdrant vector database.

How It Works

Query Processing
- Gemini generates dense vector embedding
- BM25 generates sparse keyword vector
Search Execution
- Both vectors query Qdrant simultaneously
- Results from each method are retrieved
Fusion
- Reciprocal Rank Fusion (RRF) combines results
- Balances semantic understanding with keyword precision
Filtering
- Results filtered by case, document type, date range
- Permissions applied based on user/team

Search Modes

Mode	Description
Hybrid	Best of both - semantic understanding + keyword precision
Semantic	Conceptual matching - finds related content even without exact words
Keyword	Traditional BM25 - exact term matching

Indexed Content

Document chunks (all granularities)
Transcript segments
Summaries and key moments
Entity mentions

Firebase Integration

LegalEase leverages Firebase for a serverless, real-time architecture.

Services Used

Cloud Firestore

Primary database for cases, documents, transcripts
Real-time listeners for instant UI updates
Automatic offline support

Firebase Storage

File storage for uploads
Secure, authenticated access
Resumable uploads for large files

Cloud Functions

Genkit-based AI flows
Firestore triggers for background processing
Scales automatically

Firebase Auth

Google sign-in
Email/password authentication
Session management

Real-Time Updates

All data uses Firestore real-time listeners:

Document processing status updates instantly
Transcript completion triggers immediate UI refresh
Team collaboration sees changes in real-time

AI Provider Architecture

LegalEase uses a provider abstraction for flexibility:

functions/src/transcription/
├── provider.ts      # Interface definition
├── types.ts         # Shared types
├── registry.ts      # Provider registration
├── index.ts         # Public API
└── providers/
    ├── gemini.ts    # Gemini 2.5 Flash
    └── chirp.ts     # Google Speech-to-Text

Adding New Providers

Implement TranscriptionProvider interface
Register in registry.ts
Provider automatically available via TRANSCRIPTION_PROVIDER env var

This pattern extends to future AI providers (OpenAI, Anthropic, local models).

Usage

Common LegalEase workflows for case management, transcription, and search.

Configuration

Environment variables and settings for LegalEase.