Features
This reference explains how the major subsystems work and what to expect from each one.
Document Processing Pipeline
LegalEase processes documents through a multi-stage pipeline to make them searchable and reviewable.
Processing Flow
- Upload - Files are uploaded to Firebase Storage under
users/{userId}/documents/{documentId}/ - Trigger - A Firestore trigger detects the new document and starts processing
- Extraction - Docling extracts text, structure, tables, and bounding boxes. OCR runs automatically for scanned pages.
- Chunking - Content is split into hierarchical chunks:
summary- Coarse overviews for jumping into large filessection- Medium sections (~500 tokens)paragraph- Smaller blocks for precise matching
- Embedding - Gemini generates dense vector embeddings for each chunk
- Indexing - Vectors are stored in Qdrant with metadata for hybrid search
- Page Renders - PDF pages are rendered as images for the viewer
Document Viewer
The built-in viewer provides:
- Page-by-page navigation with thumbnails
- Search hit highlighting using extracted bounding boxes
- Text layer overlay for copy/paste
- Entity sidebar showing extracted people, organizations, dates
- Chunk metadata for debugging
Supported Formats
| Format | Support |
|---|---|
| Full extraction with OCR | |
| DOCX | Text and structure extraction |
| Images (PNG, JPG) | OCR extraction |
| HTML/Markdown | Direct text extraction |
Transcription Pipeline
Audio and video files are transcribed using frontier AI models with speaker diarization.
Providers
LegalEase supports two transcription providers:
Gemini 2.5 Flash (default)
- Works with Firebase Storage emulator
- Automatic speaker diarization via prompt engineering
- Speaker name inference from conversational context
- Supports files up to 9.5 hours
- Structured JSON output with timestamps
Google Speech-to-Text (Chirp 3)
- Requires production GCS (not compatible with emulator)
- Native speaker diarization
- Higher accuracy for some audio types
- Supports files up to 8 hours via BatchRecognize
Processing Flow
- Upload - Audio/video uploaded to Firebase Storage
- Trigger - Firestore trigger starts transcription job
- Provider Selection - Based on
TRANSCRIPTION_PROVIDERsetting - Transcription - Provider generates segments with:
- Start/end timestamps
- Speaker identification
- Transcript text
- Confidence scores (Chirp only)
- Speaker Inference - Names inferred from conversation
- Summarization - Gemini generates summary, key moments, entities
- Waveform - Audio peaks extracted for visual player (in progress)
Output Format
{
fullText: string,
segments: [{
id: string,
start: number, // seconds
end: number,
text: string,
speaker: string // "Speaker 1", "Speaker 2", etc.
}],
speakers: [{
id: string,
inferredName?: string // "John", "Jane", etc.
}],
duration: number,
language: string,
summarization: {
summary: string,
keyMoments: [...],
actionItems: [...],
topics: [...],
entities: {...}
}
}
Summarization
All transcripts are automatically analyzed using Gemini 2.5 Flash.
Output Components
Executive Summary
- 1-2 paragraph overview of the conversation
- Focus on key facts and outcomes
Key Moments
- Timestamped highlights with importance ratings (high/medium/low)
- Click to jump directly to that point in the audio
Action Items
- Follow-up tasks mentioned or implied
- Extracted automatically from conversation
Topics
- Main subjects discussed
- Useful for categorization and filtering
Entities
- People - Names mentioned in the conversation
- Organizations - Companies, agencies, firms
- Locations - Places referenced
- Dates - Dates, deadlines, timeframes mentioned
Hybrid Search
LegalEase combines semantic and keyword search using Qdrant vector database.
How It Works
- Query Processing
- Gemini generates dense vector embedding
- BM25 generates sparse keyword vector
- Search Execution
- Both vectors query Qdrant simultaneously
- Results from each method are retrieved
- Fusion
- Reciprocal Rank Fusion (RRF) combines results
- Balances semantic understanding with keyword precision
- Filtering
- Results filtered by case, document type, date range
- Permissions applied based on user/team
Search Modes
| Mode | Description |
|---|---|
| Hybrid | Best of both - semantic understanding + keyword precision |
| Semantic | Conceptual matching - finds related content even without exact words |
| Keyword | Traditional BM25 - exact term matching |
Indexed Content
- Document chunks (all granularities)
- Transcript segments
- Summaries and key moments
- Entity mentions
Firebase Integration
LegalEase leverages Firebase for a serverless, real-time architecture.
Services Used
Cloud Firestore
- Primary database for cases, documents, transcripts
- Real-time listeners for instant UI updates
- Automatic offline support
Firebase Storage
- File storage for uploads
- Secure, authenticated access
- Resumable uploads for large files
Cloud Functions
- Genkit-based AI flows
- Firestore triggers for background processing
- Scales automatically
Firebase Auth
- Google sign-in
- Email/password authentication
- Session management
Real-Time Updates
All data uses Firestore real-time listeners:
- Document processing status updates instantly
- Transcript completion triggers immediate UI refresh
- Team collaboration sees changes in real-time
AI Provider Architecture
LegalEase uses a provider abstraction for flexibility:
functions/src/transcription/
├── provider.ts # Interface definition
├── types.ts # Shared types
├── registry.ts # Provider registration
├── index.ts # Public API
└── providers/
├── gemini.ts # Gemini 2.5 Flash
└── chirp.ts # Google Speech-to-Text
Adding New Providers
- Implement
TranscriptionProviderinterface - Register in
registry.ts - Provider automatically available via
TRANSCRIPTION_PROVIDERenv var
This pattern extends to future AI providers (OpenAI, Anthropic, local models).