Changelog

Transcript Review & Summaries

End-to-end transcript workflow with Ollama summaries.
Transcript Review & Summaries

Transcript review now goes beyond raw text dumps.

What you can do

  • Search within a transcript, filter by speaker, and toggle segment visibility.
  • Generate an Ollama-backed summary that includes an executive overview, key moments, timeline, speaker statistics, action items, and topic tags.
  • Regenerate summaries on demand or run batch jobs across multiple transcripts.
  • Mark important segments as key moments; metadata is stored in the database for later retrieval.

Integration perks

  • Transcript segments share the main Qdrant collection, so a single query can surface both document and transcript hits.
  • Download caption files (SRT/VTT) or a polished DOCX with speaker formatting for further editing.

This release lays the groundwork for deeper analytics while keeping everything self-hosted.

Hybrid Search Filters

Case scoping, chunk selection, and transparent scoring.
Hybrid Search Filters

The search interface now mirrors the flexibility of the backend hybrid engine.

Highlights

  • Filter by case IDs, document IDs, or both to narrow the scope of each query.
  • Select which chunk granularities to include (summary, section, microblock, transcript segment) so you can focus on either high-level summaries or precise snippets.
  • Switch between hybrid, dense-only, and keyword-only retrieval strategies from the UI without modifying backend settings.
  • View _score_debug metadata on each result to understand how BM25 and dense scores were fused.

Developer conveniences

  • Debounced input prevents unnecessary API calls while typing.
  • The search panel logs the final request payload to the browser console in development mode, making it easy to replay queries via cURL or the API docs.

PDF Viewer Improvements

Better navigation, highlighting, and MinIO integration.
PDF Viewer Improvements

The built-in PDF viewer has been refreshed to work hand-in-hand with the new document pipeline.

What's new

  • Bounding boxes from Docling are now honoured in the UI, so clicking a search result jumps to the exact location within the PDF.
  • Page thumbnails, zoom, and keyboard navigation stay in sync, avoiding the desynchronisation issues present in earlier builds.
  • MinIO-backed page renders load progressively, making large documents responsive even when running remotely.
  • Toolbar controls were simplified; print, download, and jump-to-result actions now live in a single toolbar.

These changes make reviewing lengthy filings and exhibits significantly smoother without relying on external PDF tooling.

Hybrid Search Foundation

Docling chunking + Qdrant hybrid retrieval now live.
Hybrid Search Foundation

LegalEase’s core search stack is in place:

  • Documents are parsed by Docling, chunked into multi-scale segments, and indexed with both dense embeddings and BM25 sparse vectors.
  • Qdrant handles named vectors so hybrid queries can combine multiple granularities (summary/section/microblock) in a single request.
  • Reciprocal Rank Fusion is the default strategy, balancing keyword hits with semantic context.
  • The search API returns detailed metadata (scores, chunk type, page number) so the UI and external clients can explain why a result ranked.

This foundation underpins every other workflow—transcripts, forensic exports, and future analytics reuse the same infrastructure.

Audio Transcription

title: "Transcription Pipeline Updates" description: "WhisperX support with heuristic fallbacks and export formats." date: "2024-09-10" image: https://images.unsplash.com/photo-1590602847861-f357a9332bbc?auto=format&fit=crop&w=800&q=80

The transcription pipeline now runs end-to-end inside LegalEase. Upload audio or video, monitor Celery progress, and receive transcripts with speaker labels and timestamped segments.

Highlights

  • WhisperX is the primary engine (CUDA and ROCm friendly). When it is not available, LegalEase falls back to the official Whisper API or lightweight heuristics.
  • Speaker diarisation uses Pyannote when an HF_TOKEN is configured, and gracefully falls back to pause-based speaker detection otherwise.
  • Download transcripts as DOCX, SRT, VTT, TXT, or JSON from the dashboard.
  • Celery task status endpoints expose progress so you can build custom indicators in downstream tools.

Notes

Actual throughput depends heavily on your hardware. Expect real-time or faster on modern GPUs, and slower performance on CPU-only environments. No audio leaves your infrastructure unless you opt into the Whisper API fallback.

Speaker Diarization

title: "Speaker Labelling Improvements" description: "Configurable diarisation ranges and smoother speaker changes." date: "2024-09-12" image: https://images.unsplash.com/photo-1557804506-669a67965ba0?auto=format&fit=crop&w=800&q=80

We refined how LegalEase labels speakers inside transcripts:

  • Configure minimum and maximum speaker counts when uploading a recording. This helps Pyannote converge faster on the right number of participants.
  • When Pyannote is unavailable, a heuristic diariser detects speaker switches based on pauses and merges overly short segments to reduce flicker.
  • The transcript viewer now colour-codes speakers consistently and lets you rename them inline; updates apply to all occurrences.
  • Speaker statistics (total talk time and contribution percentage) are calculated alongside summary generation.

These tweaks make multi-party hearings and interviews easier to review even without cloud-based diarisation services.

Built with Nuxt UI • LegalEase AI © 2025