Completely Offline
Vexis runs entirely on your device — laptop, tablet, or phone. No internet connection required. Client communications, case strategy, and confidential documents never touch a server.
See use casesA multiplication-free, 1.58-bit inference engine running 10B-parameter models on mid-range phones — under 1.3 GB RAM, weather-app energy, zero cloud dependency.
2,400+ legal professionals on the waitlist · Launching Q3 2026
Built for legal professionals across
Every feature was designed with legal confidentiality, accuracy, and workflow in mind — not adapted from a general-purpose chatbot.
Vexis runs entirely on your device — laptop, tablet, or phone. No internet connection required. Client communications, case strategy, and confidential documents never touch a server.
See use casesDraft petitions, plaints, writ petitions, legal notices, and replies in minutes. Vexis understands jurisdiction-specific formats and suggests precise legal language tailored to your facts.
See use casesAsk Vexis to find relevant precedents, summarise landmark judgements, or explain statutory provisions in plain language — instantly, without billing research time to a database subscription.
See use casesAttorney-client privilege cannot be waived to an AI that never receives your data. Vexis processes everything locally — no transmission means no disclosure risk, no third-party terms to worry about.
See use casesVexis fits naturally into the way you already work — no new workflow, no learning curve, no cloud account.
Open Vexis on your device and load your case files, client instructions, or statutory references. Everything stays local — nothing is uploaded, synced, or stored externally.
ExecuTorch dispatches POPCNT/XOR kernels to NPU or SIMD-capable CPU. Vulkan compute shaders accelerate Family Law devices by 2–3×. KeyDiff evicts stale KV entries between every token.
Vexis returns a structured draft or analysis for your review. Iterate with follow-up instructions, then export to Word or PDF. Your edits stay on your device — always under your control.
From solo practitioners to large chambers — here are the tasks legal professionals do with Vexis every day.
Generate a full first draft of a writ petition, civil plaint, or criminal complaint based on your facts. Vexis structures the document, inserts the correct legal provisions, and uses jurisdiction-appropriate language.
"Draft a writ petition under Article 226 for wrongful termination of a government employee — include grounds of natural justice violation."
Paste a contract and ask Vexis to identify unfair clauses, missing standard protections, limitation issues, or jurisdiction problems. Get a structured risk summary in seconds, fully offline.
"Review this non-disclosure agreement and flag any clauses that are overly broad or unenforceable under Indian contract law."
Draft Section 80 notices, demand letters, cease-and-desist letters, and formal replies without starting from a blank page. Vexis populates the statutory language and formats it correctly for service.
"Draft a legal notice under Section 138 of the Negotiable Instruments Act for a dishonoured cheque of ₹4,50,000."
Build a detailed bail application with tailored grounds — antecedents, custodial necessity, flight risk arguments — formatted for the relevant court. Prepared in minutes, reviewed in seconds.
"Prepare a bail application for a first-time offender charged under IPC 420, emphasising roots in the community and cooperation with investigation."
Paste a judgement — however long — and ask Vexis to extract the ratio decidendi, obiter dicta, key holdings, and relevant facts. Ideal for quickly getting across an unfamiliar area of law before a hearing.
"Summarise the ratio in Maneka Gandhi v. Union of India and explain its relevance to Article 21 jurisprudence."
Structure your arguments, organise precedents, anticipate counter-arguments, and draft a written brief — all grounded in the facts and law you provide. Vexis never invents citations it can't verify.
"Prepare arguments for a Section 9 Arbitration application to obtain an interim injunction restraining sale of disputed property."
Generate affidavits for court filings, statutory declarations for regulatory submissions, or sworn statements — in the correct format, with appropriate jurat language.
"Draft an affidavit of service confirming personal service of summons on the defendant on a given date and location."
Turn complex legal analysis into a clear, plain-language advice note your client will actually understand. Vexis can shift register from technical legal writing to plain English and back in the same session.
"Explain the consequences of this penalty clause to a client with no legal background, in under 200 words."
Paste a section of legislation and ask Vexis to interpret its scope, identify ambiguities, map its interaction with related provisions, or explain its practical effect in a specific factual scenario.
"Explain how Section 29A of the Insolvency and Bankruptcy Code applies to a promoter who is also a creditor of the corporate debtor."
Privacy-first AI demands uncompromising leadership. Meet the team architecting the future of confidential intelligence.
Sets the product vision: a 10B-parameter model running on a phone with weather-app energy. Leads strategy, investor engagement, and the cross-functional push from MVP to SDK launch in eight months.
Architect of the BitNet-KAN backbone and the SAR verification loop. Leads research across ternary quantization, KAN integration, NanoQuant compression, and KeyDiff-based KV eviction strategies.
Owns the silicon-to-software seam. Drives ExecuTorch delegate work, bitnet.cpp SIMD kernel tuning across NEON and AVX2, Vulkan compute shaders for Adreno, and HW-NAS co-optimization with target chipset families.
Vexis is currently in private beta for legal professionals. Join the waitlist to be among the first lawyers, barristers, and law firms with access to fully private, on-device legal AI — launching Q3 2026.
We'll be in touch before the public launch with your early access link and the benchmark deck.
Every weight in the BitNet b1.58 backbone is ternary: −1, 0, or +1. A weight of +1 is a copy, 0 is a skip, −1 is a negation. The CPU's multiply unit idles entirely; matrix multiplication becomes POPCNT (population count of set bits) and XOR — operations that draw a fraction of the energy of a standard FMA pipeline. This is why Vexis hits a 1.5–3 W active power envelope on mid-range silicon.
Two compression layers. NanoQuant decomposes weight matrices as W ≈ α·(B₁ · B₂ᵀ) — binary factors with a learned scale vector — bringing 10B ternary weights from ~1.975 GB down to ~820 MB at 0.082 bytes per parameter. TurboQuant then applies 3-bit per-head quantization to the KV cache, and KeyDiff evicts low-relevance entries token-by-token. Total budget: ~1.17 GB with 130 MB headroom.
Mid-range Android targets are Litigation (5–7 tok/s), Contract Law (7–10 tok/s), and Criminal Defence. Flagship devices like Corporate Law reach 12–18 tok/s with 8K context. ExecuTorch dispatches to NPU via QNN, ANE, or APU delegates; SIMD kernels cover ARM NEON, x86 AVX2, and RISC-V Vector. Vulkan compute shaders accelerate Family Law GPUs by 2–3×.
Self-Assessment & Retry. A distilled 300M verifier classifies each response into one of five domains — Code, Math, Legal, Financial, or General — then generates 3–5 rubric questions and grades the candidate output against them. Failed checks trigger a retry at temperature 0.2 (max 3 attempts). It costs 120–200 ms per response but lifts TruthfulQA by +3.8% over the FP16 baseline. Trustworthiness over raw speed is a deliberate product decision.
Three sovereignty levels. L0 (Air-Gap) ships zero network traffic — TC-38 verified via Wireshark, suitable for healthcare and enterprise. L1 (Signal-Only) transmits binary quality signals only, no text. L2 (Opt-in Snippet) sends NER PII-scrubbed snippets through a manual review dashboard. Default consumer mode is L1. Memory is local to your device across three tiers: profile.txt, memory.txt, and daily_log.txt.
BitNet b1.58 alone holds ~96.2% of FP16 accuracy. Kolmogorov-Arnold Network layers — replacing the transformer's MLP feed-forward blocks — close most of the remaining gap by placing learnable univariate spline activations on edges rather than fixed activations on nodes. The combination hits ~98.5% of FP16 accuracy. We use the KAT variant with grouped KANs and rational bases for NPU efficiency, with Wav-KAN (wavelet) as a 1.2–1.8× speed alternative.
Yes. Sub-layer LoRA rank-8 adapters with Straight-Through Estimator gradients let you fine-tune the ternary backbone on-device or server-side. Each transformer block keeps a frozen ternary base plus a trainable FP16 adapter — ~120 MB total for the full 10B model. Training reaches 98% of full fine-tune accuracy at rank-8, completing in under 30 minutes on-device.
The public API exposes VexisEngine.init(), chat(), setMemoryTier(), setSovereigntyLevel(), and registerAdapter(). Internal primitives — inference dispatch, the SAR verifier, KeyDiff eviction — are encapsulated. Ships with developer documentation and integration guides as part of the Phase 6 launch package.