When a legal AI vendor says 'we take security seriously,' that's marketing. When they hand you a completed SOC 2 Type II report, data processing agreement, and written privilege preservation analysis without being asked — that's evidence.

Most law firms and legal departments ask the same 10 surface-level security questions and call it due diligence. They check the SOC 2 box and move on. The 50+ questions in this framework cover the areas that actually matter for legal: privilege preservation, data residency, model training exclusions, and the specific risks that make legal AI different from every other enterprise AI deployment.


Infrastructure and Encryption (12 Questions)

Start with the foundation. 1. Where is our data stored? List all data centers, regions, and cloud providers. 2. Is data encrypted at rest? What encryption standard (AES-256 minimum)? 3. Is data encrypted in transit? What protocol (TLS 1.2+ minimum)? 4. Who holds the encryption keys? Can we bring our own keys (BYOK)? 5. Do you operate a single-tenant or multi-tenant architecture? If multi-tenant, how is our data logically separated? 6. What is your disaster recovery plan? RTO and RPO targets? 7. Where are backups stored, and how are they encrypted? 8. What was your actual uptime over the past 12 months (not SLA target)? 9. Have you experienced any data breaches in the past 3 years? If so, describe the incident, impact, and remediation. 10. What third-party subprocessors have access to our data? List all of them. 11. Do you conduct annual penetration testing by an independent third party? Share the most recent executive summary. 12. What is your vulnerability management process and patch timeline for critical vulnerabilities? These questions establish whether the vendor's infrastructure meets the baseline for handling confidential legal data. Any resistance to answering them is a disqualifying signal.

Data Privacy and Model Training (10 Questions)

This is where legal AI security diverges from general enterprise security. 13. Is our data used to train, fine-tune, or improve your AI models? Provide a written commitment. 14. Can we opt out of all model training with our data? Is this contractually guaranteed? 15. What data do you collect about our usage patterns, queries, and interactions? 16. Who at your organization can access our data, under what circumstances, and with what authorization? 17. What is your data retention policy? How long is our data stored after we stop using the platform? 18. What is the data deletion process upon contract termination? Timeline and verification method? 19. Do you comply with GDPR, CCPA, and other applicable data protection regulations? Provide documentation. 20. How do you handle data subject access requests related to our data? 21. If your AI processes data in a different jurisdiction than where it's stored, what cross-border data transfer mechanisms are in place? 22. Can we specify data residency requirements (e.g., US-only processing and storage)? Question 13 is the most important in this section. If a vendor uses your confidential legal data to improve their models — even anonymized — you have a potential privilege waiver and confidentiality breach. Get the answer in writing, in the contract, not just in a sales conversation.

Attorney-Client Privilege Preservation (8 Questions)

This section is unique to legal AI and most vendors aren't prepared for it. 23. Has your organization obtained a legal opinion on whether attorney-client privilege is maintained when privileged information is processed through your system? Provide it. 24. Does your architecture support ethical walls (information barriers) between different client matters? 25. Can we configure access controls so that only authorized attorneys see specific matter data? 26. If a third party (including your employees) can access our data for support or maintenance, does that constitute a privilege waiver? What's your legal analysis? 27. In the event of a subpoena or legal hold directed at your company, how do you handle our data? What notification do you provide? 28. Does your system log which users accessed which data and when? Can we audit these logs? 29. How do you handle inadvertent disclosure of privileged information within your platform? 30. Can we segregate privileged and non-privileged data within your system? Judge Rakoff's February 2026 ruling in United States v. Heppner — holding that documents created using a free AI chatbot weren't protected by attorney-client privilege — made these questions urgent. If your vendor can't demonstrate privilege preservation with legal analysis, not just marketing assurances, you're taking a risk the ethics rules don't support.

Access Controls and Authentication (10 Questions)

31. Do you support SSO integration with our identity provider (Okta, Azure AD, etc.)? 32. Do you support multi-factor authentication? Is it required or optional? 33. What role-based access control (RBAC) options are available? Can we define custom roles? 34. Can we enforce IP allowlisting to restrict access to our corporate network? 35. How are user accounts deprovisioned when someone leaves our organization? Is this automated via SCIM? 36. What is your password policy? Do you support our organization's password requirements? 37. Do you maintain audit logs of all user actions? How long are logs retained? Can we export them? 38. Can we configure automatic session timeouts? 39. How do you handle API authentication and authorization for integrations? 40. Do you support conditional access policies (device compliance, location-based restrictions)? For law firms, questions 33 and 37 are critical. RBAC ensures that associates on Matter A can't see data from Matter B. Audit logs prove it. Without both, you can't demonstrate the information governance controls that clients increasingly require in their outside counsel guidelines.

Compliance Certifications and Incident Response (12 Questions)

41. Do you hold SOC 2 Type II certification? Provide the most recent report. 42. Do you hold ISO 27001 certification? 43. Are you FedRAMP authorized (if applicable for government-adjacent work)? 44. What compliance frameworks do you align with beyond SOC 2 (NIST CSF, CIS Controls, etc.)? 45. What is your incident response plan? Provide documentation. 46. What is your notification timeline for security incidents affecting our data? (Contractual commitment, not just policy.) 47. Do you carry cyber insurance? What are the coverage limits? 48. How do you monitor for unauthorized access or anomalous activity? 49. Do you conduct regular security awareness training for all employees? 50. What is your secure software development lifecycle (SSDLC) process? 51. How do you ensure the security of your AI model supply chain (pre-trained models, training data sources)? 52. Do you have a responsible disclosure/bug bounty program? Don't accept a vendor's word that they're SOC 2 certified — ask for the actual report. A SOC 2 Type I means they had controls in place at a point in time. A SOC 2 Type II means those controls were tested over a period (usually 12 months). Type II is the minimum acceptable standard for legal AI vendors handling confidential data.

The Bottom Line: 52 questions across five categories: infrastructure/encryption, data privacy/model training, privilege preservation, access controls, and compliance/incident response. The questions that matter most for legal: Does your data train their models? (Get it in writing.) Is privilege preserved? (Get a legal opinion, not marketing.) What happens in a subpoena? (Get the notification process documented.) Send this questionnaire before you sign anything. The vendors who answer completely and quickly are the ones worth working with. The ones who stall or give vague answers are telling you everything you need to know.

AI-Assisted Research. This piece was researched and written with AI assistance, reviewed and edited by Manu Ayala. For deeper takes and the perspective behind the research, follow me on LinkedIn or email me directly.