The Bartleby Baseline

A thought exercise on robot lawyers - maybe our standards are off?

Apr 25, 2025

“Bartleby, never mind then about revealing your history; but let me entreat you, as a friend, to comply as far as may be with the usages of this office. Say now you will help to examine papers to-morrow or next day: in short, say now that in a day or two you will begin to be a little reasonable:—say so, Bartleby.”
“At present I would prefer not to be a little reasonable,” was his mildly cadaverous reply.
Bartleby the Scrivener, Herman Melville

The basic idea of the Turing Test is this: is AI indistinguishable from a human, all else being equal? It’s not really a test of whether a machine is sentient, which would likely be impossible to meaningfully measure, and, like the the definition of practicing law, the definition of sentience just gets more fuzzy and abstract the closer you look at it.

Computing has the Turing Test (and associated ideas like the Chinese Room) to determine whether or not AI is so human-like it could be considered sentient. Arguably, the Turing Test was passed back in 1991 by a mindless chat algorithm. But for legal, we’re left to wonder - at what point is AI “good enough?”

Granted we’ve done studies, run tests, and seen mishaps about AI, all while moving the goalposts around on the field:

ChatGPT can get passing scores on law school exams, in 2022;
ChatGPT can’t pass the bar exam yet, in 2022;
ChatGPT can pass the bar exam, in 2023;
Maybe ChatGPT really wasn’t so good on the bar exam as advertised, in 2023;
Or maybe the bar exam is not the best way of predicting lawyer performance, in 2024;
Steven Schwartz misunderstands (old version) ChatGPT’s purpose and constraints, get sanctioned;
Joshua Browder misjudges many things, including the desire of state bars to not have AI (or business that aren’t law firms) do legal things;
There’s more to this list but I’m going to spare you.

What’s still unclear is whether or not we’d allow, as a profession, someone to use AI to get legal help. If we’re not at that point yet, then at what point would we be ok with that? And to figure out whether or not we’re at that point, I think we need some type of Turing-style test that we can point to. This is kind of the point with the bar exam and lawyers - like it or loathe it, it’s a metric that we can point to and say “this person should be allowed to practice law because they got a score of ___.” Otherwise we’re just moving goalposts around.

Here’s a proposed test for whether or not AI is ready to help people in the real world, with real-world legal problems:

Is legal help from an AI model as good as, or better than, the help someone would otherwise typically get from another person?

Because alliteration is nice, we’ll call this test the “Bartleby Baseline.”

Some key terms here:

We’re focusing on “legal help,” not advice/information. Help is a more global idea. Focusing instead on legal advice vs. information risks getting us into a weird debate over metaphysics or ecumenicism or something like that. Non-lawyer humans give information (or advice maybe), lawyer humans give advice, AI gives both.
Another person, not a lawyer (although lawyers are people too). I don’t think our standard should be AI vs. Lawyer, mainly because, well, most people aren’t using lawyers to solve legal problems. More on this in a second.
What we’re not measuring is whether AI can give legal help that’s indistinguishable from help given by a human. Non-lawyers, like court staff, aren’t supposed to give “advice,” while AI doesn’t have that restriction in practical terms. So it’s not a fair comparison due to this constraint.

What’s the point of all this?

At some point, we’re going to have to just come out and admit that the Bartleby Baseline has been met - that AI is giving people good legal help, and that we have to accept that as something AI should be allowed to do.

There’s a lot of lawyers who, if asked if they believe AI should be allowed to give people legal help, they’d say “no.” They hold on to a dichotomy of AI vs. Lawyer, and believe that if the AI isn’t 10000% perfect in every aspect, it won’t be as good as a lawyer. A point of personal privilege here: I’m a lawyer, and, speaking from my own experience (with myself and other lawyers), I’m not very smart or even in the same galaxy as 1000% perfect. Not many lawyers are.

In reality, however, the majority of people simply aren’t going to lawyers to get legal help. From the The Hague Institute for Innovation of Law (HiiL) study done pre-ChatGPT, in 2021:

Less than 1/3 of people are getting their legal information/advice from a lawyer. We have to be clear-eyed about what the reality is if we ever want to get anywhere.

This is why I don’t think we should be measuring the efficacy of AI vs. Lawyer. This is not a measure of an AI’s legal reasoning, like LegalBench, or writing yet another chapter in the long-running debate of whether AI will replace lawyers. Rather, I believe the real question is this: is legal help from an AI model as good as or better than the help someone would otherwise typically get?

One big problem: to truly answer this, we’d need to benchmark the help that people are actually getting.

Where are people going for legal help?

As I mentioned above, less than 1/3 of people use a lawyer as a source of legal advice or information.

If we go by the numbers put out by the Legal Services Corporation back in 2017, the number goes lower for people classified as “low income” - they reported seeking the help of a legal professional for only 20% of legal their problems. In 2021, that number was largely unchanged at 19%.1

What kind of help are they getting?

Evaluating the help people get from sources like family and friends would be hugely difficult. Let’s look to the internet instead, since, according to HiiL, 1/3 of people use the internet as a source of help. Some examples:

Reddit’s r/LegalAdvice:

Some lawyers see this as a “hive of scum and villainy,” but it’s the go-to place for people helping other people with their legal problems. The unfortunate acronym IANAL appears often (I Am Not A Lawyer).

Avvo’s Legal Answers:

A more moderated version of r/LegalAdvice, where real lawyers compete in sweaty desperation to somehow raise their Avvo scores, so more potential clients click on their profiles, by answering legal questions with the most generic advice possible.

Attorney web sites:

Back in the before times, before ChatGPT put all lawyers out of business, a lot of small law firms were duped into believing they could get new clients by using the magic and arcane art known as Search Engine Optimization, or SEO for short. This resulted in a lot of lawyers writing a lot of blog posts about anything and everything related to their law practice, in the hopes that someone searching the internet for “Best DUI Attorney Fort Worth” would read their blog post and decide to hire them. I’m exaggerating a bit here, but when I was running a small firm in private practice, we actually did get a number of clients from them finding blog posts we’d written about things like the risks to commercial driver’s licenses from criminal cases.

But while SEO may remain a dark art, and may be even less relevant in our age of AI Overviews, it did result in a huge amount of content where lawyers explain the law, all nicely meta-tagged for particular jurisdictions.

LawHelp sites:

Nearly all U.S. states and territories have a legal help portal site, such as LawHelpNY.org or GeorgiaLegalAid.org, that is typically run and maintained by that state’s legal aid organizations. These sites combine self-help resources with a smattering of online legal forms, and sometimes even a “LiveHelp Chat” feature. Disclaimer here - I (currently) work for a company that creates and runs a large number of these.

An example:

Here’s a typical request and response from a LiveHelp chat program on a LawHelp site looks like:

Visitor: I [have this problem, such as job loss and inability to make child support payments]
Volunteer: I am sorry that you are dealing with this. [Here are some links to relevant resources].
Visitor: thanks

But if we feed the same type of thing into ChatGPT o3 (with web search), like “I lost my job and can’t make child support payments”:

In short: Florida law lets you ask for a lower child-support amount when you lose your job, but the court (or the Department of Revenue’s Child Support Program) will not reduce what you owe until you formally file for a modification and prove your loss of income is substantial, involuntary, and likely to last. Because your job loss has already lasted six months, you meet the “permanent” criterion in most cases, and if the new guideline amount would differ by at least 15 % or $50 a month the court can grant relief. Until a new order is entered, the old amount keeps running and any unpaid balance becomes enforceable arrears, so moving quickly is essential. Below are the concrete steps, resources, and cautions specific to Hillsborough County/Tampa and Florida generally.
1. Act right away—even partial payments help
What to doWhy it mattersKeep paying what you realistically can.Florida automatically records arrears; interest accrues and can trigger license or registration suspensions once you fall behind. Florida Dept. of RevenueTell the other parent, in writing, that you are filing for a modification.Open communication sometimes prevents enforcement actions while your case is under review (though this is not legally guaranteed).
2. How Florida decides whether to lower support
Statutory threshold
A court can modify an order when the difference between the current amount and the new guideline figure is ≥ 15 % or $50 (if the order is < 3 years old) and the change is substantial, involuntary, and permanent. Florida Senate Florida Dept. of Revenue
Permanence & involuntariness
Extended unemployment through no fault of your own is specifically listed as an “involuntary” change. Florida Dept. of Revenue
Voluntary quits, terminations for misconduct, or a self-chosen lower-paying job do not qualify. Florida Dept. of Revenue
3. Two ways to request the change
A. Administrative review (free)
Log in to Florida Child Support eServices and ask for a “review for modification.” Florida Dept. of Revenue Florida Dept. of Revenue
Upload proof of job loss (termination letter, unemployment award, recent tax return).
DOR staff recalculate guidelines, draft a new proposed order, and serve both parents. If either parent objects, the case is sent to circuit court.
Typical timeline: about six months from request to new order. Florida Dept. of Revenue
B. Circuit-court petition (faster but has a filing fee)
Download Florida Supreme Court Form 12.905(b) Supplemental Petition for Modification of Child Support. flcourts.org
Complete the financial affidavit that must accompany every petition. Florida Senate
File in the Hillsborough County Clerk’s office (Family Law).
Serve the other parent, attend any mediation or hearing, and bring documentation of income efforts.
Important: A modification is only retroactive to the date you file; it cannot erase arrears that build up earlier. flcourts.org Florida Senate
4. What happens if you do nothing
5. Income and job-search supports
Reemployment Assistance: Apply online through Florida’s Reconnect portal as soon as possible; eligibility generally extends up to 12 weeks of benefits and re-employment services. Florida Jobs
WorkSource Tampa Bay offers résumé help, training grants, and job fairs (free).
While you receive UI, those benefits …

(I cut off the response because, frankly, it’s pretty long. Real happy for you though, or sorry that happened.)

Obviously the LiveHelp example chat is way shorter than the response from ChatGPT. But there’s some other notable differences:

Legal advice vs. links to resources:

LiveHelp programs are staffed by non-lawyer volunteers, who are specifically trained and instructed to not give legal advice, but instead to use canned responses and then give links to relevant resources.

ChatGPT doesn’t have that instruction (I suppose you could call that background prompting), and is instead able to be more creative with its response. Maybe “creative” is the wrong term - comprehensive might be better. For example, ChatGPT is thinking about consequences to not making child support payments like the state Department of Transportation suspending a CDL, and giving me a heads up about that.

Context:

LiveHelp chats exist inside a legal context - they’re on a website that’s pretty much all about legal issues, and the volunteers can assume that if someone is asking them a question it has to do with the legal aspects of not being able to make child support payments.

ChatGPT is context-less, and for this example I didn’t do any prompting at all, just fed in the statement. It was able to discern that there’s a legal context to it and provide advice about it, and not say something like “wow that’s crazy, here’s a link to a Indeed.com you deadbeat.”

Chain of thought vs. canned response:

What’s really interesting to me is that instead of just giving me a canned response so that it won’t be perceived as “practicing law” (whatever that means), ChatGPT actually goes out to the internet, finds relevant information based on my location, and synthesizes it into something it thinks I’d be able to read and digest.

Here’s ChatGPT’s thinking process (I redacted the list of sites it searched):

I need to gather relevant information to help with child support modification in Florida. Key sources include Florida Department of Revenue guidelines, family law forms, child support modification thresholds, and relevant state and federal rules. I'll also need info on self-help resources, legal aid in Tampa, Florida unemployment benefits, and job search assistance programs. Once I gather this, I’ll compile at least 10 citations for clarity and support.
Searched the web…
I have gathered at least 10 sources. Now, I need to summarize the steps for dealing with the inability to pay child support. These steps should include filing a modification request, applying for unemployment, communicating with the agency, making partial payments, requesting an abatement, maintaining thorough documentation, and considering consultation with legal aid. Let's put all these steps together in a succinct and clear response for the user.
I’ll organize the response into key sections to make everything clearer:
Immediate actions such as contacting the Florida Child Support Program and continuing to pay what I can.
Steps for filing a modification request, including …

and so forth.

I’d argue that the ChatGPT response is as good or better than the typical LiveHelp response. Maybe we’ve already blown by the Bartleby Baseline (alliteration is great, right?).

Problems with this approach:

As far as standards go, it may not be that the Bartleby Baseline is hard for AI to meet, rather it’s a hard standard for us to measure, at least in terms of knowing what help people are getting out there if not from a lawyer. That’s an us problem, not an AI problem.

There’s other practical considerations I’m not getting into here. A non-exhaustive list:

There’s no guarantee that what you ask an AI model will stay confidential. AI companies don’t have a duty of confidentiality to their users, the same way lawyers do. But we’re not comparing AI to lawyers here - does your friend or family member have a duty of confidentiality? Well, no, but on the same token (pun intended), they also aren’t billion-dollar companies that mine user data for profit.
Lawyers can be sued for bad advice. Again, we’re not talking about AI vs. Lawyers, but still worth considering that AI, at least so far, hasn’t been successfully sued for giving bad advice, although a couple of cases are testing this theory. Granted, lawyers aren’t sued for bad advice hardly ever, mainly because causation is really hard to prove.
Probably others I’m not thinking about.

What comes next?

Next, I’d like to work out a way of actually scoring things for the Bartleby Baseline. That way we can maybe answer the question: have we passed it already? For starters, Margaret Hagan has a framework here that would serve as a set of ranking factors.

Maybe we need a different metric? If you have thoughts about how to measure AI for giving legal help I’d love to hear them. I think that debate is more productive than saying “I’d prefer not to.”

There is a difference in those metrics though - the HiiL study was across all income brackets, while the LSC study was only of low-income Americans.

Team Do Something

Discussion about this post