Writing a JSON Parser, part 3
Let's continue writing a JSON parser...
Handling Errors
While the book covers the basics of error handling, it wasn't until I read Andrew Gallant's article that I started to appreciate what proper error handling would take in practice. Up until then, I just called unwrap()
on my Option
s and Result
s and avoided giving my programs malformed input.
Not this time! For my JSON parser, I want to handle errors the way I might in a real application. I haven't had much practice, so it's a great excuse.
What kind of errors?
In some sense, the only error that a JSON parser can have is malformed input. You asked it to parse some json, but the "json" you supposedly gave it was no good. But how was it broken? Was it missing a colon somewhere? Did it have an extra trailing comma (which for some reason isn't allowed in json)?
Thinking through things a little more, here are the errors that I can imagine off the top of my head:
- Some unknown identifier:
fasle
instead offalse
- Some invalid token:
:
after a:
- Some unfinished object, array, or string:
{"foo":
,[1,2,3
, or"foo
- An object has two members with the same name:
{ "foo": 1, "foo": 2 }
- Too many values at the root:
{"foo": 123 } null
- A number that couldn't be parsed:
127.0.0.1
- Input data that is not valid
utf8
.
How to fix them?
For each of these errors, I would like to report the line and column number of the character in the input string where the error occured. It would also be nice if I could print a snippet of the input json nicely to show where the error occured, in context.
I don't want to simply show the line that error occured, because json can be minified to exist on one line. I don't know yet the best way to do this, but I might just take the range of 30 or so characters (or up to a line break) on either side of the error and display that. I don't want to print more than just one line of the input for each error.
Reporting multiple errors at once
The simplest thing to do when I encounter an error is to print information about the error and then abort the program. But in an ideal world, I'd be able to tell the user every error that they have in their json blob so that they can fix all of them in one go. Otherwise, they might be playing whack-a-mole without knowing how many more errors they have to fix.
I'm not sure the best way to handle this just yet, but I have some ideas.
The first idea is to continue on as if the erroreous or token didn't exist. That would handle typos like the double colons in { "foo" :: "bar" }
. On the other hand, it would result in several errors if the issue a character was missing: { "foo" : this string is missing a quotation mark" }
.
Another idea is to replace the value that is currently being parsed (assuming the error occurs in the parser) with some other valid (but inaccurate) value.
For example, "foo" :: "bar"
is supposed to be an object
's member
, so I can replace it with either some made-up member
(like "not":"real"
), or maybe omit the member entirely.
I can't easily think through to tell whether this approach would really work. I don't think it's correct to insert fake values. But I also don't know how easy it is to omit them from where they're expected, either.
Another approach on my mind is to try to "ignore" as much of the rest of the json as I can until what I'm parsing starts to make sense again. If I'm parsing a member
and something goes wrong, then ignore everything until I get to a token that, were I to start parsing from that token, would give me a valid member
, and similarly for parsing value
s.
I'm not so confident in that approach either.
Unfortunately I don't have my copy of Crafting Interpreters on me, which I'm pretty sure talks about this exact issue.
Oh well. Perhaps I'll notice another approach as I go. Otherwise I might just stick to reporting one error at a time for now.
Ergonomic errors
I will probably start by writing my own error type. But then as I need to write more From
implementations for conversions, I may adopt thiserror
. Also, while I'm thinking about error handling I've been meaning to familiarize myself with anyhow
.