Steps to consider when standardizing a JSON-Like Query Language

Let's explore the key technical considerations when creating this new standard.

Alexey Palazhchenko

a man riding a skateboard down the side of a ramp
a man riding a skateboard down the side of a ramp

The query language initially developed by MongoDB has become widely adopted, with several other databases mimicking its syntax and structure. FerretDB, Azure Cosmos DB for MongoDB, and Amazon DocumentDB are just a few examples of products and services that implement MongoDB's API to attract developers who are familiar with its powerful, developer-friendly query capabilities. However, this diversity of implementations has brought fragmentation, where each service supports a slightly different feature set, creating challenges for cross-compatibility and portability. We wrote about the situation in our previous blog post.

Let's explore the key considerations for defining this new standard. The suggestions are just ideas; these questions need to be decided by the standardization body: vendors taking part in the standardization process.

Initial scope

First of all, we should define the initial scope: what should be a part of the standard, what could be delegated to extensions, and what could be implementation-defined.

For example, we want to use JSON-like objects (documents), arrays, and scalar types. Even at that level, we already have a problem: JSON objects are unordered and don't support duplicate keys; MongoDB BSON documents are ordered and could (technically) contain duplicate field names. For example, this is a valid BSON document with duplicate, empty, but ordered keys.

{"": true,"": false}


Should we explicitly support that for strict compatibility with MongoDB? Should we disallow that for clarity and simplicity? Should we leave it implementation-dependent, ensuring compatibility with existing solutions?

Then there are scalar types. JSON supports only a single number type, while BSON specifies four: float64, int32, int64, decimal128. Should we support all of them? Should we support weird values like negative zero and NaN? Should we support various NaN payloads?

What other scalar data types should we support? Should we support regular expressions? Binary data? "JavaScript code with scope"? Apparently, that's a valid BSON type.

Then, we move to basic comparisons, which are quite complicated in MongoDB. For example, is [null] equal to, less than, or greater than []? Should the answer be the same in all contexts: in comparison with the implicit equality operator, with the $eq find filter operator, with the $eq aggregation pipeline operator, or during sorting? In MongoDB, the answer is non-obvious and depends on the context. Should we do the same?

After that, we could finally start defining the behavior of various commands, arguments, and operators. It is tempting to just say that "OpenDocDB-compatible databases should support 'find' command with '$eq' filter," but we can't skip all the previous steps. And then there are a lot of questions about non-query commands. For example, should session commands be part of the standard? What about administrative and maintenance commands?

Finally, we should decide whether to define the wire protocol and various encodings (BSON, Extended JSON). Maybe it would be enough to define the data model in abstract terms and leave protocols up to implementations, allowing them to accept requests over HTTP, for example.

And that's only the initial scope!

Future Iterations

The standard should not stifle innovation. Instead, it should create a foundation upon which all compliant databases can be built. We should define the way the standard could be extended and changed. What versioning should we use? Should we reserve some namespaces for core features to avoid conflicts with future or vendor-specific extensions?

Finally, for any standard to be successful, there needs to be an active community and governance model that drives adoption and keeps it relevant.

Conclusion

A standardized JSON-like query language has the potential to reduce fragmentation, make it easier for developers to switch between databases, and foster an innovative ecosystem of tools and services. By carefully defining the core features, ensuring extensibility, and establishing a robust governance model, we can create a foundation that promotes compatibility and innovation on the document database landscape.

If you’re interested in shaping the future of document databases, consider joining discussions around the open standard. Let’s work together to create a common foundation that drives the next generation of innovation in database technology.