29/04/2019

Write a Postgres proxy. Day 2. Getting the queries.

RediSQL, SQL steroids for Redis. Is a very fast in-memory SQL engine. Its main features are:
  1. Speed, up to 130,000 insert per second
  2. Familiarity, it support standard SQL, no weird dialects
  3. Simplicity, it is very easy to operate and to use with binding for any language.
Code on github: RedBeardLab/rediSQL
 

In this series of post we are writing a postgres proxy that accept connections made using the postgres (PG) protocol and forward them to RediSQL.

Motivation and introduction of the project are here.

Intro

This day we will isolate simple queries from Postgres and we will send them to RediSQL. We will focus on getting the text and send it to RediSQL and return very basic “tags” to the PG client.

Theory

One of the great thing of SQL is that its queries are just text that don’t need formatting or to obey to some weird identation rules (YAML anybody?).

Indeed PG accept queries as simple strings, parse them (return an error if the syntax is wrong), and then execute the query.

This helps a lot while creating a proxy, all we need to do is to get the text that represent the query and send it to the server.

In this second day we will focus on how to get the text from a query and how to return to the clients.

We will not cover SELECTqueries, but only statements like INSERT, DELETEor CREATE TABLE.

The Query message

When a PG client send a query it uses a specific message, the Query message.

The format of the query message is quite simple, it starts with the byte Q then an Int32that indicate the total length of the message, and finally a string that contains the query itself. The string is NULL \x00 terminated.

Simpler that this would be impossible.

In order to isolate the query we do the simplest possible thing.

We just isolate everything, but the last byte, after the 5th byte in the message.

        if self.state == "readyForQuery" and data[0] == Query:
            lenght = int.from_bytes(data[1:5], "big")
            strLenght = lenght - 4
            query = data[5:-1].decode("utf-8")

Remember, 1 byte for the Q that indicate the type of message, 4 more bytes for the length and the last byte is the NULL byte that act as terminator.

At this point we have isolate our query, but we still need one more step.

Even if our proxy didn’t forward the query to any SQL server yet, we still want to acknowledge to the client that we have read its query and that we execute it.

CommandComplete

In order to indicate that our command was successfully executed, we need to send back the CommandComplete message.

CommandComplete follow the usual convention and it is identified by the letter C as first byte, then an Int32to indicate the total length of the message and finally a String that contains a “tag”.

We construct the CommandComplete message with the “tag” (a string) as input.

def CommandComplete(tag):
    lenghtTag = len(tag)
    lengthMessage = lenghtTag + 1 + 4 # one for the \00 and 4 for the Int32
    # assuming that tag is smaller than 256 bytes
    bytesLenght = b'\x00\x00\x00' + bytes([lengthMessage])
    bytesBody = bytes(tag, "utf-8") + b'\x00'
    return b'\x43' + bytesLenght + bytesBody

The “tag” indicate what action was completed and how many rows were affected, it is exactly the one that you see after each command while using psql.

At the moment, we are not doing any action, so we simply return as “tag” the first token of the query. This is wrong in the general case, but a reasonable approximation in most cases.

Finally, we need to send another message, the ReadyForQuery message. Remember that this message indicate that the server is ready to accept a new query and it must be send before the client will send a new query.

Send data to RediSQL

At this point all we need to do is to send the query to RediSQL.

All we need to do is to use the standard redis client for python and simply execute the command against a default RediSQL database.

The very first step is to create a connection to redis, and this can be done when a client connect to our proxy.

At this point we just send the query to RediSQL.

    def _execute_query(self, query):
        result = self.redis.execute_command("REDISQL.EXEC", "DB", query)
        firstToken = query.split(' ')[0]
        if firstToken.upper() == "INSERT":
            numberInserted = result[1]
            return "INSERT 0 " + str(numberInserted)
        return firstToken

RediSQL returns already the number of row touched during a query, and we can use this information to return a more informative tag that simply the first token of the query itself.

Errors made

The biggest time sink of today was to don’t delete the last byte when isolating the query.

At the beginning, when isolating the query, we were just keeping everything from the 5th bytes forward, including the NULL terminator.

Sending a NULL terminator was causing RediSQL to crash, and indeed I open up an issue about it.

It was quite nasty to debug because we couldn’t see any difference between the query that was coming from psql and the one that we manually tried.

Eventually, printing out also the length of the queries, it became clear that the queries from psql where one byte longer that the queries we try manually, the difference was the NULL terminator.

Conclusion

The next day we will manage simple SELECT statements.

If you enjoy the post follow me on twitter or subscribe to the mail list 🙂

All post of this serie: Writing a Postgres proxy.

Newsletter

We publish new content each week, subscribe to don't miss any article.

Leave a Reply

Your email address will not be published. Required fields are marked *