- Speed, up to 130,000 insert per second
- Familiarity, it support standard SQL, no weird dialects
- Simplicity, it is very easy to operate and to use with binding for any language.
In this series of post we are writing a postgres proxy that accept connections made using the postgres (PG) protocol and forward them to RediSQL.
Motivation and introduction of the project are here.
Intro
This day we will isolate simple queries from Postgres and we will send them to RediSQL. We will focus on getting the text and send it to RediSQL and return very basic “tags” to the PG client.
Theory
One of the great thing of SQL is that its queries are just text that don’t need formatting or to obey to some weird identation rules (YAML anybody?).
Indeed PG accept queries as simple strings, parse them (return an error if the syntax is wrong), and then execute the query.
This helps a lot while creating a proxy, all we need to do is to get the text that represent the query and send it to the server.
In this second day we will focus on how to get the text from a query and how to return to the clients.
We will not cover SELECT
queries, but only statements like INSERT
, DELETE
or CREATE TABLE
.
The Query message
When a PG client send a query it uses a specific message, the Query
message.
The format of the query message is quite simple, it starts with the byte Q
then an Int32
that indicate the total length of the message, and finally a string that contains the query itself. The string is NULL \x00
terminated.
Simpler that this would be impossible.
In order to isolate the query we do the simplest possible thing.
We just isolate everything, but the last byte, after the 5th byte in the message.
if self.state == "readyForQuery" and data[0] == Query:
lenght = int.from_bytes(data[1:5], "big")
strLenght = lenght - 4
query = data[5:-1].decode("utf-8")
Remember, 1 byte for the Q
that indicate the type of message, 4 more bytes for the length and the last byte is the NULL byte that act as terminator.
At this point we have isolate our query, but we still need one more step.
Even if our proxy didn’t forward the query to any SQL server yet, we still want to acknowledge to the client that we have read its query and that we execute it.
CommandComplete
In order to indicate that our command was successfully executed, we need to send back the CommandComplete
message.
CommandComplete
follow the usual convention and it is identified by the letter C
as first byte, then an Int32
to indicate the total length of the message and finally a String
that contains a “tag”.
We construct the CommandComplete
message with the “tag” (a string) as input.
def CommandComplete(tag):
lenghtTag = len(tag)
lengthMessage = lenghtTag + 1 + 4 # one for the \00 and 4 for the Int32
# assuming that tag is smaller than 256 bytes
bytesLenght = b'\x00\x00\x00' + bytes([lengthMessage])
bytesBody = bytes(tag, "utf-8") + b'\x00'
return b'\x43' + bytesLenght + bytesBody
The “tag” indicate what action was completed and how many rows were affected, it is exactly the one that you see after each command while using psql
.
At the moment, we are not doing any action, so we simply return as “tag” the first token of the query. This is wrong in the general case, but a reasonable approximation in most cases.
Finally, we need to send another message, the ReadyForQuery
message. Remember that this message indicate that the server is ready to accept a new query and it must be send before the client will send a new query.
Send data to RediSQL
At this point all we need to do is to send the query to RediSQL.
All we need to do is to use the standard redis client for python and simply execute the command against a default RediSQL database.
The very first step is to create a connection to redis, and this can be done when a client connect to our proxy.
At this point we just send the query to RediSQL.
def _execute_query(self, query):
result = self.redis.execute_command("REDISQL.EXEC", "DB", query)
firstToken = query.split(' ')[0]
if firstToken.upper() == "INSERT":
numberInserted = result[1]
return "INSERT 0 " + str(numberInserted)
return firstToken
RediSQL returns already the number of row touched during a query, and we can use this information to return a more informative tag that simply the first token of the query itself.
Errors made
The biggest time sink of today was to don’t delete the last byte when isolating the query.
At the beginning, when isolating the query, we were just keeping everything from the 5th bytes forward, including the NULL terminator.
Sending a NULL terminator was causing RediSQL to crash, and indeed I open up an issue about it.
It was quite nasty to debug because we couldn’t see any difference between the query that was coming from psql
and the one that we manually tried.
Eventually, printing out also the length of the queries, it became clear that the queries from psql
where one byte longer that the queries we try manually, the difference was the NULL terminator.
Conclusion
The next day we will manage simple SELECT
statements.
If you enjoy the post follow me on twitter or subscribe to the mail list 🙂
All post of this serie: Writing a Postgres proxy.