Query String Hell

zzzzBov

You've been handling query strings wrong. It's OK, so has everybody else. When I say "everybody else" I mean that I've yet to find an accurately conforming library in any language that I use. I think it's finally time to cover all of the complicated parts of query strings in hopes that someday, someone will get them right.

In the beginning

Lets start from the beginning. Query strings are the part of the URL that contains key-value pairs of data to pass to server-side scripts. Query strings can also be used as a means to encode key-value pairs into a single string value.

In URLs, query strings follow the ? character, and come before the optional document fragment, which begins with #. In JavaScript, the query string of the current page can be accessed using location.search.

Query strings generally follow a very simple convention. Each key and value is separated by an equals sign (=), while each key-value pair is separated by an ampersand (&).

For example, a key of foo with a value of bar would be represented as:

?foo=bar

If an additional key of fizz and a value of buzz were to be added to this, the representation could be:

?foo=bar&fizz=buzz

Alternatively, the representation could be:

?fizz=buzz&foo=bar

You see, in query strings...

Order doesn't matter.

The order of the key-value pairs doesn't matter. Query strings are not ordered pairs. Query strings are key-value pairs. There is no guarantee of order, and one should not be expected. Relying on the order of key-value pairs will likely introduce bugs and significant frustration.

Special Characters

The next important issue to cover with query strings is what happens when you want a key or value to contain a special character. It's very straight-forward to have a key of foo and a value of bar, but it's not immediately obvious what's supposed to happen when the key contains an equals sign or the value contains an ampersand.

For example, a key of foo= and a value of bar&baz=fizz would turn into foo==bar&baz=fizz unless something special happens to the key and value to prevent them from being interpreted as the delimiting characters.

This is where URL encoding comes in. URL encoding (also known as Percent Encoding) is used to encode special characters such that they retain their value when used within a query string. Encoding for a query string uses the following rules:

  • Alphanumeric characters (a-z, A-Z, 0-9), period (.), hyphen (-), tilde (~), and underscore (_) are left as-is.
  • Space ( ) is optionally encoded as either a plus sign (+) or %20.
  • All other characters are encoded as hex representations of the form %HH, with any non-ASCII characters encoded as UTF-8 first.

This means that the previous example where the key was foo= and the value was bar&baz=fizz would be represented by:

?foo%3D=bar%26baz%3Dfizz

Alternative representation

When discussing query strings, it can often be useful to display data in an alternative representation to show equivalence. JSON generally works well for representing query strings in an alternative understandable manner. The previous example could be represented in JSON as:

{
    "foo=": "bar&baz=fizz"
}

For many examples in the rest of this post, I will use JSON to represent the data being stored within query strings.

Don't forget about semicolons

With query strings using & characters to separate key-value pairs, HTML authors often forget to properly escape the query string & characters as an HTML entity (&). To avoid adding additional unnecessary characters, query strings are allowed to use semicolons (;) in place of ampersands (&). This means that:

?foo=bar&fizz=buzz

may optionally be represented as:

?foo=bar;fizz=buzz

Unfortunately, this feature is very poorly supported to the point that it's not worthwhile to use ; characters in place of &, as most servers will not correctly parse the query string.

No questions

Another feature which is optional (in some contexts, I will elaborate more on this later) is the initial question mark (?). The question mark is used to signify the start of the query string within a URL. For places where a query string is being used as a stand-alone string, the question mark can often be left off without issue.

?foo=bar

is representationally equivalent to:

foo=bar

No values

If you've followed me so far, congratulations. There's a lot to know about query strings and you're doing quite well. We're only just getting started.

You know how query strings are key-value pairs? Well, what happens when the value is empty? Specifically, what happens when the value is the empty string? In JSON this is easy to represent as:

{
    "foo": ""
}

Query strings don't lend themselves to delineating where keys and values begin and end. The query string representation of the previous object is simply:

?foo=

Note that nothing follows the equals sign. That's because the value is a string of length zero. If we were to add another value to the object, in JSON it would look like:

{
    "foo": "",
    "fizz": "buzz"
}

and it would be represented as a query string as:

?foo=&fizz=buzz

or of course

?fizz=buzz&foo=

because, as a reminder, order doesn't matter.

Null values

It's one thing to have a key which contains a value of empty string. It's another to have a key which does not have a value at all. In most programming languages, null is used to represent the concept of not having a value. In query strings, keys are separated from their values with an equals sign (=). If no value exists to be separated from, no equals sign is necessary. This means that:

{
    "foo": null
}

may be accurately represented as:

?foo

Note the subtle difference between ?foo= and ?foo. The former has a value of , while the latter has a value of null.

Adding another key-value pair leads to a JSON representation of:

{
    "foo": null
    "fizz": "buzz"
}

and a query string representation of:

?foo&fizz=buzz

or

?fizz=buzz&foo

No keys

Just like values, keys may also be empty. As the key is the part before the equals sign in each key-value pair, it makes sense that the following JSON object of:

{
    "": "bar"
}

would be represented as:

?=bar

Unlike values, it's not possible to have a key of null. If a key exists, it has a value, otherwise the key doesn't exist and there is no associated value.

No keys, No values

It's time to mix and match. Not only can you have empty keys and empty values, you can have empty keys with empty values.

{
    "": ""
}

becomes:

?=

And adding another key-value pair turns:

{
    "": "",
    "fizz": "buzz"
}

into:

?=&fizz=buzz

No keys, Null values

While we're on the topic of mixing and matching, we might as well take it one step further and consider the case where an empty key has a null value.

What happens?

If the JSON representation is:

{
    "": null
}

What's the query string representation?

Well, since it's using the empty key, there is no content before the equals sign, and since it's using the null value, there's no equals sign, so that means the query string representation is:

?

Yep, that's it. Just ?. This is the important case where the initial ? is not optional. A query string of has no keys and no values, while a query string of ? has a key of   and a value of null.

To verify this behavior we can use a bit of induction.

Lets start with a simple set of two key-value pairs:

{
    "foo": "bar",
    "fizz": "buzz"
}

The query string representation is:

?foo=bar&fizz=buzz

If we remove the second key-value pair, we're left with:

{
    "foo": "bar"
}

Which produces a query string of:

?foo=bar

Note that the trailing &fizz=buzz was removed.

If we now consider the case where the first key is empty, and the value is null, the JSON representation is:

{
    "": null,
    "fizz": "buzz"
}

and the query string representation is:

?&fizz=buzz

Note that the & comes immediately after the ? which indicates that there is an initial key, which is empty as there are no characters before the &. Additionally the value is null as there is no = present to indicate any value.

If we then remove the second value, the JSON representation becomes:

{
    "": null
}

and the query string representation becomes:

?

Same key, over and over

What I consider the most commonly overlooked issue in query strings is the case where the same key is used multiple times. So far I've only covered cases where each key is unique, however there is no such restriction on query strings. The result of duplicate keys is that the value is an array containing all of the values paired with the key.

This means that:

?foo=bar&foo=baz

is not only valid, it should be parsed as:

{
    "foo": [
        "bar",
        "baz"
    ]
}

or

{
    "foo": [
        "baz",
        "bar"
    ]
}

Remember, order doesn't matter. If the order of values matters, a different encoding which preserves order should be used, such as CSV.

All together now

So, to recap:

  1. keys and values are sparated with =
  2. key-value pairs are separated with & or ;
  3. Order doesn't matter
  4. Special characters are URL encoded
  5. The initial ? is optional if the query string contains a key with a length greater than zero, or multiple keys
  6. Values may be empty strings or null
  7. Keys may be empty strings, but not null
  8. ? has a key of "" and a value of null
  9. The same key may be used multiple times to represent an array of values

So, how did I do? If you have any questions, concerns, comments, or corrections, feel free to leave me comments down below!