cover image for post 'Full Control over HTTP Requests Headers in Python'

Full Control over HTTP Requests Headers in Python

Using the requests and HTTPX library
Table of Contents
Cover Image Backdrop by Renè Müller on Unsplash

Malicious servers, and even some CDN, are using HTTP headers to fingerprint clients and will rejects queries based on the presence of headers, their order, or even the capitalization of header names. Many of these features shouldn’t matter according to RFC 2616, which is probably why it is not straightforward to get full control over HTTP headers using Pythons request and httpx library.

In this blog post I’ll show how to achieve the following 4 aspects when it comes to HTTP headers:

  • removing any header, including those that should normally be sent, like User-Agent,
  • ordering headers in a particular order,
  • defining the case of http headers (e.g., USER-AGENT instead of User-Agent), and
  • allowing duplicate headers (for instance two Set-Cache headers).

It’s possible to achieve all four points with the more modern httpx, except changing the order of the Host header or removing it altogether. In order to do that, you need to monkey patch some functions.

The very popular requests library can’t do duplicate headers due to due to the underlying OrderedDict datastructure, but you can control the remaining 3 aspects.

libraryremoveordercaseduplicate
requests
httpx✔*✔*

* works except for Host, which requires monkey patching.

Requests

Here is how to send only the specified headers, in the specified order and with specified capitalization. Duplicate header names are not possible. Note that you need to create a session first and clear its headers. Otherwise, some headers will not respect the specified order.

import requests
from urllib3.util import SKIP_HEADER
from collections import OrderedDict


# reset default headers
headers = OrderedDict({
    "Host": SKIP_HEADER,
    "User-Agent": SKIP_HEADER,
    "Accept-Encoding": SKIP_HEADER,
    "Accept": None,
    "Connection": None
})

# add the desired headers here in order, duplicate keys are not possible
headers.update(OrderedDict([
    ("Host", "www.should-come-first.home"),
    ("Accept", "*/*"),
    ("User-Agent", "Should come last"),
    ("Cache-Control", "no-cache")
]))

# you need to create a session first and clear the headers
s = requests.Session()
s.headers = {}
r = s.get("http://test.home", headers=headers)

HTTPX

Here is how to fully control headers in HTTPX. Note that the two ugly monkey patches are only necessary if you want to omit the Host header (patch _validate) or not place Host first (patch write_headers).

import h11
import httpx
from h11._util import validate


# this monkey patch is only needed, when you want to remove
# the 'Host' header
def _validate(self):
    """remove validation for missing Host header"""
    validate(h11._events.request_target_re, self.target, "Illegal target characters")


h11._events.Request._validate = _validate

# this monkey patch is only needed, when the 'Host' header
# should not be the first header
def write_headers(headers, write):
    """put the Host header at the specified place, not at the top"""
    raw_items = headers._full_items
    for raw_name, name, value in raw_items:
        write(b"%s: %s\r\n" % (raw_name, value))
    write(b"\r\n")


h11._writers.write_headers = write_headers

# using a client instance is the recommended way to use httpx
with httpx.Client() as client:
    headers = httpx.Headers(
        [
            # put your headers here
            ("B", "1"),
            ("A", "3"),
            ("a", "4"),
            ("R", "5"),
            ("Host", "test.home"),
            ("Cache-Control", "no-cache"),
            ("Cache-Control", "no-store"),
        ]
    )
    request = client.build_request("GET", "http://test.home:8008")
    request.headers = headers
    r = client.send(request)