Evade Worker Timeout When There Is No Internet Connection

UPDATE: I found great requests-async library that I think can solve the problem, will make a post soon :)

Music workers (Starlette app running via Gunicorn with Uvicorn workers) were catching timeouts every time I or somebody else accidentally cough around router and that made the whole microservice unstable as hell. I thought that it was the time for improving it on many levels and fix this problem in 2.0.1 branch (right in the middle of a session, I am a fucking genius :D). Boy oh boy that was a journey, I finally made it and will write about what I've tried (most of which didn't work) and why it didn't work in my case.


Find The Cause of Timeouts

As far as I know - asynchronous code in Python 3.x should not take a long time to run because of other stuff that should be run on the same asyncio loop (I am new to asynchronous programming, hope I am not wrong). I ask Spotify Web API for current playing song every second using AsyncIOScheduler from apscheduler. This process: request to api.spotify.com, response parsing and sending custom song JSON to "song" Redis channel - should take a second or less. And imagine that when Internet is down - it tries to resolve api.spotify.com for ABOUT 5 SECONDS! Problem is found - DNS query.

Down The Rabbit Hole

I tried the obvious first - set timeout for request:

try:
    response: Response = SESSION.get(url, headers=headers, timeout=0.5)
except exceptions.Timeout:
    # ...

It still takes 5 seconds after turning off the Internet. I thought that maybe I am missing something, timeout is explicitly set but it does not work. I began to search on StackOverflow and in requests documentation and found that you can provide tuple to specify connect and the read timeouts separately. Maybe this was what I needed.

So I tried it:

try:
    response: Response = SESSION.get(url, headers=headers, timeout=(0.5, 0.25))
except exceptions.Timeout:
    # ...
except exceptions.ConnectionError:
    # ...

Nothing changed. 5 seconds as usual. But what can be the reason? I thought that maybe it has something to do with DNS and searched more for this kind of issues but with DNS keyword in them. In requests GitHub Issues I found issue #2347. TURNS OUT NOBODY TIMEOUT THE DNS QUERIES! Why there is no information about that in requests documentation - I don't know, maybe my use case is ultra-specific or something.

Okay, what should I do to timeout DNS query? You knew that socket.gethostbyname() method (that is used when you resolving hostname in your request's URL) does not have timeout? :) That was a surprise for me too. So I found out that there is a way with signals:

  1. Set SIGALRM handler
  2. Set timeout for SIGALRM
  3. Run socket.gethostbyname('google.com')
  4. If handler is called - we throw timeout error, but if it resolves - Internet is working and we just unset timeout.

Great plan! I found one implementation here and tried to make something similar in my app. Later I found out (wish I knew that sooner) that you can't send signals from non-main thread. Tried a bunch of timeout libraries from pip but nothing worked because song updating function was running as apscheduler job and was not in main thread.

When I was about to give up for today - I thought that maybe I can just use a dirty hack. Pinging Cloudflare DNS (1.1.1.1) to check if we are connected:

try:
    if os.system("/bin/ping -c1 -w1 1.1.1.1 > /dev/null 2>&1") != 0:
        raise exceptions.ConnectionError

    response: Response = SESSION.get(url, headers=headers, timeout=(0.5, 0.25))
except exceptions.Timeout:
    # ...
except exceptions.ConnectionError:
    # ...

It worked! Cheap and dirty trick, but with it in place there were no more worker timeouts. That was such a great moment. I love this part of programming :)

I Can Do Better

Pinging from Python is not the best solution I thought and started to research how to check for connectivity in Pythonic way™.  I found out that opening a TCP/IP connection with timeout and then closing it after is the way to go:

try:
    socket.create_connection(("1.1.1.1", 53), timeout=0.25).close()
    response: Response = SESSION.get(url, headers=headers, timeout=(0.5, 0.25))
except exceptions.Timeout:
    # ...
except (socket.error, socket.timeout, exceptions.ConnectionError):
    # ...

It is quick (takes about 12ms, I checked)  and reliable way to check for Internet connection in almost one line (if we don't count exceptions). Now Music microservice can survive Internet outage. It was a great place to start from and there are a lot more stuff to do for 2.0.1 :)

Happy Experimenting!

comments powered by Disqus