Managing Haskell wreq efficiently
If you have ever needed to make HTTP requests from your Haskell code, chances are that you have used the wreq library. If your work was a one-off job, or you have been using it only for one or two requests or infrequently, you might not have noticed that wreq needs managing once you are making a lot of network requests.
In my workplace deployments, backend services make lots of HTTP requests. Specifically, they make multiple HTTP requests to the same server, and there are a group of servers that they talk to. For example, we make requests to Google, AWS, Azure and Digital Ocean cloud services, and to each we make multiple requests. I have noticed, if you do not use a HTTP session manager when making network requests of the above pattern using wreq, it :
- tends to use up significant memory, probably due to keeping so many TCP connections open
- tends to perform not so efficiently (tends to be slower), because it would setup and teardown an entire TCP connection for every request
- can even lead to TCP socket leaks (read more)
Usually browsers and other popular HTTP clients automatically manage the above
by keeping TCP connections open and re-using them. But in wreq you have to be
explicit about them.
Using wreq’s session manager
wreq has a Network.Wreq.Session module, which exposes a HTTP session manager.
The API is straight-forward, and is used like :
import Network.Wreq
import Network.Wreq.Session (Session)
import qualified Network.Wreq.Session as Sess
main :: IO ()
main = do
sess <- Sess.newSession
resp <- mkGetRequest sess
resp2 <- mkAnotherRequest sess
...
mkGetRequest :: Session -> IO ByteString
mkGetRequest sess = do
resp <- Sess.get sess "http://httpbin.org/get"
return resp
mkAnotherRequest sess = do
Sess.get sess "http://httpbin.org/get"
...
The wreq documentation recommends to use the manager if you’re making multiple
requests to the same server so that it can re-use TCP connections. But this
documentation is hidden away in the Session module separate from the other
main modules. That is why it is easy to overlook this.
Also, the newSession API creates a manager that manages cookies as well. That
is, any cookie sent by a server is sent back across requests (how browsers
behave) when using the same manager. This is not really desirable in backend
systems unless you’re dealing with having a user session. Wreq exposes another
API called newAPISession. The API usage is exactly same as newSession but
this just a HTTP manager without managing any of the cookies.
import Network.Wreq
import qualified Network.Wreq.Session as Sess
main = do
sess <- Sess.newAPISession
...
Underneath, wreq uses the HTTP Manager from the http-client package for
sessions. You can use the Manager directly from the http-client package as well.
Tidying things up
Finally, you would obviously not define functions that take the Session
explicitly in its argument. You should have a Reader monad constraint on your
functions and make the the HTTP session manager as part of your environment. Something like:
import Control.Monad.Reader
import Network.Wreq.Session (Session)
import qualified Network.Wreq.Session as Sess
type App r = ReaderT r IO a
main :: IO ()
main = do
sess <- Sess.newAPISession
res <- flip runReaderT sess $ do
resp <- mkGetRequest sess
resp2 <- mkAnotherRequest sess
...
print res
mkGetRequest :: (MonadReader Session) => IO ByteString
mkGetRequest = do
sess <- ask
resp <- Sess.get sess "http://httpbin.org/get"
return resp
mkAnotherRequest :: (MonadReader Session) => IO ByteString
mkAnotherRequest = do
sess <- ask
Sess.get sess "http://httpbin.org/get"
...
Wreq also has another problem. It throws exceptions when the response from the
server is a non-200 response. It also throws exceptions if the network
connection fails. In production code we need to handle this behaviour of wreq
as well to make if safer. But that I’ll probably discuss in another post.