How to fetch remote data with the App Engine

You can use the App Engine to retrieve web pages or other information from foreign servers and process them in your python program.

This must obviously be limited, because the abuse potential is very high. Therefore you can not call all possible Internet services. You can retrieve only “web pages” or everything that a Web server is offering on port 80 or 443.

The module urlfetch implements it. A simple usage is:

response = urlfetch.fetch(url)
content = response.content

If everything goes well, this is enough. So far, this is very convienent. But in most real world scenarios, you need to handle all kind of error conditions. Like in this skeleton:

    try:
        response = urlfetch.fetch(jadurl)
        if response.status_code == 200:
            self.xxheaders = response.headers
            self.xxcontent = response.content
            self.xxFetched = True
        elif response.status_code == 404:
            self.errortext = "File not found"
            self.xxcontent = "not found"
        else:
            self.errortext = "Bad Response Code"
            self.xxcontent = response.status_code
    except InvalidURLError:
        self.errortext = "Invalid URL"
    except DownloadError:
        self.errortext = "Error downloading file"
    except ResponseTooLargeError:
        self.errortext = "File to large"

HTTP-Headers

You can also access to the HTTP headers. For example, if you want to access the mime type:

response.headers['content-type']

In my particular use case I am also interested whether a header is sent twice. It seems that you can not get this information.

SSL

You can also access SSL-protected pages with “https: / / . Unfortunately, according to the documentation as of version 1.1, the certificate is not checked. This means, the increased security is not there. This restricts the possible application range.

HTTP-Headers

SSL

Leave a comment