You can use the App Engine to retrieve web pages or other information from foreign servers and process them in your python program.
This must obviously be limited, because the abuse potential is very high. Therefore you can not call all possible Internet services. You can retrieve only “web pages” or everything that a Web server is offering on port 80 or 443.
The module urlfetch implements it. A simple usage is:
response = urlfetch.fetch(url) content = response.content |
If everything goes well, this is enough. So far, this is very convienent. But in most real world scenarios, you need to handle all kind of error conditions. Like in this skeleton:
try: response = urlfetch.fetch(jadurl) if response.status_code == 200: self.xxheaders = response.headers self.xxcontent = response.content self.xxFetched = True elif response.status_code == 404: self.errortext = "File not found" self.xxcontent = "not found" else: self.errortext = "Bad Response Code" self.xxcontent = response.status_code except InvalidURLError: self.errortext = "Invalid URL" except DownloadError: self.errortext = "Error downloading file" except ResponseTooLargeError: self.errortext = "File to large" |
HTTP-Headers
You can also access to the HTTP headers. For example, if you want to access the mime type:
response.headers['content-type'] |
In my particular use case I am also interested whether a header is sent twice. It seems that you can not get this information.
SSL
You can also access SSL-protected pages with “https: / / . Unfortunately, according to the documentation as of version 1.1, the certificate is not checked. This means, the increased security is not there. This restricts the possible application range.