23 Feb 2010

A Python script to upload certificate proxies to MyProxy server in a batch

A list of temporary certificates need to be uploaded to a MyProxy server for training purposes. It is troublesome to do it one by one. Luckily, the Globus myproxy-init command allows specifying passwords through stdin. This python script makes use of it to generate the passwords, save in a text file, and run the myproxy-init command with the password file redirected to the stdin.
Viola! keep this note for myself.


import os

count = 30
uname_prefix = 'username%02d'
upass_prefix = 'userpass%02d'
certpass = 'certpass'
file_prefix = 'certs/User'
cert_file = file_prefix + 'cert%02d.pem'
key_file = file_prefix + 'key%02d.pem'
hours = 24

for i in range(1, count + 1):
uname = uname_prefix % i
upass = upass_prefix % i
certfname = cert_file % i
keyfname = key_file % i
fo = open('passes','wt')
fo.write('%s\n%s\n'%(certpass,upass))
fo.close()
s = 'myproxy-init -s myproxy.ngs.ac.uk -c %d -l %s -C %s -y %s -S <passes'
os.system(s % (hours,uname,certfname,keyfname))

15 Apr 2009

From IP to geolocation (country, city, latitude, longitude) using Python on App Engine

As a social website, it is often necessary to detect where the user is located. The simplest way of doing this is to query a geolocation service for the user's IP address.
There're plenty of geolocation API services that support such queries. However, some of these services do not have enough couverage, others may not be reliable enough. In this case, we can't rely on a single service.  In this example, we use three APIs with the best one visited first.  The other two are reserved to fall back on in case the first is broken.


def detectUserLocationByIP(ip):
try:
gs = urllib.urlopen('http://blogama.org/ip_query.php?ip=%s&output=xml' % ip)
txt = gs.read()
except:
try:
gs = urllib.urlopen('http://www.seomoz.org/ip2location/look.php?ip=%s' % ip)
txt = gs.read()
except:
try:
gs = urllib.urlopen('http://api.hostip.info/?ip=%s' % ip)
txt = gs.read()
except:
logging.error('GeoIP servers not available')
return {}
try:
if txt.find('<Response>') > 0:
countrys = re.findall(r'<CountryName>([\w ]+)<',txt)[0]
citys = re.findall(r'<City>([\w ]+)<',txt)[0]
lats,lons = re.findall(r'<Latitude>([\d\-\.]+)</Latitude>\s*<Longitude>([\d\-\.]+)<',txt)[0]
elif txt.find('GLatLng') > 0:
citys,countrys = re.findall('<br />\s*([^<]+)<br />\s*([^<]+)<',txt)[0]
lats,lons = re.findall('LatLng\(([-\d\.]+),([-\d\.]+)',txt)[0]
elif txt.find('<gml:coordinates>') > 0:
citys = re.findall('<Hostip>\s*<gml:name>(\w+)</gml:name>',txt)[0]
countrys = re.findall('<countryName>([\w ,\.]+)</countryName>',txt)[0]
lats,lons = re.findall('gml:coordinates>([-\d\.]+),([-\d\.]+)<',txt)[0]
else:
logging.error('error parsing IP result %s'%txt)
return {}
return {'Country':countrys,'City':citys,'Latitude':lats,'Longitude':lons}
except:
logging.error('Error parsing IP result %s'%txt)
return {}

Country,City,Latitude,Longitude = detectUserLocationByIP(ip).values()

29 Mar 2009

YUI Editor with named anchor tags on IE7

It is strange that YUI editor does not work well with named anchor tags on the same page in IE7.  It took me quite a while to test it by removing elements here and there and found that the named anchor elements caused the problem.  While other browsers do not complain about it, it should be the IE7 compatibility issue.  Finally I found out that the named anchor element must have something inside even an empty string will do, just like <script> So I replaced <a name="somewhere"/> element with <a name="somewhere"></a>, and it worked perfect. It's difficult because the debugger only reported a runtime error from a YUI component which is rather misleading.

JavaScript parseInt with leading zeros

The JavaScript parseInt function converts a string-typed number to an integer. An interesting feature of this function is that it silently handles hex and octal numbers according to the prefix. This can cause unexpected results sometimes. For example,

parseInt('8') will return int 8, but
parseInt('08') will return int 0, because the leading 0 and 8 are regarded as the prefix of an octal number.

Fortunately, the function accepts a second parameter radix, and thus:
parseInt('08',10) will produce the correct integer 8.

It may be a good practice to always add the radix parameter to avoid the unexpected trouble.

24 Mar 2009

A tutorial on bulkloading data onto App Engine Datastore

A recent application migration to App Engine involves uploading a lot of old data from the SQL Server database into the datastore. The Google documentation only gives some simple examples, and there're two bulkloading tools available causing more trouble than help. I followed with bulkloader first and installed SQLite for it, but failed to make it work. Later on, I tried the bulkload_client tool and succeeded with it.


It is straightforward following the examples to do simple data uploading. You can just dump your database tables into CSV files, write a uploader class in Python to map all fields of a CSV row to datastore.Entity properties, and it should work. However, it's more troublesome if your data contain unicode strings, or if your model requires relationship, or referencing in Datastore terms.


1. A simple example:


Suppose we have a database table MyUser containing an ID as integer, a name as string and a mobile phone number as db.PhoneNumber type. And we have the entity class built like this:


class MyUser(db.Model):
id = db.IntegerProperty()
name = db.StringProperty()
mobile = db.PhoneNumberProperty()


The CSV file contains a number of rows with each row representing one user, and containing three fields separated with commas. For example:


1,David Johnson,31,07897601212
2,Mary Mann,24,0786321234


To upload this data, we need to write a Loader class which, like a Web request handler, stays on the server
and can be activated by a request from the client. It can be as simple as a couple of lines code.


from google.appengine.ext import bulkload
from google.appengine.api import datastore_types

class MyUserLoader(bulkload.Loader):
def __init__(self):
bulkload.Loader.__init__(self, 'MyUser', [('name',str),('mobile',datastore_types.PhoneNumber)])

if __name__=='__main__':
bulkload.main(MyUserLoader)


Our class MyUserLoader is a derived class from bulkload.Loader and the __init__ method (constructor) is responsible for specifying what data to upload into what entity class, in this case 'MyUser'. We call bulkload.main in the main program entry and pass our loader class name as its parameter. The most interesting part is the list that maps a data row in the CSV file. Each field in the list is a tuple of two elements, and the first element is the name of the property that will be put in the entity, and the second element tells the data type, it's actually a function that transforms the string from CSV to the value accepted by that property. In this example, 'name' is a string, so we use str to indicate the Python builtin function
str(x) will be used where x is the first string separated by commas in a CSV line. The second tuple specifies
the property named 'age' accepts an integer which will be converted from the second string in a CSV line
using int(x) function. The third tuple 'mobile' is of type PhoneNumber, and for loader class, we must use
datastore_types.PhoneNumber for db.PhoneNumber type.
Now we can save this class as file myloader.py, upload it to the production server if not running on local machine.


Next, we need to add this to app.yaml file:


-- url: /loadusers
script: myloader.py
login: admin


Note that if you run it on local development machine, comment off the last line login:admin so that you don't
need a credential to run the bulkloader. If on GAE, you can browse to /loadusers, and copy the cookie string
and insert into the bulkload_client command line argument.


Finally, we can run the bulkload_client tool on the command line:


python bulkload_client.py --filename myusers.csv --url http://localhost:8080/loadusers

Uploading will be done with 10 rows each round.


2. Working with datetime type


Now let's add a datetime property to our entity class:


updateTime = db.DateTimeProperty()


It's a bit tricky with date time data values. They can be of different format as output in the CSV file.
For example, we have an ISO format datetime string like this: 2009-01-30 12:24:01. And we must
specify the format to make it parse correctly. We can use Python datetime function strptime.
And let's modify the myloader.py file to add the following:


from datetime import datetime

[('updateTime', lambda x:datetime.strptime(x,'%Y-%m-%d %H:%M:%S')]


We used a lambda inline function to do the conversion from string to datetime object, instead, we can also write a function like this:


def times2val(dt):
return datetime.strptime(dt, ','%Y-%m-%d %H:%M:%S')


Then in the list, we write:


[('updateTime', times2val)]

3. No empty values allowed but None


One problem that I met is that some fields are empty, but the property does not allow empty fields, for example, the PhoneNumber property.



When we have an empty string value for the mobile field, we'll get an exception. Although this property does not accept empty strings as input, it accepts None value. So, to solve this problem, we can define a function to return None if an empty string is given.



[('mobile', lambda x:x != '' or None)]

4. Use different encoding


We have Asian-language names encoded as UTF-8 in the CSV file, and when we upload it, it reports exception using the default encoding. So we must specify the encoding.


We need to do this for both StringProperty and TextProperty properties.


name = db.StringProperty()
bio = db.TextProperty()

lambda s:unicode(s, 'utf-8') or None
lambda s:datastore_types.Text(s, encoding='UTF-8')

5. Build relationship with referencing with a key


Suppose we have another entity called MyPost which has a Reference property to the MyUser entity.


class MyPost(db.Model):
postid = db.IntegerProperty()
title = db.StringProperty()
author = db.ReferenceProperty(MyUser)
content = db.TextProperty()


How should we write a function for reference? First of all, we must make sure that the MyUser entities are
already loaded before we upload the MyPost entities, and we can query the MyUser entities by user id.


u = MyUser.all().filter('uid =',id)


This query is based on index of uid, and we can make it even faster by adding a key for the entities. So let's modify the MyUser loader and load the data again. (We must delete the old data before uploading again, otherwise, the entities will be duplicated.)


Unfortunately, we do not have a straightforward method of specifying a key_name in the loader class. However, we can override a method called HandleEntity in the loader class. A entity parameter is passed over to this method, and the entity contains the fields populated for one row, and this is the place where we can post-process the fields before submitting the entity to datastore. And here can we add the key_name. To do that, we should construct a datastore.Entity instance and add the current entity to it.


def HandleEntity(self, entity):
keyname = 'u%d' % entity['uid']
new_entity = datastore.Entity('MyUser', name=keyname)
new_entity.update( entity )
return new_entity


We set the user id as the key_name, and prepended a letter 'u' because key_names can't start with a number.


Having a key_name, we can use Model.get_by_key_name method to rapidly fetch an entity.


def usertype(uid):
u = MyUser.get_by_key_name('u%d' % uid)
if u is not None:
return u.key()
return None

def tenc(s):
return datastore_types.Text(s, encoding='UTF-8') or None

fields = [('postid', int),('title', str),('author',usertype),('content',tenc)]

class MyPostLoader(bulkload.Loader):
def __init__(self):
bulkload.Loader.__init__(self, 'MyPost', fields)

if __name__=='__main__':
mulkload.main(MyPostLoader)


In app.yaml:


-- url: /loadposts
script: mypostloader.py
login: admin


There seems to be no way of accessing other field values inside the type function. For example, a multiple field primary key cannot be processed in this way. It should be preprocessed so that the key is already combined into one string.

17 Mar 2009

GAE-based ForumBlog: suitable for closed groups

Started for an alumni group of around 150 members, the social network tool is now deployed and put into beta testing. The main features include:

A single forum as a bulletin board;
Posts listed as personal weblogs in profile page;
Automatic detection of members location by IP;
Plotting members on Google map;
Shared address book with Sirname first letter indexing;
Shared photo albums;
Email notifications sent to admins group for new posts etc;
Other features to be added.

The obvious use case is for a small group to exchange information internally. I plan to make it open-source in the hope that it may be helpful to some other people. It is simple and easy to set up and use, and most importantly, it is free software and free hosting by Google as well.

10 Mar 2009

Choosing a JavaScript HTML editor

The beiwai82 BlogBBS needs a browser-based WYSIWYG editor.  Having googled for a while, I found quite a few choices.  Since I'm using jQuery as the JavaScript library, I prefer a jQuery plugin editor, but unfortunately, the one which is called jquery.rte is too simple to meet the requirements.  A powerful non-jQuery editor is TinyMCE.  Inspite of the name, it is not tiny at all, but rather heavy with a lot of plugins and toolbar buttons.  Obviously TinyMCE is powerful enough.  However, the problem is it is difficult to customise the toolbar.  The ready-made choices provide either too little or too much functionality.  In the end, I found Yahoo! UI is exactly the correct choice I expected.  Its advantages include:
  • YUI is a load on-demand library;
  • The simple editor is just taylored for blog writers;
  • YUI is in Google JavaScript libray;
Combining with Google Ajax loader, the code looks a bit messy:


<script type="text/javascript" src="http://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("yui","2.7.0");
google.setOnLoadCallback(function() {
init();
});
function init() {
var loader = new YAHOO.util.YUILoader({
require: ["editor"],
loadOptional: true,
onSuccess: function() {
var myEditor = new YAHOO.widget.SimpleEditor('textbox', {
 height: '400px', width: '100%', 
handleSubmit: true,
 dompath: true});
myEditor.render();
});
loader.insert();
}
</script>