Validate Python string translation in Transifex

Transifex already supported validating translations of old styled Python strings, e.g.,

"A sample string with a %(keyword)s argument." % {'keyword': 'key word'}

The validation is done by checking if all the positional and keyword arguments are present in the translation string and the translation string does not contain any extra argument which is not in the source string. You can have a look at the validator code here.

However, the existing validator is not able to check for replacement fields in new style Python format strings, e.g.

"This is a sample string with different replacement fields: {} {1} {foo["bar"]:^30}".format(
"arg0", "arg1", foo={"bar":"a kwarg"})

I tried to devise a regex to extract the replacement fields in the Python format string based on the grammar defined here.

# Regex to find format specifiers in a Python string

import re

field_name = '(?P<field_name>(?P<arg_name>\w+|\d+){0,1}'\
conversion = '(?P<conversion>r|s)'
align = '(?:(?P<fill>[^}{]?)(?P<align>[<>^=]))'
sign = '(?P<sign>[\+\- ])'
width = '(?P<width>\d+)'
precision = '(?P<precision>\d+)'
type_ = '(?P<type_>[bcdeEfFgGnosxX%])'
format_spec = ''\
    ')' % {
        'align': align,
        'sign': sign,
        'width': width,
        'precision': precision,
        'type': type_
replacement_field = ''\
    '\}' % {
        'field_name': field_name,
        'conversion': conversion,
        'format_spec': format_spec

printf_re = re.compile(
    '(?:' + replacement_field + '|'

Well, with the above, I was able to parse almost all the cases discussed here except for this one:

import datetime
d = datetime.datetime(2010, 7, 4, 12, 15, 58)
s = '{:%Y-%m-%d %H:%M:%S}'.format(d)

I was not sure how I could fit the above case to my regex. After some discussions in #python on IRC, I found some limitations of regular expressions and that it is not Turing complete. People suggested me to use some parser tools.

I, being a strong supporter of “Never re invent the wheel”, gave another shot to find some existing solution and lucky I was to come across _formatter_parser() of a Python string object.  It correctly found all replacement fields in python format strings properly and returned  an iterable of tuples (literal_textfield_nameformat_specconversion). All I needed then was to convert this info to a list of replacement fields in a format string. A simple script below would is all that I needed to extract replacement fields in a format string in Python:

replacement_fields = []
s = "{foo:^+30f} bar {0} foo {} {time:%Y-%m-%d %H:%M:%S}"

for literal_text, field_name, format_spec, conversion in \
    if field_name is not None:
        replacement_field = field_name
        if conversion is not None:
            replacement_field += '!' + conversion
        if format_spec:
            replacement_field += ':' + format_spec
        replacement_field = '{' + replacement_field + '}'
print replacement_fields
["{foo:^+30f}", "{0}", "{}", "{time:%Y-%m-%d %H:%M:%S}"]

That’s all. Simple and easy, isn’t it?

FUDCON KL 2012 Day 2

Day 2 of FUDCON KL started with a talk on Fedora book by Joshua Wulf (for me Sitapati Prabhu). This idea is very intuitive for any one (especially newbies) to start contributing to Fedora documentation. Although, contributors need to know some basics of docbook, I guess that’s not tough.

There were many interesting talks for the day on topics like Ask Fedora, Transifex, Fedora Tour, etc. Soumay‘s talk on Fedora in Education was really inspiring. He shared how he initially started as a contributor and many things about DGPLUG‘s Bijra project. Following this talk, there was another talk by Soumya on Ask Fedora. This one explained to the attendees what is Ask Fedora, why was it needed and how it can be used. Soumya also explained to the attendees that Ask Fedora runs on Askbot (an Open Source QA forum) and encouraged people to contribute to Askbot.

After lunch, it was time for my and Mahay’s talk on Effective localization Crowdsourcing (using Transifex). Mahay stared the talk explaining to the attendees about localization, internationalization and it’s importance. This helped set the scene for the entire talk. Then I spoke on the various gotchas in the traditional localization workflow and how Fedora tried to get rid of these gotchas using Transifex. I explained to people about Transifex, what it does and why it’s so awesome. I also explained about the various super cool features in Transifex like crowdsourcingproject management, release management, Translation Memory, glossary, etc. I also told the attendees about how to contribute to Transifex.

Well, that was not all. It was followed by another session on How to internationalize and localize softwares. I took example of a simple Django app and explained how to i18n (using gettext) the app and extract the source POT file. Then I showed how to localize it using Transifex. I gave the attendees a walk through from creating a Transifex account to creating a project, resources, releases, forming teams, translating and finally to downloading translations and deploying it in their app. I also mentioned about other i18n methods available for different languages and directed people to the necessary resources. With this, I finally concluded my talk.

After the talk, it was time for some tea and then lightning talks. Christoph Wickert‘s talks on LXDE and Clouds were super cool. Michel also spoke on the ROX DE and ROX filer. We had some post session discussions and some group pics.

Today was the day for FUDPub. It was supposed to start at 8 PM at Sri Petaling Hotel. We reached the hotel and had some rest. Then we moved downstairs to join FUDPub. It was hell lot of fun out there.

Dear Turkish translators

Transifex usually defines plural rules for languages according to So, the plural rule for Turkish language in Transifex is other → everything. However, lately there has been some requests that the Turkish language should have two plural forms:

nplurals=2; plural=(n>1)

The requests have been with reference to

Here is a quote from a user at

Turkish behaves like Akan for example. The rule should be:

One: 0, 1 Other: 2-999

It is only when including a count that there are no plural forms. For example:

“You posted a photo”, “You posted several photos”

is correct in Turkish, as is:

“You posted 1 photo”, “You posted 6 photo”.

So, dear Turkish translators, please share your opinion on this issue. This will help a lot to resolve this issue at Transifex and fix plural translations in Turkish language.

#Transifex now supports comments in Apple .strings i18n files

#Transifex now supports comments in Apple .strings i18n files. Only /* foo */ style comment in the line preceding the key value pair in the source file is saved as a comment for the key. The example below will explain this in a better way:

/*Comment for key1*/
"key1" = "value 1";

/* This comment will not be
included in key2*/

/* comment for
"key2" = "value 2";

/* this comment will not be included in key3*/

"key3" = "value 3";

Well, I’m pretty sure that the above snippet explains which comments from source Apple .strings file are saved by Transifex. You can see the comment for a source string in its “Details” section in Lotte.

Comment for a source string imported from a source Apple .strings file