Friday, May 25, 2007

Oracle tip - using CLOB columns

Recently I had to investigate about the usability of the CLOB (Character Large OBjects) columns in oracle database tables. Actually the requirement was to store a ralatively larger string values into the table column. I discuss my experience with the task here.

When creating a LOB (either CLOB or BLOB) column, oracle offers 2 ways of storage to choose from.

  1. In-line storage
    Data is stored within the table row. For the values which are greater than 3964 (from reference) bytes, oracle stores the data in a different place (most probably a separate table space) and keeps a pointer to that location within the row.
    The clause ENABLE IN ROW is used to select this option.

  2. Out-of-line storage
    Data is always stored outside and the location pointer is stored in the table row.
    DISABLE IN ROW is used to choose this option in the storage clause.
Both options have pros and cons. I had looked at them in different aspects to choose the right one for my purpose.

Performance: Option 1 will give better response time when saving and retrieving a row. This can be even better when saving a value of size lesser than the maximum limit for in-line. So, if the application gives more important to the performance, it should be option 1.

Storage: Amount of space occupied by the same value in both options are different. I got to know this through experience. Option 2 uses more space than it is required for option 1. This implies, if the application's database can be allocated large storage space, going for option 2 should not be a problem.

There are other storage parameters like CHUNK SIZE which affect the performance and storage. But, I don't want to discuss that here. Oracle reference can give good knowledge on that.

In my situation, I had to choose a method where performance is more important and the space usage is minimal. The avarage size of the sample values was also a factor, where it was almost 4000 bytes. So, I decided to go for option 1, which seemed to be the most appropriate one.

However I want to say one experience based fact here. When VARCHAR2 and CLOB (in-line) are used to store the same column value in a table, the later occupies much more physical storage space than the former (even double). We shouldn't just go for CLOB since it offers more room. In conclusion I would say, VARCHAR2 should be considered first in every angle before thinking about CLOBs.

Thursday, May 24, 2007

Dinner with Henrik

Our Service Manager, Henrik Celinder has been here with us for more than a week. Following IFS culture, we went for a team outing, a dinner on the 22nd of May. It was another fun team event that we enjoyed. The restautant was the 'Cafe Asiana', in Hill Street, Dehiwala this time. Even though the food and the serving standard were not in top level, we had a good time with Henrik.

I have shared some photoes in my photo site.

Dinner with Henrik

We discussed lot of things including the europian special 'handball' game and different food menus like greek and Lebanese.

Thursday, May 17, 2007

Cost effective localization

In todays world of software applications, localization plays a vital role. Almost all the web sites tend to localize their contents to enjoy better reach among their target audience. Our global company web site; www.ifsworld.com, for example, maintain seperate sites for many countries. All the sites share the common design skeleton but they consist of localized contents in local languages to suit the people of the country. This is the same for the stand-alone applications delivered to different countries; They are localized.

Translation is one of the major aspects of localization. A user can better interact with a software application when it is native to him/her. That is, when the application communicates with the user in his/her mother tongue and in native ways. To make the end-user feel more happier, the developers put more effort in the design and implementation of the software. There can be lots of things which hinder this; cost is the worse and annoying constraint which can affect the software quality.

With respective to localization, the cost for translation can significantly affect the price of the software delivered. To be cost effective, the company should adpot proper methodlogies to reduce the money spent on translation. It is not a good idea to minimize the level of localization. But, it is possible to minimize the number of words (for which the translators charge) to be translated in the application. But again this should not affect the level of localization.

During the development of a software application, the attributes which need to be translated are first collected. They are analyzed for duplicates, to get the distinct values. Finally for each of these attibutes, translators are consulted to localize the attributes. These are the typical tasks involved in localization of an application through translation. However, this method is acceptable when the collection of attributes is small and provided that there is a way to find out the duplicates.

But in an enterprise system, huge number of attributes are involved. Enterprise systems are generally component based and each component is independent from the other. It is not possible to identify the duplicates as it's done for smaller systems. It requires proper organization of data and a seperate system; a cost effective system to handle the translations.

Discussing about such a system; firstly, a scanning system is required to build a repository of all the attributes. The word 'term' is introduced and it is defined as a single translatable item in the system. A term can have several forms of translations such as short, long or full. For each form of translation there exists translation in every languages used in the system. As the second step, a term base is organized with terms retrieved from the attributes. This is actually a generalization process. Attributes are generalized to create terms. Thirdly, each and every attribute from the repository is bound to a suitable term in the term base. This bond will also determine which form of the term translation is required by the attribute. The final result is that there are many attributes pointing to one term (this is the whole purpose actually). Finally, all the bound terms are sent to the translators. When this is done in a typical enterprise system, no. of attributes is to no. of terms ratio becomes around 5. This leads to more than 80% translation cost reduction in the system.

So, the localization should not only be effective but also be cost-effective for the product to survive. Thats why the localization has become an integral part of the enterprise systems nowadays.

Thursday, May 10, 2007

Web 2.0 awards - 2007

seomoz.org says, it has reviewed and ranked hundreds of web 2.0 sites and awarded the best sites under 41 categories; 3 places (first, second and third) exist per category. Interviews with the founders of some winners have been hosted.

Digg (Social News), LinkedIn (Professional Networking) , del.icio.us (Social Tagging) and Technorati (Blog Guides) are among the winners.

Check out the list:
SEOmoz's Web 2.0 Awards