History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: KTS-560
Type: Bug Bug
Status: Closed Closed
Resolution: Won't Fix
Priority: Priority Five: Fluff Priority Five: Fluff
Assignee: Bryn Divey
Reporter: Jose Nevado
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
KnowledgeTree Community Edition

Accentuated characters do not show correctly

Created: 03/Mar/06 10:24 AM   Updated: 13/Jul/06 04:16 PM
Component/s: User interface: Layout, CSS, icon themes
Affects Version/s: 3.0
Fix Version/s: 3.1

Original Estimate: Unknown Remaining Estimate: Unknown Time Spent: Unknown
File Attachments: None
Image Attachments:

1. browse1.jpg
(37 kb)

2. browse2.jpg
(27 kb)

3. workflow1.jpg
(92 kb)

4. workflow2.jpg
(37 kb)

5. workflow3.jpg
(57 kb)
Environment: Windows 2000 SP4, PHP 4.4.0, Apache 2.0.54, MySQL 4.1.7


 Description  « Hide
Up to 3.0RC1 the words (for instance, the name of the users) containing characters where displayed correctly on the screen. Now, it is not so.

 All   Comments   Work Log   Change History      Sort Order:
Daniel Chalef - [09/Mar/06 08:36 AM ]
Could you please provide us with some screenshots of where this happens. Please ensure that you include the version number at the bottom of the screen.

Please provide us with the information from the "Admin>Miscellanous>Server Information" section.

Jose Nevado - [10/Mar/06 09:03 AM ]
I'd like to attach screenshots showing the old situation, but I'm afraid I can't since I "renamed" all the usernames including accents.

However, there were other places where the problem appeared; for instance, folders with accents showing correctly in kt 2.0.2 appeared as garbled in 3.0. I've also "corrected" this, so no screenshots can be attached.

I said "corrected" because now I'm experiencing the opposite situation.

In 2.0.2:

Accentuated folders/files and usernames were shown correctly both in the kt and windows enviroment (for instance, in emails)

In 3.0

Accentuated folders/files and usernames were shown correctly in the windows environment but incorrectly in KT.

Then, still in 3.0

I renamed (within KT) both usernames and folders (not documents!) using accents, so, now:

Accentuated folders and usernames are shown correctly in KT but incorrectly in the windows environment!!!

That is, the situation is the opposite to the first one I described. Now KT looks pretty, but windows shows garbage.

This problem appears also in workflows. States or transitions with accentuated characters show the same symptons. For instance, I have created a workflow with accentuated state names. Within kt, eveything is OK. But when a notification email is sent, the message contains garbage in every place where an accent has been used.

To show this I've included 5 images.

browse1.jpg +browse2.jpg - Depict a folder with accentuated characters in both kt and windows environment.
workflow1.jpg + workflow2.jpg + workflow3.jpg - Show a workflow definition (including accentuated states), a document (whose title is also accentuated) and an email notification.

To find the "errors" look for the red circles in the images.

Thanks.

Brad Shuttleworth - [10/Mar/06 09:05 AM ]
The incorrect encoding in Email was fixed as part of KTS-592

I'm not entirely sure what's wrong with the other files - the storage on the filesystem isn't shown to the end user, so that shouldn't be a problem. As far as I can tell the characters _are_ showing with accents on the actual web-pages, but I may be wrong. What is the difference between these page-views, and the ones shown pre-3.0.0?

Jose Nevado - [10/Mar/06 09:07 AM ]
Ooops!

I forgot to include the server information. I couldn't find the page you told me to check, but the closest one was:

Administration » Miscellaneous » Support and System information

So here's the info:

PHP Version 4.4.0

System Windows NT JUVENTUD 5.0 build 2195
Build Date Jul 11 2005 16:08:47
Server API Apache 2.0 Handler
Virtual Directory Support enabled
Configuration File (php.ini) Path C:\php4\php.ini
PHP API 20020918
PHP Extension 20020429
Zend Extension 20050606
Debug Build no
Zend Memory Manager enabled
Thread Safety enabled
Registered PHP Streams php, http, ftp, compress.zlib

Zend logo This program makes use of the Zend Scripting Language Engine:
Zend Engine v1.3.0, Copyright (c) 1998-2004 Zend Technologies

Jose Nevado - [10/Mar/06 09:22 AM ]
Brad,

thanks for the quick answer. Thanks also for the comment about the patch concerning emails.

The point about accentuated documents/folders/usernames is migration.

I'm testing the 3.0 release on a tiny subset of our actual dms, which is still running the 2.0.2 version.

This means that renaming some user names and folders was an easy task; I did it myself in a few minutes. Renaming just three or four folders and a couple of user names was enough to check what happened.

But if I migrate the WHOLE dms, then 3.0 will show (in the web pages) dozens of folders with mangled names. The same will happen with the 30 or so usernames. Of course, renaming all these folders and usernames to make 3.0 show the correct accentuation will be a rather big effort.

I insist: when using 2.0.2 I could use accentuated characters everywhere and the windows names were the same both in windows and kt. With 3.0, something has happened and, if I don't rename eveything, kt shows mangled names.

That's the point! Of course, If I would begin to use kt NOW from scratch I wouldn't be worried about the windows representation of folder / document names. It wouldn't matter how they would be named there providing that KT would show them correctly...

Thanks again.

Neil Blakey-Milner - [10/Mar/06 10:00 AM ]
Hi Jose,

Okay, I understand the problem now. In KT2, we left it up to PHP to set the character set, which means that most people probably ended up entering things in ISO8859-1.

Now that we've changed everything to UTF-8, this sort of thing will be a problem. When written to the database in ISO8859-1, and then byte-converted to UTF-8, we're going to have a problem. This is why the folders and documents have different names on the filesystem now.

We're not going to try convert UTF-8 to a specific character set to store things on the filesystem - for one thing it will break all current KT 3.0.0 installs.

Basically, what's needed is a script to:

a) Read every string in the database in, do a character set conversion to UTF-8, and write them back to the database. I imagine we'll focus on document names, folder names, user and group names, and workflow names.
b) Read every file and folder name on the file system, do a character set conversion to UTF-8, and rename the files and folders on the filesystem.

In terms of files and folders, it may be better to convert to the new storage provider that doesn't emulate folder and file names onto the filesystem (to avoid problems encoding the file and folder name to the host operating system's filesystem).

To write and test the script, you will probably need to send us an example KnowledgeTree instance (DB and documents archive) generated from 2.0.2 that exhibits these problems. We can then run the upgrade and then get the script to fix the problem.

You can send that to me directly on nbm@ktdms.com if you want to keep the contents relatively private.

Regards,

Neil

Jose Nevado - [10/Mar/06 11:06 AM ]
Neil,

thanks alot for your interest.

I will prepare what you are asking for. It will consist of:

- A folder hierarchy containing a few folders and documents, some of them named using accentuated characters. I'll zip this to reduce the amount of data to send to your email account.
- A dump of the MySQL database in sql format.

Let me know if you need anything else.

However, it will take some time (now my boss is pressing me about other issues :-( ) I expect to have the data ready by the middle of next week.

And, yes, I'll use your email account to send you the information.

Thx.

Jose Nevado - [13/Mar/06 12:44 PM ]
Neil,

I've just sent to you an email with the data you requested.

Hope it's enough. Otherwise, let me know.

Thanks for your interest.

Jose.

Brad Shuttleworth - [27/Mar/06 01:12 PM ]
Hi Jose,

We're still looking at this, but its a rather complicated upgrade/translation process.

Jose Nevado - [28/Mar/06 08:28 AM ]
Thanks alot, Brad. I'll wait.

Jose.

Zakariah - [24/Apr/06 01:19 AM ]
As already noted, the correct fix is to convert things in the database to UTF-8.
     However, I found a quick fix that is adequate for my purposes, until an upgrade comes.
     I simply changed the code to declare that the data being downloaded is in the charset "windows-1252", which is the registered charset name that mysql calls "latin1" and uses if the charset has not been otherwise declared. If some of your users do not use clients supporting windows-1252, iso-8859-1 is similar enough to be helpful, although not identical.
     The place to change the charset declaration for document titles is in /lib/templating/kt3template.inc.php , where it has

header('Content-type: text/html; charset=UTF-8');

Just change UTF-8 to windows-1252 .
       This fixed the problem for accented characters in document titles. I have not dealt with such characters in user names or file names.
      I look forward to a better fix when it comes.
  --Zakariah

Brad Shuttleworth - [13/Jul/06 04:07 PM ]
3.1 has had a massive overhaul of UTF-8 and foreign character handing. This should be fixed in the upcoming release.

Neil Blakey-Milner - [13/Jul/06 04:16 PM ]
The real problem here is the 2.0.8 to 3.0.0 upgrade. Unfortunately we don't have the capacity to convert from the potentially unknown character set in 2.0.x, and the UTF-8 character set stored in 3.0 and above.