Difference between revisions of "Unicode"
From Linuxintro
imported>ThorstenStaerk |
imported>ThorstenStaerk (cat) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | = Understanding = | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
Clearly, [http://www.joelonsoftware.com/articles/Unicode.html every text file has an encoding], that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. [http://en.wikipedia.org/wiki/Unicode Unicode] defines every character in the world. | Clearly, [http://www.joelonsoftware.com/articles/Unicode.html every text file has an encoding], that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. [http://en.wikipedia.org/wiki/Unicode Unicode] defines every character in the world. | ||
Line 20: | Line 12: | ||
00000000 68 65 6c 6c c3 b6 20 77 6f 72 6c 64 0a |hell.. world.| | 00000000 68 65 6c 6c c3 b6 20 77 6f 72 6c 64 0a |hell.. world.| | ||
0000000d | 0000000d | ||
− | This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's [http://en.wikipedia.org/wiki/UTF-8 UTF-8 encoding] | + | This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's [http://en.wikipedia.org/wiki/UTF-8 UTF-8 encoding]. |
+ | |||
+ | = Doing = | ||
+ | Convert a file to UTF-8 | ||
+ | convmv -f iso-8859-1 -t utf8 -r --notest <datei> | ||
+ | recode latin1..u8 <datei> | ||
+ | |||
+ | Unicode text editor: | ||
+ | * yudit | ||
+ | |||
+ | = Configuration = | ||
+ | For [[php]]: /etc/php5/apache2/php.ini, key default_charset. | ||
− | + | For [[squirrelmail]]: set default_charset in config.php and config_default.php to UTF8. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | = See also = | |
− | + | * [http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256&utf8=0x&unicodeinhtml=hex&htmlent=1 Unicode character table] | |
− | + | * [http://www.joelonsoftware.com/articles/Unicode.html Joel on UniCode] | |
− | + | [[Category:Concept]] |
Latest revision as of 20:00, 1 January 2012
Contents
Understanding
Clearly, every text file has an encoding, that means, you must know if two bytes form one character to display, one byte, or the characters have mixed byte length. Unicode defines every character in the world.
Here is some practice: Store a file containing
hellö world
in file.txt. Do:
tweedleburg:~ # cat >file.txt hellö world tweedleburg:~ # cat file.txt hellö world tweedleburg:~ # hexdump -C file.txt 00000000 68 65 6c 6c c3 b6 20 77 6f 72 6c 64 0a |hell.. world.| 0000000d
This means, every "normal" character has been stored in 1 byte, every umlaut in 2 bytes. That is unicode's UTF-8 encoding.
Doing
Convert a file to UTF-8
convmv -f iso-8859-1 -t utf8 -r --notest <datei> recode latin1..u8 <datei>
Unicode text editor:
- yudit
Configuration
For php: /etc/php5/apache2/php.ini, key default_charset.
For squirrelmail: set default_charset in config.php and config_default.php to UTF8.