Bite Size Standards offers concise web development tutorials, tips, and tricks written by designers and developers who are passionate about web standards. More about us
A way of converting sequences of ones and zeros to characters, or the other way around, is called a character encoding. The files sent on the Web are still just sequences of ones and zeros, but the Web browser needs to know what characters those ones and zeros represent. So it needs to know which encoding to use.
The easiest method is by using the <meta> tag. To specify the document as UTF-8 (remember, you have to actually save your document as UTF-8 for this to work), use the following code:
<meta http-equiv="content-type" content="charset=utf-8" />
UTF-8 (8-bit Unicode Transformation Format) is a variable-length character encoding for Unicode created by Ken Thompson and Rob Pike. It is able to represent any universal character in the Unicode standard, yet is backwards compatible with ASCII. For this reason, it is steadily becoming the preferred encoding for email, web pages, and other places where characters are stored or streamed. In short, it supports more different characters then other standards, like ISO8859. This enables you to use Japanese (???????), Russian (????????????! ???) and English (hello world) in the same document.
Commenting is closed for this article.
James AkaXakA
: http://akaxaka.gameover.com/
14 April 2006, 10:11 : Permanent link to comment
It’s good to know that instead of a meta-tag, you can also simply have the server serve your HTML as UTF-8.
That’s what we do at BiteSizeStandards at least ;)
Hayo Bethlehem
: http://hayobethlehem.nl
14 April 2006, 10:14 : Permanent link to comment
Yup, which, in fact, is a much more stable way of doing it. Maybe you should write a bite about it? :)
Henrik Lied
: http://fourmargins.com
17 April 2006, 14:18 : Permanent link to comment
Character encoding should be specified on the server side, before the page hits the web browser. :)
Robert Wellock
: http://www.xhtmlcoder.com/beck/
19 April 2006, 03:14 : Permanent link to comment
For XHTML documents it is preferable to use the Server HTTP Content-Type headers or the xml declaration encoding attribute. For obvious reasons as it is really too late by the time it reaches the element, unless of course it’s only being served as text/html.
RQ
: http://rq.online.lt/
20 April 2006, 02:19 : Permanent link to comment
Just a small correction: “????????????! ???” should actually say “????????????, ???!”. :)
Evgeniy
:
20 April 2006, 02:33 : Permanent link to comment
Thougn I’m not sure about Japanese symbols, you have a typo in Russian variant. Here’s right one (in UTF-8 encoding):
@
D0B7 D0B4 D180 D0B0 D0B2 D181 D182 D0B2 D183 D0B9 2C 20 D0BC D0B8 D180 21
@
, which naturally looks like “??????????, ???!”. Notice a typo “?” letter instead of right one “?”; punctuation marks also replaced to be read right.
Found your site’s link as WaSP and really like it, thanks for your bites! :)