TechTalkz.com LogoAsk the Expert

Go Back   TechTalkz.com Technology & Computer Troubleshooting Forums > Tech Support Archives > Programing Languages > PHP

Notices

Character Set Question

PHP


Reply
 
Thread Tools Display Modes
Old 29-11-2007, 01:30 AM   #1
Zach
Guest
 
Posts: n/a
Character Set Question

Adrian Nievergelt qrote:

"...The only problem with UTF-8 is that some operating systems .... have
no or hardly sufficient support. About every modern system can handle
unicode though..."

Question:

is iso-8859-1 unicode?
is utf-8 unicode?

What is unicode?

Zach.
  Reply With Quote
Old 29-11-2007, 02:33 AM   #2
Michael Fesser
Guest
 
Posts: n/a
Re: Character Set Question

..oO(Zach)

>Adrian Nievergelt qrote:
>
>"...The only problem with UTF-8 is that some operating systems .... have
>no or hardly sufficient support. About every modern system can handle
>unicode though..."
>
>Question:
>
>is iso-8859-1 unicode?


No.

>is utf-8 unicode?


No. But UTF-8 is an encoding for Unicode, where all characters are
encoded as a sequence of 1 to 4 bytes.

>What is unicode?




Micha
  Reply With Quote
Old 29-11-2007, 04:36 AM   #3
Zach
Guest
 
Posts: n/a
Re: Character Set Question

"UTF-8 is not catered for properly by "some operating systems"
"Every system can handle Unicode"
"ISO-8859-1 isn't Unicode"
"UTF-8 isn't Unicode"
"UTF-8 is an encoding for Unicode"
+ ---------------------------------
Add this together and the outcome is
.oO(Mich)

Zach.


Michael Fesser wrote:
> .oO(Zach)
>
>> Adrian Nievergelt qrote:
>>
>> "...The only problem with UTF-8 is that some operating systems .... have
>> no or hardly sufficient support. About every modern system can handle
>> unicode though..."
>>
>> Question:
>>
>> is iso-8859-1 unicode?

>
> No.
>
>> is utf-8 unicode?

>
> No. But UTF-8 is an encoding for Unicode, where all characters are
> encoded as a sequence of 1 to 4 bytes.
>
>> What is unicode?

>
>
>
> Micha

  Reply With Quote
Old 29-11-2007, 11:30 AM   #4
Michael Fesser
Guest
 
Posts: n/a
Re: Character Set Question

..oO(Zach)

> "UTF-8 is not catered for properly by "some operating systems"
> "Every system can handle Unicode"
> "ISO-8859-1 isn't Unicode"
> "UTF-8 isn't Unicode"
> "UTF-8 is an encoding for Unicode"
> + ---------------------------------
> Add this together and the outcome is


Is what?

It's really not that complicated. Actually I don't care about systems
that can't handle Unicode, even the old NN4 can handle most of it. So I
use it in all of my recent web projects without exceptions: From the
database to my scripts to the final HTML pages - it's all UTF-8, which
really makes things much easier (for example no ugly HTML character
references anymore, except for a few special chars).

Some words to the last two points from the list above: Simply spoken
Unicode itself just assigns a number (a code point) to any character
that's part of the standard. Until now there are nearly 100.000(!) chars
registered, more than a million are currently possible. But of course
now you have to find a way to transfer all these different numbers/code
points to a client (a browser for example) in an efficient way.

That's where the different encodings come into play. UTF-32 for example
uses 32 bit (4 bytes) for all characters. This has the advantage of an
equal size of every character in a string, but of course it wastes a lot
of memory. UTF-8 on the contrary uses a variable char length. The most
important characters (the entire ASCII charset) are encoded with just a
single byte, all other characters require two or more bytes (up to 4).
It still allows to display characters from the entire Unicode space.

So Unicode is one thing, the used transfer encoding another.

Micha
  Reply With Quote
Old 29-11-2007, 03:28 PM   #5
Zach
Guest
 
Posts: n/a
Re: Character Set Question

Micha,

Thank you for the explanation!

Zach

Michael Fesser wrote:
> .oO(Zach)
>
>> "UTF-8 is not catered for properly by "some operating systems"
>> "Every system can handle Unicode"
>> "ISO-8859-1 isn't Unicode"
>> "UTF-8 isn't Unicode"
>> "UTF-8 is an encoding for Unicode"
>> + ---------------------------------
>> Add this together and the outcome is

>
> Is what?
>
> It's really not that complicated. Actually I don't care about systems
> that can't handle Unicode, even the old NN4 can handle most of it. So I
> use it in all of my recent web projects without exceptions: From the
> database to my scripts to the final HTML pages - it's all UTF-8, which
> really makes things much easier (for example no ugly HTML character
> references anymore, except for a few special chars).
>
> Some words to the last two points from the list above: Simply spoken
> Unicode itself just assigns a number (a code point) to any character
> that's part of the standard. Until now there are nearly 100.000(!) chars
> registered, more than a million are currently possible. But of course
> now you have to find a way to transfer all these different numbers/code
> points to a client (a browser for example) in an efficient way.
>
> That's where the different encodings come into play. UTF-32 for example
> uses 32 bit (4 bytes) for all characters. This has the advantage of an
> equal size of every character in a string, but of course it wastes a lot
> of memory. UTF-8 on the contrary uses a variable char length. The most
> important characters (the entire ASCII charset) are encoded with just a
> single byte, all other characters require two or more bytes (up to 4).
> It still allows to display characters from the entire Unicode space.
>
> So Unicode is one thing, the used transfer encoding another.
>
> Micha

  Reply With Quote
Reply

Thread Tools
Display Modes



< Home - Windows Help - MS Office Help - Hardware Support >


New To Site?Need Help?

All times are GMT +5.5. The time now is 08:39 AM.


vBulletin, Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO
Copyright © 2005-2010, TechTalkz.com. All Rights Reserved - Privacy Policy
Valid XHTML 1.0 Transitional