On mySimon: The Art of Shaving Engraved Shaving Set
BNET Business Network:
BNET
TechRepublic
ZDNet

July 14th, 2008

Sneak peek at the new Unicode-friendly PHP6

Posted by Larry Dignan @ 3:12 am

Categories: General, Software Infrastructure, Web Technology

Tags: Unicode, PHP, Internationalization, Transliteration, Zmievski, Larry Dignan

The following is a guest post from Andrew Mager, associate technical producer at ZDNet. This dispatch is from the Bay Area PHP Meetup. He can be found at Andrewmager.com.

When Andrei Zmievski isn’t busy building infrastructure for a social gaming startup, or processing photos from his Nikon D3, he is compiling C code that will define PHP 6.

php1.jpg

He stopped by the old CNET Networks building last week to speak to the Bay Area PHP Meetup audience.

PHP6 will have Unicode support everywhere; in the engine, in extensions, in the API. It’s going to be native and complete; no hacks, no external libraries, no language bias. English is just another language, it’s not the primary language.

Unicode is a system that provides a unique number for every character. Its current version has 99,000+ characters, but it has the capacity for over 1 million+ characters.

Complete support of Unicode will prevent Mojibake, the phenomenon of incorrect, unreadable characters shown when computer software fails to render text correctly, according to its associated character encoding.

php2.jpg

We’ve all seen it, and it’s ugly.

PHP6 supports Unicode composition, so you can create new characters as languages evolve.

php31.jpg

Unicode simplifies development, but doesn’t solve all of the internationalization problems.

Internationalization is the design and development of an application without built-in cultural assumptions and that is efficient to localize. Time formats, currencies, sorting letters — there are lots of inconsistencies in the world.

PHP6 will have two string types: Unicode and Binary.

Unicode identifiers are allowed:

php4.jpg

Functions will understand how to read Unicode text.

Streams have built-in support for converting between Unicode strings and other encodings on the fly. ex: fopen(’textfile.txt’), fread(’something.txt’);

In version 6, PHP will be much easier to use across different languages. For instance, this is how simple it is to grab a Chinese news feed, parse the first five stories, clean it up, and convert it to JSON.

php5.jpg

Then you can easily display it on the web in any format you want.

TextIterator is a new feature of PHP 6: you can iterate over code points, characters, graphemes, words, lines, sentences, both forwards or backwards. It makes truncating much easier.

Transliteration allows you to take names written in Japanese and translate them into Latin so you can pronounce it. It only takes two lines of code in PHP 6:

php6.jpg

pecl/intl will be bundled in PHP 5.3. It’s complementary to the Unicode support.

Other cool features of PHP 6 include number collation, formatting numbers, “message formatting”, APC bundled, closures, traits, 64-bit integer type, a new MySQL driver, and general cleanup.

Zmievski says the new language will hopefully be ready by March of 2009. Here is a link to the full presentation.

Larry DignanLarry Dignan is Editor in Chief of ZDNet and Editorial Director of ZDNet sister site TechRepublic. See his full profile and disclosure of his industry affiliations.

For daily updates, follow Larry on Twitter.

Email Larry Dignan

Subscribe to Between the Lines via Email alerts or RSS.

  • Talkback
  • Most Recent of 2 Talkback(s)
RE: Sneak peek at the new Unicode-friendly PHP6
I think 5.3 comes out very soon. (Read the rest)
Posted by: magerleagues1 Posted on: 07/14/08 You are currently: a Guest | | Terms of Use
More interested in 5.3....  storm14k | 07/14/08
RE: Sneak peek at the new Unicode-friendly PHP6  magerleagues1 | 07/14/08

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

advertisement

Recent Entries

advertisement

Archives

Favorite Links

ZDNet Blogs

White Papers, Webcasts, and Downloads

Meet Doc

  • Here to help you with your Document Management Needs
  • Doc is an enigma. Born to a Russian ballerina and a German electrical engineer, he grew up in various locations in the United States. He’s seen the insides of more brands, versions, and generations of printer and printer-related hardware than almost anyone.
  • To learn more about this mysterious figure check out his blog on ZDNet and his Workspace on TechRepublic. You’ll be glad you did.
  • Produced by
    ZDNet and