Let's learn java programming language with easy steps. This Java tutorial provides you complete knowledge about java technology.

Saturday 17 December 2016

Unicode

 

Unicode

Unicode

Unicode is universal international standard character encoding that is capable of representing most of the written languages. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode standard has been adopted by such a industry leaders as Apple, HP, Microsoft, IBM, Oracle, SAP, Sun. Unicode is required by modern standards such as XML, Java, JavaScript.

Unicode is supported in many operating systems, all modern browsers and many other products.


Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing system. Developed in conjunction with the Universal Coded Character Set standard and published as the Unicode Standard, the latest version of Unicode contains a repertoire of more than 128,000 characters covering 135 moderns and multiple symbol sets.


Unicode can be implemented by different character encoding. The most commonly used encoding are UTF-8 and UTF-16. UTF-8 uses one byte for any ASCII character , All of which have the same code values in both UTF-8 and ASCII encoding and up to four bytes for other characters.



Why java uses Unicode System

In Unicode character hold 2 bytes, in java also uses 2 byte for characters.

lowest value:\u0000
highest value:\uFFFF 


UTF-8

UTF stands for Unicode Transformation Format. UTF-8 is a character encoding capable of encoding all possible characters, or code points and defined by Unicode. The encoding is a variable length and uses 8 bit code units. It was designed for backward compatibility with ASCII and to avoid the complication endianness and byte order marks in the alternative UTF-16 and UTF-32. The name is derived from the Unicode(or Universal Coded Character Set) Transformation Format.


UTF-16

UTF-16 is a character encoding capable of encoding of all 1,112,064 characters in Unicode. The encoding is a variable length as code point are encoded with one or two 16-bit code unit. UTF-16 developed from an earlier fixed width 16 bit encoding known as UCS-2 (for 2 byte Universal Character Set). Once it became clear 16-bits were not sufficient for Unicode community.

UTF-32

UTF-32 stands for Unicode Transformation Format in 32 bits. It is a protocol to encode Unicode points that uses exactly 32 bits per Unicode points. UTF-32 is a fixed length encoding, in contrast to all other Unicode Transformation Format, which are variable length encoding. Each 32 bit value in UTF-32 represent one Unicode code point and is exactly equal to that code's point numeric values.


What is ASCII?

ASCII(American Standard Code for Information Interchange) pronounced ask-ee. ASCII is a code for representing English characters as a numbers, with each letters assign a number from 0 to 127. for example the ASCII code for uppercase M is 77. Most computers uses ASCII codes to represent text, which make it possible to transfer data from one computer to another.

ASCII is an standard that an assigns letters, numbers, and other characters within the 256 slots available in the 8-bit code. The ASCII decimal(dec) number is created from binary, which is the language of all computer. for example the lowercase 'h' character (char) has a decimal value of 104 ,which is '01101000' in binary.


Share:

0 comments:

Post a Comment

Facebook Page Likes

Follow javatutorial95 on twitter

Popular Posts

Translate