A brief (very loose) interpretation of the history of Unicode
In the beginning there were a bunch if people in the basement of some computer lab somewhere who needed to type words on a computer screen. The realized that with a 7 bit value they would have 128 options in which they could put the entire English alphabet. This included upper and lower case, numbers, punctuation, and whitespace characters. They called it ASCII (American Standard Code for Information Interchange) and they saw it was good.
One day they realized that they wanted to
But then the heard the cry's of their friends in countries that spoke languages like Japanese, Chinese, and Hebrew who wanted to
Notes about Unicode and SQL Sever
- There are lots of variants/encoding for Unicode based on my experimentation it appears that SQL server supports UTF-8 encoding with SQL_Latin1_General_CP1_CI_AS collation.
- You need to use a Unicode data types (nchar, nvarchar, ntext)
- You need to prefix your string with the UPPERCASE letter N (ex. N'Some Unicode')
- All of the lower 256 ASCII characters are compatible with the Unicode (its backwards compatible)
- If you try to cast a Unicode char (not part of the 256 ASCII characters) as ASCII you will get a '?' and there is not way to tell if its not actually a question mark.
- Unicode data takes up twice the space that ASCII data takes up. ( varchar(8000) = nvarchar(4000) )
- Some of applications cannot import Unicode data files
- The way your text editor knows that a file has Unicode data is that the first character has special value (that varies depending of the type of encoding).
- Microsoft says that Unicode data processes faster.
Use this code snippet to have 'fun' with unicode
Comments, corrections, clarifications are always appreciated in the comments.