In highschool, when I started programming, there was Turbo Pascal 7 from Borland. We were taught about two data types: Char and String. Char had a representation in memory of 1 byte. A string could be seen as a sequence of chars, an array of chars or just a text literal. When it comes to the visual representation of a string, it was simply used the code page 437 also known as DOS-US, where the lower 128 symbols are the ASCII symbols, except for the first 32 symbols that in ASCII are not printable, and the upper 128 symbols are a fancy selection. A string could have only 255 characters at most. To change a code point to a visual reprezentation (1 byte integer value to the corresponding 1 byte character) the built in function CHR was used. To change a visual representation to a code point the built in function ORD was used. The string was indexed starting with 1, the value at index 0 was the size of the string. This was it.
Later this kind of string become a ShortString and String was an alias of an AnsiString. Hopefully the Char remained unchanged, but was called an AnsiChar. The visual representation of the lower 128 symbols is the representation of the ASCII symbols. The visual representation of the upper 128 symbols, however depends of the code page used. In Windows the most popular is Windows-1252 (West European Latin). Other Windows codepages are: Windows-1250(Central and East European Latin), Windows 1251(Cyrillic), Windows-1253(Greek), Windows-1254(Turkish), Windows-1255(Hebrew), Windows-1256(Arabic), Windows-1257(Baltic), Windows-1258(Vietnamese) and Windows-874(Thai). AnsiString brought cool new features: size up to 2GB, reference counting, dynamic memory management, null termination. Because of all these, it was often more convenient to pass strings around than byte arrays.
In the last versions of Delphi, things got ugly. The Char type changed to WideChar, a 2-byte memory representation. The String type become a WideString, a sequence of WideChars. The 2-byte character now permits the represenation of Unicode Basic Multilingual Plane. The thing is that most text files from the harddrive are without a specified encoding or a Byte Order Mark(BOM), and that defaults to Windows-1252 in most cases. Reading such a file and displaying it in an Unicode program it's harder. Even worse is the fact that the CHR function don't work for the AnsiChar anymore, and it's pretty useless for WideChar too (as far as the Delphi help explains it). What saves the day is the typecasting to AnsiChar of an integer value, or typecasting to AnsiChar of a WideChar value with some data loss, or even an implicit typecasting to a WideChar from an AnsiChar. What is strange is that VCL components still use the Font and Code Page properties that were used before Unicode was introduced.










0 comentarii:
Trimiteţi un comentariu