Iconv - library

links: |- index -|- home -| end

in page: preamble downloads encodings end

Preamble

2011-08-04: Converting from one character set to another can be difficult to understand for some - what is a character set? - but 'unix' has addressed this problem strongly. There exist a GNU project - libiconv - which addresses this. Microsoft (MS) address this issue through UNICODE, which works well on most MS windows machines, but is not completely cross-platform.

Just visiting the 'unicode' consortium site in various browsers, in various platforms will often show you the difference. But in general, the web, one way or another, now supports most of the world languages, or more precisely their character sets, and displays them in a 'native' form.

But I wanted to compile this library in WIN32, since several other project I compile needed this library. I chose the last stable source at the time - http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.13.1.tar.gz - circa 30-Jun-2009. I note this source is also now available through an anonymous checkout:

git clone git://git.savannah.gnu.org/libiconv.git [target_directory]

Over the years, I have developed a set of perl scripts, 'amsrcs' - see here - which do SOME of what the unix auto-tools do. That is read the 'configure.ac' file, and then the Makefile.am (or .in) set, to build up the project for MSVC. This 'amsrcs' script set outputs a MSVC6 build file set - a <project>.dsw, pointing the a number of DSP projects files. This can be loaded in just about any version of MSVC, and converted to the format it uses.

Of course these UNIX type sources include (a) a file generated during the auto-tool processing, 'config.h', and (b) several other 'standard' UNIX include headers. So in 'windows' I have hand prepared many of these, so as to avoid, if possible, having to 'amend' the source. Often these are just 'stubs', providing just 'enough' definitions to get the source compiles.

These will be included in the source zips provided. Feel free to replace, or amend these. These are just my 'estimation' of what is needed to successfully compile the source. And the MS runtime libraries chosen for this port is Multithreaded static - that is /MT and /MTd. Always take care to NOT mix runtimes.

As always, source and binaries supplied as is. No warranty for fitness of purpose is implied or intended. To the extent possible this WIN32 port is released under GPL version 2, or later, at your choice.

top

Downloads

WARNING: Take care with downloading, and using binaries from the web

Some downloads:
libiconf-src-01.zip - Full, modified SOURCE, excluding the 'msvc' folder
libiconf-win-01.zip - The 'msvc' folder, including the MSVC build files, and /MT static libraries in 'msvc/bin'
libiconv-1.13.1.tar.gz - Original source

Date	Link	Size	MD5
2011/08/04	libiconf-src-01.zip	5,127,253	6bea26eea7c9639437c922d2f741b035
2011/08/04	libiconf-win-01.zip	1,503,179	0178ba54035de74e8733adc754626c60
2010/09/21	libiconv-1.13.1.tar.gz	4,716,070	7ab33ebd26687c744a37264a330bbe9a

Usage:
- Download, and unzip the full source, preserving directories, into a folder of your choice,
but suggest say C:\Projects\libiconv-1.13.1, to show the version.
- In this folder, create a folder 'msvc', like libiconv-1.13.1> md msvc, and then cd msvc
- Download the 'win' zip, and unzip it, preserving directories, in this 'msvc' folder
- Load libiconv.sln (or libiconv.dsw) into your version of MSVC. Allow it to convert to its own format.
- Run MSVC to build all targets.

top

Encodings

From its web site, libconv suggests it supports ALL the following encodings, some through UNICODE support -

European languages: ASCII, ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16}, KOI8-R, KOI8-U, KOI8-RU, CP{ 1250, 1251, 1252, 1253, 1254, 1257 }, CP{ 850, 866, 1131 }, Mac{ Roman, CentralEurope, Iceland, Croatian, Romania}, Mac{ Cyrillic, Ukraine, Greek, Turkish}, Macintosh
Semitic languages: ISO-8859-{6,8}, CP{ 1255, 1256 }, CP862, Mac{ Hebrew, Arabic }
Japanese: EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP, ISO-2022-JP-2, ISO-2022-JP-1
Chinese: EUC-CN, HZ, GBK, CP936, GB18030, EUC-TW, BIG5, CP950, BIG5-HKSCS, BIG5-HKSCS:2001, BIG5-HKSCS:1999, ISO-2022-CN, ISO-2022-CN-EXT
Korean: EUC-KR, CP949, ISO-2022-KR, JOHAB
Armenian: ARMSCII-8
Georgian: Georgian-Academy, Georgian-PS
Tajik: KOI8-T
Kazakh: PT154, RK1048
Thai: ISO-8859-11, TIS-620, CP874, MacThai
Laotian: MuleLao-1, CP1133
Vietnamese: VISCII, TCVN, CP1258
Platform specifics: HP-ROMAN8, NEXTSTEP
Full Unicode: UTF-8, UCS-2, UCS-2BE, UCS-2LE, UCS-4, UCS-4BE, UCS-4LE, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE, UTF-32LE, UTF-7, C99, JAVA
Full Unicode, in terms of uint16_t or uint32_t (with machine dependent endianness and alignment): UCS-2-INTERNAL, UCS-4-INTERNAL
Locale dependent, in terms of `char' or `wchar_t' (with machine dependent endianness and alignment, and with OS and locale dependent semantics): char, wchar_t - The empty encoding name "" is equivalent to "char": it denotes the locale dependent character encoding.

When configured with the option --enable-extra-encodings, it also provides support for a few extra encodings:

European languages: CP { 437, 737, 775, 852, 853, 855, 857, 858, 860, 861, 863, 865, 869, 1125 }
Semitic languages: CP864
Japanese: EUC-JISX0213, Shift_JISX0213, ISO-2022-JP-3
Chinese: BIG5-2003 (experimental)
Turkmen: TDS565
Platform specifics: ATARIST, RISCOS-LATIN1

It can convert from any of these encodings to any other, through Unicode conversion.

top