[sisyphus] IQ: enca -- charset guesser
Michael Shigorin
=?iso-8859-1?q?mike_=CE=C1_lic145=2Ekiev=2Eua?=
Чт Мар 28 23:52:47 MSK 2002
Здравствуйте.
Разгребая ~/Download (и дописывая шестидесятую строчку в
~/ALT/TODO :), наткнулся на любопытную штучку -- enca:
---
Name : enca Relocations: /usr
Version : 0.9.3 Vendor: Trific soft.
Release : 1 Build Date: Thu Mar 28 22:10:23 2002
Install date: Thu Mar 28 22:18:09 2002 Build Host: work.fair.net
Group : Applications/Text Source RPM: enca-0.9.3-1.src.rpm
Size : 173998 License: GNU GPL v2
Packager : David Necas (Yeti) <yeti на physics.muni.cz>
URL : http://physics.muni.cz/~yeti/software/enca.shtml
Summary : A program that guesses encoding of text files.
Description :
Enca (Extremely Naive Charset Analyser) is a simple utility guessing
encoding of text files and optionally converting them to some other
encoding using either a built-in convertor, a system conversion library
or an external conversion program. Currently, it has support for Czech,
Slovak, Russian and some multibyte encodings (mostly variants of Unicode)
independent on language.
Install Enca if you need to cope with text files of dubious origin
and unknown encoding and convert them to some reasonable encoding.
---
Есть подозрение, что в постмастерскую эпоху я буду это дело
собирать в Sisyphus (уж больно понравилось), а пока есть
неформальное предложение всем заинтересованным посмотреть и
оценить возможности вкручивания в существующий софт (навскидку --
у меня сейчас прикручен в mc view последним дефолтом подобный
конвертор -- собственно, я как-то его упоминал). Оно довольно
умное:
---
Usage: enca [-L LANGUAGE] [OPTION]... [FILE]...
Guess encoding of text files and convert them if required.
Output type selectors:
-d, --details print detailed information about how the guess was made
-e, --enca-name print enca's encoding name (passed to convertors)
-f, --human-readable print full (descriptive) encoding name (default)
-i, --iconv-name print how iconv calls the encoding
-r, --rfc1345-name print RFC 1345 (or otherwise canonized) encoding name
-s, --cstocs-name print how cstocs calls the encoding
-n, --name=WORD print required name (enca-name, human-readable, etc.)
-x, --convert-to=ENC convert file to some other encoding ENC
Guessing parameters:
-L, --language=LANG set language of FILEs---obligatory, when cannot be
determined from locale settings
-m, --no-short-message turn off short message mode, reset defaults
-M, --short-message turn on short message (ambiguous) mode
-R, --max-chars=NUM set maximum number of bytes read from input file
-S, --significant=NUM set required number of significant characters
-T, --threshold=FLOAT set threshold (the smallest allowed ratio between the
most probable encoding and the second most probable)
-u, --multibyte try multibyte encodings too (default)
-U, --no-multibyte don't try multibyte encodings (somewhat faster)
Conversion parameters:
-E, --external-convertor-program=PATH
set external convertor program name (default: )
-C, --try-convertors=LIST convertors to be tried (associative)
(default: built-in,iconv)
General options:
-p, --with-filename print the file name for each result
-P, --no-filename suppress the prefixing filename on output
-V, --verbose increase verbosity level
Listings:
-G, --license print full enca license (GNU GPL v2) and terminate
-h, --help print this help and terminate
-l, --list=WORD print required list (built-in-encodings, convertors,
encodings, languages, lists, names, surfaces)
and terminate
-v, --version print version and build information and terminate
With no FILE, read standard input and possibly write converted stream to
standard output. Exit status is 0 if all files were successfully proceeded,
1 if some were not recognized or converted, 2 in troubles.
Report bugs to <yeti на physics.muni.cz> (please include `enca' in subject).
---
Украинский/белорусский там в данный момент не поддерживаются --
но, судя по описанию, это дело техники.
--
---- WBR, Michael Shigorin <mike на altlinux.ru>
------ http://visa.chem.univ.kiev.ua/~mike/
----------- следующая часть -----------
Было удалено вложение не в текстовом формате...
Имя : =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Тип : application/pgp-signature
Размер : 232 байтов
Описание: =?iso-8859-1?q?=CF=D4=D3=D5=D4=D3=D4=D7=D5=C5=D4?=
Url : <http://lists.altlinux.org/pipermail/sisyphus/attachments/20020328/6e69511e/attachment-0012.bin>
Подробная информация о списке рассылки Sisyphus