About
EncodingSleuth Text is a simple and powerful Java API your programs can
use to determine whether a byte stream or file contains encoded text,
and to identify the best charset/decoder to use to decode those bytes
into text.
EncodingSleuth Text version 1 has these features:
-
It works with Unicode versions 4.1.0, 5.0.0, and 5.1.0 (all three
versions are included). It also can work with custom Unicode versions
you provide.
-
It works with a list of potential charsets configured by your program,
improving efficiency by letting you eliminate charsets you will never
need.
-
It provides several detectors which analyze different aspects of each
potential decoding in order to rank them according to which is best.
Your program may use any or all of the detectors, or even create custom
detectors, in order to achieve the best results for your application.
-
It provides your program with a scored list of potential decoders for
your data. The list order is based on the relative "goodness" of each
decoder for decoding the bytes; the best decoder is at the top of the
list.
For details, see
Theory of Operation
.