123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165 |
- Lynx CHARTRANS
- Features (in addition to those which Lynx 2.7.1 already has):
- - Can (attempt to) translate from any document charset to any display
- character set, *IF* the document charset is known by a translation
- table (compiled in at installation).
- - New method to define character sets: used for input charset as well
- as display character set, translation tables compiled in from
- separate files (one per charset). One table is designated as default
- and can be used for fallback translation to 7-bit replacements for
- display.
- - New method for specifying translations of SGML entities.
- - Unicode (UTF-8) support: can (attempt to) decode and translate UTF-8 to
- display character set, or pass through UTF to display (if terminal
- or console understands UTF-8). [raw display of UTF only tested with Slang
- so far, does not always position everything correctly on screen]
- - Support for CHARSET attribute on A tag (and sometimes LINK), as in HTML
- i18n RFC 2070 and W3C HTML 4.0 drafts. A link can suggest the target's
- charset in this way.
- - Support for ACCEPT-CHARSET attribute of FORM tags.
- - EXPERIMENTAL, currently enabled only for Linux console:
- can (attempt to) automatically switch terminal mode and load new
- code pages on change of display character set.
- - some minor changes: sometimes invalid characters were displayed in a hex
- notation Uxxxx (helps debugging, but I also regard it as at least not
- worse than showing the wrong char without warning), now they are not
- displayed to reduce garbage.
- Additions/changes to user interface:
- - many new Display Character Sets are available on O)ptions screen.
- (One can use arrow keys, HOME, END etc. for cycling through the list
- or use selection from popup box, as for other options.)
- - new command line flags:
- -assume_charset=... assume this as charset for documents that don't
- specify a charset parameter in HTTP headers
- -assume_local_charset=... assume this as charset of local file
- -assume_unrec_charset=... in case a charset parameter is not recognized;
- docs also available as ASSUME_CHARSET etc. in lynx.cfg
- In "Advanced User" mode, ASSUME_CHARSET can be changed during a session
- from the Options Screen.
- - The "Raw" toggle (from -raw flag, '@' key, or Options screen)
- o toggles the assumption "Default remote charset is same as Display
- Character Set" on or off.
- Toggling of the assumed charset is between Display Character Set and
- the specified ASSUME_CHARSET or, if they are the same, between the
- specified ASSUME_CHARSET and ISO-8859-1.
- o The default for raw mode now depends on the Display Character Set as
- well as on the specified ASSUME_CHARSET value.
- o should work as before for CJK charsets (turning CJK-mode on or off).
- o If the effective ASSUME_CHARSET and the Display Character Set are
- unchanged from the ISO-8859-1 default, toggling "Raw" may have some
- additional effect for characters that can't be translated.
- (Try the "Transparent" Display Character Set for more "rawness".)
- Requirements: same as for Lynx in general :)
- The chartrans code is now merged with Wayne Buttle's changes for
- 32-bit MS Windows and DOS/DJGPP, with Thomas Dickey's and Jim Spath's
- emerging auto-configure mechanism, and with BUGFIXES from Foteos
- Macrides. See the accompanying file CHANGES for the current
- status.
- A warning:
- In some cases undisplayable bytes may still get sent to the terminal
- which are then interpreted as control chars, there is no protection
- against if strange things are defined in the table files.
- HOW TO INSTALL:
- (4) before compiling:
- Check top level makefile or Makefile and userdefs.h as usual.
- NOTE that there is a new "#define" in userdefs.h for MAX_CHARSETS
- near the end (in "Section 3.").
- (5) Building Lynx:
- Compiling the chartrans code is now integrated into the normal
- installation procedures for UNIX (configure script) and other
- platforms.
- What's supposed to happen (in addition to the usual things when
- building Lynx): in the new subdirectory src/chrtrans, make should
- first compile the auxiliary program `makeuctb', then invoke that
- program to create xxxxx_yyy.h files from the provided xxxxx_yyy.tab
- translation table files. (See README.* files in src/chrtrans for
- more info.)
- If all goes well, just invoking make from the top-level Lynx dir
- as usual should do everything automatically. If not, the makefiles
- may need some tweaking... or:
- (6) Some things to look at if compilation fails:
- In src/chrtrans/UCkd.h there is a typedef for an unsigned 16bit
- numeric type which may need to be changed for your system.
- See comment near top there.
- For recompiling Lynx, `make clean' should not be necessary if only
- files in src/chrtrans have been changed. On the other hand
- may not propagate to the src/chrtrans directory (depending how things
- are going with auto-config), you may have to cd to that directory
- and `make clean' there to really clean up there.
- (7) To customize (add/change translation tables etc.):
- See README.* files in src/chrtrans.
- Make the necessary changes there, then recompile.
- (A general `make clean' should not be necessary, but make sure
- the ...uni.h file in src/chrtrans gets regenerated.)
- Note that definition of new character entities (if e.g., you want
- Lynx to recognize Ž) are not covered by these table files,
- they have to be listed in entities.h.
- _If you are on a Linux system_ and using Lynx on the console (i.e.
- not xterm, not a dialup *into* the Linux box), you can compile
- with -DEXP_CHARTRANS_AUTOSWITCH. This is very useful for testing
- the various Display Character Sets, Lynx will try to automatically
- change the console state. You need to have the Linux kbd package
- installed, with a working `setfont' command executable by the user,
- and the right font files - check the source in src/UCAuto.c for
- the files used and/or to change them!
- NOTE that with this enabled,
- - Lynx currently will not clean up the console state at exit,
- it will probably left like the last Display Character Set you used.
- - Loading a font is global across _all_ virtual text consoles, so
- using Lynx (compiled with this flag) may change the appearance of
- text on other consoles (if that text contains characters
- beyond US-ASCII).
- (8) Some suggested Web pages for testing:
- <URL: http://www.tezcat.com/~kweide/lynx-chartrans/test/>
- <URL: http://www.isoc.org:8080/>,
- especially
- <URL: http://www.isoc.org:8080/liste_ml.htm>.
- <URL: http://www.accentsoft.com/un/un-all.htm>
- (9) Please report bugs, unexpected behavior, etc.
- to <lynx-dev@nongnu.org>.
- Suggestions for improvement would be welcome, as well as
- contributed translation tables (for stuff that is not available
- at ftp://dkuug.dk or ftp://ftp.unicode.org).
- KW 1997-11-06
|