a2x, manpages, html entities

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

a2x, manpages, html entities

f.holop
hi there,

the port i am working on uses a2x to turn text files into man pages.
i have noticed that the generated man pages are full of html entity
codes for characters like "'" and others.

for example:

Don't

is turned into:

Don’t


the files were generated with:

/usr/local/bin/a2x.py -d manpage  -f manpage file.txt


how can i make this go away?  i see other ports using a2x
but in more creative ways like a2x + xmlto.

is using only a2x for generating man pages broken?

-f
--
suddenly, nothing happened!  but, it happened suddenly.

Reply | Threaded
Open this post in threaded view
|

Re: a2x, manpages, html entities

f.holop
hmm, on Wed, Oct 19, 2011 at 12:18:55AM +0200, frantisek holop said that

> hi there,
>
> the port i am working on uses a2x to turn text files into man pages.
> i have noticed that the generated man pages are full of html entity
> codes for characters like "'" and others.
>
> for example:
>
> Don't
>
> is turned into:
>
> Don’t
>
>
> the files were generated with:
>
> /usr/local/bin/a2x.py -d manpage  -f manpage file.txt
>
>
> how can i make this go away?  i see other ports using a2x
> but in more creative ways like a2x + xmlto.
>
> is using only a2x for generating man pages broken?

i have also tried the flow:

$ asciidoc -d manpage -b docbook -o file.xml - < file.txt
$ xmlto man file.xml

but the html entities are still there in file.8

in the debian generated man page for the same port i see:

.\" -----------------------------------------------------------------
.\" * Define some portability stuff
.\" -----------------------------------------------------------------
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.\" http://bugs.debian.org/507673
.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '

and "Don't" looks like this:

Don\(cqt

-f
--
the borg assimilated my race & all i got was this t-shirt

Reply | Threaded
Open this post in threaded view
|

Re: a2x, manpages, html entities

Ingo Schwarze
Hi Frantisek,

frantisek holop wrote on Wed, Oct 19, 2011 at 12:44:23AM +0200:

> the port i am working on uses a2x to turn text files into man pages.

Definitely a very bad idea.

I just installed the asciidoc port to have a brief look at it.

 * That tool starts from an input format that provides some physical
   markup facilities, almost no semantic markup facilities, and by using
   a very heterogeneous syntax, makes it hard to distinguish markup from
   normal text.

 * It converts that input format to man(7) output - remeber that
   man(7) can already be considered a legacy format.

 * The quality of the man(7) output is low, it looks rather ugly.

 * For no good reason, the man(7) code is intermixed with several
   low-level roff(7) requests, thus of limited portability.

 * The thing is slow as hell.
   On my notebook, formatting the asciidoc(1) manual takes:
    - 10 milliseconds with mandoc(1)
    - 50 milliseconds with groff(1)
    - 3500 milliseconds with a2x(1)

 * Documentation is scarce and vague.
   Right, there is a web site with a cheatsheet, an FAQ,
   and scattered other stuff.

> i have noticed that the generated man pages are full of html entity
> codes for characters like "'" and others.
>
> for example:
>
> Don't
>
> is turned into:
>
> Don&#8217;t

Sure, looks like a bug in a2x(1), that's certainly invalid man(7).
It's probably due to the fact that going from txt to man format,
a2x(1) takes the detour via docbook.

> the files were generated with:
>
> /usr/local/bin/a2x.py -d manpage  -f manpage file.txt
>
> how can i make this go away?

I'd say don't waste your time on docbook-related issues.
Docbook is notorious for being complicated, bug-ridden,
and for producing low-quality output.

The output from that toolchain will look bad and break often
in any case.  Just live with it.  "Don&#8217;t" looks bad,
no doubt, but people will be able to understand it.

> in the debian generated man page for the same port i see:
>
> .\" -----------------------------------------------------------------
> .\" * Define some portability stuff
> .\" -----------------------------------------------------------------
> .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> .\" http://bugs.debian.org/507673
> .\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
> .\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> .ie \n(.g .ds Aq \(aq
> .el       .ds Aq '

No idea where that is coming from, too little context.

> and "Don't" looks like this:
>
> Don\(cqt

Huh?  Above you say \(aq, here \(cq, now which one is it?
Anyway, the groff-1.21 we have in ports deals well with both of
them, as does mandoc(1).

Yours,
  Ingo