[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tags (was: RE: New Threads (Was...))



Gregory Leblanc wrote:

> 
> Let me rephrase then.  There's no need to spell our every single tag that is
> in the "permitted" group.  We simply say "here are the tags that you MUST
> have, here are the tags that you CANNOT have.  Feel free to use any tags not
> listed here however the DTD allows them to be used.  As an additional note,
> here are the tags that our search engine is "smart" about.  How's that
> sound?
> 

OK

> > As a way of starting this tags discussion off, I
> > have a text file with a hierarchical list of
> > DocBook tags automatically generated from DocBook
> > itself. The list has the form
> 
> Woo hoo, that's got to be HUGE.  ohmygosh, 65K.  Not that long, I guess,
> much shorter than parts of the SHR.  :-)

It is big, but it can be manipulated with Unix
text processing tools relatively easily.

> 
> [shhhnip]
> > It's 3345 lines long, and each line consists of
> > parent_tag child_tag
> >
> > For instance, above, if Article is open, Abstract
> > ... FormalPara and the following are legal
> > children. I haven't got the unreverted tags yet,
> > but it would be a similar, but shorter file.
> 
> Unreverted tags?  I'm not familiar with the usage of that word in this
> context (or, I just don't get it).

Reversion is going back to standard formatting.
For instance, if you have a Warning tag, you may
want the warning to display in 20 pt red, but
after the warning, you want the format to revert
to what it was.

What I haven't figured out is where the layout
tells you that after the first sect 2, only
sect2's are allowed. In sect1, before you do a
sect2, all sorts of tags are allowed. But after
the first sect2, it can only be followed by more
sect2s or the end tag for the sect1.
> 

> 
> Hmm, two ways to treat this.  The file that you sent me isn't really tags,
> it's complete structures.  Are we going to restrict based on structures (a
> huge set), or tags (a much smaller set)?

We don't need to be concerned with the fact that
the file is large. 1) we can sort and uniq it to
get a tag list, and do a utility that reads the
large file and ensures that our structure subset
is followed.

The large file is amenable to management with Unix
tex processing tools like sed, awk, and grep. See
later about DocBook updates.


> 
> I'm not so concerned about mistakes, but more about completeness, and the
> ability to not get stuck with bad design.  We do need to be able to adapt
> this to DocBook 4.0 and 5.0, as well as XML at some point.

The way to get completeness is to start off with a
complete list of DocBook tags and decimate them
until what remains is our set. I'm planning to
produce fgrep, sed, and awk scripts that automate
this process so that when new versions of DocBook
come out we can do most of it automatically. We
can use diff to highlight the new stuff.


> 
> Sounds cool.  Got any ideas on how to build such a thing?  Maybe somebody
> can build a starting list of keywords.  Later,

Yeah. It needs to be a database with an internet
interface. We populate it with a seed set of
keywords and let it grow from there. So it needs
an author interface that has two aspects - one a
browser/search interface, and the other an  add
interface. We need to get a volunteer that likes
both apache and the Linux database. I assume we're
using apache?

Eventually we hook it to the LDP search interface.
That way our public can look up keywords and ask
for a search based on one or several they like.
It's very similar to library search engines and
the database is very similar to those used by
on-line businesses.

Gary


--  
To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org