Design (AKA Random Rants)

If this sounds like a gtk-doc defamation, I apologize. However, if I thouhgt everything is all right with gtk-doc, I would have never started writing yagdoc, so this essentially is a list of thing to do differently. The overall structure is very similar to gtk-doc (one reason for this is of course the intended gtk-doc compatibility):

Scan header files for declarations.
Scan source code for documentation.
Generate output.

Output means DocBook XML source here, that has to be further processed in order to obtain something human-friendly. The possibility to generate some kinds of output directly from the parsed declarations and documentation should be also considered.

Unhardcode Ugliness

Does the list of Gtk+ signal argument names belong to gtk-doc? Or the map of event signals to the particular GdkEvent subtypes they use? Is it necessary to add an exception to exclude gnome_keyring_item_info_get_type() to gtk-doc code?

The goal is not to deprive Gtk+ of nicely presented signal arguments and their types. But why other libraries could not have their signal arguments and their types nicely presented? And why should the addition of new signals with arguments to Gtk+ require gtk-doc updates?

Step 1 is of course to avoid non-generic infromation creep. Step 2 is then to implement mechanisms that enable all projects easily (pre|post)process the information and insert hooks where necessary.

Don't Distress Developers

The worst maintenance nightmare are gettext translations, gtk-doc documentation is close second. It is necessary to strictly separate computer-generated and manually written data.

Files written by yagdoc have to be disposable – no need to store them in a version control system and get conflicts when someone else generates something else, no need to distribute them (with the possible exception of the final format)
All files have to be written to the build directory (to support VPATH builds), source directory should be untouched with the exception of bootstrapping.
Bootstrapping and first build after a checkout should be sooth, without repetitions and manual interventions.

Having lots of optinally existing files that something sometimes attempts to create (having to guess in which directory) if they don't exist, this makes writing sensible Makefile rules hard. Keep the number of auxiliary files small, and if they appear in Makefile dependencies, ensure they exist after bootstrapping.

In particular (some of the following is implemented in recent gtk-doc too, often by me):

Templates are Evil and they will not be implemented. If there is a need of separate documentation writer and developer roles for API references, something else has to be invented. It is not clear what.
Sections are slavery, the default behaviour will be essentially gtk-doc's --rebuild-sections mode, more control over sorting declarations into sections will be possible with a set of regular-expression based rules. To split garray.h declarations into GArray, GPtrArray and GByteArray documentation about 2 × 3 = 6 rules will be likely needed.
Types are redundant, the equivalent of gtk-doc's --rebuild-types will be the only behaviour.
Overrides is something I'd rather see done in the headers using yagdoc the preprocessing mechanism.

Learn the Language

After several years of development and many bug reports, gtk-doc still has difficulties with basic C syntax: nested structures, sensitivity to line breaks and other ignorables, recognizing unsigned long as a type, forward declarations of enums, …. A different approach is evidently necessary.

While the standard Gnome/Gtk+/GLib code idioms must be recognized and supported, it should not really matter at which point one breaks lines in function protoypes. Regular expressions alone are not sufficient to parse a reasonably large subset of C, a standard recursive parser should be used instead. (Another facet of this issue is how the documentation of complex nested structures should be written and presented.)

In addition, the user should be able to easily teach yagdoc his local variations and conventions: constructs, that should be ignored, that can stand for const or extern.

Remove Repetition

We have the luxury of object serialization Just Working in Python. We also have the actual token lists and parse trees. Analysing the same text fragments again and again in different places – with possibly slightly different regular expressions – as happens in gtk-doc is then avoidable and has to be avoided.

A small downside of the use of serialization is that the rough equivalent of foo-decl.txt (containing more information than that though) will not be human readable. It is not meant to be human-edited and automated changes should be done by extending the build process with Python code. So the only uses are: tools that read it and humans wishing to look at it. Tools written in Python are encouraged to just deserialize it, other tools and humans can be served by a “pretty printer” that extracts information in text form. In fact, various dumpers are being aleady written for visualization of the in-core representation.

Commit to Configurability

The mechanism of defining variables in the Makefile, including another Makefile that tries to do something reasonable with them and passing dozens of options to the various tools has its limitations. It also makes the output silently dependent (i.e. this dependency does not appear in the Makefile rules) on the Makefile. Many people resign and customize gtk-doc.make directly.

While there's nothing wrong with such customization, we should be able to do better, and since the border between configuration and extension is fuzzy, use one mechanism for both.

Most configuration should be done in a configuration file. One file. One file consisting of Python code that can be directly imported by yagdoc and that can perform anything from setting simple variables to overrding and extending the stages of the generation process.
Setting variables in the Makefile still makes sense, namely for configure-determined parameters. It would be possible to turn the yagdoc configuration file into a configure template (.in), but that would be ugly. A better approach is to generate a file imported to the main configuration file. It is also possible to extract variable values from the Makefile in yagdoc.

Encourage Extensions

All assumptions (as oposed to hard facts, for instance about the C language) should be put into data instead of hardcoding them deep in some four-line regular expressions. And such data should be modifiable in the configuration file.

Documentation generation stages should be factored into overridable parts and/or allow user hooks at suitable places.

Many extensions have the form of new source code markup or documentation tags that are then processed somehow. While the processing will typically require to write some real code, the recognition code should not require any changes. For instance, noting that symbols are deprecated according to their occurence inside certain preprocessor conditionals is a mechanism that is equally hard to implement specifically for deprecation and for general preprocessor blocks.

Fail Friendly

Every failure that is not due to yagdoc bugs (or system failures), has its origin in the source files. We might not be able to pinpoint the primary cause but we can always point to the line where we noticed things are not right. Not emitting some standard error messages (for instance GCC-like) is a mortal sin.

Also, an “advantage” of Python against Perl is that essentially everything that gets wrong raises an exception and unbounded exceptions are fatal. The code just has to be right and handle bogus input with grace.

The parse-or-perish approach perhaps requires more effort, and sometimes unfortunately cooperation from the user, on the other hand, he will not need to wonder why his declarations are not picked up. Either they are, or they produce errors.

Interesting Ideas

See Gtk-doc Future for some interesting ideas. The key word is some: there are features that yagdoc naturally implements because that's the way it should work from the begining, some interesting extension suggestion, and cases of – probably bad weed you'd rather quit smoking.