Signatures, an ancient evil.

Error message

  • Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/www/pied-piper.ermarian.net/includes/common.inc).
  • Deprecated function: The each() function is deprecated. This message will be suppressed on further calls in menu_set_active_trail() (line 2405 of /var/www/pied-piper.ermarian.net/includes/menu.inc).
Perhaps you are aware that the archive has a profound weakness in its database schema, which badly violates the rules of normalization. Not only are signatures not normalized over users and instead stored post-by-post, they are actually inseparable from the post.

The reason is insufficient delimiting security. A post can contain such text that part of it could conceivably be a signature even though it isn't. The signature itself can also contain text such that part of it could belong to the post.

ACTUAL POST START

My post text.
--
Fake signature

ACTUAL POST END
--
ACTUAL SIG START

Fake post
--
Signature

ACTUAL SIG END


As in our example, "--" is the only real delimiter, there is no way of knowing where the post ends and where it begins. It is possible that the "Fake signature" is already a part of the sig, or that everything up to "Signature" is part of the post, and even that the post has no signature at all.

Rather than risk losing data, I have opted for doing what even the original forum seems to do: Store the delimited version and pretend all signatures are typed in by the poster. I'm not kidding.

The problem begins when you wish to search the database for a word contained in a signature, naturally. There are minor issues like making the edit line show up between post and signature (not too difficult, but stupid), but those are not the issue.

In the interest of normalization, I plant to install a guessing system that tries to identify a signature by its re-use. Not a guaranteed process, but it should work most of the time. Signatures will be checksummed and aggregated in a special table; the program will then use this aggregation to determine the signature part of the post.

This will likely be postponed until next summer, but I want to jot it down here lest I forget.