Submitted by Arancaytar on
Looking at the parser cache status, you can see that the archive are nearly completely parsed and stored in the database now.
The posts that are still missing are less than 100 now - at least the ones I don't know anything about.
I'm going through the archives right now, trying to match the topic length set in the topic index to the number of posts actually in the post index. Fortunately my database schema violates 3NF by storing the topic length separately - the anomalies resulting from this are the only way I can check whether the archive is complete.
When I come across a topic whose parsed post count is lower than its length (it invariably is), it matches one of four categories:
1. There's a page that wasn't parsed yet. Easy; just visit it in the browser.
2. It was parsed before the last bugfix in the parser. Its parsed posts need to be deleted, then it must be parsed again.
3. Its topic length is wrong because several posts where deleted. The topic length must be adjusted in the database.
4. And this is the most common: Someone whose account was eaten by UBB posted in this topic, showing as a guest post. These must be inserted in the error index and removed from public view until my parser can handle guest posts.
So I'm going through my list of anomalies and identifying the problem that causes each. The list is still long at about 89 topics, but it's steadily shrinking...
The posts that are still missing are less than 100 now - at least the ones I don't know anything about.
I'm going through the archives right now, trying to match the topic length set in the topic index to the number of posts actually in the post index. Fortunately my database schema violates 3NF by storing the topic length separately - the anomalies resulting from this are the only way I can check whether the archive is complete.
When I come across a topic whose parsed post count is lower than its length (it invariably is), it matches one of four categories:
1. There's a page that wasn't parsed yet. Easy; just visit it in the browser.
2. It was parsed before the last bugfix in the parser. Its parsed posts need to be deleted, then it must be parsed again.
3. Its topic length is wrong because several posts where deleted. The topic length must be adjusted in the database.
4. And this is the most common: Someone whose account was eaten by UBB posted in this topic, showing as a guest post. These must be inserted in the error index and removed from public view until my parser can handle guest posts.
So I'm going through my list of anomalies and identifying the problem that causes each. The list is still long at about 89 topics, but it's steadily shrinking...