Pied Piper Project 3: Revenge of the Boards

Error message

Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/www/pied-piper.ermarian.net/includes/common.inc).

Pages

AuthorTopic: Pied Piper Project 3: Revenge of the Boards
Law Bringer
Member # 2984
Profile Homepage #0
(subtitle in continuing the StarWars theme of "PPP2: Boards Strike Back", in case anyone is wondering. Some people are dense.)

The script is partly done. One of the most crucial parts was finding a way to browse through a forum and find out what threads there actually are - because by simply starting at 1 and going up to about 3500, the script would copy thousands of "thread doesn't exist pages", also, there are some gaps where topics were deleted.

So I've finished a function that gets all index pages of a forum (all 51 of the ones on General, for starters), and finds the thread numbers and corresponding titles.

Here's a taste of what this looks like in a database:

+----------+----------+-------------------------------------------------+
| topic_id | forum_id | title |
+----------+----------+-------------------------------------------------+
| 2596 | 1 | The Ultimate "Favourite Game" Poll |
| 2744 | 1 | Dungeon Siege |
| 2736 | 1 | What the heck? |
| 2745 | 1 | Time for a change? |
| 2741 | 1 | SW has a new untitled Guru |
| 2738 | 1 | Did you miss me? |
| 2746 | 1 | Campaign Against Illiteracy: Chapter I - Piracy |
| 2743 | 1 | My thoughts on hex editing and keygens... |
| 2728 | 1 | limewire pro should i get it or not |
| 2732 | 1 | If you could meet one person off this board.. |
+----------+----------+-------------------------------------------------+
In the second step, the script will try to copy all these threads to html pages in numbered folders.

Later on, when I have more time, I will also look into inserting these posts into the database as well (which I started at some point, but didn't manage), so they can be full-text searched. For now, the html pages will have to do; after all, the script can parse those html pages later, they only need to be copied for now.

As of this moment, people DO NOT NEED to manually save threads.

Since good old Murphy is always ready to make things interesting, and I have the most stressed week of this year ahead of me, and the script is not yet quite done, it is entirely possible this will change.

If at any point I am unable to save the threads because of technical or time-related problems, I will give notice in time.

In time means Wednesday evening, GMT (early midday in America). By that time, I will either have copied all threads that need to be copied, or I will have posted here and said I couldn't and they need to be saved manually.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Off With Their Heads
Member # 4045
Profile Homepage #1
I thought you were a great sucker or something. If all else fails, couldn't you just do what you did last time and use site-sucking software to download the boards again?

--------------------
Arancaytar: Every time you ask people to compare TM and Kel, you endanger the poor, fluffy kittens.
Smoo: Get ready to face the walls!
Ephesos: In conclusion, yarr.

Kelandon's Pink and Pretty Page!!: the authorized location for all things by me
The Archive of all released BoE scenarios ever
Posts: 7968 | Registered: Saturday, February 28 2004 08:00
...b10010b...
Member # 869
Profile Homepage #2
Last time someone did that, the server mistook it for a DoS attack and automatically IP-banned him for an hour.

[ Monday, January 09, 2006 12:00: Message edited by: Thuryl ]

--------------------
The Empire Always Loses: This Time For Sure!
Posts: 9973 | Registered: Saturday, March 30 2002 08:00
Law Bringer
Member # 2984
Profile Homepage #3
Before I do that, I'd rather save it manually. It was a bugger getting it to work, and I still haven't managed to get that archive properly index.

Basically, the files look like this:

www.ironycentral.com/.../index.9fagg8g787.html
www.ironycentral.com/.../index.bux938xns3.html
www.ironycentral.com/.../index.65fghcdnje.html
www.ironycentral.com/.../index.8r8z8ke9c8.html

The program apparently uses an alphanumeric hash to save all these pages that have the same filename (index.php), and it would take days to put it into a format like

/1/1775/page1.html

etc. It is a very useful program when you want to save a lot of static pages with distinct filenames, but a forum where all information is from the one index file doesn't work as well.

Edit: Thuryl, that was Djur, not me. He tried to use wget but apparently set the time limit between requests too low. I've used wget myself since, and there was no problem.

[ Monday, January 09, 2006 12:06: Message edited by: The Piper of Hamlin ]

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Law Bringer
Member # 2984
Profile Homepage #4
The coding marathon is coming along nicely. I just found out why single-paged topics cannot be saved with the script - I try to count pages via the "This topic comprises..." line above the posts table, and this is not displayed if the topic only has one page.

In short, I'm a bit of a moron. But I'll fix it (not the moron bit, the script.) First I need to get myself a pizza though.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Law Bringer
Member # 2984
Profile Homepage #5
Eureka. Or Heureka, it depends on where you're from.

I am as of this moment merrily archiving away at well over 10 posts a second - already 5000 posts archived while writing -, and therefore expect to be done within an hour. Except I don't have that much time.

So I'll stick around to see if my script runs into any errors in the next couple of minutes, and then leave it in peace while it spiders the entire General forum all on its own.

Don't worry about posting new threads or replying to old ones. All threads that are available have already been listed, meaning that new threads will not screw it up by messing with the pages. Also, the newest threads have already been archived (163 of them, and 6300 posts while I'm writing), so most likely your posts won't be in the backup from now on. This is hardly anything to worry about, since anything you post in from now on is not going to the trash pit anyway. But I just thought I'd point it out.

6519 and counting, 168 threads. I love my job. Except it isn't. But I wish it was.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Master
Member # 4614
Profile Homepage #6
You are da man.

--------------------
-ben4808
Posts: 3360 | Registered: Friday, June 25 2004 07:00
Law Bringer
Member # 2984
Profile Homepage #7
I am still a great sucker. ^_^

16045, more than a third done. If there's still a special case somewhere in the remaining threads that I haven't taken into account, then, well, Murphy wins. I'm going to bed, I have a presentation to prepare and two weeks worth of Java homework to check.

Edit: Even if there *is* a special case and the archiving process breaks down as soon as I switch off the screen, sitting here won't be of any use. It won't make the error go away, and I'm too bloody tired to code by now.

[ Monday, January 09, 2006 16:05: Message edited by: The Piper of Hamlin ]

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Off With Their Heads
Member # 4045
Profile Homepage #8
quote:
Originally written by The Piper of Hamlin:

I am still a great sucker. ^_^
Yes. Yes, you are.

--------------------
Arancaytar: Every time you ask people to compare TM and Kel, you endanger the poor, fluffy kittens.
Smoo: Get ready to face the walls!
Ephesos: In conclusion, yarr.

Kelandon's Pink and Pretty Page!!: the authorized location for all things by me
The Archive of all released BoE scenarios ever
Posts: 7968 | Registered: Saturday, February 28 2004 08:00
Law Bringer
Member # 2984
Profile Homepage #9
General is done, Tech Support is nearly done, and there does not appear to be a problem with these threads.

There may still be errors I don't see, and now is the last chance to find them.

To anyone who has time and cares, please go to

http://pied-piper.ermarian.net/flute/index

and

http://pied-piper.ermarian.net/flute/threads

and verify that the index pages of the forums (first link) and the threads themselves (second link) have been saved correctly.

The threads are saved in a folder corresponding to forum number (1 for General), then a folder corresponding to thread number (3110 for this thread). Pages are saved as html files named page1.html,page2.html...

All threads that are listed in the index pages should also be saved completely. If not, we're in trouble.

Later today I will make a rudimentary thread index available that will query the database I have now. For now, anyone who has a Linux machine or some kind of command-line Mysql client can log into the database as follows:

Server: mysql2.websitesource.net
Database: Burschk_piper
User: Burschk_reader
PW: czif46

The privileges are supposedly set to read only, but if I made a mistake or there's a security hole, please don't exploit it. :rolleyes:

Edit: There's only one table in the database. I figure that if you're smart enough to log in, you can also find the table (show tables;) and find out what fields it has.

Edit:

Status

1 - "General" - done.
2 - "Tech Support" done.
3 - "Announcements" done.
4 - "Moderator Board" inaccessible (duh).
5 - "The Exile Trilogy" done.
6 - "Nethergate" done.
7 - "Blades of Exile" in progress.
8 - does not exist
9 - does not exist
10 - "Geneforge" done.
11 - does not exist
12 - does not exist
13 - "Richard White Games" done.
14 - "SubTerra" done.
15 - "Blades of Avernum" done.
16 - "Geneforge 2" done.
17 - "The Avernum Trilogy" done.
18 - "Blades of Avernum Editor" done.
19 - "Geneforge Series" done.
20 - "Avernum 4" done.


All done.

If I missed anything or there are unclarities, please tell me. I'll update this list as I get more forums done.

Edit:

+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
| Forum | Name | Posts | Average length | Average page count | Topics saved | Topics total | Progress |
+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
| 1 | General | 43805 | 34.71 | 1.94 | 1262 | 1262 | 100.00% |
| 2 | Tech Support | 2643 | 5.26 | 1.01 | 502 | 502 | 100.00% |
| 3 | Announcements | 7 | 1.00 | 1.00 | 7 | 7 | 100.00% |
| 5 | The Exile Trilogy | 2765 | 9.19 | 1.06 | 301 | 301 | 100.00% |
| 6 | Nethergate | 1093 | 8.54 | 1.05 | 128 | 128 | 100.00% |
| 7 | Blades of Exile | 2594 | 10.02 | 1.10 | 259 | 259 | 100.00% |
| 10 | Geneforge | 744 | 8.18 | 1.05 | 91 | 91 | 100.00% |
| 13 | Richard White Games | 2264 | 22.87 | 1.54 | 99 | 99 | 100.00% |
| 14 | SubTerra | 750 | 10.71 | 1.11 | 70 | 70 | 100.00% |
| 15 | Blades of Avernum | 18718 | 11.86 | 1.15 | 1578 | 1578 | 100.00% |
| 16 | Geneforge 2 | 2111 | 9.42 | 1.14 | 224 | 224 | 100.00% |
| 17 | The Avernum Trilogy | 5300 | 9.43 | 1.07 | 562 | 562 | 100.00% |
| 18 | Blades of Avernum Editor | 1213 | 24.26 | 1.50 | 50 | 825 | 6.06% |
+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
Blades of Avernum is the board with the most topics, although General beats it in terms of posts. This indexing might take a while yet.

[ Monday, January 16, 2006 06:23: Message edited by: Arancaytar ]

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
By Committee
Member # 4233
Profile #10
quote:
Originally written by The Piper of Hamlin:

"This topic comprises..."
Kudos on using "comprises" correctly! About 0.005% of the population of the U.S. appears to be able to do so. :)
Posts: 2242 | Registered: Saturday, April 10 2004 07:00
Law Bringer
Member # 2984
Profile Homepage #11
Actually, I was quoting it as it appears on a thread page on this forum.

If I ever used it myself, I probably would use it correctly. But I've never, ever used it. I'd probably say "consists of" or "has" or whatever else seems appropriate. In fact, the only time I've ever seen the word comprises in any context is on this board.

As a side note, after mentioning the word comprises here I actually had to rework my program to check that the word "comprises" actually occurs at the top of the page, rather than anywhere - usually, I could be sure it would not be anywhere else in the thread.

+-------+---------------------+-------+----------------+--------------------+--------------+--------------+
| Forum | Name | Posts | Average length | Average page count | Topics saved | Topics total |
+-------+---------------------+-------+----------------+--------------------+--------------+--------------+
| 1 | General | 43805 | 34.71 | 1.94 | 1262 | 1262 |
| 2 | Tech Support | 2643 | 5.26 | 1.01 | 502 | 502 |
| 3 | Announcements | 7 | 1.00 | 1.00 | 7 | 7 |
| 5 | Exile Trilogy | 2765 | 9.19 | 1.06 | 301 | 301 |
| 6 | Nethergate | 1093 | 8.54 | 1.05 | 128 | 128 |
| 7 | Blades of Exile | 2594 | 10.02 | 1.10 | 259 | 259 |
| 10 | Geneforge | 744 | 8.18 | 1.05 | 91 | 91 |
| 13 | Richard White Games | 2264 | 22.87 | 1.54 | 99 | 99 |
| 14 | SubTerra | 750 | 10.71 | 1.11 | 70 | 70 |
| 15 | Blades of Avernum | 3407 | 11.71 | 1.14 | 291 | 1578 |
+-------+---------------------+-------+----------------+--------------------+--------------+--------------+
This might take a while.

[ Tuesday, January 10, 2006 06:12: Message edited by: The Piper of Hamlin ]

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Cartographer
Member # 1851
Profile Homepage #12
Well, this is beautiful. Everything is going nicely, and would appear perfect if not Stareye being bit of a rat behind everyone's backs, scurrying about and deleting topics without giving much thought to how important they may be, yet. If I understood correctly.

Of course, not that there was anything that important in them topics, because all the good topics are long gone and no new as good will ever appear because there are new members who invariably are less excellent than original, older members, so it is in fact impossible for anything that magnificent to come about ever again, and anyway, older members are turning 'hip' and will reflect the style of new members in order to fit in I suppose, although that really should be the other way around. This is often portrayed as a good thing, but change is what change is. Something different. And the past has been wiped away, its lingering cut off by new things and new matters and everything just goes onward and though some try to remember the way life used to be, they're never really in the majority. Not from their heart.

...

I mean, my point being... A good job nearly done, and hearty thanks for Aran? Yes, I'm pretty sure that was what I was trying to say, before I got carried away. Sorry. It's this heart of mine. Romantic daydreams of passionate beginnings and exciting affairs... All long gone. [sighs] Those were the days.

--------------------
"I'm not crazy!"
"Well, whatever. Maybe you just ate something really questionable, or perhaps someone hit you on the head with something large, blunt and heavy just now. By the way..." Gil nudged Grul pointedly.

Ooh! Homepage - Blog - Geneforge, +2, +3 - My Elfwood Gallery and DevArt page
So many strange ones around. Don't you think?
Posts: 1308 | Registered: Sunday, September 8 2002 07:00
Law Bringer
Member # 2984
Profile Homepage #13
We thought they'd never end, we'd sing and dance for ever and a day!

Each of the years I've been here have had a distinct feel - 2003, back when Scorp, Djur and the lot were around, 2004, with its great BoA craze, 2005 with the moderator elections...

Goodness, 2005 seems so awfully distant now, already 10 days past.

And a board will always resist change. Old members don't die off as inevitably as old people die in real life, and different members almost inevitably come from outside (or has anyone here had their children join Spiderweb yet?) and are therefore suspicious. Why, way back when, if Jeff would announce a board purge, we had to save those threads by hand, and uphill both ways! And we liked it! <_<

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Off With Their Heads
Member # 4045
Profile Homepage #14
quote:
Originally written by The Piper of Hamlin:

4 - "Moderator Board" inaccessible (duh).
Just for fun, I manually saved the Mod Board just now. Perhaps someday some of the posts will be declassified, and then I can release them. For the time being, they'll sit on my computer.

--------------------
Arancaytar: Every time you ask people to compare TM and Kel, you endanger the poor, fluffy kittens.
Smoo: Get ready to face the walls!
Ephesos: In conclusion, yarr.

Kelandon's Pink and Pretty Page!!: the authorized location for all things by me
The Archive of all released BoE scenarios ever
Posts: 7968 | Registered: Saturday, February 28 2004 08:00
BANNED
Member # 4
Profile Homepage #15
The Enemy of My Buddy is My Friend?

--------------------
*
Posts: 6936 | Registered: Tuesday, September 18 2001 07:00
Shaper
Member # 5450
Profile Homepage #16
If you are talking about the member MyBuddy, then yes.

--------------------
I'll put a Spring in your step.
:ph34r:
Posts: 2396 | Registered: Saturday, January 29 2005 08:00
Law Bringer
Member # 2984
Profile Homepage #17
This is related how? :P

Anyway, nearly there. Only Avernum 4 (#20) remains:

+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
| Forum | Name | Posts | Average length | Average page count | Topics saved | Topics total | Progress |
+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
| 1 | General | 43805 | 34.71 | 1.94 | 1262 | 1262 | 100.00% |
| 2 | Tech Support | 2643 | 5.26 | 1.01 | 502 | 502 | 100.00% |
| 3 | Announcements | 7 | 1.00 | 1.00 | 7 | 7 | 100.00% |
| 5 | The Exile Trilogy | 2765 | 9.19 | 1.06 | 301 | 301 | 100.00% |
| 6 | Nethergate | 1093 | 8.54 | 1.05 | 128 | 128 | 100.00% |
| 7 | Blades of Exile | 2594 | 10.02 | 1.10 | 259 | 259 | 100.00% |
| 10 | Geneforge | 744 | 8.18 | 1.05 | 91 | 91 | 100.00% |
| 13 | Richard White Games | 2264 | 22.87 | 1.54 | 99 | 99 | 100.00% |
| 14 | SubTerra | 375 | 10.71 | 1.11 | 35 | 35 | 100.00% |
| 15 | Blades of Avernum | 18718 | 11.86 | 1.15 | 1578 | 1578 | 100.00% |
| 16 | Geneforge 2 | 2111 | 9.42 | 1.14 | 224 | 224 | 100.00% |
| 17 | The Avernum Trilogy | 5300 | 9.43 | 1.07 | 562 | 562 | 100.00% |
| 18 | Blades of Avernum Editor | 8127 | 9.85 | 1.10 | 825 | 825 | 100.00% |
| 19 | Geneforge Series | 8402 | 9.16 | 1.09 | 917 | 917 | 100.00% |
| 20 | Avernum 4 | 59 | 29.50 | 2.00 | 2 | 187 | 1.07% |
+-------+--------------------------+-------+----------------+--------------------+--------------+--------------+----------+
Meanwhile, this lovely interface has been shoddily implemented to allow you to more or less easily browse the archive. Basically, look for threads on the forum page, then mouse over the link, watch the destination in the status bar, then copy the thread number into the form in the control panel. It's awkward, but under the circumstances I didn't have time for anything more fancy.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Law Bringer
Member # 2984
Profile Homepage #18
Sorry for the double post, but...

IT IS DONE!

The next step is to zip all this stuff to make it available for download and offline viewing, later perhaps to enter it into a database to make it more interactive and searchable.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Law Bringer
Member # 2984
Profile Homepage #19
Sorry for the triple post, but I'm making another announcement so it wouldn't hurt to bump the thread.

PPP3 is now completely available for offline viewing at leisure, in zip format. This download is almost 75 MB.
The index pages of the forums, used to determine what threads to download, are here. This is about 2 MB.

[ Wednesday, January 11, 2006 12:45: Message edited by: Arancaytar ]

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Shaper
Member # 5450
Profile Homepage #20
Yay.

I might download the earlier PPP's for 'offline viewing' first.

Well done Aran.

--------------------
I'll put a Spring in your step.
:ph34r:
Posts: 2396 | Registered: Saturday, January 29 2005 08:00
Master
Member # 4614
Profile Homepage #21
You are really da man. I hope you get a 6-digit programming job when you grow up. ;)

--------------------
-ben4808
Posts: 3360 | Registered: Friday, June 25 2004 07:00
...b10010b...
Member # 869
Profile Homepage #22
I wasn't aware that Aran suffered from polydactyly.

--------------------
The Empire Always Loses: This Time For Sure!
Posts: 9973 | Registered: Saturday, March 30 2002 08:00
Law Bringer
Member # 335
Profile Homepage #23
Ha.

An extra digit gives you a real leg (or finger) up on the competition in the world of piano. Unfortunately, most standard keyboard layouts aren't designed to take advantage of this.

—Alorael, who hopes Aran gets a six-digit salary from a company that has its finger on the pulse of progress and knows who to tap for the best odds at clawing its way into the bold new future. That was a stretch.
Posts: 14579 | Registered: Saturday, December 1 2001 08:00
Law Bringer
Member # 2984
Profile Homepage #24
Hey, I could try to go into politics and end up with a salary greater than that with no coding at all, just spouting lots of bull. :P

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00

Pages