Help with Unix needed...

Error message

Deprecated function: implode(): Passing glue string after array is deprecated. Swap the parameters in drupal_get_feeds() (line 394 of /var/www/pied-piper.ermarian.net/includes/common.inc).
AuthorTopic: Help with Unix needed...
Law Bringer
Member # 2984
Profile Homepage #0
This is for the Pied Piper Project, so it's at least tangentially related to Spiderweb.

[if you know what the project is, you can skip what follows]
---
The PPP started when the board was pruned about a year and a half ago for disk space reasons. Over a few days (with about a week's warning before the purge) we managed to save almost every thread on the General board. When, a few months later, another purge became necessary, we did the same for the other boards. The data is still saved, and I have most of it (all of PPP1, and a lot of PPP2, although I'd have to check that).
---
[start again here]

The data is unstructured and unformated, unfortunately. Most people saved threads by title rather than by number, which made it very hard to index the html files. But with a little time and some leet unix shell commands for renaming large numbers of files, it now is mostly structured. It can be downloaded here:

PPP1,PPP2, or both.

The next step would be to replace the page links in each thread with internal links, allowing the archive to be browsed without accidentally going to the SW forums and being told the thread no longer exists. There are a 1000 html pages in General alone, so changing it manually would be dumb. So I'm going to try it with a shell command.

---

The thread link takes this form in every page:

http://www.ironycentral.com/cgi-bin/ubb/ubb/ultimatebb.php?ubb=get_topic;f=A;t=BBBBBBBB;p=CWhere the "A" is the forum number, the "B" is the thread number, and "C" the page number.

I would like to have this link refer, instead, to:

pageC.htmlSince threads are in the same folder, so the other stuff isn't necessary.

That's the first problem, but the second one seems a bit more tricky. Some browsers transform all html tags into uppercase when saving a page. This is quite annoying when trying to replace strings; I wonder if it would be possible to transform all characters that lie between '<' and '>' into lowercase without affecting anything else?

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Master
Member # 4614
Profile Homepage #1
It seems like a job for regular expressions, which I don't have much practice on, unfortunately.

--------------------
-ben4808
Posts: 3360 | Registered: Friday, June 25 2004 07:00
Law Bringer
Member # 2984
Profile Homepage #2
Yes, mostly regular expressions. But the lowercase conversion actually would require a certain command I don't know yet (if it exists). sed has a command for changing characters in a line if a certain string is found, but that would only work if one html tag was on each line. As it is, this would lowercase any text on the same line, which wouldn't be that good.

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Warrior
Member # 5886
Profile #3
Perhaps use a strong text-manipulating scripting language such as Perl? Perl is extremely powerful because it allows you to select what you replace within the regular expression. Give me a specific sample of what you are trying to do, and I'll try to cook something up. You could go here to learn yourself if you like:
www.perl.com

[ Sunday, December 11, 2005 11:58: Message edited by: Aranea Hirsuta ]

--------------------
We don't make the white chalky excrement that splats down and ruins your car's paint job. We make it stronger.
Posts: 52 | Registered: Friday, June 3 2005 07:00
Law Bringer
Member # 2984
Profile Homepage #4
What did you post eight " :eek: "s for?

Other than that, yes I'd guessed Perl would have a good solution; unfortunately I'm far too short on time to even learn the basics of a new language right now. Perhaps it'll have to wait until after the exams. ;)

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Warrior
Member # 5886
Profile #5
I got the link parts working. Why is Windows so horrible about supplying convenient environment variables (like PWD) ? ugggg

There must be somewhere in the *.html files that controls the default path tag (http://www.ironycentral.com/cgi-bin/ubb/ubb/ultimatebb.php) when it is not specified in an absolute sense in the file.

Another problem is that getting it to work for Macs is going to be a problem, because I use windows.

About the part concerning the forced capitalization of html tags, for which specific browsers is this a problem?

[ Sunday, December 11, 2005 17:26: Message edited by: Aranea Hirsuta ]

--------------------
We don't make the white chalky excrement that splats down and ruins your car's paint job. We make it stronger.
Posts: 52 | Registered: Friday, June 3 2005 07:00
Law Bringer
Member # 2984
Profile Homepage #6
It's not a problem when displaying as much as it will be annoying when I try to parse the files and enter the posts into a MySQL database - although of course it will be possible to do that case-insensitively.

So I suppose that part is not essential...

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Apprentice
Member # 6641
Profile #7
One method that should work on most *nix machines for:

myshell# cat oldfile | tr [:upper:] [:lower:] >> newfileThis will change the case throughout the file.
Posts: 1 | Registered: Wednesday, January 4 2006 08:00
Law Bringer
Member # 2984
Profile Homepage #8
But that still leaves the problem that all text is converted to lower case, although only the html tags are supposed to...

--------------------
Encyclopaedia ErmarianaForum ArchivesForum StatisticsRSS [Topic / Forum]
My BlogPolarisI eat novels for breakfast.
Polaris is dead, long live Polaris.
Look on my works, ye mighty, and despair.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Infiltrator
Member # 6136
Profile #9
Hey whats Unix??
It's related with Linux?? :confused:
Posts: 446 | Registered: Friday, July 22 2005 07:00
Off With Their Heads
Member # 4045
Profile Homepage #10
Chicho: Wikipedia is your friend.

[ Friday, January 06, 2006 13:10: Message edited by: Kelandon ]

--------------------
Arancaytar: Every time you ask people to compare TM and Kel, you endanger the poor, fluffy kittens.
Smoo: Get ready to face the walls!
Ephesos: In conclusion, yarr.

Kelandon's Pink and Pretty Page!!: the authorized location for all things by me
The Archive of all released BoE scenarios ever
Posts: 7968 | Registered: Saturday, February 28 2004 08:00
Master
Member # 4614
Profile Homepage #11
quote:
Originally written by It's Ah-run-KYE-tar:

But that still leaves the problem that all text is converted to lower case, although only the html tags are supposed to...
I'm thinking you could disassemble all the html tags into an array, change the contents of that array to lowercase, and then reassemble the array into the text at the proper position as previously saved.

It could probably be done with the right combination of intrinsic functions. :P

--------------------
-ben4808
Posts: 3360 | Registered: Friday, June 25 2004 07:00