Help with Unix needed...
Author | Topic: Help with Unix needed... |
---|---|
Law Bringer
Member # 2984
|
written Saturday, December 10 2005 07:29
Profile
Homepage
This is for the Pied Piper Project, so it's at least tangentially related to Spiderweb. [if you know what the project is, you can skip what follows] --- The PPP started when the board was pruned about a year and a half ago for disk space reasons. Over a few days (with about a week's warning before the purge) we managed to save almost every thread on the General board. When, a few months later, another purge became necessary, we did the same for the other boards. The data is still saved, and I have most of it (all of PPP1, and a lot of PPP2, although I'd have to check that). --- [start again here] The data is unstructured and unformated, unfortunately. Most people saved threads by title rather than by number, which made it very hard to index the html files. But with a little time and some leet unix shell commands for renaming large numbers of files, it now is mostly structured. It can be downloaded here: PPP1,PPP2, or both. The next step would be to replace the page links in each thread with internal links, allowing the archive to be browsed without accidentally going to the SW forums and being told the thread no longer exists. There are a 1000 html pages in General alone, so changing it manually would be dumb. So I'm going to try it with a shell command. --- The thread link takes this form in every page: http://www.ironycentral.com/cgi-bin/ubb/ubb/ultimatebb.php?ubb=get_topic;f=A;t=BBBBBBBB;p=C Where the "A" is the forum number, the "B" is the thread number, and "C" the page number.I would like to have this link refer, instead, to: pageC.html Since threads are in the same folder, so the other stuff isn't necessary.That's the first problem, but the second one seems a bit more tricky. Some browsers transform all html tags into uppercase when saving a page. This is quite annoying when trying to replace strings; I wonder if it would be possible to transform all characters that lie between '<' and '>' into lowercase without affecting anything else? -------------------- Encyclopaedia Ermariana • Forum Archives • Forum Statistics • RSS [Topic / Forum] My Blog • Polaris • I eat novels for breakfast. Polaris is dead, long live Polaris. Look on my works, ye mighty, and despair. Posts: 8752 | Registered: Wednesday, May 14 2003 07:00 |
Master
Member # 4614
|
written Saturday, December 10 2005 10:04
Profile
Homepage
It seems like a job for regular expressions, which I don't have much practice on, unfortunately. -------------------- -ben4808 Posts: 3360 | Registered: Friday, June 25 2004 07:00 |
Law Bringer
Member # 2984
|
written Saturday, December 10 2005 15:24
Profile
Homepage
Yes, mostly regular expressions. But the lowercase conversion actually would require a certain command I don't know yet (if it exists). sed has a command for changing characters in a line if a certain string is found, but that would only work if one html tag was on each line. As it is, this would lowercase any text on the same line, which wouldn't be that good. -------------------- Encyclopaedia Ermariana • Forum Archives • Forum Statistics • RSS [Topic / Forum] My Blog • Polaris • I eat novels for breakfast. Polaris is dead, long live Polaris. Look on my works, ye mighty, and despair. Posts: 8752 | Registered: Wednesday, May 14 2003 07:00 |
Warrior
Member # 5886
|
written Sunday, December 11 2005 04:02
Profile
Perhaps use a strong text-manipulating scripting language such as Perl? Perl is extremely powerful because it allows you to select what you replace within the regular expression. Give me a specific sample of what you are trying to do, and I'll try to cook something up. You could go here to learn yourself if you like: www.perl.com [ Sunday, December 11, 2005 11:58: Message edited by: Aranea Hirsuta ] -------------------- We don't make the white chalky excrement that splats down and ruins your car's paint job. We make it stronger. Posts: 52 | Registered: Friday, June 3 2005 07:00 |
Law Bringer
Member # 2984
|
written Sunday, December 11 2005 08:23
Profile
Homepage
What did you post eight " :eek: "s for? Other than that, yes I'd guessed Perl would have a good solution; unfortunately I'm far too short on time to even learn the basics of a new language right now. Perhaps it'll have to wait until after the exams. ;) -------------------- Encyclopaedia Ermariana • Forum Archives • Forum Statistics • RSS [Topic / Forum] My Blog • Polaris • I eat novels for breakfast. Polaris is dead, long live Polaris. Look on my works, ye mighty, and despair. Posts: 8752 | Registered: Wednesday, May 14 2003 07:00 |
Warrior
Member # 5886
|
written Sunday, December 11 2005 09:13
Profile
I got the link parts working. Why is Windows so horrible about supplying convenient environment variables (like PWD) ? ugggg There must be somewhere in the *.html files that controls the default path tag (http://www.ironycentral.com/cgi-bin/ubb/ubb/ultimatebb.php) when it is not specified in an absolute sense in the file. Another problem is that getting it to work for Macs is going to be a problem, because I use windows. About the part concerning the forced capitalization of html tags, for which specific browsers is this a problem? [ Sunday, December 11, 2005 17:26: Message edited by: Aranea Hirsuta ] -------------------- We don't make the white chalky excrement that splats down and ruins your car's paint job. We make it stronger. Posts: 52 | Registered: Friday, June 3 2005 07:00 |
Law Bringer
Member # 2984
|
written Monday, December 12 2005 05:35
Profile
Homepage
It's not a problem when displaying as much as it will be annoying when I try to parse the files and enter the posts into a MySQL database - although of course it will be possible to do that case-insensitively. So I suppose that part is not essential... -------------------- Encyclopaedia Ermariana • Forum Archives • Forum Statistics • RSS [Topic / Forum] My Blog • Polaris • I eat novels for breakfast. Polaris is dead, long live Polaris. Look on my works, ye mighty, and despair. Posts: 8752 | Registered: Wednesday, May 14 2003 07:00 |
Apprentice
Member # 6641
|
written Wednesday, January 4 2006 12:32
Profile
One method that should work on most *nix machines for: myshell# cat oldfile | tr [:upper:] [:lower:] >> newfile This will change the case throughout the file.Posts: 1 | Registered: Wednesday, January 4 2006 08:00 |
Law Bringer
Member # 2984
|
written Wednesday, January 4 2006 12:50
Profile
Homepage
But that still leaves the problem that all text is converted to lower case, although only the html tags are supposed to... -------------------- Encyclopaedia Ermariana • Forum Archives • Forum Statistics • RSS [Topic / Forum] My Blog • Polaris • I eat novels for breakfast. Polaris is dead, long live Polaris. Look on my works, ye mighty, and despair. Posts: 8752 | Registered: Wednesday, May 14 2003 07:00 |
Infiltrator
Member # 6136
|
written Friday, January 6 2006 12:57
Profile
Hey whats Unix?? It's related with Linux?? :confused: Posts: 446 | Registered: Friday, July 22 2005 07:00 |
Off With Their Heads
Member # 4045
|
written Friday, January 6 2006 13:09
Profile
Homepage
Chicho: Wikipedia is your friend. [ Friday, January 06, 2006 13:10: Message edited by: Kelandon ] -------------------- Arancaytar: Every time you ask people to compare TM and Kel, you endanger the poor, fluffy kittens. Smoo: Get ready to face the walls! Ephesos: In conclusion, yarr. Kelandon's Pink and Pretty Page!!: the authorized location for all things by me The Archive of all released BoE scenarios ever Posts: 7968 | Registered: Saturday, February 28 2004 08:00 |
Master
Member # 4614
|
written Friday, January 6 2006 20:55
Profile
Homepage
quote:I'm thinking you could disassemble all the html tags into an array, change the contents of that array to lowercase, and then reassemble the array into the text at the proper position as previously saved. It could probably be done with the right combination of intrinsic functions. :P -------------------- -ben4808 Posts: 3360 | Registered: Friday, June 25 2004 07:00 |