Consider This Catharsis

Pages

AuthorTopic: Consider This Catharsis
Guardian
Member # 6670
Profile Homepage #0
Meh; the last time I did something like this a few people partook.
quote:
A palindrome is a word that is spelt the same forwards and backwards; aba and a are palindromes, while ab is not. A substring of a character string is just the given string with zero or more characters left out - for example, abc is a substring of axbxzcd, while abcx is not. Every nonempty character string has a substring that is a palindrome, since each character on its own is a palindrome.

Design an efficient algorithm which, given a nonempty character string S, computes the length of a longest substring of S that is a palindrome.

Blah blah correctness blah blah complexity blah blah blah.
--------------------
quote:
For the next part, just pretend that I've already found the correct recurrence formula.

Posts: 1509 | Registered: Tuesday, January 10 2006 08:00
Law Bringer
Member # 2984
Profile Homepage #1
This is brainstorming, not a formal description:

If we do not need to work in place, I'd first make a copy of the string that is stripped of case and non-alpha characters. This is only a single pass over the string - O(n) - and takes at most as much memory as the string. This would greatly simply the analysis as we don't need to account for spaces or case any more.

I would then try to find potential center positions first - two characters that are consecutive, or one that is sandwiched between the same characters.

- For each character from 0 to length of string
--- Is next character equal to this character or to previous character?
--- Store the current index in a collection of potential palindromes.

This step can be done in a single pass over the string and is O(n).

(We need to store the information whether this is an even or odd length palindrome, and I can think of no elegant solution but only a special case checking. The following assumes an odd-length palindrome for simplicity.)

Next, we eliminate these candidates by extending each and finding out how long they get.

- Start i at 1
- While there are still values in the collection
--- Increment i
--- For each value in the collection
----- Compare characters at distance i in both directions from the value
----- If they're not equal (or one of the positions compared lies outside string boundaries), eliminate this value
- The longest palindrome (still assuming odd length) is (i-1)*2+1.

Because this algorithm identifies all candidates first and eliminates this tree of possibilities in Level Order, it is most efficient on a string with few such candidates or even just a single long palindrome. In that ideal case it could be practically linear. The worst case scenario is a string of equal letters. In this case, all letters are potential candidates and will be examined until each reach the edge and only the center is left, causing it to become O(n^2).

An alternative solution would be to eliminate each candidate completely in their own order, starting at the ones closest to the center (most room to expand). This would be faster in some cases but slower in others - I'm not sure how common each of these cases far.

--------------------------------

Gah. Tired. I hope I haven't just excreted complete nonsense.

[ Tuesday, October 23, 2007 15:02: Message edited by: Xian ]

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Post Navel Trauma ^_^
Member # 67
Profile Homepage #2
He's making your life difficult - the substring doesn't have to be consecutive, so AxByA should return ABA.

--------------------
Barcoorah: I even did it to a big dorset ram.

New Mac BoE
Posts: 1798 | Registered: Thursday, October 4 2001 07:00
Law Bringer
Member # 2984
Profile Homepage #3
WTF?

Ouch, forget it then...

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Nuke and Pave
Member # 24
Profile Homepage #4
Why do I have a feeling that this problem is NP-complete? :)

I haven't done any NP-completeness proofs in several years, so I'll think about it after work, but it is looking similar to some other innocent-looking problems that turn out to have only exponential solutions. (The obvious algorithm for this one is to look at all substrings that are skipping a single letter, then run recursively until you get down to strings of length 1, which gives you a O(n!) running time.)

--------------------
Be careful with a word, as you would with a sword,
For it too has the power to kill.
However well placed word, unlike a well placed sword,
Can also have the power to heal.
Posts: 2649 | Registered: Wednesday, October 3 2001 07:00
Law Bringer
Member # 335
Profile Homepage #5
I am not a programmer, so here's my terrible solution. It very quickly gets out of hand for long strings with many non-unique characters, but it should eventually spit out a solution.

Define variables longestPalindrome and currentPalindrome.

Identify characters that appear more than once in the string. If there are no such characters, add 1 to currentPalindrome and compare currentPalindrome to longestPalindrome. If the string is an empty set, don't increment currentPalindrome. If longest is shorter, let longest = current.

For every character that is non-unique, find the first and last instance of the character, then call the same function with the string of characters between those two characters as its argument. Add 2 to currentPalindrome.

—Alorael, who thinks this will get the right number if it doesn't crash the computer with an immense string. Using this kind of algorithm on the Bible would be a bad idea.
Posts: 14579 | Registered: Saturday, December 1 2001 08:00
Nuke and Pave
Member # 24
Profile Homepage #6
EDIT: Never mind.

[ Tuesday, October 23, 2007 16:54: Message edited by: Zeviz ]

--------------------
Be careful with a word, as you would with a sword,
For it too has the power to kill.
However well placed word, unlike a well placed sword,
Can also have the power to heal.
Posts: 2649 | Registered: Wednesday, October 3 2001 07:00
Shaper
Member # 32
Profile #7
While perhaps not as efficient, I suggest looking for smaller palindromes and

working upwards from those. If we cannot find a sub-string palindrome of length n,

then there will not be one of length n+1 (because we can create a palindrome of

length n by eliminating a character from the larger palindrome).

std::string Text;

bool PDRMS = true;
int LENGTH = 1;

////////////////////[HEIGHT,WIDTH]///////////////////////
Matrix<bool> INDX [Text.length(),Text.length()+1] = false;

bool TEST_UPDATE(int Row_Indx){
string T1 = "";
for(int i = 1; i < INDX.WIDTH; i++){
if(INDX[Row_Indx,i]){T1 += Text[i];}
}

for(int j = 0; j < T1.length()/2; j++){
if(T1[0] != T1[T1.length()-(T1+1)]){
return false;
}
}

return true;
}

bool EXPAND(int Row_Indx){
int L_INDX = 0;
int R_INDX = 0;
int MOD = 1;
for(int i = 1; i < INDX.WIDTH; i++){
if(INDX[Row_Indx,i]){
if(L_INDX == 0){L_INDX = i}
R_INDX = i;
}
}

while((L_INDX-MOD) > 0 && (R_INDX+MOD) < INDX.WIDTH){

//Try to Add onto the Left
INDX[Row_INDX,L_INDX - MOD] = 1;
T1 = TEST_UPDATE(Row_Indx);
if(T1){return T1;}
INDX[ROW_INDX,L_INDX - MOD] = 0;

//Try to Add onto the Right
INDX[Row_INDX,R_INDX + MOD] = 1;
T1 = TEST_UPDATE(Row_Indx);
if(T1){return T1;}
INDX[ROW_INDX,R_INDX + MOD] = 0;

MOD++;
}
return false;
}

bool FIND_PDRMS(int LENGTH){
bool FOUND = false;
for(int i = 0; i < INDX.Height; i++){
if(INDX[i,0]){
INDX[i,0] = EXPAND(i);
FOUND = FOUND || E_INDX[i,0];
}
}

return FOUND;
}

void Initialize_Array(){
for(int i = 0; i < O_INDX.Height; i++){
INDX[i,0] = 1;
INDX[i,i+1] = 1;
}
}


while(PDRMS){
Initialize Array();
PDRMS = FIND_PDRM(LENGTH+1);
if(PDRMS){LENGTH++;}
}
std::cout << "Longest Palindrome has length: " << LENGTH << std::endl;


--------------------
Lt. Sullust
Quaere verum
Posts: 2462 | Registered: Wednesday, October 3 2001 07:00
Law Bringer
Member # 2984
Profile Homepage #8
I think better in terms of numerical operations than character comparisons. Perhaps we can reduce the problem (requiring at most quadratic time, which is still less than the current solutions) by first comparing all characters with each other and building a boolean matrix:

I'll vainly demonstrate on my name:

arancaytar

a 1010010010
r 0100000001
a 1010010010
n 0001000000
c 0000100000
a 1010010010
y 0000001000
t 0000000100
a 1010010010
r 0100000001
I believe that in this matrix, we can, navigating through rows and columns, find the longest possible chain. I still need to figure out how, but I posted this here to give you all a little nudge in a direction that might show promise.

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Law Bringer
Member # 2984
Profile Homepage #9
Edit:

To make clear how the matrix visualizes the solutions, here is another image that shows the matches we are looking at with X instead of 1:

dintiradan

d 1000000X00 dinid
i 100X00000
n X0000001
t 1000000
i 100000
r 10000
a 1010
d 100
a 10
n 1

dintiradan

d 1000000X00 ditid
i 100X00000
n 10000001
t X000000
i 100000
r 10000
a 1010
d 100
a 10
n 1

dintiradan

d 1000000100
i 100100000
n 1000000X nadan
t 1000000
i 100000
r 10000
a 10X0
d X00
a 10
n 1
Obviously, whenever we pick a certain 1 as a starting place, we then look at the triangle below and to the left of this position. There is still the problem of identifying the most promising place - look at the current example, where merely looking at the outmost matches wouldn't have helped us find which solution was better. If it were "Dintiraden", the "nadan" solution would not exist, but we couldn't have known it from the outmost matches:

dintiraden

d 1000000100
i 100100000
n 10000001
t 1000000
i 100000
r 10000
a 1000
d 100
e 10
n 1
Perhaps we should instead start at the other end, where we know we must end up? Start on the main diagonal or all those 1 positions that lie adjacent.

In this case, whenever we consider a certain position, what is left is not a triangle, but a rectangle - above and to the right of the position we're looking at.

dintiraden

d 0000100
i 0100000
n X
You then have to pick another match out of the triangle that's left, and repeat the process.

This process might still take polynomial time, though - if the problem is provably NP-complete, there's no way around that.

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
By Committee
Member # 4233
Profile #10
Caveat: I am not a programmer

If there are no characters in the string, it would return 0.

You would then need to search for all pairings of identical characters within the string. If there are no pairs, then the function returns a maximum palindrome length of 1.

After that, once the algorithm has identified pairs, we would make the function recursive, such that the function would repeat within each substring between the identified pair.

Is there some other way to go about this? What does NP-complete mean?

--------------------
In today’s America, there are more World of Warcraft players than farmers.
Posts: 2242 | Registered: Saturday, April 10 2004 07:00
Law Bringer
Member # 2984
Profile Homepage #11
Wait wait wait.

Do you know how we solved a labyrinth in FORTRAN, entirely without backtracking? We passed over the entire field, beginning at the entrance, and set each field unexplored field (value "0") adjacent to an explored field (value >0) equal to the value of the explored field + 1. This way, each field of the labyrinth had the minimum number of steps required to reach it. A long path might still take almost n^4 calculations (Longest theoretical path is n^2 steps, and each step requires a loop over the n^2 board), but we got rid of the backtracking and recursion.

Perhaps we can "accumulate" a solution here too? By incrementing all the values in the rectangle shown in the last post by 1... the only problem is that the rectangles overlap...

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Shaper
Member # 32
Profile #12
I'll need to think about it a bit more, but I'm pretty sure my solution runs in polynomial time. Albeit with a fairly large exponent...

--------------------
Lt. Sullust
Quaere verum
Posts: 2462 | Registered: Wednesday, October 3 2001 07:00
Nuke and Pave
Member # 24
Profile Homepage #13
Drew, "NP" is a class of problems that are assumed to have No Polynomial-time solutions. NP-Complete problems are the hardest problems in that group. The "complete" part of description means that if you can solve one of these problems in polynomial time, you can solve all of them, and by extention all of NP problems.

"Polynomial time" means that for a problem of size X, it will take X, or X^2, or X^3, or some other fixed power of X steps to solve the problem. For example:

- Checking if a list of size X contains letter 'a' takes at most X steps.

- An obvious algorithm for finding biggest number in a list of size X takes X^2 steps. (For each number in the list, go through the whole list looking for a bigger number.)

Aran, that labirinth solution was a simple Breadth-First Search of a graph. I don't think you can reduce this problem to a BFS.

The solutions proposed so far have exponential worst-case running time (when run on a string of same letters), but polynomial average-case running time. This sounds very familiar, but I don't remember the details. I need to re-read my algorithms textbook.

--------------------
Be careful with a word, as you would with a sword,
For it too has the power to kill.
However well placed word, unlike a well placed sword,
Can also have the power to heal.
Posts: 2649 | Registered: Wednesday, October 3 2001 07:00
Shaper
Member # 32
Profile #14
In retrospect, my solution fails for "ababababa", as it will only extract either a string of b's or a string of a's...

--------------------
Lt. Sullust
Quaere verum
Posts: 2462 | Registered: Wednesday, October 3 2001 07:00
Electric Sheep One
Member # 3431
Profile #15
I am interested in understanding better the complexity class NP. I had always supposed that NP stood for 'non-polynomial', meaning problems for which no algorithm running in polynomial time exists. I have just learned (via the usual oracle call) that NP is 'non-deterministic polynomial', and refers to the 'non-deterministic Turing machine', which seems to be essentially the Turing machine that a god of luck would purpose-build for any problem.

As pre-payment for any information on the topic, I offer my own modest but non-trivial insight in the general theory of problem solving.

Some problems are in the class NC, which stands for No-one Cares. Worse are problems in the class NC-Complete, which no-one will ever care about, because if they cared about any one of them, they would have to care about all the others too.

The insight is that you should avoid working on these problems.
*(laugh here)
You should not laugh.
*(and here)

--------------------
We're not doing cool. We're doing pretty.
Posts: 3335 | Registered: Thursday, September 4 2003 07:00
Guardian
Member # 6670
Profile Homepage #16
Erm, not quite.

P stands for Polynomial time. This means that the running time is a polynomial function based on the input size. For example, finding the largest item of a list of size 'n' can be solved in 'n' iterations (not n^2, as Zeviz said; just devote some memory to remembering the current maximum).

NP stands for Nondeterministic Polynomial time. This means that, given a solution, we can verify it in polynomial time. For instance, consider the general boolean constraint problem: you have an arbitrary number of boolean statements of arbitrary length, sharing an arbitrary number of variables. There's no known algorithm to solve this general case in polynomial time, but if someone hands us a solution, we can check to see if it is correct in polynomial time (just see if every statement evaluates to 'true' with the given variables).

Note: All problems in P are in NP (P is a subset of NP). Trivial to show: we can get a solution in polynomial time, and assuming the algorithm is correct, the solution is correct.

One of the biggest problems in theoretical computing science is whether or not P=NP. That is, does every problem where we can verify a solution in polynomial time have a method of obtaining that solution in polynomial time?

Which brings us to NPC: NP-Complete. As Zeviz said, these are the problems that have been shown to be as 'tough' as any other problem in NP. Of course, if P=NP, then NPC is a subset of P. However, if P!=NP, then NPC=NP-P.

(There's also Co-NP, which is the set of problems where we can verify if a problem has no solution in polynomial time.)

--------------------
In other news, I just reinvented the wheel.
Posts: 1509 | Registered: Tuesday, January 10 2006 08:00
Law Bringer
Member # 335
Profile Homepage #17
Drew's solution is the same as mine. I wonder if there's something about learning to program that teaches you to think in a more programmish way.

—Alorael, who thinks Spiderweb is exactly the right place to throw NC-Complete problems. The only better forum for such issues is xkcd's.
Posts: 14579 | Registered: Saturday, December 1 2001 08:00
Electric Sheep One
Member # 3431
Profile #18
Wikipedia told me that verifiability in polynomial time with a deterministic Turing machine is logically equivalent to solvability in polynomial time with a non-deterministic model. Hence the otherwise awkward name 'non-deterministic polynomial'. Evidently the non-deterministic solvability concept was the first one introduced, historically.

But this is part of my question. Are non-deterministic Turing machines of any real interest, or were they just a notion people passed through on the way to realizing that verification was what they really wanted to discuss?

--------------------
We're not doing cool. We're doing pretty.
Posts: 3335 | Registered: Thursday, September 4 2003 07:00
Nuke and Pave
Member # 24
Profile Homepage #19
quote:
Originally written by Student of Trinity:

...
But this is part of my question. Are non-deterministic Turing machines of any real interest, or were they just a notion people passed through on the way to realizing that verification was what they really wanted to discuss?

As far as I know, they are to quantum computing what regular Turing Machines are to regular computing, but I might be wrong on that.

(As an amusing side-note, there are people already writing graduate dissertations about solving problems on quantum computers, despite the fact that such computers themselves are still over a decade away, by the most optimistic estimates.)

--------------------
Be careful with a word, as you would with a sword,
For it too has the power to kill.
However well placed word, unlike a well placed sword,
Can also have the power to heal.
Posts: 2649 | Registered: Wednesday, October 3 2001 07:00
Post Navel Trauma ^_^
Member # 67
Profile Homepage #20
Quantum computers aren't much like nondeterministic turing machines. For a start, they probably can't solve NP-complete problems in polynomial time. The conception of a quantum computer trying every possibility and picking out the right answer is more about bad science journalism than reality.

I'm not sure what nondeterministic turing machines are used for, but if you go down a level or two, nondeterministic finite state automata are useful for thinking about regular expressions. I imagine that nondeterministic turing machines are useful for similar things, where you're looking to see if a solution exists, but don't want your analysis to be bogged down with details about backtracking and search order and whatnot.

[ Wednesday, October 24, 2007 13:10: Message edited by: Khoth ]

--------------------
Barcoorah: I even did it to a big dorset ram.

New Mac BoE
Posts: 1798 | Registered: Thursday, October 4 2001 07:00
Law Bringer
Member # 2984
Profile Homepage #21
I'm still having trouble accepting the claim of NP-completeness. Here's my next idea that I think runs in polynomial time. It's not a good polynomial (possibly n^3 or even n^4), but the NP property of the algorithm has to rest in a single level - if we can show that a step is repeated n^j times, and is n^k big, we are still in Polynomial complexity in total.

------

1. Generate my triangle matrix outlined above - O(n^2)

2. Examine the diagonal i=j+a (i is column, j is row). a is 1 in this first cycle. Complexity O(n) as there are less than n such diagonals.

2.1 Examine each value M(i,j+a)=1 in the diagonal. Call this point M(x,y). Complexity O(n) as the diagonal is less than n long.

2.1.1 Examine the triangle of values described by i>j and i<x and j>y. (In graphical terms, this is the triangle between the diagonal and the point to the lower left of M(x,y) ).

In the beginning this triangle is empty and the step can be skipped. At the end, this triangle contains at most half the matrix and thus takes O(n^2) to search.

2.1.1.1 Find the largest value inside the triangle.

2.1.2 Set M(x,y) to this largest value + 1.

2.2 Repeat step 2 with a = a+1.

3. Search through the matrix to find the largest value. This has complexity O(n^2) again.

4. Return the largest value, and have some coffee. Aaah.

I'm telling you, it's polynomial in the worst case! :P

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
Shaper
Member # 32
Profile #22
I've done a revamp of the code listed above. It should run correctly now, though at the sacrifice of some efficiency. I've gotten the following run times:

STRING TIME(seconds)
a 0.00
aa 0.00
aaa 0.00
aaaa 0.00
aaaaa 0.01
aaaaaa 0.19
aaaaaaa 8.90
aaaaaaaa Pending...
The results for the final test are still pending. The rapid increase in time, while not completely eliminating the polynomial scale, certainly seems to indicate exponential.

Perhaps I'll trying coding up Aran's solution to see how it compares...

--------------------
Lt. Sullust
Quaere verum
Posts: 2462 | Registered: Wednesday, October 3 2001 07:00
Law Bringer
Member # 2984
Profile Homepage #23
I've implemented it (in PHP :P )

I'm seeing a rapid increase in time as letters are added.

But it's not immediately obvious whether it's a polynomial or a slow exponential, until I'm too tired to do the eye-balling. A sample of 281 letters had 14 seconds. 50 letters had 0.01 seconds. 100 letters had 0.25 seconds. 150 letters had 1.3 seconds.

Or, more succinctly:

IMAGE(http://ermarian.net/html/php/string/graph.php)

[ Wednesday, October 24, 2007 16:00: Message edited by: Xian ]

--------------------
The Noble and Ancient Order of Polaris - We're Not Yet Dead.
EncyclopediaArchivesStatsRSS (This Topic / Forum) • BlogNaNoWriMo
Did-chat thentagoespyet jumund fori is jus, hat onlime gly nertan ne gethen Firyoubbit 'obio.'
Decorum deserves a whole line of my signature, and an entry in your bookmarks.
Posts: 8752 | Registered: Wednesday, May 14 2003 07:00
By Committee
Member # 4233
Profile #24
quote:
Originally written by Hierogrammate:

Drew's solution is the same as mine. I wonder if there's something about learning to program that teaches you to think in a more programmish way.

—Alorael, who thinks Spiderweb is exactly the right place to throw NC-Complete problems. The only better forum for such issues is xkcd's.

Sorry - I realized that just as I finished, but I was dashing out the door at the time. All due props to Alo for the non-programmer idea.

--------------------
In today’s America, there are more World of Warcraft players than farmers.
Posts: 2242 | Registered: Saturday, April 10 2004 07:00

Pages