Author Topic: and CEP?  (Read 3848 times)

Legacy_Guest_nosecone2010_*

  • Newbie
  • *
  • Posts: 6
  • Karma: +0/-0
and CEP?
« Reply #15 on: July 14, 2010, 05:11:07 pm »


               In my experience, the search facility on this site is not very clever and it only indexes thread titles, not text. So giving threads meaningful titles is quite important. But it's not enough.

Try searching for CEP in, say, the Bioware Off-topic forum and see what happens ...
               
               

               


                     Modifié par nosecone2010, 14 juillet 2010 - 04:17 .
                     
                  


            

Legacy__six

  • Hero Member
  • *****
  • Posts: 1436
  • Karma: +0/-0
and CEP?
« Reply #16 on: July 14, 2010, 05:53:43 pm »


               Well assuming the current setup isn't going to change if I was running CEP I'd be setting up a sticky pointing users to CEP's own existing seperate forums where they'd be more in control and not have to sift through non-CEP-related posts (and which would free up this category for people who need assistance in their content creation efforts). That might even help centralize CEP's posting and save them work compared to their current setup with the two seperate forums - though obviously that solution gives them less visibility than they got from Bioware's promotion in the past.
               
               

               


                     Modifié par _six, 14 juillet 2010 - 05:38 .
                     
                  


            

Legacy_Jenna WSI

  • Hero Member
  • *****
  • Posts: 1837
  • Karma: +0/-0
and CEP?
« Reply #17 on: July 14, 2010, 06:27:15 pm »


               I vote for CEP to have it's own forum section.
               
               

               
            

Legacy_SHOVA

  • Hero Member
  • *****
  • Posts: 893
  • Karma: +0/-0
and CEP?
« Reply #18 on: July 14, 2010, 06:34:01 pm »


               I vote for CEP to have its own sticky here in this section, but not to have its own section. CEP is community made content. Yes it is a great package that lots use, but it is still at the end of the day, after everyone has gone to bed,  community made content.
               
               

               
            

Legacy_420

  • Sr. Member
  • ****
  • Posts: 370
  • Karma: +0/-0
and CEP?
« Reply #19 on: July 15, 2010, 03:54:59 am »


               

Chris Priestly wrote...

I incuded it in the title so fans from the legacy boards would know where to find the discussion on these boards.



':devil:'

Thanks Chris!

-420
               
               

               
            

Legacy_Kephisto

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
and CEP?
« Reply #20 on: July 15, 2010, 07:24:26 am »


               

Barry_1066 wrote...

CEP posts will start showing here as we try and rebuild the wealth of material from the site going down.

Hey, warm greetings to Barry_1066, 420, and the CEP Team! Likewise to fellow members of the community!

I've posted news in the legacy forums and perhaps its relevant here. nwn.bioware.com/forums/viewtopic.html

** begin post

As you read this I’m backing up the Neverwinter Nights forums. All of them. Every post of every topic in every forum. Naturally, there’s good and bad news as to how I’m doing it. 

The good : So far 8,000 topics and most of the postings in them are already saved. 

The bad : That leaves something like 90,146 topics to go.

Due to the way BioWare’s forums work I can’t just back up one forum at a time then move on to the next. The process also creates errors every several hundred topics or so, which means out of a topic’s multi page discussion one or two pages might link to another topic entirely. That might change once the process is complete.

For instance, the backup for “Sticky : Community Expansion Pack Support and FAQ” is 11 pages long with 161 replies, but pages 5 and 8 don’t link properly. The rest of that topic work fine. In theory that’s close to an 80% retention of the original topic. I hope even with such errors that’s far better than losing the entire topic.

The process creates a separate html file of each forum page, which means once the process is complete just about anyone can take all the files to any topic and upload them to a website. For the most part they’ll look just like the originals - minus the black background and a few icons or two.

We can also take the html pages of each forum page, which lists several topics at a time, so at a glance we can see the various topics and see how many views and replies they had in relation to others. Since they're html files we can also edit these at will, perhaps even combining related topics even if the original ones were spread across several forums.

With luck everything should be backed up by Monday.

** end post
               
               

               


                     Modifié par Kephisto, 15 juillet 2010 - 06:25 .
                     
                  


            

Legacy_pkpeachykeen

  • Jr. Member
  • **
  • Posts: 89
  • Karma: +0/-0
and CEP?
« Reply #21 on: July 15, 2010, 07:11:17 pm »


               

Kephisto wrote...

As you read this I’m backing up the Neverwinter Nights forums. All of them. Every post of every topic in every forum. Naturally, there’s good and bad news as to how I’m doing it. 

The good : So far 8,000 topics and most of the postings in them are already saved. 

The bad : That leaves something like 90,146 topics to go.


That explains why mine is going so slow. '<img'>
I'm doing the same thing, roughly 16,000 pages of posts backed up at the moment and 200 reparsed in a generic XML schema, for quick import into any CMS or database system.
Still working on my BioBoard to XML parser and the IRC-bot frontend for that, but the mirror is building nicely.

Any chance for some coop on this? No reason to both be building mirrors of the same thing. If you were to focus on mirroring it, I could focus on parsing it into a searchable database, for example. Dunno, but it would probably be better to work together instead of doing the same thing twice.
               
               

               
            

Legacy_chico400

  • Newbie
  • *
  • Posts: 37
  • Karma: +0/-0
and CEP?
« Reply #22 on: July 15, 2010, 07:36:09 pm »


               he my friend peachy ! nice to see you here, this forum is better because now your are in the place '<img'>
               
               

               
            

Legacy_Kephisto

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
and CEP?
« Reply #23 on: July 15, 2010, 11:06:29 pm »


               

pkpeachykeen wrote...

That explains why mine is going so slow. '<img'>
I'm doing the same thing, roughly 16,000 pages of posts backed up at the moment and 200 reparsed in a generic XML schema, for quick import into any CMS or database system.
Still working on my BioBoard to XML parser and the IRC-bot frontend for that, but the mirror is building nicely.

Any chance for some coop on this? No reason to both be building mirrors of the same thing. If you were to focus on mirroring it, I could focus on parsing it into a searchable database, for example. Dunno, but it would probably be better to work together instead of doing the same thing twice.

That’s great news, pkpeachykeen.

Your process is more advanced than mine. I’m up for cooperating on this but unfortunately the way I’m doing it is really basic. I’m just downloading the entire forum directory, so one moment its gathering topics from the Scripting forums, the next its backing up the Server Admin topics, then switches to CEP then to General Discussion, etc.. I can’t really control it.


The Neverwinter Nights forums are massive. So far I have 4.43 GB of data and I’m guessing there’s another 15 GB or so to go. It wouldn’t surprise me if it was another 50 GB. So its probably a good thing that both of us are working on it separately because your process might back up topics I haven’t and vice versa.


At the moment I’m at 33,514 downloaded pages and the counter estimates 125,733 to go. I’ve noticed many topics have navigational phrases and instructions in another language, so it seems I’m backing up topics multiple times, one in English and again in other languages. So just how big this project turns out to be is anyone’s guess.

Here’s hoping we finish before the deadline. '<img'>
               
               

               


                     Modifié par Kephisto, 15 juillet 2010 - 10:08 .
                     
                  


            

Legacy_pkpeachykeen

  • Jr. Member
  • **
  • Posts: 89
  • Karma: +0/-0
and CEP?
« Reply #24 on: July 16, 2010, 12:50:25 am »


               

Kephisto wrote...

That’s great news, pkpeachykeen.

Your process is more advanced than mine. I’m up for cooperating on this but unfortunately the way I’m doing it is really basic. I’m just downloading the entire forum directory, so one moment its gathering topics from the Scripting forums, the next its backing up the Server Admin topics, then switches to CEP then to General Discussion, etc.. I can’t really control it.


Same here, I'm using WinHTTrack to spider and mirror it. So far it's going well, if slowly. I haven't gotten to many more pages done, it keeps hanging on random ones. I have around 3 gigs of data down now, no sure how many topics that is.

At the moment I’m at 33,514 downloaded pages and the counter estimates 125,733 to go. I’ve noticed many topics have navigational phrases and instructions in another language, so it seems I’m backing up topics multiple times, one in English and again in other languages. So just how big this project turns out to be is anyone’s guess.

Here’s hoping we finish before the deadline. '<img'>

I noticed the same thing. Luckily, the classes/HTML code is the same.

I've been playing with this for most of today, and I put together a basic C# application that links an HTML and XML parser. I've been able to, with almost a 98% success rate at the moment, parse the pages I'm downloading. Been testing with a small set (137 random pages), and I've been able to parse all of the successfully at the moment.

It's not terribly smart, but it should be able to coalesce topics into a single XML file (it uses a CRC32 checksum of the topic's title to generate the filename, so pages with the same topic title should generate the same pagename).

It takes the page, pulls the title, then goes through the body and copies out each post, including author, body text and signature. All are put in a basic XML file using the following schema:


    thread title 
  
     
           me
           post text! 
           This is a biomirror page
      
  
I then whipped up an XSL stylesheet and linked them. It's currently able to parse Bioboards pages (as saved from any browser or most spiders) into something like this:

http://cx029a.dnsdoj..._1118418858.xml
http://cx029a.dnsdoj...c_874755937.xml
http://cx029a.dnsdoj..._2789098980.xml

It's not pretty or smart, and needs some work, but it is relatively fast (can go through a hundred pages in a few seconds) and spits out that. If you take a look at the source, it's really simple to read and you could parse it with PHP, Java or C# and stuff it into any kind of database in a few minutes.
               
               

               


                     Modifié par pkpeachykeen, 15 juillet 2010 - 11:54 .
                     
                  


            

Legacy_Kephisto

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
and CEP?
« Reply #25 on: July 16, 2010, 02:55:04 am »


               Sounds like you’re making progress, pkpeachykeen. Preparing the backups for others to view, edit, or otherwise upload to a server and use, is the next step. Seems you’ve already started with that by making that C# application and stylesheets. That should make a lot of people happy. Good job.

As for me, I plan to keep the data in the same file format and visual design as the original.
               
               

               


                     Modifié par Kephisto, 16 juillet 2010 - 01:56 .
                     
                  


            

Legacy_pkpeachykeen

  • Jr. Member
  • **
  • Posts: 89
  • Karma: +0/-0
and CEP?
« Reply #26 on: July 16, 2010, 05:16:29 am »


               

Kephisto wrote...

Sounds like you’re making progress, pkpeachykeen. Preparing the backups for others to view, edit, or otherwise upload to a server and use, is the next step. Seems you’ve already started with that by making that C# application and stylesheets. That should make a lot of people happy. Good job.


Yup. At the very least, I'll be putting it in my (limited) home server and serving what I can from there. I may get something better, too, if I can find any hosting.

Also, I redid a lot of that since posting, making it more readable and with some more info, much better formatting. It's acting odd in Firefox, apparently the XSL parser is very strict and my code may be off, but in IE and Chrome it should look fine. The layout is incredibly light-weight, too (1 300x80 pixel banner and 2 1x64 pixel gradients, for a total of ~40kb of resources).

'Posted


As for me, I plan to keep the data in the same file format and visual design as the original.


That's actually what I'm doing, partially. My parser is designed to work from the original format and design and make a copy, so...
It may be helpful to compare or cross-reference our mirrors when
they're done, though, to make sure we got as much as possible. Just make sure nothing important got missed. '<img'>
Good luck with your mirror, though. Mine's progressing nicely, but I'm getting awful download speeds (been getting a flat 100kb all day, hasn't cleared that). '<img'>
               
               

               


                     Modifié par pkpeachykeen, 16 juillet 2010 - 04:18 .
                     
                  


            

Legacy_Kephisto

  • Newbie
  • *
  • Posts: 9
  • Karma: +0/-0
and CEP?
« Reply #27 on: July 16, 2010, 06:48:34 am »


               Cross referencing or comparing our mirrors (when its done) sounds like a plan, pkpeachykeen. It's funny that the more files I download the more links it finds related to them and ups the files remaining. Now I'm at 42,498 files downloaded and 149,727 remaining. It's a good thing I selected just the forums directory and not the whole server.

So from here on I'll try not to mention how many files are remaining. The way things are going it might as well be 5 terabytes! With the deadline approaching I'm expecting the power to go out, the ISP to fail or a hard drive to crash. So far none of that has happened. So far.
Good luck on your end. If you see smoke rising in the distance its just my computer and ISP begging me for mercy. 'B)'
               
               

               


                     Modifié par Kephisto, 16 juillet 2010 - 05:51 .
                     
                  


            

Legacy_SuperFly_2000

  • Hero Member
  • *****
  • Posts: 1292
  • Karma: +0/-0
and CEP?
« Reply #28 on: July 16, 2010, 08:46:39 am »


               But guys...the old forums will stay where they are..just in archived form and it will be locked of course. But it will all be there and searchable and so on...at least thats what I figured...(?)
               
               

               
            

Legacy_ChaosInTwilight

  • Full Member
  • ***
  • Posts: 126
  • Karma: +0/-0
and CEP?
« Reply #29 on: July 16, 2010, 08:51:12 am »


               

Kephisto wrote...

With the deadline approaching I'm expecting the power to go out, the ISP to fail or a hard drive to crash. So far none of that has happened. So far.


Who gave you the impression that it'll vanish?

I'm pretty sure Woo, and probably Priestly would like to give 'em a bloody nose for misleading you.
               
               

               


                     Modifié par ChaosInTwilight, 16 juillet 2010 - 07:51 .