Author Topic: The Vault Preservation Project  (Read 6068 times)

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #75 on: October 03, 2012, 06:07:52 pm »


               <smiling benignly...>

@ Lovelamb: WooHOO! *dances like a goat in spring* My Favorite new evil diva! Thank you!
I also seriously want to see my stories in action and try to balance all those shining white do-gooders with a proper pinch of... <smut>
I was going to say soot, bird. I'm talking here. <flirting>
In front of Pen? Heavens forfend! <her knives *are* still sharp, i'd think>
which brings me to...

@ Pen: Dearheart! How *are* you?! What you been up to? *makes sure the exit is clear*
Where's my feathered cloak? <what he said, giggles>

<...at the ladies>
               
               

               


                     Modifié par Rolo Kipp, 03 octobre 2012 - 05:09 .
                     
                  


            

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #76 on: October 03, 2012, 06:43:47 pm »


               <thinking long and...>

At this point I'd like to state an opinion. <pontificating again, old man?>
Er, no. I just want it very clear this is only my opinion :-P <heh>
There is a fabulous wealth of modules on the Vault and I love the Vault and I hope people continue to make their modules available on the Vault. <but?>
But, I really think the Nexus and (to somewhat lesser degree) the ModDB do a better job (and getting better) of collecting modules. This opinion is colored, or rather *not* colored, from experience. I haven't tried to upload anything to either place, so I don't know that side of things.

What I'm getting around to (I *like* beating around bushes!) is that there's no place like the Vault for classic modules, but the newer options seem to me to be far more viable for new releases.

If we do the VPP right, my opinion could very well change, but I thought I'd mention it here, anyway :-P  Bottom line is that if I was to release a mod... <you mean finish it first, right?>
...I'd post it to the Vault first, but I'd also post it on the Nexus.

Then I'd look at the D/L figures and see where people are grabbing it. =) Analyse, modify, test. Repeat.

<...hard>
               
               

               
            

Legacy_acomputerdood

  • Sr. Member
  • ****
  • Posts: 378
  • Karma: +0/-0
The Vault Preservation Project
« Reply #77 on: October 03, 2012, 07:58:50 pm »


               

Rolo Kipp wrote...
@ meaglyn: But that is what I want! :-P The key value metadata, that is... preferably in CSV or Excel format. 

Would you be willing to share with ACD and incorporate that? He's sent me one updated version, why not another =)


it's probably not worth trying to combine the two systems.  i doubt meaglyn wrote hers in perl, so then either i'd be stuck trying to port her code or she would have to finish up her tool to grab everything.

i'd wager it won't be much harder for her to iterate through each project directory that's already been archived and build her metadata off of that.  probably slightly easier if i've already pulled down and broken out all the pieces needed.

i'll PM her the email i sent to you explaining everything if she wants to pick up the post-processing.

Getting the metadata into an easily imported format would make things vastly easier. I'd then use that CSV file to generate the projects. Then all I need to do is link up the files/screenies and comments.


is there a standard format the metadata needs to be in to import into your database?  or will your database read in the metadata based on the format the projects are in?  i guess whomever develops the system first gets to determine the format.  '<img'>

Actually, comments could be collected in a keyed file, also. Drupal gives each comment its own node and links the nodes to the project. So I'd just need a field for each comment with the unique identifier for the project... I think... :-P


yeah, completely lost me there.  hopefully somebody else knows what rolo's rambling about and can make it happen.
               
               

               
            

Legacy_meaglyn

  • Hero Member
  • *****
  • Posts: 1451
  • Karma: +0/-0
The Vault Preservation Project
« Reply #78 on: October 03, 2012, 08:31:23 pm »


               

it's probably not worth trying to combine the two systems.  i doubt meaglyn wrote hers in perl, so then either i'd be stuck trying to port her code or she would have to finish up her tool to grab everything.


It's actually "his" '<img'>  The picture just fit the story I'm working on. Needs a half-elf female and I was using Raptre Thanlis for all my testing so that seemed like the right picture at the time given the few default choices. Just haven't gotten around to finding and loading a custom picture...

Anyway, it's partly perl. But I ended up using jsoup and java to parse the HTML because it was good at handling
incomplete html. Many of the other HTML parsers I found wanted the whole page but there's so much noise there
I just used perl to strip down to only the main part. 

It should not be too difficult to rework what I have to post-process the downloaded stuff produced by your script.
I'll watch for your pm. It'll be cleaner and easier to maintain to do it post anyway.

Rolo, we should talk about what format you want the final output. For now I'll continue working to an intermediate
format which we can either modify to suit or jsut translate to a "final" format when we know what that is.

Cheers,

Meaglyn
               
               

               


                     Modifié par meaglyn, 03 octobre 2012 - 07:31 .
                     
                  


            

Legacy_acomputerdood

  • Sr. Member
  • ****
  • Posts: 378
  • Karma: +0/-0
The Vault Preservation Project
« Reply #79 on: October 03, 2012, 08:38:38 pm »


               

meaglyn wrote...

It's actually "his" '<img'>


nope, according to Internet Rule #35771, somebody with a female avatar shall be know and treated as a girl.

sorry '<img'>
               
               

               
            

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #80 on: October 03, 2012, 09:05:38 pm »


               <looking optimistically...>

@ Meaglyn: Yes, we should :-) I'll have a lot more time for this on Sunday, though. I'm sneaking NwN bit in between paying work and can't dig into it right now.

I'll most likely be using the Migrate module for drupal and will look at that for the format needed:

The migrate module provides a flexible framework for migrating content into Drupal from other sources (e.g., when converting a web site from another CMS to Drupal). Out-of-the-box, support for creating core Drupal objects such as nodes, users, files, terms, and comments are included - it can easily be extended for migrating other kinds of content.

Primarily, I'm probably looking at a Comma Separated Value file with the first row being the column (field) names. The fields needed vary with project type

@ ACD: I just want the comment metadata to include a field with the name of whatever project it belongs to ;-P But I'm picking nits :-P

<...harried>
               
               

               
            

Legacy_Bannor Bloodfist

  • Hero Member
  • *****
  • Posts: 1578
  • Karma: +0/-0
The Vault Preservation Project
« Reply #81 on: October 03, 2012, 10:18:39 pm »


               Note: I hate this site.  Dang thing has 3 times in a row, completely lost a post I have made on this.

Rolo Kipp wrote...

<thinking long and...>

At this point I'd like to state an opinion. <pontificating again, old man?>
Er, no. I just want it very clear this is only my opinion :-P <heh>
There is a fabulous wealth of modules on the Vault and I love the Vault and I hope people continue to make their modules available on the Vault. <but?>
But, I really think the Nexus and (to somewhat lesser degree) the ModDB do a better job (and getting better) of collecting modules. This opinion is colored, or rather *not* colored, from experience. I haven't tried to upload anything to either place, so I don't know that side of things.


Great idea, but absolutely not workable.

Nexus absolutely forbids posting/re-posting of some other author's works.  Check their EULA for that, but I think it is fairly prominent in other locations as well.

As a team working to provide a "safe haven/backup" of the vault, that is one job.  Re-publishing to another site, well, that opens that nasty can of worms regarding copyrights and we all know that the worms involved have a tendency to exponentially mutliply once various opinions get involved.  There are copyrights, fair usage rights, etc, none of which are lost/broken by posting something onto a site, regardless of the EULA of that particular site.  Yet to be tested in court, but easily found in written copyright laws.

Sure wish we could just get permission from all the authors with a single mass email attempt or something, but we all know that is not enough of an attempt to contact folks.  And we all know that email addresses on 90% of the existing vault content is no longer valid, yet many of those various authors are still around in some fashion, sometimes with new nicks, sometimes with old nicks, sometimes just watching, and sometimes still contributing in some fashion.

Irregardless of copyright issues, we would have to due our due dilligence to attempt to contact author's of the various projects, and then wait 5 years (is it 5 or 10?)  before any given project could be considered abandonware and thus free from copyright?  And according to wikipedia, using a partial quote of the first page of data regarding the abandonware issue: "In most cases, software classed as abandonware is not in the public domain, as it has never had its original copyright revoked and some company or individual still owns exclusive rights. Therefore, sharing of such software is usually considered copyright infringement, though in practice copyright holders rarely enforce their abandonware copyrights."

Surely, this is not something that this project was intended to do.

<snip>
If we do the VPP right, my opinion could very well change, but I thought I'd mention it here, anyway :-P  Bottom line is that if I was to release a mod... <you mean finish it first, right?>
...I'd post it to the Vault first, but I'd also post it on the Nexus.
Then I'd look at the D/L figures and see where people are grabbing it. =) Analyse, modify, test. Repeat.
<...hard>


A great idea to cross-post any NEW works of your own to both locations.  
               
               

               


                     Modifié par Bannor Bloodfist, 03 octobre 2012 - 09:19 .
                     
                  


            

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #82 on: October 03, 2012, 10:41:25 pm »


               <grumping...>

I seem to be on a roll for not saying what I mean. :-P

Rolo thought he said...
there's no place like the Vault for classic modules, but the newer options seem to me to be far more viable for new releases.


I meant the last bit - that people should multiple-post new stuff and that the old stuff should be let sit on the Vault (and the VPP).

<...just 'cause he can>
               
               

               
            

Legacy_Bannor Bloodfist

  • Hero Member
  • *****
  • Posts: 1578
  • Karma: +0/-0
The Vault Preservation Project
« Reply #83 on: October 03, 2012, 11:07:02 pm »


               Yep, and that aspect I directly agreed with above... last line of text in the post.
               
               

               
            

Legacy_Tarot Redhand

  • Hero Member
  • *****
  • Posts: 4165
  • Karma: +0/-0
The Vault Preservation Project
« Reply #84 on: October 04, 2012, 12:20:57 am »


               In my wanderings around the net I have stumbled on something that might be of use to this project. It is called Spider.NET.1.4. To quote the help file :-

Quote

Spider is a .NET application which crawls websites and saves content and links to a Microsoft SQL Server Database.

End Quote

I would love to say it is a wonderful program but <sheepish grin> I have run into a couple of little problems. The first of which is I can't remember which precise website I found it on (Codeplex, SourceForge or Planet Source Code). The other is it requires a database called spider to be created before it can be used and as I haven't used sql for <mumbles under breath> years, I don't know how to do that.

In the hopes that somone else can get it working and then evaluate its usefulness or otherwise I have uploaded it to my dosbox/public folder. The complete package is in a plain ol' zip file to be found here.

TR
               
               

               


                     Modifié par Tarot Redhand, 03 octobre 2012 - 11:26 .
                     
                  


            

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #85 on: October 04, 2012, 12:25:07 am »


               <pulls out his...>

Hmmm... on codeplex?

Edit: At first scan, it does generically what ACD has put together to do specifically. That is, I don't think there'd be any advantage to spider.net over my experience with wget. Both simply grab too much stuff. What ACD & Meaglyn are doing is processing the crawl and only pulling in the project stuff.

But good looking out, mate! :-)

<...spider net and tries to catch one>
               
               

               


                     Modifié par Rolo Kipp, 03 octobre 2012 - 11:45 .
                     
                  


            

Legacy_Tarot Redhand

  • Hero Member
  • *****
  • Posts: 4165
  • Karma: +0/-0
The Vault Preservation Project
« Reply #86 on: October 04, 2012, 12:28:29 am »


               That was my main thought but I couldn't be quite sure. I do know there is a readme in there with an email if the author needs to be contacted.

Yes Just followed your link Rolo and that is the one.

TR
               
               

               


                     Modifié par Tarot Redhand, 03 octobre 2012 - 11:32 .
                     
                  


            

Legacy_kamal_

  • Sr. Member
  • ****
  • Posts: 347
  • Karma: +0/-0
The Vault Preservation Project
« Reply #87 on: October 04, 2012, 12:59:13 am »


               

Lovelamb wrote...

Sir, did you say the Vault is now read-only? I've devoted over a year of my recent life to working on an evil module that I doubt the Nexus, with their strict rules, would accept... (Should I kill myself for being late? '<img'>)

I would like to help with backuping the Vault, though I might need an explanation as to how to upload the content to your site. I can save all the web pages and related files for now. You can sign me up for the first 10 pages (or 250 modules) on the module list. I'm not sure how the modules are ordered, hope everyone sees the same list. I have the disk space, but my upload speed isn't very high.

off topic, but the Nexus doesn't seem to have a problem with evil content. My evil campaign for nwn2 is up there. You can sell children into slavery and kill completely innocent people, among other not nice things.
               
               

               
            

Legacy_meaglyn

  • Hero Member
  • *****
  • Posts: 1451
  • Karma: +0/-0
The Vault Preservation Project
« Reply #88 on: October 04, 2012, 06:23:41 pm »


               

Rolo Kipp wrote...

<looking optimistically...>

@ Meaglyn: Yes, we should :-) I'll have a lot more time for this on Sunday, though. I'm sneaking NwN bit in between paying work and can't dig into it right now.

I'll most likely be using the Migrate module for drupal and will look at that for the format needed:

The migrate module provides a flexible framework for migrating content into Drupal from other sources (e.g., when converting a web site from another CMS to Drupal). Out-of-the-box, support for creating core Drupal objects such as nodes, users, files, terms, and comments are included - it can easily be extended for migrating other kinds of content.


I'll take a look at that.

Primarily, I'm probably looking at a Comma Separated Value file with the first row being the column (field) names. The fields needed vary with project type


CSV will be a little tricky what with the free text description and comment fields.

A kay/value dictionary allows the fields to be sort of self describing and we can use the same code for all the project types in theory.  I'll see what the migrator can use...

Another good thing about doing as a post-processor is we've got all the original data and can simply
re-process things as we refine what it needs to look like.


Cheers,
Meaglyn
               
               

               
            

Legacy_Rolo Kipp

  • Hero Member
  • *****
  • Posts: 4349
  • Karma: +0/-0
The Vault Preservation Project
« Reply #89 on: October 04, 2012, 06:43:28 pm »


               <reading a bit...>

Well, I'm not married to the CSV (that was a leftover from a migration I did about 18 months ago, btw).

Here's another Migrate quote :-)

Features:

  • An object-oriented architecture, allowing default behavior to be extended and/or overridden.
  • Built-in support for PDO (DBTNG), XML, CSV, JSON, and native MSSQL and Oracle API sources; extendable for other sources.
  • Built-in support for node, user, taxonomy term, comment, and file destinations; extendable for other destinations entities and fields.
  • Map tables maintain relationships between source data and the resulting Drupal objects.
  • Import operations can be rolled back, allowing simple trial-and-error development of migration processes.
  • Tools for managing dependencies between migrated content.
  • Automated management of memory usage and framework for performance logging.
  • A web UI supporting collaboration between stakeholders and implementors.

Perhaps xml or json would be less tricky?

<...on the side>
               
               

               


                     Modifié par Rolo Kipp, 04 octobre 2012 - 05:45 .