Is OldWitt in danger of being taken down soon?

Started by GV, March 23, 2021, 03:46:23 PM

Previous topic - Next topic

Baron Alexandreu Davinescu

They all seem to be located in one directory actually, so it should be simple to just delete it.  That's assuming I can scrape while logged-in at all, of course.  Kind of amazed I grabbed 5 gigs of data without getting blocked already, actually.

GV, I can upload this to Drive when it's done if you want, but don't be disappointed when you can't view it.  It'll never be conveniently viewable from Drive, since it's a full HTML backup and Drive doesn't support interlinking between files like that.  So you will be able to go through and view individual threads by manually selecting them by numbered directory, but not just through normal navigation.  In fact, until I go through and batch edit it to change the links so that they don't specifically reference the full location on my own computer, it won't work for you as normal even if you download it (since your computer would follow a link to D:/PC/FolderName/BlahBlah/thread/736737.html, but wouldn't find anything there).  I think I can write that code without too much trouble, though, and I'll be sure to make a backup on the Drive before I start fiddling with it.  I think your days of worrying about losing this data are over.  You'll be left with nothing to do (although again it would be good to be able to host both Republic's Witt and this one on the same server for posterity).
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

Baron Alexandreu Davinescu

Side note: this is one of the reasons Talossa is so fun.  You can just dive into a task that you'd never have any reason to do otherwise, and learn some new skills on the way!  I've gotten so much out of Talossa this way.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

Baron Alexandreu Davinescu

Okay, all downloaded!  Zipping now and then I'll upload it.  11,710 threads in total.  It'll take probably about a half hour or so to zip, since it's 6 gigs and 64,128 files.  And then it'll take a while to upload.  As you might guess, my computer and my internet connection aren't top of the line.  But by morning, GV, you will be the proud owner of a genuine and reasonably complete backup of Wittenberg.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

GV

Quote from: Sir Alexandreu Davinescu on March 25, 2021, 07:22:22 PM
Side note: this is one of the reasons Talossa is so fun.  You can just dive into a task that you'd never have any reason to do otherwise, and learn some new skills on the way!  I've gotten so much out of Talossa this way.

You and me both, Alexander.  Back in the day, I was doing horrible websites and through Talossa, I've kept up my writing skills and gotten a view onto the world I never, ever would have gotten otherwise. 

GV

Quote from: Sir Alexandreu Davinescu on March 25, 2021, 07:37:39 PM
Okay, all downloaded!  Zipping now and then I'll upload it.  11,710 threads in total.  It'll take probably about a half hour or so to zip, since it's 6 gigs and 64,128 files.  And then it'll take a while to upload.  As you might guess, my computer and my internet connection aren't top of the line.  But by morning, GV, you will be the proud owner of a genuine and reasonably complete backup of Wittenberg.

Thank you!!  What I will do is go through and make sure all threads are represented.  This will not take too long.

As for internal file-linkage, I've always had the expectation people would have to do the searching for whatever on OldWitt by hand. 

If the batch-editing messes with the original thread numbers, don't do the batch editing.  The thread numbers are critical to future citation and research.

GV

Quote from: Sir Alexandreu Davinescu on March 25, 2021, 07:37:39 PM
Okay, all downloaded!  Zipping now and then I'll upload it.  11,710 threads in total.  It'll take probably about a half hour or so to zip, since it's 6 gigs and 64,128 files.  And then it'll take a while to upload.  As you might guess, my computer and my internet connection aren't top of the line.  But by morning, GV, you will be the proud owner of a genuine and reasonably complete backup of Wittenberg.

Tech seems to have caught up with our advanced needs in Talossa.

Baron Alexandreu Davinescu

Quote from: GV on March 25, 2021, 08:16:35 PM
Quote from: Sir Alexandreu Davinescu on March 25, 2021, 07:37:39 PM
Okay, all downloaded!  Zipping now and then I'll upload it.  11,710 threads in total.  It'll take probably about a half hour or so to zip, since it's 6 gigs and 64,128 files.  And then it'll take a while to upload.  As you might guess, my computer and my internet connection aren't top of the line.  But by morning, GV, you will be the proud owner of a genuine and reasonably complete backup of Wittenberg.

Thank you!!  What I will do is go through and make sure all threads are represented.  This will not take too long.

As for internal file-linkage, I've always had the expectation people would have to do the searching for whatever on OldWitt by hand. 

If the batch-editing messes with the original thread numbers, don't do the batch editing.  The thread numbers are critical to future citation and research.
11,710 threads is a lot to check, since you have to open each one manually.  If you are able to open and check each thread at a rate of one every five seconds without ever stopping or slowing down, that's more than sixteen straight hours of checking!  Unless there's some urgent need, I'd suggest it might save you a lot of time if you just gave me a little bit to sort it out so that it's navigable.  I don't know if I'll be able to get fancy stuff like searching working, but I bet I can get it a lot better-sorted for you than the current state.

In an hour it'll all be uploaded, compressed to 1 gig.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

Baron Alexandreu Davinescu

#22
Restarting the upload because I figured out the batch edit thing pretty quickly.  I think this should work on your computer now.  Once you unzip the 7z file, then you want to open up the Witt2 folder, then the talossa.proboards.com folder.  Inside of that folder is a file named index.html, and it should open normally with any web browser.  You can navigate through any page that the crawler could access, and the links should be purely relative to their location and operate normally.  In order to advance through multiple pages in a board with a large size, like the main one, you need to use the "next" link (or you can manually edit the URL to the intended number).  If you click the ellipsis, the dialogue box to pick a page will come up, but it won't work.  No external links were copied, so any links to other sites outside of talossa.proboards.com will be dead (but there will be an error page to point you to a live version, if one exists).  No images were downloaded, but that'll be something I can try in the future (why not, after all?)

To be clear, I've only done spot checks here and there, but I didn't find any missing pages.  If something's not working or missing, let me know and I'll see if I can figure it out.

EDIT: Uploaded and sorted.  Enjoy.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

GV

Stunning.  I will take a look at everything this coming week.

Baron Alexandreu Davinescu

Working on grabbing images, but running into some problems with that amount of data.  Working on it, but expect nothing in the near term.  Seems pretty low priority anyway.

Everything work okay for you?  I know you'd been wanting this for years now, so I hope that this whole thing wasn't anticlimactic.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

GV

Quote from: Sir Alexandreu Davinescu on April 05, 2021, 09:53:23 PM
Working on grabbing images, but running into some problems with that amount of data.  Working on it, but expect nothing in the near term.  Seems pretty low priority anyway.

Everything work okay for you?  I know you'd been wanting this for years now, so I hope that this whole thing wasn't anticlimactic.

Argh.  This got buried with other stuff I'm working on.  Alexander, I'll let you know on this by the end of this month.  Thanks already for an amazing amount of work on this.

What I will need to do is check to see if every thread number (save Chat) is covered.  Once I'm sure we can open everything without prioprietary software, I'll call this project done.

Baron Alexandreu Davinescu

No rush at all; I was just curious.  Take your time and whenever you get around to it, let me know if it works or if something's broken.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

GV

Quote from: Sir Alexandreu Davinescu on April 05, 2021, 11:10:38 PM
No rush at all; I was just curious.  Take your time and whenever you get around to it, let me know if it works or if something's broken.

Sounds good.  Thanks again!

Baron Alexandreu Davinescu

I haven't been able to get the images yet, but I'm still working on this.  Just FYI.
Alexandreu Davinescu, Baron Davinescu del Vilatx Freiric del Vilatx Freiric es Guaír del Sabor Talossan

                   

GV

Quote from: Sir Alexandreu Davinescu on April 20, 2021, 07:51:56 AM
I haven't been able to get the images yet, but I'm still working on this.  Just FYI.

TY for you continuing efforts!