Log in | Register
Forum > General / Nonfiction > Thread

The end of an era. R.I.P. Diana the Valkyrie and its webmaster after 27 years in operation

Apr 18, 2024 - permalink

Back in 2021 I did a web crawl of the DtV Library, grabbing a copy of all the stories. I am more than happy to share them, but I'm a very infrequent contributor to this forum, and currently blocked from posting links.

Perhaps those of us who both care about restoring this library and have the tech knowledge to do so might be able to collaborate?

Apr 18, 2024 - permalink

Really???? Awwwww man. This stings a bit

Apr 18, 2024 - edited Apr 18, 2024 - permalink

Back in 2021 I did a web crawl of the DtV Library, grabbing a copy of all the stories. I am more than happy to share them, but I'm a very infrequent contributor to this forum, and currently blocked from posting links.

@FfejL If you upload it to mega.nz or something, you should be able to just post the link as text with it broken up a bit, like:

mega . nz/file/9v0kFbAQ#jW7Okc2XNQZYvVu-1-R8kRiVIvexKdSjr6LcdxqxEYA

 

Also, on a side note, in researching which of those old sites went away I was pleased to find that a female muscle portal that's been around about as long as DtV is still alive and kicking: Muscles of Dee Kay

It even still has the same "hot woman casually bulging her bicep" gif on the front page that's an old favorite of mine!

Apr 18, 2024 - permalink

Damn. Really? That was definitely my first foray down the rabbit whole that is female bodybuilding. Been hooked ever since.

Apr 19, 2024 - permalink

The mega URL got taken down by a copyright notice from awefilms. Not sure what is their problem since it's content that is still publicly available on the internet, but it is what it is.

A post on another girlpower forum informed me about the effort going on here to recreate Diana's library, and the collection on Mega.

First off, there are 741 files in the collection that are just saves of Diana's 404 redirect, as you can check for yourself (since you {simoncop73} mentioned scripting, I assume you will understand this):

grep -rl ">That page can't be found<" *

Of those, I have 70 of the real files saved from Diana's site, and another I downloaded from the Wayback Machine some years back.

In total, I have about 320 files that are not matched at the same file location in your collection (270+ from Diana, the rest from Wayback). I say location, because the newstory/ directory includes files that are not properly categorized.

The program I had used for saving Diana's stories is wget. Frankly, it seems to be more capable than the tools you guys were using:

  • It timestamps my files, using the Last-Modified time sent by the server to set the modification date of the local saved file. If your tools have that feature, it wasn't turned on.
  • It preserves the site's directory structure well.
  • Although it has some irritating quirks, that probably led me to miss a few files that I assumed I'd downloaded, overall it seems to be more thorough.

Unfortunately, while I used to save everything in Diana's library (just about, maybe not the foreign-language bookshelves), including stories that would later be deleted, my old collections have been lost to drive crashes and thieves. (If anything's recoverable, I don't expect it to be in the near future). When I rebuilt my collection on my current computer, I skipped most author bookshelves, limiting myself to authors whose tastes were (roughly) compatible with mine. And my ability to recover already-deleted files was limited to what was preserved by Wayback.

Also, while the HTML files you saved from the Wayback Machine have been modified by Wayback, I prefer to add the suffix "id_" to the timestring directory to get the site's raw HTML file. I still have Wayback-modified files, but I intend to replace them.

Out of those 741 only 30 were stories, the rest were broken photos containing the 404 page. As I mentioned, the photos were scraped through the waybackmachine. I checked the stories for the "page is missing" content, but did not bother for the images. Photos were a low priority.

I could've gotten the date from the timestamp, but it was not relevant for me and the date created was more useful for the scraping.

I would've encouraged you (and anyone) to post a version with your files merged and whatever fixes you consider, but might not be advisable now given the copyright notices. I myself fixed some of those in the meantime.

Apr 19, 2024 - permalink

The mega URL got taken down by a copyright notice from awefilms. Not sure what is their problem since it's content that is still publicly available on the internet, but it is what it is.

Now that is absolutely a damn shame considering this post from Steve & Rowena earlier in this very forum thread.

Quoted below for posterity.

Awefilms Apr 01, 2024 - permalink

I am a relative stranger to GWM but David in our customer service dept. shared your forum post about DTV with me today. I heard a couple weeks ago about the passing of DTV and was sincerely saddened. A lot of time has flown by and things come and go but Rowena and I have to say that if not for us coming across DTV back in 1998 on or 28k dial up modem running on aol we would not have had the epiphany that made us start up Awefilms. Once we saw there was a world community of female muscle fans like us, no matter how small in comparisons to other fetishes, we would have never started our company. It is impossible to measure the enormous impact that site had on us over the years. RIP Diana. Steve & Rowena Scibelli

Apr 19, 2024 - permalink

To be fair, looking again, it apparently came from "awefilms@gmail.com". Not sure if it's really them or someone impersonating them since it's a gmail account and not their official domain.

Apr 19, 2024 - permalink

What have Awefilms got to do with DTV content? They have no claim on DTV material.

Apr 19, 2024 - permalink

Thanks for the tip, chipperpip! Here's the ZIP of everything in DtV's library circa 2021:

mega .nz/file/8KwDSayZ#fSK5la4nFvie5xN1czE8r8q3DfSz4cz4TEZ3Iuorkko

Apr 20, 2024 - permalink

Thanks for the tip, chipperpip! Here's the ZIP of everything in DtV's library circa 2021:

mega .nz/file/8KwDSayZ#fSK5la4nFvie5xN1czE8r8q3DfSz4cz4TEZ3Iuorkko

You got a strike too?

Apr 20, 2024 - permalink

Yeah, probably someone trolling.

I can repost those if anyone wants, but I'm currently in the process of trying to merge FfejL & simoncop73's versions, since although FfejL's was mostly reundant, it does have a fair amount of text files with fewer encoding issues that I'm swapping in.

Apr 20, 2024 - permalink

Damn IIRC femfortefan was still posting occasional on his site. Wonder if anyone got caught with their pants down on this

Femfortefan has stopped posting since 2021. He occasionally likes something on DeviantArt but his page and website haven’t been updated for 3 years.

Apr 20, 2024 - permalink

Yup, my archive got DMCA'ed, too. I actually know the guy who runs AWEFilms, have pinged him to see what's up.

Apr 23, 2024 - permalink

Yup, my archive got DMCA'ed, too. I actually know the guy who runs AWEFilms, have pinged him to see what's up.

Is it still up?

Apr 23, 2024 - permalink

No. However, the DMCA did not come from AWEFilms, and the owner of AWEFilms is contacting Mega to let them know that. So it might get restored, we'll see.

Apr 24, 2024 - permalink

I was working on figuring out what files simoncop73 missed on Wayback, also which files are redundant. However, now that I know others like FfejL also ripped the site, the urgency is gone. Whatever temporary roadblocks may exist for the moment, I can rest easy that the great bulk of the library will be preserved.

Unfortunately, I wasn't able to download FfejL's zip before it got taken down, but that's another reason to hold off until I see what he has. However, if these zips get too far over 600MB, maybe you guys could consider splitting them? Unlike most sites, Mega downloads to the browser process, and only then is it decoded and written to disk. The first night, I didn't have the free memory to finish saving it to my drive. I ended up doing it the next day with my computer freshly started.

chipperpip: From the sound of it, you are using the simon/mkr archive as your base, and filling in with FfejL's. Is that simply because it was first, or because it is better/more complete? I've already explained that while I'm grateful that the stories are being preserved, for my preferences, that archive has limitations: error-message files, redundant files, unorganized files, Wayback-modded files, no timestamps. Of course, I don't know how FfejL's archive compares.

Apr 24, 2024 - edited 1 day ago - permalink

@BereavedPaul

Because it was the most complete, and the one I started with. Whatever issues you had with the simoncop73/mkr version, FfejL's would have mostly shared them, as shown by the redundant hash comparison I did to remove the exact duplicates between the sets, which only left about 1,300 unique files out of his original 14,500.

I'm not really concerned about things like the Wayback Machine metadata and timestamps, as long as the content itself is viewable. I actually prefer to leave stubs of things that couldn't be retrieved in, since those make it more obvious what still needs to be recovered in the future (possibly from private collections, since I think the Wayback Machine will be mostly tapped out once I'm done).

I've finished restoring the HTML/Multimedia Stories folder as much as I was able, partially from my own decade-and-and-half-old-downloads (it was one of my main concerns, since there's some pretty classic female muscle artwork in there).

I also used wbm-dl to grab my own dump of the library, which seemed to get some things the previous attempts missed.

I might give these two downloaders a try to see if there's much difference, although they both seem like more of a pain to set up.

 

Actually, mentioning the HTML section gave me a good idea:

Here's a download for my restored version of just the "html/multimedia" section of illustrated stories:

https://www.mediafire.com/file/5rcy2qgaa3bq6v...

The password is "dianathevalkyrie", index.htm is the newer index page, index0.html is the older (and arguably more organized) one from around 2016.

If anyone has any of the missing pictures/gifs, let me know and I can add them in before uploading the full version. There are a couple of gifs missing for the "Big Betty" story linked from html/jpeg/various.htm, for instance.

Apr 25, 2024 - permalink

I didn't look at that site too often but it was one of the very first of its type that I discovered when the internet was just gaining traction. My favorite story from there was called "Meg Tries It" - it's sad to see nostalgia disappear.

Well, at least we still have Zombo.

Apr 26, 2024 - permalink

@chipperpip

I'm not really concerned about things like the Wayback Machine metadata

Besides simpy preferring the "real" original file, I find it a little creepy to have files on my computer that want to call Wayback scripts and other files when I view them. Of course, I would usually look at them in offline mode anyway, and Diana also had some absolute links calling her site. But if the latter concerns anyone, it's relatively easy to write a script to change them to relative links. As I noted, it's possible to get the raw file by using the "id_" suffix.

and timestamps, as long as the content itself is viewable.

I do care, but I have to keep things in perspective. Much better to have the actual stories than a list of filenames with dates!

I actually prefer to leave stubs of things that couldn't be retrieved in, since those make it more obvious what still needs to be recovered in the future

Besides replacing those files with a list, they could be zeroed out, or substituted with much briefer content, e.g. "404".

I also used wbm-dl to grab my own dump of the library, which seemed to get some things the previous attempts missed.

It's true that there are files at Wayback that simoncop73 missed. I had saved a cdx search and was using it to compare Wayback's holdings against my collection and the Mega zip, but like I said, I've put that on hold until I see what all is in the consolidated collection.

Apr 26, 2024 - permalink

You can just copy paste the text in notepad and save it as a. .txt file and you don't need to worry about anything it's the most basic digital technology. and if it messed up the formatting you can use Libre office instead

Apr 26, 2024 - permalink

I also used wbm-dl to grab my own dump of the library,

I just looked up wbm-dl, and its documentation declares, "The files downloaded are the original ones not the Wayback Archive rewritten version," so hooray for that. It also talks about timestamps, but only seems to be talking about the WM save timestamps, not changing the time of the downloaded files.

@yotv

You can just copy paste the text in notepad and save it as a. .txt file and you don't need to worry about anything

That doesn't scale. Though I actually wrote a script a few years ago for stripping out the Wayback-added code, before I discovered the "id_" suffix.

2 days ago - permalink

@chipperpip

It's been a few days, and there hasn't been an updated zip or any news. Are you worried that the DMCA jackass is still lurking? Are you still in the process of trying out different Wayback rippers? Maybe you have not had the free time to sift through all the different sources, to figure out what's new and which version of files to use? Or maybe I misunderstood, and you're simply informing other people how they can do their own Wayback scrapes? If you could spare a few words to update us, I'd appreciate it.

1 day ago - edited 5 hours ago - permalink

I haven't had as much time as I'd like to go back to it after finishing the HTML folder, hoping to finish the whole thing next week.

Speaking of which, I reuploaded my zip of the HTML/Multimedia stories folder in my previous post above, we'll see how that host works or if the troll is still active.

EDIT: Just a note, the troll is still active. They're claiming to be this company, "Internet Privacy Limited" now: https://find-and-update.company-information.s...

But they also gave the nonexistent email address thevalkyrie@comic.com for their contact info, which kind of confirms it's just someone fooling around, and as always DMCA claims don't seem to need much verification. I'll worry about the hosting later once I finish putting the package together.

« first < prev Page 5 of 5 next > last »