It appears this Community will be on the Internet Archive's "Wayback Machine" if it is not deleted before 2 April.

It appears this Community will be on the Internet Archive's "Wayback Machine" if it is not deleted before 2 April. All we have to do is nothing. Don't know about you, but I'm pretty good at that!

Originally shared by Edward Morbius

The archiving of public Google+ content to the Internet Archive by the Archive Team has has begun.

What does this mean, how does this affect you, and what can you do?

TL;DR: Most public Google+ content should live on at the Internet
Archive thanks to a fanatical bunch of volunteers, and you can help.


The Internet Archive

The Internet Archive is a digital library with the stated mission of "universal access to all knowledge". Though often known for its Web archives, the "https://web.archive.org/[Wayback Machine]", it also preserves texts, audio, video, software, and other formats. Think of the Wayback Machine as the Web’s attic, or basement, or storage locker.
https://www.archive.org/



The Archive Team

Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it’s lost forever.

https://www.archiveteam.org/

The Archive Team works closely with, but is not affiliated with the Internet Archive. It runs projects to save bits of Web history that appear likely to be lost. Past projects include Mozilla Addons, Tindeck, and UOL Forums (the "Brazillian AOL"), whilst present projects include Flickr and Tumblr, as well as several manual projects.

Archive Team have previously saved other social media site content, and have several on their watchlists, including larger sites such as YouTube, Facebook, CodeAcademy, LiveJournal, Reddit, Twitter, WikiLeaks, and Wikipedia. This group thinks big.



The Google+ Archive Project

Archive Team became aware that Google+ was shutting down in December of 2018. The G+MM / Plexodus effort became aware of Archive Team in January of 2019. We’ve been sharing information and planning over the past few months, including the copious information we’ve collected on Google+ size, activity, profiles, communities, and characteristics of the site and platform.

The actual archive code lives on GitHub:
https://github.com/ArchiveTeam/googleplus-grab

The more interesting project tracker, showing updates in realtime, is:
http://tracker.archiveteam.org/googleplus/

Note that this shows only 1/50th of the total project. "Items" are sitemap subsets of 100 profiles, and 50 batches of 1,000 sitemaps at a time, each with about 680 or so items, will be processed over the course of this archival. 50 * 1,000 * 680 * 100 = 3.4 billion, or the total number of Google+ profiles (as of March, 2017). There will be 34 million items (100 profiles each), total, in the overall process.



How does this affect you as a Google+ user?

If you do absolutely nothing, there is a very good chance that much of your public Google+ content will be preserved by Archive Team, on the Internet Archive, and will be publicly visible there.

▪ *If you do want this to happen … you’re in luck. Don’t delete your Google+ content or profile and it should be saved.

If you don’t want this to happen, you can request removal of specific items through the Internet Archive’s procedure:
https://help.archive.org/hc/en-us/articles/360018138951-How-do-I-remove-an-item-page-from-the-site-

▪ If you want to help,* keep reading.


Limitations

There are a few limitations to this project:

▪ Only public content that is presently available on Google+ is being included. Private posts, and any previously deleted content will not be saved. (Previously saved content that’s since been deleted will be available.)

▪ Full post comments may not be archived. Google+ allows up to 500 comments per post, but only presents a subset of these as static HTML. It’s not clear that long discussion threads will be preserved. Historically they have not been.

▪ Image and video content may not be preserved at full resolution. This will apply mostly to high-def image and video content, though photographers may want to be aware.

▪ Content archival is subject to the rate at which the project can proceed and any limitations imposed outside its control. From past experience, the Archive Team can suck in amazing amounts of data quickly, and general success is likely.


What can you do to help?

Contributions can be made in the way of funds or volunteering services, particularly as an archive Warrior, running an archive instance yourself.

The Internet Archive is fueled by donations, which provide servers, disk, and bandwidth to receive and share content. It costs the Archive about $2,000 to host 1 terabyte of data:
https://archive.org/donate


Donate to the Archive Team directly

For the most part, contributing to the Internet Archive is strongly encouraged, as they do the heavy lifting, but Archive Team has its own smaller contributions project:
https://opencollective.com/archiveteam


If you have the technical resources and skills, run a Warrior instance

People with access to large-scale storage and high-bandwidth network
connections are especially appreciated.

What you’ll need:

▪ A desktop, server computer, or "cloud" hosted system(s),

▪ A Virtual Machine server, including VirtualBox, VMWare, Docker and Hyper-V.

▪ At least 60 GB of free disk space.

▪ Sufficient memory for the virtual machine, probably 1-2 GB.

▪ A sufficiently high-bandwidth connection. 100 Mb/s+ or better is recommended.

▪ Skills and understanding to run all of this.


If you don’t understand part or any of this or the referenced documentation, and cannot get up and running by yourself, we’ll manage without you. Self-supporting volunteers are appreciated.

Archive Warriors volunteer their time, resources, and services, there is no compensation. If you wish to solicit donations on your own, you may do so.

There are a set of requirements for your Internet connection itself, and additional information, instructions, troubleshooting, and guidance at the Archive Team Warrior Wiki page:
https://www.archiveteam.org/index.php?title=ArchiveTeam_Warrior



You can request or add specific URLs to the Internet Archive directly

It’s possible to save items directly to the Internet Archive by other mechanisms. This is independent of the Archive Team’s GooglePlus project and does not affect either the content they collect or the fetchlist compilation.

Methods may be appropriate for single items or large-scale (100s, 1,000s, or 1,000,000s) of requests. So long as requests are legitimate, they are actively encouraged by the Archive.


Using the Wayback Machine

Single pages may be saved by navigating to https://web.archive.org/ and entering the URL into the "Save Page Now" form (should be on the right side of the page).

Using DuckDuckGo

The !wayback bang search will look up an archived page, you can save it from there if it's not already archived

https://duckduckgo.com/bang

There is also a `!save` bang but this is broken and does not work.

Using Internet Archive browser extensions

There are extensions for all major browsers as well as iOS and Android which allow interactions with the Internet Archive, including a "save page now" feature. See: "If you See Something, Save Something"
https://blog.archive.org/2017/01/25/see-something-save-something/

Using the "save" URL format

If you want to save a large number of URLs, or save them from a command line, you can use a specific URL format to do so:

https://web.archive.org/save/

Where `` is the page you want to save. For example, to save the Google+ Mass Migration Community homepage, at
https://plus.google.com/communities/112164273001338979772, you’d use:

https://web.archive.org/save/https://plus.google.com/communities/112164273001338979772

This can be scripted for both individual and large-scale batch archival. See the linked article for a simple script and use with a list of URLs to archive.



How can I specifically access archived content later?

If you know the URL of the item, you can request it directly from the Internet Archive. The browser extensions above can simplify this for you. There are also specific tools for querying and interacting with the Wayback Machine repository.
Again, using DuckDuckGo (especially when setting this as your default browser), you can access pages directly using the `!wayback` bang search, entered before the URL in your browser’s Navigation bar.

There are a set of Wayback Machine APIs which can test for archives of a known URL.
https://archive.org/help/wayback_api.php

From a given Wayback Machine page, you can generally search for all pages under some specific URL. This is of mixed use for Google+
content for reasons expanded at the article.

There are tools to assist with rebuilding websites based on Wayback Machine archives. These may be useful for G+ content:
https://help.archive.org/hc/en-us/articles/360001834411-Can-I-rebuild-my-website-using-the-Wayback-Machine-


Thanks to ArchiveTeam for taking this on, and Fusl in particular for
answering my pesky questions about process and processing.

(This post has been adapted and condensed from the linked Reddit article. I'll be updating that post with new information or corrections.)

https://old.reddit.com/r/plexodus/comments/az285j/saving_of_public_google_content_at_the_internet/
https://old.reddit.com/r/plexodus/comments/az285j/saving_of_public_google_content_at_the_internet/

Comments

Post a Comment