[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]  [Register] 

Status: Not Logged In; Sign In

Whitney Webb: Foreign Intelligence Affiliated CTI League Poses Major National Security Risk

Paul Joseph Watson: What Fresh Hell Is This?

Watch: 50 Kids Loot 7-Eleven In Beverly Hills For Candy & Snacks

"No Americans": Insider Of Alleged Trafficking Network Reveals How Migrants Ended Up At Charleroi, PA Factory

Ford scraps its SUV electric vehicle; the US consumer decides what should be produced, not the Government

The Doctor is In the House [Two and a half hours early?]

Trump Walks Into Gun Store & The Owner Says This... His Reaction Gets Everyone Talking!

Here’s How Explosive—and Short-Lived—Silver Spikes Have Been

This Popeyes Fired All the Blacks And Hired ALL Latinos

‘He’s setting us up’: Jewish leaders express alarm at Trump’s blaming Jews if he loses

Asia Not Nearly Gay Enough Yet, CNN Laments

Undecided Black Voters In Georgia Deliver Brutal Responses on Harris (VIDEO)

Biden-Harris Admin Sued For Records On Trans Surgeries On Minors

Rasmussen Poll Numbers: Kamala's 'Bounce' Didn't Faze Trump

Trump BREAKS Internet With Hysterical Ad TORCHING Kamala | 'She is For They/Them!'

45 Funny Cybertruck Memes So Good, Even Elon Might Crack A Smile

Possible Trump Rally Attack - Serious Injuries Reported

BULLETIN: ISRAEL IS ENTERING **** UKRAINE **** WAR ! Missile Defenses in Kiev !

ATF TO USE 2ND TRUMP ATTACK TO JUSTIFY NEW GUN CONTROL...

An EMP Attack on the U.S. Power Grids and Critical National Infrastructure

New York Residents Beg Trump to Come Back, Solve Out-of-Control Illegal Immigration

Chicago Teachers Confess They Were told to Give Illegals Passing Grades

Am I Racist? Reviewed by a BLACK MAN

Ukraine and Israel Following the Same Playbook, But Uncle Sam Doesn't Want to Play

"The Diddy indictment is PROTECTING the highest people in power" Ian Carroll

The White House just held its first cabinet meeting in almost a year. Guess who was running it.

The Democrats' War On America, Part One: What "Saving Our Democracy" Really Means

New York's MTA Proposes $65.4 Billion In Upgrades With Cash It Doesn't Have

More than 100 killed or missing as Sinaloa Cartel war rages in Mexico

New York state reports 1st human case of EEE in nearly a decade


Science/Tech
See other Science/Tech Articles

Title: Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??
Source: none
URL Source: http://none
Published: Nov 20, 2009
Author: X-15
Post Date: 2009-11-20 03:04:08 by X-15
Keywords: None
Views: 2240
Comments: 170

"Web spiders/crawlers: programs that search websites looking for specific words or patterns to compile into a database."

A popular gun website I visit had 20 running, if 4um has less then I assume it has a lower profile in the eyes of FedGov.

Post Comment   Private Reply   Ignore Thread  


TopPage UpFull ThreadPage DownBottom/Latest

Begin Trace Mode for Comment # 119.

#1. To: X-15, Pinguinite, christine (#0)

Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??

You'd need access to christine's server logs to get a good idea. However, there are many kinds of spiders, some quite difficult to detect.

Here is a quickie spider that I wrote. It runs on the Mac, OS X 10.5 Leopard. However, it is a standard Bash script and should work easily on Linux or Unix systems, probably in a Cygwin setup on Windows too.

The script uses Lynx, a venerable text-only browser, to fetch my Comments page to a file called htmlsource1. It then uses the stream editor Sed to parse this captured HTML file by scanning the right column for news stories, capturing the thread names and URLs at 4um to a file called htmlsource2.

It then uses Lynx to capture each thread to a separate file by thread number in a subdirectory called '4um'.

You could build a database or use text search tools like grep to mine the stored threads for info.

No doubt, various federal agencies and people like ADL or SPLC use scripts like this to capture many forums and use grep and other search tools to scan each thread captured for relevant keywords to flag them for review by human beings.

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

#!/bin/sh

# # lynx directory LYNX=/usr/bin/lynx

# IP host HOST='freedom4um.com' DIRNAME='4um'

if [ ! -d "$DIRNAME" ]; then
mkdir "$DIRNAME"
fi

$LYNX -source "http://$HOST/cgi-bin/latestcomments.cgi?SNSearch=1&EM=on&Fm=&To=TooConservative" > lynxhtmlsource1

sed -n "/

/,/


/ s//http://$HOST1/p" lynxhtmlsource1 > lynxhtmlsource2

FETCHCOUNT=0
for URL in $(cat lynxhtmlsource2); do
ARTICLE="${URL:49:6}"
# echo "#$ARTICLE URL: $URL"
if [ ! -f "$DIRNAME/$ARTICLE" ]; then
$LYNX -source "$URL" > "4um/$ARTICLE"
echo "fetched $ARTICLE..."
let FETCHCOUNT+=1
fi
done

rm -f lynxhtmlsource1 lynxhtmlsource2

if [ $FETCHCOUNT -ne 0 ]; then
echo "$HOST: $FETCHCOUNT fetched"
else
echo "no new articles on $HOST"
fi

exit

[Apologies. Neil's 4um code won't let me post this spider code accurately. It's screwing my Sed command and turns the text red.]

TooConservative  posted on  2009-11-20   6:15:42 ET  Reply   Untrace   Trace   Private Reply  


#3. To: TooConservative (#1)

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

thank you, TC.

christine  posted on  2009-11-20   10:52:30 ET  Reply   Untrace   Trace   Private Reply  


#4. To: christine (#3)

BTW, by changing only a few lines in the above code, I could slowly download your entire database and reconstruct each poster's remarks. Essentially, your mySQL database could be replicated by downloading all the threads and parsing the user comments into a new mySQL database. I'm sure Neil could point this out as well, probably better than I can. This is why watching your server logs for an IP that downloads every thread or an IP that downloads every thread in the database sequentially is good to do.

Anyway, this seemed a good thread to point this stuff out.

TooConservative  posted on  2009-11-20   11:04:44 ET  Reply   Untrace   Trace   Private Reply  


#5. To: TooConservative (#4)

I could slowly download your entire database and reconstruct each poster's remarks.

can that be done on any forum?

christine  posted on  2009-11-20   11:23:55 ET  Reply   Untrace   Trace   Private Reply  


#6. To: christine (#5)

can that be done on any forum?

Yes.

You only have to parse for the HTML tags and CSS classes. Not at all difficult.

I once wrote a Firefox extension that allowed me to entirely replace the look and feel of TOS, add backgrounds, insert YouTubes to replace the YouTube links, implement my own browser-based bozo filter, etc.

It's quite easy. You have to have good server log analytic software to find out if your site is being mined. Now, 4um isn't really high traffic so you can probably get a good idea by looking at IP addresses. You should watch for IP addresses that only read threads (and that read every thread) and never post. Those lurkers can just as easily be spiders for ADL, FBI, SPLC, NetNanny, Google, etc. In fact, you should assume that you are being spidered this way until you can prove otherwise.

And the spidering of your threads could just as easily come from multiple IP addresses. You can deter some of this by requiring the use of cookies but a competent programmer can fake that too.

You should assume that every word you put online will be recorded. The Feds are building huge new datacenters to store the content of the entire internet and all cell phones and landlines. They may make no use of that unless and until they detect (or need) to chase a domestic threat for terrorism or hate crimes. It is at that point that you will receive a subpoena for your server logs to obtain IP addresses (if they don't already have them by being on the backbone and sniffing everything) and an included order that you will not tell anyone that they are assembling evidence. After that, the ISPs for those requested IP addresses will get national security directives and subpoenas to provide their logs to identify the IP address and they would also be silenced.

This is is the new Soviet Amerika. Welcome to the gulag, comrade.

TooConservative  posted on  2009-11-20   12:16:39 ET  Reply   Untrace   Trace   Private Reply  


#22. To: TooConservative, christine, Pinguinite, all (#6) (Edited)

whois.domaintools.com/liberty-post.net

What do you think of this?

wudidiz  posted on  2009-11-20   19:59:03 ET  Reply   Untrace   Trace   Private Reply  


#23. To: All (#22) (Edited)

libertypost.net/

libertypost.org/

wudidiz  posted on  2009-11-20   20:00:53 ET  Reply   Untrace   Trace   Private Reply  


#42. To: All (#23)

libertypost.net/

libertypost.org/

Noone wants to touch this stuff.

Disconcerting.

wudidiz  posted on  2009-11-21   20:26:01 ET  Reply   Untrace   Trace   Private Reply  


#43. To: wudidiz (#42)

What is your problem anyway?

She owns both domain names and points both of them to her site. If you use libertypost.net as your URL, it mostly works but you won't have the same browser cookies (it seems) so it may look or act a little differently on a few screens. You might have to sign in again or whatnot.

Libertypost.net points to the server for libertypost.org. Libertypost.com and libertypost.us are owned by the USPS and that isn't sinister either.

Pointing multiple domains at a single server or server farm is a very common practice. This is how places like Amazon and Newegg work too.

TooConservative  posted on  2009-11-22   0:20:32 ET  Reply   Untrace   Trace   Private Reply  


#75. To: TooConservative (#43)

Libertypost.com and libertypost.us are owned by the USPS and that isn't sinister either.

Huh? Who would have known that LP is owned by the Post Office. So there is no real Goldi, it's some person sitting in a post office somewhere running the show. Well that explains why "Goldi" goes "postal" at times I guess.

What sort of detective work did you do to arrive at that conclusion?

FormerLurker  posted on  2009-11-23   11:40:40 ET  Reply   Untrace   Trace   Private Reply  


#79. To: FormerLurker (#75)

Huh? Who would have known that LP is owned by the Post Office. So there is no real Goldi, it's some person sitting in a post office somewhere running the show. Well that explains why "Goldi" goes "postal" at times I guess.

USPS owns libertypost.us and libertypost.com. I can't figure out why but I assume it is historical.

Goldi owns libertypost.org and libertypost.net.

What sort of detective work did you do to arrive at that conclusion?

Umm...I looked them up at GoDaddy.com?

Just enter the domain name and hit Search. When the page comes up to tell you the name is already taken, click the link for info on who owns the domain.

Quite revealing. It's easy to find this stuff out.

TooConservative  posted on  2009-11-23   13:02:38 ET  Reply   Untrace   Trace   Private Reply  


#119. To: TooConservative (#79)

USPS owns libertypost.us and libertypost.com. I can't figure out why but I assume it is historical.

It all makes sense now......I always thought Goldi-Lox was a tad bit postal. : )

abraxas  posted on  2009-11-24   15:04:45 ET  Reply   Untrace   Trace   Private Reply  


Replies to Comment # 119.

        There are no replies to Comment # 119.


End Trace Mode for Comment # 119.

TopPage UpFull ThreadPage DownBottom/Latest


[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]  [Register]