[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help] 

Status: Not Logged In; Sign In

The INCREDIBLE Impacts of Methylene Blue

The LARGEST Eruptions since the Merapi Disaster in 2010 at Lewotobi Laki Laki in Indonesia

Feds ARREST 11 Leftists For AMBUSH On ICE, 2 Cops Shot, Organized Terror Cell Targeted ICE In Texas

What is quantum computing?

12 Important Questions We Should Be Asking About The Cover Up The Truth About Jeffrey Epstein

TSA quietly scraps security check that every passenger dreads

Iran Receives Emergency Airlift of Chinese Air Defence Systems as Israel Considers New Attacks

Russia reportedly used its new, inexpensive Chernika kamikaze drone in the Ukraine

Iran's President Says the US Pledged Israel Wouldn't Attack During Previous Nuclear Negotiations

Will Japan's Rice Price Shock Lead To Government Collapse And Spark A Global Bond Crisis

Beware The 'Omniwar': Catherine Austin Fitts Fears 'Weaponization Of Everything'

Roger Stone: AG Pam Bondi Must Answer For 14 Terabytes Claim Of Child Torture Videos!

'Hit Us, Please' - America's Left Issues A 'Broken Arrow' Signal To Europe

Cash Jordan Trump Deports ‘Thousands of Migrants’ to Africa… on Purpose

Gunman Ambushes Border Patrol Agents In Texas Amid Anti-ICE Rhetoric From Democrats

Texas Flood

Why America Built A Forest From Canada To Texas

Tucker Carlson Interviews President of Iran Mosoud Pezeshkian

PROOF Netanyahu Wants US To Fight His Wars

RAPID CRUSTAL MOVEMENT DETECTED- Are the Unusual Earthquakes TRIGGER for MORE (in Japan and Italy) ?

Google Bets Big On Nuclear Fusion

Iran sets a world record by deporting 300,000 illegal refugees in 14 days

Brazilian Women Soccer Players (in Bikinis) Incredible Skills

Watch: Mexico City Protest Against American Ex-Pat 'Invasion' Turns Viole

Kazakhstan Just BETRAYED Russia - Takes gunpowder out of Putin’s Hands

Why CNN & Fareed Zakaria are Wrong About Iran and Trump

Something Is Going Deeply WRONG In Russia

329 Rivers in China Exceed Flood Warnings, With 75,000 Dams in Critical Condition

Command Of Russian Army 'Undermined' After 16 Of Putin's Generals Killed At War, UK Says

Rickards: Superintelligence Will Never Arrive


Science/Tech
See other Science/Tech Articles

Title: Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??
Source: none
URL Source: http://none
Published: Nov 20, 2009
Author: X-15
Post Date: 2009-11-20 03:04:08 by X-15
Keywords: None
Views: 5626
Comments: 170

"Web spiders/crawlers: programs that search websites looking for specific words or patterns to compile into a database."

A popular gun website I visit had 20 running, if 4um has less then I assume it has a lower profile in the eyes of FedGov.

Post Comment   Private Reply   Ignore Thread  


TopPage UpFull ThreadPage DownBottom/Latest

Begin Trace Mode for Comment # 4.

#1. To: X-15, Pinguinite, christine (#0)

Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??

You'd need access to christine's server logs to get a good idea. However, there are many kinds of spiders, some quite difficult to detect.

Here is a quickie spider that I wrote. It runs on the Mac, OS X 10.5 Leopard. However, it is a standard Bash script and should work easily on Linux or Unix systems, probably in a Cygwin setup on Windows too.

The script uses Lynx, a venerable text-only browser, to fetch my Comments page to a file called htmlsource1. It then uses the stream editor Sed to parse this captured HTML file by scanning the right column for news stories, capturing the thread names and URLs at 4um to a file called htmlsource2.

It then uses Lynx to capture each thread to a separate file by thread number in a subdirectory called '4um'.

You could build a database or use text search tools like grep to mine the stored threads for info.

No doubt, various federal agencies and people like ADL or SPLC use scripts like this to capture many forums and use grep and other search tools to scan each thread captured for relevant keywords to flag them for review by human beings.

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

#!/bin/sh

# # lynx directory LYNX=/usr/bin/lynx

# IP host HOST='freedom4um.com' DIRNAME='4um'

if [ ! -d "$DIRNAME" ]; then
mkdir "$DIRNAME"
fi

$LYNX -source "http://$HOST/cgi-bin/latestcomments.cgi?SNSearch=1&EM=on&Fm=&To=TooConservative" > lynxhtmlsource1

sed -n "/

/,/


/ s//http://$HOST1/p" lynxhtmlsource1 > lynxhtmlsource2

FETCHCOUNT=0
for URL in $(cat lynxhtmlsource2); do
ARTICLE="${URL:49:6}"
# echo "#$ARTICLE URL: $URL"
if [ ! -f "$DIRNAME/$ARTICLE" ]; then
$LYNX -source "$URL" > "4um/$ARTICLE"
echo "fetched $ARTICLE..."
let FETCHCOUNT+=1
fi
done

rm -f lynxhtmlsource1 lynxhtmlsource2

if [ $FETCHCOUNT -ne 0 ]; then
echo "$HOST: $FETCHCOUNT fetched"
else
echo "no new articles on $HOST"
fi

exit

[Apologies. Neil's 4um code won't let me post this spider code accurately. It's screwing my Sed command and turns the text red.]

TooConservative  posted on  2009-11-20   6:15:42 ET  Reply   Untrace   Trace   Private Reply  


#3. To: TooConservative (#1)

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

thank you, TC.

christine  posted on  2009-11-20   10:52:30 ET  Reply   Untrace   Trace   Private Reply  


#4. To: christine (#3)

BTW, by changing only a few lines in the above code, I could slowly download your entire database and reconstruct each poster's remarks. Essentially, your mySQL database could be replicated by downloading all the threads and parsing the user comments into a new mySQL database. I'm sure Neil could point this out as well, probably better than I can. This is why watching your server logs for an IP that downloads every thread or an IP that downloads every thread in the database sequentially is good to do.

Anyway, this seemed a good thread to point this stuff out.

TooConservative  posted on  2009-11-20   11:04:44 ET  Reply   Untrace   Trace   Private Reply  


Replies to Comment # 4.

#5. To: TooConservative (#4)

I could slowly download your entire database and reconstruct each poster's remarks.

can that be done on any forum?

christine  posted on  2009-11-20 11:23:55 ET  Reply   Untrace   Trace   Private Reply  


End Trace Mode for Comment # 4.

TopPage UpFull ThreadPage DownBottom/Latest


[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]