[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help] 

Status: Not Logged In; Sign In

Keir Starmer reveals where his family is really from

(Real) 10 Non-Tax Policies In Trump's Megabill That Will Affect Americans

10 Non-Tax Policies In Trump's Megabill That Will Affect Americans

The Global Debanking Crisis Exposed! Banks Are Now Weapons Against Free Speech

Italian Government Warning of a Super Volcano

Tucker Carlson: Fox News & neo-cons are LYING about Trump and they’re keeping us in endless wars.

Tariff Windfall Drives Surprise $27 Billion US Budget Surplus In June

Tucker Carlson Reveals Who He Thinks Funded Jeffrey Epstein's Crimes

Russia's Dark Future

A Missile Shield for America - A Trillion Dollar Fantasy?

Kentucky School Board Chairman Resigns After Calling for People to ‘Shoot Republicans’

These Are 2025's 'Most Livable' Cities

Nicotine and Fish

Genocide Summer Camp, And Other Notes From The Edge Of The Narrative Matrix

This Can Create Endless Green Energy WITHOUT Electricity

Geoengineering: Who’s Behind It and How We Stop It

Pam Bondi Ordered Prosecution of Dr. Kirk Moore After Refusing to Dismiss Case

California woman bombarded with Amazon packages for over a year

CVS ordered to pay $949 MILLION in Medicaid fraud case.

Starmer has signed up to the UNs agreement to raise taxes in the UK

Magic mushrooms may hold the secret to longevity: Psilocybin extends lifespan by 57% in groundbreaking study

Cops favorite AI tool automatically deletes evidence of when AI was used

Leftist Anti ICE Extremist OPENS FIRE On Cops, $50,000 REWARD For Shooter

With great power comes no accountability.

Auto loan debt hits $1.63T. 20% of buyers now pay $1,000+ monthly. Texas delinquency hits 7.92%.

Quotable Quotes from the Chosenites

Tokara Islands NOW crashing into the Ocean ! Mysterious Swarm continues with OVER 1700 Quakes !

Why Austria Is Suddenly Declaring War on Immigration

Rep. Greene Wants To Remove $500 Million in Military Aid for Nuclear-Armed Israel From NDAA

Netanyahu Lays Groundwork for Additional Strikes on Iran: 'We Didn't Deal With The Enriched Uranium'


Science/Tech
See other Science/Tech Articles

Title: Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??
Source: none
URL Source: http://none
Published: Nov 20, 2009
Author: X-15
Post Date: 2009-11-20 03:04:08 by X-15
Keywords: None
Views: 6088
Comments: 170

"Web spiders/crawlers: programs that search websites looking for specific words or patterns to compile into a database."

A popular gun website I visit had 20 running, if 4um has less then I assume it has a lower profile in the eyes of FedGov.

Post Comment   Private Reply   Ignore Thread  


TopPage UpFull ThreadPage DownBottom/Latest

Begin Trace Mode for Comment # 21.

#1. To: X-15, Pinguinite, christine (#0)

Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??

You'd need access to christine's server logs to get a good idea. However, there are many kinds of spiders, some quite difficult to detect.

Here is a quickie spider that I wrote. It runs on the Mac, OS X 10.5 Leopard. However, it is a standard Bash script and should work easily on Linux or Unix systems, probably in a Cygwin setup on Windows too.

The script uses Lynx, a venerable text-only browser, to fetch my Comments page to a file called htmlsource1. It then uses the stream editor Sed to parse this captured HTML file by scanning the right column for news stories, capturing the thread names and URLs at 4um to a file called htmlsource2.

It then uses Lynx to capture each thread to a separate file by thread number in a subdirectory called '4um'.

You could build a database or use text search tools like grep to mine the stored threads for info.

No doubt, various federal agencies and people like ADL or SPLC use scripts like this to capture many forums and use grep and other search tools to scan each thread captured for relevant keywords to flag them for review by human beings.

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

#!/bin/sh

# # lynx directory LYNX=/usr/bin/lynx

# IP host HOST='freedom4um.com' DIRNAME='4um'

if [ ! -d "$DIRNAME" ]; then
mkdir "$DIRNAME"
fi

$LYNX -source "http://$HOST/cgi-bin/latestcomments.cgi?SNSearch=1&EM=on&Fm=&To=TooConservative" > lynxhtmlsource1

sed -n "/

/,/


/ s//http://$HOST1/p" lynxhtmlsource1 > lynxhtmlsource2

FETCHCOUNT=0
for URL in $(cat lynxhtmlsource2); do
ARTICLE="${URL:49:6}"
# echo "#$ARTICLE URL: $URL"
if [ ! -f "$DIRNAME/$ARTICLE" ]; then
$LYNX -source "$URL" > "4um/$ARTICLE"
echo "fetched $ARTICLE..."
let FETCHCOUNT+=1
fi
done

rm -f lynxhtmlsource1 lynxhtmlsource2

if [ $FETCHCOUNT -ne 0 ]; then
echo "$HOST: $FETCHCOUNT fetched"
else
echo "no new articles on $HOST"
fi

exit

[Apologies. Neil's 4um code won't let me post this spider code accurately. It's screwing my Sed command and turns the text red.]

TooConservative  posted on  2009-11-20   6:15:42 ET  Reply   Untrace   Trace   Private Reply  


#3. To: TooConservative (#1)

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

thank you, TC.

christine  posted on  2009-11-20   10:52:30 ET  Reply   Untrace   Trace   Private Reply  


#4. To: christine (#3)

BTW, by changing only a few lines in the above code, I could slowly download your entire database and reconstruct each poster's remarks. Essentially, your mySQL database could be replicated by downloading all the threads and parsing the user comments into a new mySQL database. I'm sure Neil could point this out as well, probably better than I can. This is why watching your server logs for an IP that downloads every thread or an IP that downloads every thread in the database sequentially is good to do.

Anyway, this seemed a good thread to point this stuff out.

TooConservative  posted on  2009-11-20   11:04:44 ET  Reply   Untrace   Trace   Private Reply  


#5. To: TooConservative (#4)

I could slowly download your entire database and reconstruct each poster's remarks.

can that be done on any forum?

christine  posted on  2009-11-20   11:23:55 ET  Reply   Untrace   Trace   Private Reply  


#7. To: christine, TooConservative (#5) (Edited)

can that be done on any forum?

I was wondering how a certain poster appeared on a thread so quickly where the word heebs was used. I suspect a spider looking for certain keywords.

Critter  posted on  2009-11-20   13:11:54 ET  Reply   Untrace   Trace   Private Reply  


#21. To: Critter (#7) (Edited)

I was wondering how a certain poster appeared on a thread so quickly where the word heebs was used. I suspect a spider looking for certain keywords.

Well, there is always the possibility of coincidence. But then again...

You shouldn't post anything assuming that it is somehow anonymous. That includes email and all internet activity. And don't assume your internal network is absolutely secure either. I don't think anything has been truly anonymous for at least the last five years, maybe before. And certainly, the entire future will be recorded and used against you in a court of law.

Words to the wise.

TooConservative  posted on  2009-11-20   19:40:41 ET  Reply   Untrace   Trace   Private Reply  


Replies to Comment # 21.

        There are no replies to Comment # 21.


End Trace Mode for Comment # 21.

TopPage UpFull ThreadPage DownBottom/Latest


[Home]  [Headlines]  [Latest Articles]  [Latest Comments]  [Post]  [Sign-in]  [Mail]  [Setup]  [Help]