4um: Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??

[Home] [Headlines] [Latest Articles] [Latest Comments] [Post] [Sign-in] [Mail] [Setup] [Help]

Status: Not Logged In; Sign In

LA Police Bust Burglary Crew Suspected In 92 Residential Heists
Top 10 Jobs AI is Going to Wipe Out
It’s REALLY Happening! The Australian Continent Is Drifting Towards Asia
Broken Germany Discovers BRUTAL Reality
Nuclear War, Trump's New $500 dollar note: Armstrong says gold is going much higher
Scientists unlock 30-year mystery: Rare micronutrient holds key to brain health and cancer defense
City of Fort Wayne proposing changes to food, alcohol requirements for Riverfront Liquor Licenses
Cash Jordan: Migrant MOB BLOCKS Whitehouse… Demands ‘11 Million Illegals’ Stay
Not much going on that I can find today
In Britain, they are secretly preparing for mass deaths
These Are The Best And Worst Countries For Work (US Last Place)-Life Balance
These Are The World's Most Powerful Cars
Doctor: Trump has 6 to 8 Months TO LIVE?!
Whatever Happened to Robert E. Lee's 7 Children
Is the Wailing Wall Actually a Roman Fort?
Israelis Persecute Americans
Israelis SHOCKED The World Hates Them
Ghost Dancers and Democracy: Tucker Carlson
Amalek (Enemies of Israel) 100,000 Views on Bitchute
ICE agents pull screaming illegal immigrant influencer from car after resisting arrest
Aaron Lewis on Being Blacklisted & Why Record Labels Promote Terrible Music
Connecticut Democratic Party Holds Presser To Cry About Libs of TikTok
Trump wants concealed carry in DC.
Chinese 108m Steel Bridge Collapses in 3s, 16 Workers Fall 130m into Yellow River
COVID-19 mRNA-Induced TURBO CANCERS.
Think Tank Urges Dems To Drop These 45 Terms That Turn Off Normies
Man attempts to carjack a New Yorker
Test post re: IRS
How Managers Are Using AI To Hire And Fire People
Israel's Biggest US Donor Now Owns CBS

Science/Tech
See other Science/Tech Articles

Title: Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??
Source: none
URL Source: http://none
Published: Nov 20, 2009
Author: X-15
Post Date: 2009-11-20 03:04:08 by X-15
Keywords: None
Views: 6231
Comments: 170

"Web spiders/crawlers: programs that search websites looking for specific words or patterns to compile into a database."

A popular gun website I visit had 20 running, if 4um has less then I assume it has a lower profile in the eyes of FedGov.

Post Comment Private Reply Ignore Thread

Top • Page Up • Full Thread • Page Down • Bottom/Latest

Begin Trace Mode for Comment # 3.

#1. To: X-15, Pinguinite, christine (#0)

Internet geeks here! Who can determine how many web spiders/crawlers are on 4um??

You'd need access to christine's server logs to get a good idea. However, there are many kinds of spiders, some quite difficult to detect.

Here is a quickie spider that I wrote. It runs on the Mac, OS X 10.5 Leopard. However, it is a standard Bash script and should work easily on Linux or Unix systems, probably in a Cygwin setup on Windows too.

The script uses Lynx, a venerable text-only browser, to fetch my Comments page to a file called htmlsource1. It then uses the stream editor Sed to parse this captured HTML file by scanning the right column for news stories, capturing the thread names and URLs at 4um to a file called htmlsource2.

It then uses Lynx to capture each thread to a separate file by thread number in a subdirectory called '4um'.

You could build a database or use text search tools like grep to mine the stored threads for info.

No doubt, various federal agencies and people like ADL or SPLC use scripts like this to capture many forums and use grep and other search tools to scan each thread captured for relevant keywords to flag them for review by human beings.

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

#!/bin/sh
# # lynx directory LYNX=/usr/bin/lynx
# IP host HOST='freedom4um.com' DIRNAME='4um'
if [ ! -d "$DIRNAME" ]; then
mkdir "$DIRNAME"
fi

$LYNX -source "http://$HOST/cgi-bin/latestcomments.cgi?SNSearch=1&EM=on&Fm=&To=TooConservative" > lynxhtmlsource1
sed -n "/
/,/
/ s//http://$HOST1/p" lynxhtmlsource1 > lynxhtmlsource2
FETCHCOUNT=0
for URL in $(cat lynxhtmlsource2); do
ARTICLE="${URL:49:6}"
# echo "#$ARTICLE URL: $URL"
if [ ! -f "$DIRNAME/$ARTICLE" ]; then
$LYNX -source "$URL" > "4um/$ARTICLE"
echo "fetched $ARTICLE..."
let FETCHCOUNT+=1
fi
done
rm -f lynxhtmlsource1 lynxhtmlsource2
if [ $FETCHCOUNT -ne 0 ]; then
echo "$HOST: $FETCHCOUNT fetched"
else
echo "no new articles on $HOST"
fi
exit

[Apologies. Neil's 4um code won't let me post this spider code accurately. It's screwing my Sed command and turns the text red.]

TooConservative posted on 2009-11-20 6:15:42 ET Reply Untrace Trace Private Reply

#3. To: TooConservative (#1)

I'm presenting this so that 4um folks can get some idea of how these agencies and busybodies operate. Geeks know this stuff but the average person has no idea how easy it is. People should know how easy it is to database their every remark since we no longer live in a free country.

thank you, TC.

christine posted on 2009-11-20 10:52:30 ET Reply Untrace Trace Private Reply

Replies to Comment # 3.

#4. To: christine (#3)

BTW, by changing only a few lines in the above code, I could slowly download your entire database and reconstruct each poster's remarks. Essentially, your mySQL database could be replicated by downloading all the threads and parsing the user comments into a new mySQL database. I'm sure Neil could point this out as well, probably better than I can. This is why watching your server logs for an IP that downloads every thread or an IP that downloads every thread in the database sequentially is good to do.

Anyway, this seemed a good thread to point this stuff out.

TooConservative posted on 2009-11-20 11:04:44 ET Reply Untrace Trace Private Reply

End Trace Mode for Comment # 3.

Top • Page Up • Full Thread • Page Down • Bottom/Latest

[Home] [Headlines] [Latest Articles] [Latest Comments] [Post] [Sign-in] [Mail] [Setup] [Help]