Screen scraping and lead generation - Posted by John

Posted by Doug McDowell on March 17, 2003 at 15:06:25:

haha, I guess so Peter :slight_smile:

I won’t argue that visual basic is as robust as some other languages, I totally agree with you, but having used it to write applications that parse html from over 150 different web servers for financial institutions, I can tell you it is certainly robust enough for what he wants, even with the threading limitations, etc.

That said, though, I cannot compare to Delphi, having never used it. My only other point of reference was C or C++, better languages than VB by far, but also steeper learning curves.

You are quite right, if he is used to a language structure similar or identical to Delphi, that would be his best bet.

Live and be well :-),
-Doug

Screen scraping and lead generation - Posted by John

Posted by John on March 13, 2003 at 13:43:30:

My county is wonderfully automated and online. I’d like to write a program that can:

  1. Both major newspapers in my city have online classified ads, which appear at no extra cost to the advertiser. Basically, it’s a searchable copy of the printed page. This new program needs to gather all ‘for sale’ listings containing such keywords as ‘desparate,’ ‘owner financing,’ etc.

  2. The program should then parse any telephone numbers contained in these ads, and perform a reverse lookup (anywho.com) to find the address. Of course, this doesn’t work for cell phones.

2.5 If phone numbers are from out of state, gather names and addresses for a mail merge letter.

  1. If an address is found (or already printed in the ad), the program then hits the county recorder’s office. All documents are searchable by name, address, and tax ID. I want to pull past property tax information, current appraisal values, originl mortgage balances, delinquencies, etc.

  2. Using property owner names, I’ll hit the online courthouse and search for judgements, leans, etc against the owner or address.

  3. Using the address, I can check the MLS for comps, expireds, and such on the same street.

Hopefully, the program runs every night just after midnight, and can present me with either a printed report, or a giant .pdf file to be read in the morning over cereal.

I figure, the more I know about a property at the outset (automatically, even!), the easier I can solve problems sellers are facing.

Does anyone have enough programming experience to point me in the right direction? I can make MATLAB sit up and talk, but it’s no good for web-based stuff.

Cheers,

JEC

And here’s something else I found… - Posted by js-Indianapolis

Posted by js-Indianapolis on March 14, 2003 at 17:39:48:

http://www.egrabber.com/addressgrabberbusiness/addressgrabber_b.htm

That’s a program that looks like it might do the first half of what we are all attempting, somewhat. Basically, highlight some contact info, click on some button, and it gets exported into the proper fields of your favorite database program. BTW, my favorite is ACT, it should be everyone else’s too. :slight_smile:

This would work great for when I have a lead that I can view online. However, I’m also looking for a program to “look” for me. Just thought I’d throw this out, to see if someone else could figure something to do with it.

Re: How about we all get together on this? - Posted by js-Indianapolis

Posted by js-Indianapolis on March 14, 2003 at 17:28:27:

How about we all put our funds, as well as our heads togeteher, and find us a good programmer?

Sounds like we are all attempting to do basically the same thing. How difficult would it be to get one programmer to write the code for the main project, then have them write an individual subprogram for each website we each are trying to pull data from?

Anyone up for it, use my email.

I almost did most of what you want - Posted by osirus

Posted by osirus on March 13, 2003 at 23:59:48:

About three years ago I realized how labor intentsive and redudant it would be manully going through FSBO and FRBO owner ads. I thought it would be awesome if I could get the computer to do most of the work for me.

The only computer programming training I had was Basic I and Basic II programming I took highschool on the old TRS-80’s. By the the time I took on this project Basic was an obsolete language for some time. Anyhow, over a period of 6-8 months of trail and error, I “Forest Gumped” myself a several programs written in Visual basic that does the following:

  1. Changes text copies of FSBO & FRBO ads into tab delimited records with this structure: ad date ad category ad info ad adress (if one appears in the ad) phone number

  2. Next I automatically import the tab delimited records in to Access. Once in Acess I can easily used various data bases and queries to accomplish most of items 2-4 on your list.

I now only takes me a few minutes to sort through the FSBO and FRBO ads. However, if I had it to do over again I would have outsourced the project.

Re: Screen scraping and lead generation - Posted by James Strange

Posted by James Strange on March 13, 2003 at 15:26:16:

It has been my experince that most ads with keywords like ‘desparate,’ are investors.

Re: Screen scraping and lead generation - Posted by Peter (NM)

Posted by Peter (NM) on March 13, 2003 at 15:18:49:

Umm… Well… Hrmm… Well, I have programming experience in many languages (Mainly C and Perl). What you are trying to do would be no easy task and would be very close to impossible to stay consistant.

Keep in mind, if you are trying to parse a webpage, for instance anywho.com’s reverse lookup tool… if they make 1 change to the format, it could potentially effect the programs parsing ability and ruin the output. I think its a very good idea, but very very difficult to accomplish, especially for someone lacking the technical knowledge of TCP/IP & HTTP protocols. You would need strong knowledge of HTML. You would need to know a language capable of doing all of this and know how to use that language to actually write the code to produce your program.

I have tried something similar for tracking my sports wagers and lines/spreads on games, etc… so I didnt have to pay $30 for some service to keep me updates and store my records… bottom line, after hours of pain staking work, I got something working… some what, but a small change in the website I was pulling the data from would ruin it and im redoing the parsing functions… I ended up subscribing to the service I was trying to avoid :slight_smile: This is with extensive knowledge of all the protocols, etc. involved.

Also, I think Kristine’s post is a very good one and she gives good advice.

Re: Screen scraping and lead generation - Posted by Eric FL

Posted by Eric FL on March 13, 2003 at 14:53:10:

I would post your project on elance.com and some offshore programmers will bid on your project. They should be able to get it done for you for about $500-1000. It will save you a lot of time and whole lot of money. You just have to give them the county sites.

Re: Screen scraping and lead generation - Posted by Kristine-CA

Posted by Kristine-CA on March 13, 2003 at 14:18:15:

JEC: my experience has been that you do not need to know as much about the property as about the seller. No amount of info about liens, etc., will make a seller sell. Stuyding up on details of properties not even for sale can be a huge waste of time. Or at least not a very good use of your time. If you want to use FSBOs as a lead, I suggest calling them. You will get many people who are not interested in talking to you at all unless you offer them their asking price or close to it. So why waste all this time figuring out their story when they still won’t sell to you?

Just my two cents. Sincerely, Kristine

Re: I almost did most of what you want - Posted by cole310

Posted by cole310 on March 14, 2003 at 09:06:23:

Would you possibly be open to sharing your knowledge with myself and my programmer. We are jointly trying to build an app that will parse data out of a web page (Notice of Defaults in my county) and pull out certain data. Any help would be greatly appreciated.

Yeah… - Posted by John

Posted by John on March 13, 2003 at 22:50:40:

I realize it’s a big project. I searched around the web and found several companies who will do the data mining, but they aren’t cheap.

I’ve done some work in C, but it was several years ago. I did spend many years peeking and poking around on my Commodore 64, but I doubt that would help. Programming isn’t difficult for me, but there would be a learning/familiarity curve with whatever language I choose, since I’d be starting from scratch.

Perhaps the slashdot crowd has some advice.

Would would be a good language to investigate?

Back to ye ole’ drawing board.

JEC

How about pulling info from GIS maps? - Posted by js-Indianapolis

Posted by js-Indianapolis on March 13, 2003 at 18:10:31:

I’ve seen a lot of counties have different formats for accessing their info, but they all seem to have “GIS Map Data”, whatever that means to a programmer. To me, it means I can get parcel info, tax info, and whatever other info the county has online, via a parcel map, and a simple click.

So skip the newspaper ads, and search the entire database for something like those behind on taxes for a year. Or those with out of state mailing addresses. Or those who have both an out of state mailing address, and the taxes are behind. Again, this is all online, just cannot easily be accessed through html.

How about getting right to the GIS data? The page I see it on is a .asp file, for most counties I look at. I don’t know what all that means. The last time I programmed, PASCAL was going out, and C was in. It’s been a while.

Here’s some websites, with good online records:

http://www.co.franklin.oh.us/
http://gisweb.ci.richmond.va.us/
http://www.co.lake.il.us/

Any thoughts on how to pull infor from them? I’ve got other examples of sites with data, but they’re all basically the same. GIS and .asp. County formats are www.co.COUNTY.ST.us, whereas COUNTY is the county’s name, and ST is the state abbreviation.

Think anything can be done to search this info?

Re: Screen scraping and lead generation - Posted by James Strange

Posted by James Strange on March 13, 2003 at 15:24:20:

Be sure that the bidders have lots of good feedback. I took a chance on a bidder who was new. They begged to get the job, they said that they wanted to do a great job so that they could get good feedback. It was a complet waste of time and money. And Elance was no help.

Re: Yeah… - Posted by Peter (NM)

Posted by Peter (NM) on March 14, 2003 at 13:01:37:

The slashdot crowd rarely has a good thing to say. Post your question and expect a ratio of 10000:1 BS posts/flames to an actual good post.

I would say python or perl is your best bet for something like this (NOT VB, like Doug suggested). The reason I suggest these is because of their natural use of regexp (Regular Expressions) which would make it much easier to parse. Now, im not sure how portable perl and/or python are to a windows environment (I do ALL my computing, work or home, on FreeBSD (UNIX) systems)… I am pretty sure there are interpreters/compilers for Windows for those languages. Maybe you can try Delphi. You said you have Pascal experience, well think of Delphi as Pascal on steroids. Its the same syntax, it has a nifty IDE, its pretty fast, etc… Feel free to email me if you have any questions/issues you need help hashing out…

Peter

Re: Yeah… - Posted by Doug McDowell

Posted by Doug McDowell on March 14, 2003 at 08:47:20:

John,
I would recommend using Visual Basic (does NOT have to be the .NET version) , simply because of the speed of development, and you are writing this for your own use.
Create a form in VB, add an Internet Transfer Control, and parse through your web page. They are absolutely right in that one small change to the web-site would probably hose the parsing, so you would need to be aware of that going in.

You might try some of the local technical colleges to see if there are any programming students there that want to make money on the side. You could probably get away with not paying them as much and they would end up with a ‘contract job’ to put on their resume.

Re: Screen scraping and lead generation - Posted by Eric FL

Posted by Eric FL on March 13, 2003 at 15:27:25:

Just find a competent vendor who will let you pay all or the majority of the contract when the final piece is delivered. I have done many many projects through elance successfully. Remember that their are many good vendors on elance who are trying to run a successful business.

Re: Yeah… - Posted by Doug McDowell

Posted by Doug McDowell on March 17, 2003 at 09:39:54:

While it’s true regular expressions are not native to VB 6 (they are to VB.NET), you can still use them by referencing VB Script Regular Expressions Library. And the internet transfer control does not require an extensive knowledge of http and tcp protocols in order to use it.

I guess this is ‘bs post 10001’ but he can do everything he needs to do in Visual Basic OR Delphi.

And you’re right, Perl is portable to windows.

Re: Yeah… - Posted by Peter (NM)

Posted by Peter (NM) on March 17, 2003 at 13:22:52:

Yes, I know he CAN do it in VB. He could also do it in binary if he so wanted :slight_smile: The reason I shied him away from VB is because imho, its a horrible language and should only be used for playing, not gathering data you will base your business plan on. I think Delphi would be a good choice for him, its not very difficult, its where he has some slight experience and it will be 10x more robust than anything in VB would be.

I will assume your a VB programmer and want to argue the point with me. Thats fine. Yes I know some major MS apps were written in VB (MS Money for instance) and yes VB could have some stability when done right. But if he doesnt have experience with it he cant clean it up properly. The drop-modules will be cludgy (like most IDE’s). Of all the professional programmers I know (Windows or UNIX programmers) only one uses VB on a regular basis and he wishes he didnt have to :slight_smile:

To each his own.

Peter