Go Back   Webmaster Malaysia Forum » Website Design & Development » Website Programming

Reply
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 13-04-2002, 10:35 PM
whit3_cryst4l's Avatar
Novice Webmaster
 
Join Date: Jul 2001
Location: jauh nun...
Posts: 36
Rep Power: 0
whit3_cryst4l is on a distinguished road
Send a message via ICQ to whit3_cryst4l Send a message via Yahoo to whit3_cryst4l
Question crawling pdf files... how???

Assalamualaikum...

sorry... just want to ask one quest... i have done search engine with crawling part... but i wonder to know... how to crawl pdf files since all php scripts crawl URLs thru "href"... it just crawls html files... can someone tell me how??

- thanks =) -
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 14-04-2002, 11:18 AM
Senior Webmaster
 
Join Date: Jul 2001
Location: OCed
Posts: 252
Rep Power: 92
malayneum is on a distinguished road
dont know .. but google can do that .. right ?
__________________
I can't affod to have a signature here, can somebody sponsor me a signature ?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 15-04-2002, 11:54 AM
whit3_cryst4l's Avatar
Novice Webmaster
 
Join Date: Jul 2001
Location: jauh nun...
Posts: 36
Rep Power: 0
whit3_cryst4l is on a distinguished road
Send a message via ICQ to whit3_cryst4l Send a message via Yahoo to whit3_cryst4l
ehehe... not just google my dear... most journals/technical papers search engine can do that... such as cora.whizbang... ncstrl... and others... pdf files or postscripts files are more useful and can be trusted than html files...=)
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 15-04-2002, 12:18 PM
Senior Webmaster
 
Join Date: Jul 2001
Location: OCed
Posts: 252
Rep Power: 92
malayneum is on a distinguished road
thehehe ... *i just know google* (because i use it ONLY)

but since php can *create* pdf file ... so i think there is a way to undo the process (haha .. what am i talking about ...). so we can read it as txt file or anything ...
__________________
I can't affod to have a signature here, can somebody sponsor me a signature ?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 15-04-2002, 12:36 PM
whit3_cryst4l's Avatar
Novice Webmaster
 
Join Date: Jul 2001
Location: jauh nun...
Posts: 36
Rep Power: 0
whit3_cryst4l is on a distinguished road
Send a message via ICQ to whit3_cryst4l Send a message via Yahoo to whit3_cryst4l
i just wonder... what do u mean by creating pdf files ?? can u show me how to do it
__________________
~ k | r | i | s | t | a | l ^ p | u | t | i | h ~
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 15-04-2002, 01:00 PM
Novice Webmaster
 
Join Date: Nov 2001
Location: JHR
Posts: 89
Rep Power: 85
r0kawa is on a distinguished road
Send a message via Yahoo to r0kawa
http://www.php.net/pdf

there's some manual about how to create a pdf file using pdflib. I'm not sure how to crawl.. maybe one idea i have..

first u fetch the pdf file to your webserver, and then read and then get the link.. and then i think u can crawl, run, or whatever u think it's suitable
__________________
<a href="http://www.php.net.my">www.php.net.my</a>
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 15-04-2002, 01:41 PM
whit3_cryst4l's Avatar
Novice Webmaster
 
Join Date: Jul 2001
Location: jauh nun...
Posts: 36
Rep Power: 0
whit3_cryst4l is on a distinguished road
Send a message via ICQ to whit3_cryst4l Send a message via Yahoo to whit3_cryst4l
tq 4 ur opinion and help rokawa... about fecth the pdf file... well... that's my prob!!!... how to fecth?? i have done lots of thing with the coding... changing and trial & error... but didn't work...

ok... lets me explain a bit details... when we crawl a page with default web page... we use href to get the link (URL)... grab it and keep the title, description, and URL (link) into our database... * that's spider job!!! ... when user search any term or keyword... we will give the results based on our data in the database...

so here... for spider part... i just can fecth html files with .gif... but not pdf files... i still work on it!!! ... anyway thanks again
__________________
~ k | r | i | s | t | a | l ^ p | u | t | i | h ~

Last edited by whit3_cryst4l; 17-04-2002 at 09:43 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Source files vaNko Websites Review and Suggestion 0 05-09-2005 11:12 AM
Can't upload files... Cosmos75 Webmaster Tools 6 02-06-2004 11:20 AM
Upload File & Crawling Web Pages to Get pdf + ps files whit3_cryst4l Website Programming 2 23-10-2002 10:18 AM



All times are GMT +8. The time now is 07:17 PM. Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.1.0 vBulletin skin by ForumMonkeys.com.


WebmasterMalaysia.com is Proudly Hosted by Exabytes Semi Dedicated Server.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59