Monday 20 April 2009

700,000 videos indexed

We have now reached 700,000 videos indexed, not quite the million we are still aiming at, however its a true number (unlike a few other sites we could mention...).

We have been working on our search engine and we have added the ability to search/match on any word, search on all words or search on a phase, thi sapplies to the videos title, any text on the page containing the video, any tags, the url of the video, and any meta description (if applicable)

For example: the search query "busty sex in office" with a match set to any will match on relevance based on any of the words, with those with all words matching first but the total number being the videos with any of the words, matching for all will return only those videos that have all the words. A phase match will match only where an exact phase exsists, thus only a videos that has the words "busty sex in office", in the same order will be returned. We hope this is useful for finding exactly what you want.

We had an outage this morning for around 4 hours, this was due to our db needing some tuning as our site usage has increased by around 600% over the last couple of days. We appoligise for this, please note we do take any outage very seriously and work (if needed) around the clock to keep the show on the road.

We are always looking for comments and suggestions on how to improve our site, or if you are a webmaster to create traffic parnerships.

All the Best

The Pordeo Team

Monday 16 February 2009

380,000 videos Indexed!

We have now indexed 380,000 videos, we took a bit of time getting to 380K, since we moved our database onto oracle rac and also modified our indexing to enable "phrase matching" i.e "sex with milf" if included in quotes will find a phrase anywhere in the title, video page text or file url for the phrase. We also now support stemming so words like 'fucks' also matches 'fuck'.

We have also implemented a bag feature which users can add search results items to and then come back to later. To use this just click on the icon next to the title and it will place into your local porn bag. To view your items just click on the link in the topright hand corner mybag(5).

We have speeded up our indexing so we should hit the 1/2 million mark by friday 20th Feb 09

Thursday 12 February 2009

Another site added

After making a few changes to the search indexing, we are in the process of loading up harporn. Its not a great site as a large number of the videos are around a few minutes however since you can sort on duration this should eliminate these if required.

We have changed the indexing to provide the most relevant results based on title,url,description and any additonal tags and text found on the videos page, so you can search using terms and phrases. i.e "big boobed milf", will return most relevant first then any containing some of the next words. Like google once you get past a few pages, the relavancy of your results will diminish.

Next on the list is freudbox

Once we have passed the 1/2 million video mark, we aim to work on our gallery and text crawler so we can start indexing all types of porn, if you have any suggestions or comments, please add em here.

Tuesday 10 February 2009

Progress so Far..

Well so far we have indexed:
YouPorn
SpankWire
PornHub
xVideos
tube8
xhamster
tnaflix
n4x
vid2c
shufuni

After some data cleansing, this results in around 225K indexed videos. Aim is to pass the 500K mark by mid Feb.

Sunday 8 February 2009

Announcing PorPop

Well I have kind of had this project on the go for around 10 months, and finally decided to make it available for everyone. After working a a C#/C++ AP at a hedge fund for the last 8 years....on credit derivatives, the recent dimise of work in this area, has given me some time to develop PorPop.com

The Idea is to apply some of the algorythmic principals (new my math degree would come in useful again) used in search engines to adult/porn sites, with the advent of youtube and its porn equivalent youporn, video sites are the most logical place to start. I have built 2 crawlers/web spiders, the first is customized parameter driven crawler for known sites which uses prepared Regex's to pull specific details from target pages; the second is a generic crawler which will read robots txt files and scan html links and work out backlinks and calcualte a weighted score based on relevance. All the data is written back into an Db for indexing.

The indexing component is written in C++ and creates a binary representation of text found in the pages and creates a fast index, I have tested this to 100Gb of compressed binary pages ( around 60K web sites) and it will return a set of matches in less than 100/sec. Being completely scalable through roundrobin parallel processing - this will be able to cope with infinate volumes - should porpop become popular.

Anyway started indexing on Thursday 5th Feb, and have as I am writting this indexed 170,000 videos from around 7 sites. Its been a bit slow to begin with since I am babysitting the process however, once I can productionize the code, I aim to have around 100 threads working and have 20,000 items / hour indexing capacity.

Any comments or suggestions?