Part 27: Generating Pages

Part 27: Generating Pages


The first step of this is to get the data — like, we need data to generate millions of these pages.

There are scraping companies like Bright Data, CoreSignal, Zite is another one, where they have pre-scraped data sets.


What we did was we bought some of these data sets for LinkedIn data as well as Crunchbase data — people profiles, company profiles — and we used that to create our pages for Bill Gates and everyone else that we have.


Some of the data sources that you can consider are Kaggle, where they have an open library of thousands upon thousands of data sets, everything from anime data to Airbnb.


You can use WHOIS records — you can actually download all this from the WHOIS API, and there are 1.7 billion records, which show who owns domains and what's the contact data for these owners.


There's flight data, which powers sites like ElonJet, where you can see where every flight is, and you can make a programmatic SEO site for tracking where celebrities are based on their jets — and perhaps get sued by Elon.


There's legal data sets, where all the court records are public, and you can download this and look at it. I don't know, maybe you want to create some sort of legal ruling directory or something like that.


Medical data sets, same thing — you can find scientific research papers and make some interesting analysis or websites on that.


Voter records as well are public in the USA, so you can access how people voted.


Next Part ->


The first step of this is to get the data — like, we need data to generate millions of these pages.

There are scraping companies like Bright Data, CoreSignal, Zite is another one, where they have pre-scraped data sets.


What we did was we bought some of these data sets for LinkedIn data as well as Crunchbase data — people profiles, company profiles — and we used that to create our pages for Bill Gates and everyone else that we have.


Some of the data sources that you can consider are Kaggle, where they have an open library of thousands upon thousands of data sets, everything from anime data to Airbnb.


You can use WHOIS records — you can actually download all this from the WHOIS API, and there are 1.7 billion records, which show who owns domains and what's the contact data for these owners.


There's flight data, which powers sites like ElonJet, where you can see where every flight is, and you can make a programmatic SEO site for tracking where celebrities are based on their jets — and perhaps get sued by Elon.


There's legal data sets, where all the court records are public, and you can download this and look at it. I don't know, maybe you want to create some sort of legal ruling directory or something like that.


Medical data sets, same thing — you can find scientific research papers and make some interesting analysis or websites on that.


Voter records as well are public in the USA, so you can access how people voted.


Next Part ->