0

Bulk search for LinkedIn profiles and get clean data with: title, first name, middle Name, last name, suffix, job title, name company

Easy search for LinkedIn profiles and add them to your CRM into Ninox.

Join it to your CRM.

From this:

 

To the end result:

 

https://html.duckduckgo.com/html/?q=linkedin.com+word1+word2&kl=nl-nl
 

 

I am only interested in external links (so not the ones containing the URL from the search engine).

 

I i choose for outgoing links.

External links.

 

To see the results.

Go to ‘Database’

External Links

 

Results

 

 

I want to generate some search URLs with different search words.

This is my database containing lastname of CEO from a company and company names (some are spelled a bit different).

 

First I generated only searches with company names.

https://www.toptal.com/marketing/mergewords

 

box 1

https://html.duckduckgo.com/html/?q=linkedin.com+

 

box 2

(variabel)

company names

 

box 3

parameter

&kl=nl-nl

for Dutch results

South Africa would be: 

&kl=za-en

 

Then I combined lastname with the company name

 

The columns containing lastname and company name are also added.

 

I copy the results to TextPad

 

You see that that the lines are not completely blue.

Meaning the URL is not good.

 

To make it work.

I change the space into a + or into %20.

I do it both.

First to +

 

You see that most turned blue.

 

It doesnt recognize the ë, thats why they are not blue.

 

I copy and paste them into Netpeak Spider.

 

I will repeat it, but now with %20 and not a + sign.

 

The program dedupes the URLs.

 

The URLs in the crawler.

 

Some settings.

You can use a lot of threads. But the more you use, the heavier the load on your computer.

Each thread opens a browser (in the program).

So if you enter 10. It opens 10 times a browser in the background.

And I set the delay on 250ms or 500ms. Otherwise it is going to fast and the search engine will block the IP address very quick.

 

Duckduckgo doesnt store cookies, so you can leave that off.

 

You can choose different User agents.

Sometimes I get no results with Chrome or Edge and then choose another one like Apple.

 

To prevent you own IP address from getting blocked or put on a blacklist, you might use proxies to ‘hide’ your own IP address.

I buy mine at proxyscrape.com

 

Some proxy checking for speed and working.

 

I only want to keep the quick ones.

So deleting the other that I dont want to use.

 

Setup a filter to get only linkedin.com/in/ profiles with in the anchor text:

name - job title - name company

or

name - name company

 

The filtered results.

 

I copy the results to a csv editor.

 

Some text is containing only the name and a company name.

Others are containing name, job title and company name.

 

First filter out only names without extra information like job title or company name.

 

Regex

- .*? Linkedin

 

Delete those.

 

[|].*?[|]

 

Delete those too.

 

Searching for text containing 

lastname, job title, company name

Regex

 -.*? - .*? Linkedin (starting with space)

 

Some results

 

I want to split up those results into new columns containing name, job title and name company.

 

I want to split up on text.

 -

space followed by - sign

Right to left

Choosing the source column.

And the selected columns (press ‘new’)

I already gave them names.

Press ‘split column’

 

The results

 

 

Let’s split up the names.

I have an addon in Google Sheets from ablebits.

 

 

 

Results are not 100%.

So you might do some clean up manually.

 

You can copy the results from Google Sheets and Join them into the results in Rons Data Edit.

 

I remove the text

  | LinkedIn (beginning with space)

actually: replacing this text for no text.

 

Some cleaned up results.

 

Do some manual clean up by selecting a column and press ascending or descending.

 

 

Reply

null

Content aside

  • 11 mths agoLast active
  • 214Views
  • 1 Following