LinkedIn profiles. Quick search through multiple files and add them to your CRM. Snel zoeken naar LinkedIn profielen en toevoegen aan CRM.
Search for LinkedIn profiles. Split them up into LinkedIn profile, gender, firstname, lastname, job position and company name. Now add them to your CRM at Ninox.
Zoek LinkedIn profielen. Split ze naar LinkedIn profiel, geslacht, voornaam, achternaam, functie en bedrijf. Nu toevoegen aan jouw CRM bij Ninox.
From this:
To
End result:
First part is how to search.
Second part is how to filter and tidy your results.
First part: how to search for LinkedIn profiles?
1. using a webcrawler and make use of a search engine
2. using services online (API's)
I am going to use a webcrawler to get results with a SERP (Search Engine).
You could also use one of the many API's from rapidapi or others. But most are POST requests and you can only pay by CreditCard over there.
And could be really expensive if you want to do a lot of searches.
Results are mostly in JSON.
I like GET (because I can change the input myself in the URL). And it is easier to integrate in some software for me. I have no clue how Postman works ;)
I found some other services like https://serper.dev/
Results in JSON with a GET request.
I use Octoparse to handle the results (they have a free plan).
You could search Google with them.
The playground is nice.
You can pay by CreditCard, PayPal or Google Pay.
I tried with 50.000 credits for 50 USD.
Results were very fast and good.
Also I used multithreading (more requests at the same time, so not one after another, that was quicker to get the results in).
10 results cost me 1 credit.
20, 30, 40, 50 or 100 results cost me 2 credits.
If I choose 100 results, I can do 500 searches for 1 USD.
50 USD gives me 25.000 searches with the maximum of 100 results per search (maximum 2,5 million results in total).
Also for Google Maps, if you want to search companies.
Or you could use a search engine and generate queries yourself.
If I choose duckduckgo, you can set up some parameters.
Search parameters.
https://duckduckgo.com/duckduckgo-help-pages/settings/params/
https://html.duckduckgo.com/html/?q={search terms, like word1+word2+word3}
https://html.duckduckgo.com/html/?q=linkedin.com+ceo+ninox
or for Dutch
https://html.duckduckgo.com/html/?q=linkedin.com+ceo+ninox&kl=nl-nl
I merge some words online.
A online tool is:
https://www.toptal.com/marketing/mergewords
But there are more (just search for it).
The merging is done in your browser. It get stuck with a lot of combinations.
For a lot of combinations you could better use a desktop tool. Like Keyword Combiner from Vovsoft.
https://vovsoft.com/software/keyword-combiner/
Toptal:
First fix the generated URLs.
Change space for a plus sign.
I use TextPad (free to use) for that.
Copy paste the URLs in a webcrawler.
I use Netpeak Spider in this example.
Only interested in external links.
Some settings in Netpeak Spider.
No cookies are stored in Duckduckgo, so you can turn it off.
The User agent is important, if you choose the 'wrong' one. Results will be unreadable.
You can choose to work with your own IP address (adjust the max amount of threads).
Or work from behind a VPN and let the tool work over the VPN.
Or work with proxies.
To see the results and filter.
You could do an export on this filtered results.
Output in CSV.
Per page it shows 1 million results.
If you filter it, you see the filtered out results of that million.
So if you have 3 million results, you have 3 pages.
When filtered, you still have 3 pages with filtered results.
You have to export per page.
You could also export from the main window.
You can also export in a different way.
External links (XL)
All unique URLs & anchors (XL)
Second part: Clean up the found results (split into LinkedIn profile, full name, gender, first name, last name, job position, company).
Using Rons Data Edit for that.
You might clean up your data.
I use Rons Data Edit for that. A great CSV editor.
https://www.ronsplace.ca/Products/RonsDataEdit
I am only interested in URLs containing Linkedin.com
And the anchor must contain text.
I filter more.
Like only linkedin.com/in/ for personal profiles
or linkedin.com/company/ for company profiles.
Then you save that filtered view.
Otherwise it saves the whole (unfiltered) file.
You don't want to have duplicate content.
Easy deleting of duplicate rows.
As a test, I added some files containing LinkedIn profiles.
Didn't do much of filtering and deleting duplicate rows (was a quick test).
I split up the big file into smaller files (containing 100.000 rows each).
Now you can search in the results.
You could even do some more filtering on the file.
I want to only have personal linkedin profiles (linkedin.com/in/) and finding a person (anchor - and | Linkedin)
You can split up the columns to get full name, job position and company.
Some results don't have a job position.
I first save the file (save view).
And then open the filtered file and continue to filter more.
It is a bit easier, because the file is smaller now (so filtering and splitting up works faster).
We filter and split again.
Filtering on text (only a company name and no job position).
We make a new column with company name.
Filtering again
I cut the content and paste it into column Company name.
Now you do the other part.
The result:
Some cells did split up the wrong way.
I fix that manually, it is not that much. Just cut and paste text.
We remove the space from beginning in a cell.
We delete all beginning with | LinkedIn (column Company name).
Delete the new column.
Splitting up the full name into first name and last name.
I have to think about something how to split on 'tussenvoegsels' like Pieter de Bruin 'de'. In the Netherlands we have 333 sorts of that (Belgium, Germany, South-Africa and many other countries have it too).
To add gender we could use genderize.io
We make a new column with gender.
Use the Fill Web API.
Press Fetch Mappings
Now you can determine which Source Paths you want to add.
I choose only gender.
Press Fill
It gets filled within seconds.
3 replies
-
Just tested.
52.606 requests made.
Time: 1h 9m
Speed: 12,70 URLs
Only 2 URLs with error (was because there was a ! in the query text).
More speed is possible.
More threads.
No delay between requests.
I managed to get it a 30 - 35 URLs.
Total external links: 6.803.669
Will now do some cleaning up of the CSV file and share those results with you.
-
91.983 filtered unique results.
-
Another step
From my database I made a list with lastname and company name.
I want to search the list (or more files in a folder) with a Regex and copy paste the found results into my database.
The Regex:
\b(WORD1)\W+(?:\w+\W+){1,4}?(WORD2)\b
WORD1 = Achternaam
WORD2 = bedrijfsnaam
I added some text on the front and end.
And after that I merged the 2 columns into 1.
As a test I am searching a folder with 8 files, total of 342 MB.
Testing with EmEditor
https://www.emeditor.com/emeditor-core/emeditor-v21-7-0-released-including-technical-review/
https://www.emeditor.com/text-editor-features/coding/batch-replace/
https://www.emeditor.com/text-editor-features/coding/find-replace/
Using the batch function to search with Regex.
The file must be saved as TSV and imported in EmEditor.
The search is really fast.
Be sure that the Regex contains WORD1 and WORD2
I made a mistake and left WORD2 empty.
It consumed a lot of RAM and got all data fields from another file.
\b(WORD1)\W+(?:\w+\W+){1,4}?(WORD2)\b
To filter only the results.
Result
First fix the data.
Moving some cells.
I want to move the names to right.
Filter on text does not contain ===
This filter out the cell with the Regex.
Now you see only the names.
I cut them and past them 2 columns to the right.
Result
I select the names, cut them and paste them one row up.
Now the Regex with result is on the same row.
Now I merge column 1 and column 2 again (they got split up because of the comma).
I added the Regex to my database (for matching results).
My database
I named the columns.
Select all in the file with the results and copy (control A and control C).
Now go to the database file and choose JOIN.
Now import the result from clipboard.
You can choose for 'Ignore space and symbols'
The first 3 columns are from my database.
Followed by the generated Regex (to match on).
Now you can check if the found data is correct.
Also that can be done automatically.
Like:
Matching on existing first name.
Or copy initials and make it only a first letter and copy the column with first name and make also the first letter. Match on that.
Now I searched with the Regex on complete company names like Pietje Puk B.V.
Or ACME Ltd.
You can shorten the company name to get better matches (B.V. or Ltd. is not allways mentioned in the LinkedIn anchor text).
Next time we do a match on emailaddresses.
Content aside
- 10 mths agoLast active
- 3Replies
- 62Views
-
1
Following