Crawl external websites and perform actions using Symfony's BrowserKit
There might be some situations where you would want to crawl a third-party website and perform some actions right from your PHP codebase. For instance, submitting forms, logging into your account, clicking links, etc. to add some degree of automation in your workflow.
- The BrowserKit Component
- Creating browser instance
- Submitting forms
- Get current page’s content
- Go back and forth in your history
- Practical usage
- In closing
The BrowserKit Component
Symfony’s BrowserKit component simulates the behavior of a web browser, allowing you to make requests, click on links, and submit forms programmatically.
It lets you do this by providing a
Symfony\Component\BrowserKit\HttpBrowser class using which you can create a browser instance and then providing it HttpClient component to it. Let’s see how.
The first thing you would need to do is install
HttpBrowser components in your project. You can do this using the following commands.
$ composer require symfony/browser-kit $ composer require symfony/http-client
Creating browser instance
Once installed, you can now create a browser instance using the
Symfony\Component\BrowserKit\HttpBrowser to create the client that will make the external HTTP requests like so.
use Symfony\Component\BrowserKit\HttpBrowser; use Symfony\Component\HttpClient\HttpClient; $browser = new HttpBrowser(HttpClient::create());
Now, you are all set to crawl external websites and perform actions over them.
Once the browser instance is created, you can use it as a “virtual browser” to navigate through the website. For instance, you can visit the site by using the
request() method, click a link using the
clickLink() method, or submit a form using the
So, if you want to simulate login into your GitHub account, you can do it like so.
$browser = new HttpBrowser(HttpClient::create()); $browser->request('GET', 'https://github.com'); // 'Log in' can be the text content, id, value or name of a <button> or <input type="submit"> $browser->clickLink('Sign in'); // the second optional argument lets you override the default form field values $browser->submitForm('Sign in', ['login' => '...', 'password' => '...']); // at this point you're logged into your account // get the current URL now $githubHome = $browser->getHistory()->current()->getUri();
As you can tell, the browser component interacts with the website content based on the text that this element contains. For instance, it clicks the link with text “Sign in”, Submits a form with form name or button name that contains “Sign in” text.
Notice also that, you can also pass in form parameters by providing an optional second argument to the
Get current page’s content
If you want to get the current page’s content, you can do it like so.
$browser->request('GET', 'https://github.com'); print_r(json_decode($this->browser->getResponse()->getContent()));
Go back and forth in your history
The browser requests also allow you to go back and forward in your history like so.
$browser->request('GET', 'https://github.com'); $browser->clickLink('Issues'); $homePage = $client->back(); $issuesPage = $client->back();
If you want to see the practical usage of this component, check Beyond Code’s fathom-analytics-api package which is the unofficial Fathom Analytics API. They have used BrowserKit to simulate the browser behavior in order to get some nice data out of it.
Check out BrowserKit’s offical documentation to learn more about it.