Kranthi's blogs: internals

Monday, 11 July 2016

WebDriver Architecture

All implementations of WebDriver that communicate with the browser, or a RemoteWebDriver server use a common wire protocol. This wire protocol defines a RESTful web service using JSON over HTTP.

So each WebDriver command is mapped to an HTTP method via the WebDriver service, and then passed on to the HTTP Command Processor to communicate with the browser. The Command responses are returned as HTTP/1.1 response messages via the WebDriver service.

Different drivers, such as the Firefox Driver and the IE Driver, have different implementations to accomplish the above.

The Selenium WebDriver architecture document linked below goes into further details on how these are implemented and how WebDrvier commands flow through to the browser and back. Read section 16.6 for details on the Firefox Driver.

Execution Flow:

Step 1. Automation Script is written using Client binding for different languages

Step 2. Once Test execution is started

Step 3. Browser driver is created

Step 3. Respective Driver.exe file starts “HTTP server” and starts listening for HTTP requests.

Step 4. A new HTTP request Creates, as each Selenium Method contains is mapped to respected webDriver API

Step 5. HTTP request is sent to the HTTP server

Step 6. HTTP server determines the steps needed for implementing the Selenium command

Step 7. The implementation steps are sent to the browser

Step 8. The HTTP server send the execution status back to the test script

Thursday, 7 July 2016

Selenium RC architecture

Intro: Introduced in 2004 and lived until 2006 when WD born.

Selenium Remote Control is great for testing complex AJAX-based web user interfaces under a Continuous Integration system. It is also an ideal solution for users of Selenium IDE who want to write tests in a more expressive programming language than the Selenese HTML table format.

Main components:

- Client libraries: which provide the interface between each programming language and the Selenium RC Server.

A Selenium client library provides a programming interface (API), i.e., a set of functions, which run Selenium commands from your own program.

The client library takes a Selenese command and passes it to the Selenium Server for processing a specific action or test against the application under test (AUT).

The client library also receives the result of that command and passes it back to your program

- Selenium Server: Drives the browser, Embeds Selenium Core and Injects into browser, Interprets commands, sends back results to the client

which launches and kills browsers, interprets and runs the Selenese commands passed from the test program, and acts as an HTTP proxy, intercepting and verifying HTTP messages passed between the browser and the AUT.

The RC server bundles Selenium Core and automatically injects it into the browser. This occurs when your test program opens the browser (using a client library API function). Selenium-Core is a JavaScript program, actually a set of JavaScript functions which interprets and executes Selenese commands using the browser’s built-in JavaScript interpreter.

Architectural diagram: (Simplified)

Step 1: Client libraries communicate with the Server passing each Selenium command for execution.

Step 2: Then the server Launches browser and Injects ‘selenium core’ into browser from Server.

Step 3: Then Selenium + java code will be sent to RC server line by line and Interprets the commands.

Step 4: Corresponding javascript method will be executed

Step 5: Each request goes via HTTP proxy to overcome same-origin policy problem

Step 6: Corresponding application / web server handles and request and sends the response back

Step 7: response will be sent back to client library.

Advantages of RC:

1. Supports multiple languages like Java, Ruby, C#, Perl, Python etc.

2. Core Written In JavaScript

3. Supports all browsers

Disadvantages:

1. The same origin policy. It is a Javascript security policy that allows running the code only from the domain you're on. And since RC is fully written in Javascript, you can't easily switch between domains or work with some websites that redirect or use frames with content from many domains.

2. Because of another security policy in Javascript, you can't fill in <input type='file' /> inputs and have to use several workarounds.

3. You have to write your own methods when you need to wait for an element.

4. Execution is Slow, duet to multiple layers of interactions

5. Fails to mimic real life interaction ( Non Native Events)

6. Cannot maximize the browser really

Tip:

Does really RC server holds Selenium Core init?

See here