How do Browsers work?

How do Browsers work?

Under the hood of Browsers

The web browser is inarguably the most common portal for users to access the web. The advancement of the web browsers has led many traditional “thick clients” to be replaced by browser enhancing its usability and ubiquity.

The web browser is an application that provides access to the webserver, sends a network request to URL, obtain resources, and represent them in an interactive way.
Common browsers include Internet Explorer, Firefox, Google Chrome, Safari, and Opera.

Functionality of Web Browser

The browser presents the web resource you choose in the info window and proceed with user interaction. Basically, it is fetching, processing, displaying, and storing.

Structure of Web Browser

web-browser-structure

Fig. Web browser structure

  1. User Interface
  2. Browser Engine
  3. Render Engine
  4. Storage
  5. UI BackEnd
  6. JavaScript Interpreter (Scripting Engine)
  7. Networking (HTTP at minimum, FTP, SMTP, email e.t.c)

User Interface

It is a space where interaction between user and browser (application) occurs via the control presented in the browser. No specific standards are imposed on how web browsers should look and feel. The HTML5 specification doesn’t define UI elements but lists some common elements: location bar, personal bar, scrollbars, status bar, and toolbar.

Browser Engine

It provides a high-level interface between UI and the underlying rendering engine. It makes a query and manipulates the rendering engine based upon the user interaction. It provides a method to initiate loading the URL, takes care of reloading, back, and forward browsing action.

Rendering Engine

Rendering Engine is responsible for displaying the content of the web page on the screen. The primary operation of a Rendering engine is to parse HTML. Rendering engine by defaults displays HTML, XML, and images and supports other data types via plugin or extension.

rendering engine flow

Fig. Rendering engine flow

Rendering Engine flow
The modern browser uses different rendering engines.
Gecko : Firefox
Webkit : Safari
Blink : Chrome, Opera (version 15 onwards).

The web content is displayed through a series of the process:

HTML Data to DOM

The requested content from the networking layer is received in the rendering engine (8 kb chunks generally). The raw bytes are then converted to a character (based upon character encoding) of the HTML file.
Characters are then converted into tokens. Lexer carries out lexical analysis, breaking input into tokens. During tokenization, every start and end tags in the file are accounted for. It knows out how to strip out irrelevant characters like white space and line breaks.

The parser then carries out syntax analysis, applying the language syntax rule to construct the parse tree by analyzing the document structure.
The parsing process is iterative. It will ask lexer for new token and token will be added to parse tree if language syntax rule match. The parser will then ask for another token. If no rule matches, the parser will store the token internally and keep asking for tokens until rule matching all the internally stored token is found. If no rule is found, then the parser will raise the exception. This means the document was not valid and contained syntax errors.

These nodes are linked in the tree data structure called DOM (Document Object Model) which establishes the parent-child relationship, adjacent sibling relationships.

dom-tree

Fig. DOM Tree

CSS Data to CSSOM

Raw bytes of CSS data are converted into character, token, node, and finally in CSSOM (CSS Object Model). CSS has something called cascade which determines what styles are applied to the element. Styling data to the element can come from parents (via inheritance) or are set to the elements themselves. The browser has to recursively go through the CSS tree structure and determine the style of the particular element.

cssom-tree

Fig. CSSOM Tree

Combination of DOM and CSSOM to Render Tree

DOM tree contains the information about HTML elements relationship and the CSSOM tree contains information on how these elements are styled. Starting from root node the browser traverses each of the visible nodes. Some nodes are hidden (controlled via CSS) and not reflected in the rendered output. For each visible node, the browser matches the appropriate rule defined in CSSOM and finally, these nodes are emitted with their content and styling called Render tree.

render-tree

Fig.Render Tree

Layout

It then proceeds to the next level called layout. The exact size and position of each of the content should be calculated to render on a page (browser viewport). The process is also referred to as reflow. HTML uses a flow-based layout model, meaning geometry is computed in a single pass most of the time. It is a recursive process starting from the root element () of the document.

Painting

Each of the renderers is traversed and the paint method is called to display the content on the screen. The painting process can be global (painting the entire tree) or incremental (the render tree validates its rectangle on-screen) and OS generates the paint event on that specific nodes and the whole tree is not affected. Painting is a gradual process where some parts are parsed and rendered while the process continues with the rest of the item from the network.

JavaScript Interpreter (JS Engine)

JavaScript is a scripting language that allows you to dynamically update the web content, control multimedia, and animated images executed by the browser’s JS engine.DOM and CSSOM provide an interface to JS which can alter both DOM and CSSOM. Since the browser is unsure what particular JS will do, it will immediately pause the DOM tree construction after it encounters the script tag. Every script is a parse blocker; the DOM tree construction is halted.

The JS engine begins parsing the code right away after fetching from the server feeding into JS parser. It converts them into the representative object the machine understands. The object that stores all the parser information in the tree representation of the abstract syntactic structure is called an object syntax tree (AST). The objects are fed into an interpreter which translates those objects into byte code.
These are Just In Time (JITs) compiler meaning JavaScript files downloaded from the server is compiled in real-time on the client’s computer. The interpreter and compiler are combined. The interpreter executes source code almost immediately; the compiler generates machine code which the client system executes directly.

Different Browser uses different JS engines

Chrome: V8 Engine (Node JS was built on top of this)
Mozilla: Spider Monkey (formerly known as ‘Squirrel Fish’)
Microsoft Edge: Chakra
Safari: Nitro

UI Back End

It is used for drawing a basic widget like combo boxes and windows. Underneath it uses operating system user interface methods. It exposes a generic platform that is not platform-specific.

Data Storage

This layer is persistent which helps the browser to store data (like cookies, session storage, indexed DB, Web SQL, bookmarks, preferences, etc.) The new HTML5 specification describes a database that is a complete database in a web browser.

Networking

It handles all kinds of network communication within the browser. It uses a set of communication protocols like HTTP, HTTPs, FTP while fetching the resource from requested URLs.

Web Browser relies on DNS to resolve the URLs. The records are cached in the browser, OS, router, or ISP. If the requested URL is not cached in, the ISP DNS server initiates the DNS query to find the IP of that server. After receiving the correct IP address the browser establishes the connection with the server with protocols. The browser sends the SYN(synchronize) packet to the server asking if it is open for TCP connection. The server responds with ACK(acknowledgment) of the SYN packet using the SYN/ACK packet.

The browser receives an SYN/ACK packet from the server and will acknowledge by sending an ACK packet. Then TCP connection is established for data communication. Once the connection is established, data transfer is ready. To transfer the data, the connection must meet the requirements of HTTP Protocol including connection, messaging, request, and response rules.

Comparison of Browsers

There are many different web browsers in the market today. Although the primary application of the browser is the same, they differ from each other in more than one aspect. The distinguishing areas are platform(Linux, Windows, Mac, BSD, and other Unix), Protocols, Graphical User Interface(GUI), HTML5, open-source, and Proprietary, explained in details here.

Happy Browsing!

References

https://www.html5rocks.com/en/tutorials/internals/howbrowserswork/
https://grosskurth.ca/papers/browser-archevol-20060619.pdf
https://developers.google.com/web/fundamentals/performance/critical-rendering-path/
https://dev.w3.org/html5/spec-LC/

Leave a Reply

Your email address will not be published.