File Downloads With Selenium — Mission Impossible?

18 Comments

When starting to automate acceptance tests that include a web UI, you probably will hit a wall quite quickly: how to verify a document that is available for download against some criteria? If you tried that one you know: doing file downloads automatically seems to be a mission impossible … or … is it really?


Frequent weapon of choice for testing web UIs is Selenium. In the fight “Selenium vs. Download” there are actually two problems, which need to be solved:

  1. File download: The download dialog is native in all browsers and cannot be controled with JavaScript. Bad for Selenium: without the possibility for Selenium to control that dialog, it stays open and the test hangs.
  2. File transfer: Under the assumption that the first problem could be solved, there’s now a second problem: When the Selenium server is not running on the same machine as the test execution, the freshly downloaded file (in case the first problem was successfully resolved) still needs to be transferred between servers.

Approaches to solve the download problem

The problem of file downloads with Selenium can be tackled in various ways.

1. Window automation
The first approach smells like “brute force”: when searching the net for a solution to the problem, you easily end up with suggestions, to control the native window with some window automation software like AutoIt. Means you have to prepare AutoIt such, that it waits for any browser download dialog, the point at which Selenium is giving up, takes control of the window, saves the file, and closes the window. After that Selenium can continue as usual.

This might eventually work, but I found it to be techical overkill. And as it turned out, there was a much simpler solution to the problem.

2. Change the browsers default behaviour
The second possibility is to change the default behaviour of the browser. When clicking on a PDF for example, the browser should not open a dialog and ask the user what to do with the file, but rather save it without comments and questions in a predefined directory. To accomplish that, a file download has to be initiated manually, saved to disk and marked as the default behaviour for these file types from now on.

Well, that could work. You “only” have to assure that all developers, hudson instances, etc. share the same browser profile. And depending on the amount of different file types, that could be some manual work.

3. Direct download
Taking a step back, why do we want to download the file with Selenium in the first place? Wouldn’t it be much cooler, to download the file without Selenium, but rather with wget? You would have solved the second problem as you go. Seems a good idea, since wget is not only available for Linux but also for Windows.

Problem solved? Not quite: what about files, that are not freely accessible? What, when I first need to create some state with Selenium in order to access a generated file? The solution seems ok for public files, but is not applicable for all situations.

Conclusion: download problem
Finally we can conclude, that it is possible to download files, but it’s a piece of work and eventually new tools are necessary. But the first step is to get a working solution in place at all.

What was the other problem again?

Approaches to solve the file transfer problem

Well, admitted, that’s not a real problem. There’s FTP and everybody can do it. Nearly everybody. For our favourite tool for test automation, the Robot Framework, there’s not FTP library yet. So that’d require some quick library hacking, but it shouldn’t be that difficult.

Problem solved

In connection with the approaches 1 and 2 for the first problem, it’d be possible to completely solve the problem:

  • Download files with Selenium and save them to a directory that can be reached with FTP
  • FTP the file to the test execution server
  • Execute the checks against the file.

Phew … that looks like some work for a rather simple problem.

Taking another step back, I cannot get the wget solution our of my head. It looked simple, but was not a complete solution to the problem. How can we make it complete? All we need to do is to tell wget that it should continue from the same spot as Selenium left it. Can we do that? We can!

Final solution

Finally, the solution is simple and elegant, and I have to ask myself, why I have not thought about that earlier — a true indication that this is a simple solution.

How can you teach wget to continue from where Selenium left it? How does a web server know who is requesting a page or document: with the current session! You can pass wget the session ID with cookies and header parameters, so that wget can then access all the same files as the browser in the current Selenium session. Implemented as a keyword, it’s just two lines:

Keyword "Download File"

Download File  [Arguments]  ${COOKIE}  ${URL}  ${FILENAME}
  ${COOKIE_VALUE} =  Call Selenium API  get_cookie_by_name  ${COOKIE}
  Run and Return RC  wget --cookies=on --header "Cookie: ${COOKIE}=${COOKIE_VALUE}" -O ${OUTPUT_DIR}${/}${FILENAME} ${URL}

First a direct call to the Selenium API is made in order to read a certain cookie. The value is stored in a variable. Then in the second step, a new process is started. The keyword “Run and Return RC” waits until the new process finishes, which is the case when the file could be downloaded with wget. To make that happen, you have to have wget in your path somewhere, otherwise the test will fail. With the header parameter “Cookie:” wget will continue in the same session as Selenium, and gains access to the file. Voilà :)

The new keyword takes three parameters

  1. COOKIE The cookie that is read via selenium and then passed to wget. Usually this is an indicator for the current session
  2. URL The link to the file that should be downloaded
  3. FILENAME The new file name for the file that is placed in Robots output directory

Usage of the new keyword

The usage of the new keyword is nearly trivial, still here’s an example how to download a PDF, which will be placed as “file.pdf” in the output directory (where also the reports go). The cookie containing the session id is JSESSION.

Download File  JSESSIONID    http://<...>/web/pdf?id=4711  file.pdf

File Downloads With Selenium — Mission Possible!

Author

Andreas Ebbert-Karroum

Andreas Ebbert-Karroum

Share on FacebookGoogle+Share on LinkedInTweet about this on TwitterShare on RedditDigg thisShare on StumbleUpon

Kommentare

  • Yeah.. a surprising simple solution for a “test impediment” we carried on for 3 sprints.
    Well not really a surprise, because I believe there is a simple solution for most problems. you just need to have a structured process for tackling them.
    You nicely describe such a process.

    Now we still need to tackle iText parsing of more or less compatible pdfs :-)

  • Very nice!

    Instead of using wget you could have implemented a simple keyword with Python using urllib and/or urllib2 modules. This would remove the dependency to wget and thus make the approach platform independent. Should we actually add `Download file` keyword to SeleniumLibrary itself?

  • Andreas Ebbert-Karroum

    Hi Pekka,

    well … someone certainly could, I couldn’t :) wget was the first tool that got the job done, but having a keyword in SeleniumLibrary itself is a great idea, I just created issue 129 for that purpose.

    Andreas

  • August 6, 2010 von sasikumar

    iam newbiew ,can you tell me how to call the above from selenium?

  • Andreas Ebbert-Karroum

    Hi Sasikumar,

    being a newsbie is not a problem. Selenium itself does not offer any facility to download files, that’s why we fall back to another tool called “wget”.

    Andreas

  • November 15, 2010 von joe1288

    Hey,

    Great Post…
    Thats exactly what I was looking for.

    But I am not sure how to apply it. Where should I paste the Code for the “Download File” function?

    btw. Im using Selenium 1.08 IDE

    Thanks a lot

  • January 8, 2011 von kenberland

    After finally realizing that cookies marked aren’t available to getCookie in Selenium (and that you need to get them with captureNetworkTraffic() ) I used native java to get the file with the stolen session. Roughly:

    URL url = new URL(urlToGet);
    URLConnection conn = url.openConnection();
    HttpURLConnection httpConn = (HttpURLConnection) conn;
    httpConn.setAllowUserInteraction(false);
    httpConn.setConnectTimeout(CONNECT_TIMEOUT);
    httpConn.setReadTimeout(SOCKET_TIMEOUT);
    httpConn.setInstanceFollowRedirects(true);
    httpConn.setRequestMethod("GET");
    httpConn.setRequestProperty("Cookie", myCookie);
    httpConn.connect();
    response = httpConn.getResponseCode();
    if (response == HttpURLConnection.HTTP_OK) {
    	in = httpConn.getInputStream();
    }
    FileOutputStream myFileOutputStream = new FileOutputStream("/tmp/foobar");
    byte buf[]=new byte[1024];
    int len;
    while((len=in.read(buf))&gt;0)
    	myFileOutputStream.write(buf,0,len);
    myFileOutputStream.close();
    • Can you elaborate? What cookies are and are not available to Selenium? And how to find out (in automated way)?

  • July 29, 2011 von Simon Kelly

    In the case where the test is being executed on a separate machine how would you invoke wget (or anything else) on the remote machine from your test code?

    • Do you mean executed with Selenium Grid? Or just run on another machine? If the former, then wget wouldn’t work, need native HTTP requests to do the job. If the latter, you could deploy the test solution with wget (in fixed or relative path) wherever you plan to run the tests.

  • March 5, 2012 von kimlam8888

    It’s awesome.

    Regarding to the id when it’s auto generated from the URL, I’m using context menu and select “copy link location” then declare it into a URL variable that ready for wget command. It’s works right now then I can download the file :)

    Thank you Andeas!

  • December 19, 2012 von Lilian

    I can successfully download pdf files with selenium after configured the firefox profile like this:

    FirefoxProfile profile = new FirefoxProfile();
    profile.setPreference( "browser.download.dir", "/home/lilian/downloads" );
    profile.setPreference( "browser.download.folderList", 2 );
    profile.setPreference( "browser.helperApps.neverAsk.saveToDisk", "application/pdf" );

    driver = new FirefoxDriver( profile );

    The only problem is if you have installed a pdf reader that provides a firefox plugin (for example Adobe Reader). In that case, the pdf is not downloaded, it is opened in the browser. To solve this, I have completely uninstalled Adobe Reader and manually checked that when I click on a pdf link in my browser, it is downloaded and not opened.

    Regards

    • March 12, 2013 von sreevani

      Hi Lilian,

      i tried the below, bot not able to open the URL in the Firefox!! can you help me out?

      WebDriver driver;

      FirefoxProfile profile = new FirefoxProfile();
      profile.setPreference( “browser.download.folderList”, 0 );
      profile.setPreference( “browser.helperApps.neverAsk.saveToDisk”, “application/zip” );
      driver = new FirefoxDriver( profile );
      driver.get(“http://encodable.com/filechucker/”);

  • January 13, 2014 von Marcin Kowalczyk

    “For our favourite tool for test automation, the Robot Framework, there’s not FTP library yet.”

    There is one already available. It can be downloaded from sf (http://sourceforge.net/projects/rf-ftp-py/) or installed from pypi (pip install robotframework-ftplibrary)

  • Hi Andreas,

    Thanks for sharing the good info.

    As per my knowledge there is an other alternative for implementing download functionality using Selenium Webdrive
    i.e by creating custom profile at run time using selenium
    hear is the sample code snippet for implementing download functionality

    http://qaautomationworld.blogspot.in/2014/02/file-downlaoding-using-selenium.html

    Thanks

  • Hi Andreas,

    Thanks for sharing the good info.

    As per my knowledge there is an other alternative for implementing download functionality using Selenium Webdrive
    i.e by creating custom profile at run time using selenium
    hear is the sample code snippet for implementing download functionality

    http://qaautomationworld.blogspot.in/2014/02/file-downlaoding-using-selenium.html

    Thanks

    • Andreas Ebbert-Karroum

      Hi Raj,

      this is the option 2 that is also outlined above. The major drawback is in my opinion, that the downloaded file then resides on the server, that runs selenium, which might not be the same machine than the one running the automated tests.

      Kind Regards,
      Andreas

Comment

Your email address will not be published. Required fields are marked *