Challenge: M&A: predict possible future acquisition

M&A: predict possible future acquisition

According to Reuters 2014 will be marked by the growth of the M&A market


Mergers and acquisitions (M&A) is a class of economic processes of consolidation of business and capital occurring at the macro and micro levels, which result in appearance of a larger company instead of several smaller.

Acquisition is a bargain performed in order to establish control over a company by acquiring more than 30% of the share capital (stocks, shares, etc.), while maintaining judicial independency.

Solvers are invited to forecast the probability of the company being acquired in the coming year.

Time-Line of the competition

  • 09.06.2014     start of the competition
  • 22.08.2014     results of the competition and award winners ($5000 prize fund). Evaluation is based on the 2 criteria: evaluation by the functional (see below) and expert judgment (decision simplicity, reproducibility of algorithm, expert opinion). Both criteria are equivalent in choosing the winners.
  • 01.03.2015     the final results of the competition and award winners ($10,000 prize fund). By March 2015 the company’s financial year will be over, so the model will be tested on real data about what companies have been acquired. The criterion for awarding the main prize is this very objective data.

Data Description

Data is provided for the task in tables containing information about the companies on the following parameters:

1.  Cash and cash equivalents (columns 1–21)

2.  Inventories (columns 22–42)

3.  Total Current Assets (columns 43–63)

4.  Total Current Liabilities (columns 64–84)

5.  Total Assets (columns 85–105)

6.  Property, Plant and Equipment, Net (columns 106–126)

7.  Goodwill (columns 127–147)

8.  Short-Term  Debt (columns 148–168)

9.  Long-Term Debt (columns 169–189)

10.  Net Debt (columns 190–210)

11.  Total Liabilities (columns 211–231)

12.  Depreciation  and amortization (columns 232–252)

13.  CAPEX (columns 253–273)

14.  Net Sales (columns 274–294)

15.  Gross Margin (columns 295–315)

16.  EBITDA (columns 316–336)

17.  Dividend yield (columns 337–357)

18.  Market  Capitalization (columns 358–378)

19.  Gross Income (columns 379–399)

20.  Financial Costs (columns 400–420)

21.  Net Income (columns 421–441)

22.  Book Value (columns 442–462)

23.  Free Cash Flow (columns 463–483)

24.  Sector (columns 484)

All data is divided into three files:

  • File Train_contest.csv – training sample in which data is available for all the above parameters except  for the sector for the years 1994-2014 (21 values ​​for each attribute), and there are two columns with the answers:
    • Column with a binary value, showing whether the company was acquired;
    • Column in which, if the company was acquired, the date of the news about the acquisition is given, and otherwise  the column is blank;
  • File Valid_contest.csv – validation sample for which all parameters are unknown (replaced by NaN), starting with some of the X year. For this sample it is required to predict the probability of the news about the acquisitions during this year of the X.
  • File FinalTest_contest.csv – final test sample, which is structured like the validation one, however, the companies in these samples (validation and final test) are different. The result obtained by using the participant’s algorithm on this sample will be the result of his performances in the competition.

*It should also be noted that if the NaN value is met “inside” of the parameter (e.g., in some row in column 236 stands numeric value, in the column 237 – NaN, however, columns 238–252 have numeric value numeric), it must ne taken as lacuna in data and can not be interpreted as the year X, which has to be predicted. In addition, for a clearer understanding of the situation prevailing at some particular moment in the market, as additional information, participants are given weekly quotation S&P500 index for the 1994-2014 year.

The most popular economic indicators

Based on the available data, participants can figure out and use the following economic indicators:


Functional for evaluation solutions

Evaluation of submitted solutions will be done using the tool getQualityOfSolution (Answer, Ideal) (written in a programming environment MatLab). The idea implemented in this functional is based on the normalized least quality ranking objects (normalized Discounted Cumulative Gain, nDCG), which can be expressed by the following formula:


where SortedIdeal – vector of correct responses, sorted in descending order of probability in the Answer, and OnesZerosIdeal – descending sorted vector of correct answers (first go all the units, followed by all zeros).

!!!Important note: in case of coincidence of predicted probabilities for a certain set of companies, after sorting they will go in the same order as the vector Answer!!!

Example calculation of solution quality

If the vector response-probability is (0.5, 0.8, 0, 0), and the ideal vector responses-acquisitions is (1, 0, 1, 0), then the quality of the solutions can be calculated using the formula:



How to take part:

Begin to address the problem


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s