Michal Kapusta

Investment Research | Finance | R programming

Is fine Art a fine Investment? Let's find out using web scraping!

Idea

The inspiration to write this article was a documentary about the life of a famous “con” artist Mr. Beltracchi link which got me curious about art trading & auctions. I often hear the media saying: “Invest in fine Art, it is a great investment!”. In the finance world, banks offer clients services such as Art Consulting. The service offers advice on buying and selling fine art. The key to a great advice is to know which Artist’s artwork is increasing in value over time and generate profits at the auctions and which one doesn’t.

This analysis shows evidence an art investor might use when making investment decisions based on actual transaction data. Structured transaction data, trends or patterns are non existent so in order to find them I have used handy web scraping techniques to start with. I have then analyzed the data and created interactive charts that hopefully provide insights into the world of Slovak art auctions.

Approach

  1. Webscrape all the auctioned paintings from the SOGA auction house website (rvest, stringr)
  2. Tidy the raw data into the tidy form (tidyverse, rebus, purrr)
  3. Use tidy data as base for the analysis (tidyverse)
  4. Conclusions
  • the MVF (Most Valuable Function) is the safely() function from the package purrr!

Part I: The webscrape

The primary source of information is the website of Slovak national gallery auction house: https://www.soga.sk (aka SOGA). The website is filled with data that can be categorized as:

  • grouped & labeled auctions (since 2005)
  • all auctioned items are listed
  • artist, artwork name, starting price, final price data are listed

Before we start web scraping process the website has to allow this behavior. More information about using robot.txt here

library(robotstxt)
robotstxt::robotstxt("soga.sk") %>% print()

Great! The website enables the scraping of the data.

First step is to gather the data and build a database that contains all the data available from the website.

Next step is to identify the elements we want to scape. Selectorgadget identifies web elements containing auction information.

url <- "http://www.soga.sk/aukcie-obrazy-diela-umenie-starozitnosti/aukcie/vysledky-aukcii"

result <- url %>% read_html() %>%
        html_nodes("p.results") %>% 
        html_text()

time <- url %>% read_html() %>%
        html_nodes("p.about") %>% 
        html_text()

link <- url %>% read_html() %>%
        html_nodes("h2 a") %>% 
        html_attr("href") 

df <- cbind(time, result,link) %>% as.tibble()

This chunk of code records following table:

df %>% head() %>% knitr::kable()
time result link
138. zimná aukcia výtvarných diel a starožitností, 05.12.2017 Na aukcii bolo vydražených 159 diel (55%) v celkovej hodnote 499 745 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/138-zimna-aukcia-vytvarnych-diel-a-starozitnosti
137. večerná aukcia, 14.11.2017 Na aukcii bolo vydražených 45 diel (50%) v celkovej hodnote 378 900 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/137-vecerna-aukcia
136. jesenná aukcia výtvarných diel, 26.09.2017 Na aukcii bolo vydražených 152 diel (57%) v celkovej hodnote 198 940 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/136-jesenna-aukcia-vytvarnych-diel
135. letná aukcia výtvarných diel a starožitností, 13.06.2017 Na aukcii bolo vydražených 164 diel (50%) v celkovej hodnote 276 625 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/135-letna-aukcia-vytvarnych-diel-a-starozitnosti
134. večerná aukcia, 25.04.2017 Na aukcii bolo vydražených 47 diel (51%) v celkovej hodnote 665 600 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/134-vecerna-aukcia
133. jarná aukcia výtvarných diel, 07.03.2017 Na aukcii bolo vydražených 129 diel (64%) v celkovej hodnote 212 230 €. /aukcie-obrazy-diela-umenie-starozitnosti/aukcie/133-jarna-aukcia-vytvarnych-diel

Perfect! The successful scrape generated the table with information about:

  • column time: name of the action & date of the auction
  • column result: # of auctioned items, % of sold items, auction turnover
  • column link: contains the weblink to the actual auction.

Unfortunately, the data are still locked in the records. The Information is extracted using string manipulation (stringr, rebus) into separate columns:

  • date of the auction
  • number and name of the auction
  • number of artwork entering auction
  • success rate of the auctions
  • overall sum of all auctioned items
# extract time information 
ptrn <-  ", " %R% one_or_more(DIGIT) %R% "." %R% one_or_more(HEX_DIGIT) %R% "." %R% one_or_more(HEX_DIGIT) 
df$c_time <- df$time %>%
        str_extract(pattern = ptrn) %>%
        str_replace(", ","") %>%
        lubridate::dmy()

# extract name information 
df$c_name <- df$time %>%  str_replace(ptrn,"") 

# extract link information
df$c_link  <- paste0("http://www.soga.sk/",df$link)

# extract percentage information
ptrn_2 <-  "(" %R% one_or_more(DIGIT) %R% "%)" 
df$c_rate <- df$result %>%
        str_extract(pattern = ptrn_2) %>%
        str_replace("%","") %>%
        as.numeric()

# clean items information
ptrn_3 <-  one_or_more(DIGIT) %R% " diel" 
df$c_items <- df$result %>%
        str_extract(pattern = ptrn_3) %>%
        str_replace(" diel","") %>%
        as.numeric()

# clean price information
ptrn_4 <- one_or_more(DIGIT) %R% SPACE %R% one_or_more(DIGIT) %R% " €" 
df$c_price <- df$result %>% str_extract(pattern = ptrn_4) %>%
        str_replace(" ","") %>% 
        str_replace("€","") %>% 
        as.numeric()

# tidy dataframe & calc average price
df_tidy <- df %>%
        select(contains("c_")) %>%
        mutate(year = lubridate::year(c_time),
               quarter = lubridate::quarter(c_time),
               qtr = paste0(year,".",quarter),
               c_price_avg = c_price / c_items)


df_tidy %>% 
        select(-year,-qtr,-quarter) %>% 
        head() %>%
        knitr::kable()
c_time c_name c_link c_rate c_items c_price c_price_avg
2017-12-05 138. zimná aukcia výtvarných diel a starožitností http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/138-zimna-aukcia-vytvarnych-diel-a-starozitnosti 55 159 499745 3143.050
2017-11-14 137. večerná aukcia http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/137-vecerna-aukcia 50 45 378900 8420.000
2017-09-26 136. jesenná aukcia výtvarných diel http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/136-jesenna-aukcia-vytvarnych-diel 57 152 198940 1308.816
2017-06-13 135. letná aukcia výtvarných diel a starožitností http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/135-letna-aukcia-vytvarnych-diel-a-starozitnosti 50 164 276625 1686.738
2017-04-25 134. večerná aukcia http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/134-vecerna-aukcia 51 47 665600 14161.702
2017-03-07 133. jarná aukcia výtvarných diel http://www.soga.sk//aukcie-obrazy-diela-umenie-starozitnosti/aukcie/133-jarna-aukcia-vytvarnych-diel 64 129 212230 1645.194

Clean dataset reveals important information about auction house performance over the period of years 2001-2017.

df_summary <- function(data, ...) {
        
        group_var <- quos(...)
        
        data %>% 
                group_by(!!!group_var) %>%  # practise of tidy eval concepts :P
                summarise(s_price = sum(c_price),
                          s_items = sum(c_items),
                          c_price_avg = mean(c_price_avg)) 
        
}

# chart overview
df_tidy %>%
        df_summary(year) %>%
        rename(average_price = c_price_avg , 
               number_of_items = s_items,
               auction_turnover = s_price) %>% 
        gather(Ratio, Value, 2:length(.)) %>% 
        ggplot(aes(year, Value, group = Ratio)) +
        geom_line(alpha = .3)  +
        theme_minimal(base_family = "Verdana",
                      base_size = 12) + 
        facet_wrap(~Ratio,scales = "free") +
        tidyquant::geom_ma(n = 4,
                           color = "red",
                           linetype = 1,
                           size = 1) + 
        labs(title = "SOGA: Overview ",
             subtitle = "by indicator",
             caption = "source: www.soga.sk",
             y = "")

Data shows SOGA turnover is growing steadily from 2004. Additionally, the average price per painting sold at auction is growing. Average price grew from ~1200€ to current ~5000€ in 14 years. This means either more expensive paintings are being sold or the prices of the artworks are growing. A calculation reveals 10% CAGR in prices per painting for the time period.

A byproduct of the scrape is the URL to every single auction in SOGA history (for example here. This link leads to every single item auctioned at that particular auction.

Next step is taken in order to:

  • scrape all the websites dedicated to listing items sold at one particular auction (user need to click several times next page in order to see all the items)
  • scrape the URLs of the particular items traded on the auction (download the individual URL link of the items)
  • scrape the details regarding starting price, final price, artwork name etc.

These steps can be executed using following code:

extract_auction_length      <- function(url) {
        ptrn <- "page=" %R% one_or_more(DIGIT)
        
        num <- url %>% read_html() %>%
                html_nodes("#auctionArtworks~ .pager .last") %>% 
                html_attr("href") %>% 
                str_extract(ptrn) %>% 
                str_replace("page=","") %>% 
                as.numeric()

        df <- paste0(url,"?page=")
        
        df_seq <- seq(1:num)
        
        df_final <- paste0(df,df_seq)
        return(df_final)
        
} # download the auction websites urls
extract_page_content        <- function(url) {
        print(url)
        
        Sys.sleep(sample(seq(1, 3, by=0.001), 1))
        
        article_list <- url %>% read_html() %>%
                html_nodes("h2") %>% 
                html_nodes("a") %>% 
                html_attr("href") 
        
        article_list_corrected <- paste0("http://www.soga.sk",
                                         article_list)
        
        return(article_list_corrected)
        
} # download the auction items urls
extract_article_content     <- function(url) {
        print(url)
        
        Sys.sleep(sample(seq(1, 3, by=0.001), 1))
        

        art_name <- url %>% read_html() %>% 
                html_nodes("h2 a") %>% 
                html_text("href")
        
        art_piece <- url %>% read_html() %>% 
                html_nodes(".wrapper a") %>% 
                html_text("href")
        
        art_cat <- url %>% read_html() %>% 
                html_nodes(".col1") %>% 
                html_text("href") %>% as.tibble()
        
        art_values <- url %>% read_html() %>% 
                html_nodes(".col2") %>% 
                html_text("href") %>% as.tibble()
        
        art_df <- bind_cols(art_cat,art_values) %>% as.tibble()
        
        art_df$name  <- art_name
        art_df$piece <- art_piece
        
        return(art_df)
        
        
} # download the auction item content

df_all_pages                <- map(df$c_link,extract_auction_length) %>%
                                unlist()

df_all_pages_content        <- map(df_all_pages,extract_page_content) %>%
                                unlist()

df_all_pages_content_vec    <- df_all_pages_content %>%
                                pull()

df_all_article_content_sf   <- map(df_all_pages_content_vec,
                                  safely(extract_article_content))

Part II: Raw data into tidy data

The downloaded wide raw data have the following structure:

df_soga <- df_soga %>%
        rename(variable = value, value = value1) %>% 
        drop_na()  
str(df_soga)
## Classes 'tbl_df', 'tbl' and 'data.frame':    47635 obs. of  4 variables:
##  $ variable: chr  "Poradové číslo:" "Vyvolávacia cena:" "Konečná cena:" "Poradové číslo:" ...
##  $ value   : chr  "1" "400 €" "320 €" "2" ...
##  $ name    : chr  "CPIN ŠTEFAN (1919 - 1971)" "CPIN ŠTEFAN (1919 - 1971)" "CPIN ŠTEFAN (1919 - 1971)" "HÁLA JAN (1890 - 1959)" ...
##  $ piece   : chr  "Dievča" "Dievča" "Dievča" "Dievča na lúke (Leto)" ...

The variable column contains the following information:

  • Article Number (Poradové číslo)
  • Starting Price (Vyvolávacia cena)
  • Final Price (Konečná cena)

Importantly, everyday items (such as sculptures) are sold besides paintings. These items will be removed as they are not the target of this analysis.

For more broader audience I will translate the collum names into the English language.

df_soga$variable <- if_else(df_soga$variable == "Cena v predaji:", "sale_price",
                         if_else(df_soga$variable == "Konečná cena:", "final_price",
                          if_else(df_soga$variable == "Nevydražené", "unsold",
                          if_else(df_soga$variable == "Odhadovaná cena:", "assumed_price",
                          if_else(df_soga$variable == "Poradové číslo:", "item_number",
                          if_else(df_soga$variable == "Vyvolávacia cena:", "starting_price",df_soga$variable)))))) # works but looks ugly - sorry folks

The code above is five times nested if_else command - something I am not particularly proud of. It might be a more elegant way to code this but I haven’t found it yet.

Next step is to create an ID columns that identify author & artwork and spread the data from long to wide - in line with the tidy data manifest written by Hadely Wickham.

df_soga_wide <- df_soga %>%
        mutate(id = paste(name, piece)) %>%  # create unique id per row
        split(.$id) %>%  # split the data per unique artwork
        map(safely(spread),variable, value) %>% # spread the values - use safely to prevent errors
        map_df("result") # filter list with suscessfull results 

glimpse(df_soga_wide)

Now, we have one row per piece and variables stored in columns. This enables a quick and consistent way of working the data. Let’s take a glimpse into the data and let’s find out the most frequently sold author at the auction house.

df_soga_wide$name %>%
        table() %>%
        as.tibble() %>%
        arrange(desc(n)) %>% 
        set_names(c("artist", "count")) %>% 
        top_n(10) %>% 
        knitr::kable()
## Selecting by count
artist count
NEZNÁMY STREDOEURÓPSKY MALIAR 349
MEDNYÁNSZKY LADISLAV (1852 - 1919) 186
BENKA MARTIN (1888 - 1971) 161
HLOŽNÍK VINCENT (1919 - 1997) 151
ZMETÁK ERNEST (1919 - 2004) 133
MEDVECKÁ MÁRIA (1914 - 1987) 119
WEINER-KRÁĽ IMRO (1901 - 1978) 118
Mednyánszky Ladislav (1852 – 1919) 117
KOMPÁNEK VLADIMÍR (1927 - 2011) 111
HÁLA JAN (1890 - 1959) 108

Great! Now we see that the most frequently sold artist was an unknown central European artist with around 360 sold paintings.

Unfortunately the data is still not tidy. My poorly written web scraping code is responsible for this messy dataset. (I am sure there is a better R code to scrape data more accurately)

Now, lets tidy up columns containing the messy data. Columns are still character variables and errors prohibiting transformation into tidy are:

  • Columns with price data contains symbols currency symbol €
  • Blank space before, after and in between the values
  • Date of birth & death of the artist are not in separate columns
  • Name of the artist contains his living years

Let’s remove the errors using string manipulation techniques!

# tidy the columns with variables containing price informations

df_soga_tidy <- df_soga_wide %>%
        modify_at(c(5,6), ~ str_replace_all(.,"€","")) %>% # modify_at helps identify collums more efficiently then naming them using mutate
        modify_at(c(5,6), ~ str_replace_all(.," ","")) %>% 
        modify_at(c(5,6), ~ str_replace_all(.,"Neurčená","")) %>% 
        modify_at(c(5,6), ~ str_replace_all(.,",","")) %>% 
        modify_at(c(5,6), ~ str_replace_all(.,"EUR","")) %>% 
        modify_at(c(5,6), ~ str_trim(.)) %>% 
        modify_at(c(5,6), as.numeric)  

# tidy the column names 
colnames(df_soga_tidy) <- df_soga_tidy %>% 
                        colnames() %>%
                        str_to_lower() %>%
                        str_replace_all(":","") %>%
                        str_replace_all(" ","_")

# tidy the date of birth & death into two separate collumns
living_years <- str_extract_all(df_soga_tidy$name, one_or_more(DIGIT),simplify = T) %>% 
        as.tibble() %>% 
        set_names(c("birth_year","death_year","birth_year_I","death_year_I")) %>% 
        select("birth_year","death_year") %>% 
        modify_if(is.character, as.numeric)


df_soga_tidy <- df_soga_tidy %>% 
                bind_cols(living_years)

# remove years from name column
df_soga_tidy$name <- df_soga_tidy$name %>% 
        str_replace_all(pattern = DIGIT,"") %>% 
        str_replace_all("\\)"," ") %>%
        str_replace_all("\\(","") %>%
        str_replace_all("-","") %>% 
        str_replace_all("–","") %>% 
        str_trim() %>% 
        str_to_upper()

# tidy & include assumed_price in the final price
df_assumed_price_tidy <- df_soga_tidy$assumed_price %>% 
        str_replace_all("-", "/") %>% 
        str_replace_all("– ","/") %>% 
        str_replace_all("€","") %>% 
        str_replace_all("EUR","") %>% 
        str_replace_all(",00","") %>% 
        str_replace_all(",","") %>% 
        str_replace_all(space(),"") %>% 
        str_split("/",simplify = T) %>% 
        as.tibble() %>% 
        modify(as.numeric) %>% 
        mutate(assumed_price = (V1 + V2) / 2) %>% 
        select(assumed_price) 

df_soga_tidy <- bind_cols(dplyr::select(df_soga_tidy, -assumed_price),
                          df_assumed_price_tidy) 


df_soga_tidy %>% glimpse()
## Observations: 15,978
## Variables: 10
## $ name           <chr> "/JIŘÍ/ GEORG DOKOUPIL", "A D.WEGENER   ?", "A....
## $ piece          <chr> "Erotický motív", "Horská krajina", "Česká kraj...
## $ id             <chr> "/JIŘÍ/ GEORG DOKOUPIL (1954) Erotický motív", ...
## $ item_number    <chr> "154", "178", "291", "207", "201", "192", "214"...
## $ starting_price <dbl> 465, 929, 597, 996, 1062, 398, 664, 597, 830, 4...
## $ final_price    <dbl> NA, NA, NA, 1394, NA, NA, NA, 597, NA, NA, 1461...
## $ sale_price     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ birth_year     <dbl> 1954, 1891, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ death_year     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...
## $ assumed_price  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,...

Much better!!! With the above-written code we have achieved:

  • name more variable contains just the name of the artist
  • birth_year, death_year column created
  • assumed price now contains mean value of the range stated
  • numeric vector instead of character vector in the columns containing information about price

Since we have some tidy data now, let’s create some new ratios.

df_soga_tidy <- df_soga_tidy %>% 
        mutate(starting_price = if_else(is.na(starting_price), assumed_price, as.numeric(starting_price)), # if variable starting_price is NA use assumed_price instead 
                diff_price_rel = ((final_price - starting_price) / (starting_price)), # relative difference between starting price and final price
               diff_price_abs = final_price - starting_price ,  # absolute difference between starting price and final price
               bin_price = ntile(starting_price, 4),  # create bins by starting price 
               bin_diff_price_rel = ntile(diff_price_rel, 4)) # create bins by relative profit & losses

Part III: Analysis of the data

With tidy data lets focus on analyzing the data. Once the data has been cleaned up analysis can be performed. This part is always a cherry on the cake after the painful process of data gathering & cleaning. (Jenny Bryans quote: “Behind every great plot there’s a great deal of wrangling”) - is an accurate one. Link here

First, let’s look at the highest prices paid for painting on auction:

df_soga_tidy %>% 
        drop_na(final_price) %>% 
        top_n(wt = final_price,n = 10) %>% 
        ggplot(aes(reorder(piece, final_price),final_price)) +
        geom_col() +
        coord_flip() + 
        labs(title = "Highest price paid per painting",
             caption = "Source: www.soga.sk",
             y = "Final Price in €",
             x = "Artwork") + 
        theme_minimal(base_family = "Verdana",base_size = 12)  + 
        scale_y_continuous(labels = scales::comma) +
        theme(legend.text  = element_text(size = 6),
                legend.title = element_text(size = 6),
              legend.position = "bottom")

The highest auctioned painting was for the “Chlapec s Dalmatinom” by DOMENICHINO DOMENICO ZAMPIERI link. Interestingly it failed to reach its owner price of 663 878 EUR and was sold at a loss of (-132 775 EUR or -19%) for final price of 531 103 EUR.

The second chart shows paintings sold at the highest relative profit:

df_soga_tidy %>% 
        drop_na(diff_price_rel) %>% 
        top_n(wt = diff_price_rel,n = 10) %>% 
        ggplot(aes(reorder(piece, diff_price_rel),diff_price_rel)) +
        geom_col() +
        coord_flip() + 
        theme_minimal(base_family = "Verdana",base_size = 12) + 
        labs(title = "Highest profit generated by painting (in %)",
             y = "Profit (in %)",
             x = "Artwork",
             caption = "Source: www.soga.sk") + 
        scale_y_continuous(labels = scales::percent) 

Surprisingly, the highest profits were in the range of 10x - 15x fold return. This is a great result for the investor. The most profitable painting was - “Zátišie s ľudovým džbánom” by KRIVOŠ RUDOLF ( link ). This painting was also an outlier as rest of KRIVOŠ paintings generated far lower returns.

Another point of view is to visualize the most profitable paintings in the absolute terms.

df_soga_tidy %>% 
        drop_na(diff_price_abs) %>% 
        top_n(wt = diff_price_abs,n = 10) %>% 
        ggplot(aes(reorder(piece, diff_price_abs),diff_price_abs)) +
        geom_col() +
        coord_flip() + 
        theme_minimal(base_family = "Verdana",base_size = 12) + 
        labs(title = "Highest profit generated by painting (in €)",
             y = "Profit (in €)",
             x = "Painting",
             caption = "Source: www.soga.sk") + 
        scale_y_continuous(labels = scales::comma) 

Painting Po žatve was the paiting sold at the highest profit of about 85 000 EUR link. The profit from this one transaction could by one bedroom flat in the city.

Now, let’s visualize the result of the SOGA auctions:

df_loss_profit <- df_soga_tidy %>% 
        drop_na(final_price, bin_price)


pl <- plotly::plot_ly(df_loss_profit,
                x = ~starting_price,
                y = ~ final_price,
                size = ~ final_price,
                split =  ~ bin_diff_price_rel,
                colors = ~ bin_diff_price_rel,
                text = ~paste("Name: ", name, "\n",
                              "Artwork:",piece,"\n",
                              "Starting Price: ", scales::comma(starting_price), "\n",
                              "Final Price:",scales::comma(final_price),"\n",
                              "Return (%):",scales::percent(diff_price_rel),"\n",
                              "Return (€):", scales::comma(diff_price_abs))) %>%
  layout(title = "Profits & Losses @ SOGA Auction House", 
         xaxis = list(title = "Starting Price",range = c(0, 200000)),
         yaxis = list(title = "Final Price",range = c(0, 200000)))

widgetframe::frameWidget(pl,width = "100%",height = 550)

After playing the the HTML widget for few minutes i have several observation. Some paintings failed to meet the expectation and were traded well below the asking price. Also important to note is that profits are getting smaller the more expensive the painting is.

Here is summary statistics with the most profitable artist sold at the SOGA auction house.

df_soga_tidy %>% 
        drop_na(final_price,diff_price_rel,diff_price_abs) %>% 
        group_by(name) %>% 
        summarise(profit_relative = median(diff_price_rel, na.rm = T) %>% scales::percent(),
                  profit_absolute = median(diff_price_abs, na.rm = T) %>% scales::comma(),
                  artwork_price = median(final_price, na.rm = T) %>% scales::comma(),
                  count = n(),
                  sum = sum(diff_price_abs)) %>% 
        filter(count > 5) %>% 
        arrange(desc(profit_relative)) %>% 
        dplyr::top_n(wt = profit_relative, n = 15) %>% 
        knitr::kable()
name profit_relative profit_absolute artwork_price count sum
LADISLAV GUDERNA 87.7% 2,257 4,149 7 11284
EDITA SPANNEROVÁ 86.7% 249 713.5 8 2754
BIZMAYER IGNÁC 85.6% 325 1,325 14 7820
STRUHÁR ALOJZ 8.33% 25 310 6 200
MARTIN SEDLÁK 8.28% 66 1,062 7 631
ZBYNĚK PROKOP 76.5% 50.5 116.5 6 302
JÚLIUS JAKOBY 74.2% 3,071 7,800.5 6 16000
FRANTIŠEK LIPTÁK 7.93% 100 746.5 12 1661
VINCENT HLOŽNÍK 7.49% 132 1,461 19 14736
DÁVID BAFFI 66% 49.5 99.5 6 397
JOZEF JANKOVIČ 64% 531 1,759 9 15469
FRANTIŠEK REICHENTÁL 62.5% 132 1,162 9 11021
KĽÚČIK PETER 60% 90 240 17 6700
MILOŠ ALEXANDER BAZOVSKÝ 6.31% 67 3,485 21 26457
KERN PETER JÚLIUS 5.56% 10 470 10 730

The table shows results sorted by the highest median relative profit. In other words - artist generating highest profits per painting is at the top. Based on this table Ladislav Guderna paintings generated average profit of 87%. The author was traded 7x times. Average price per painting was 4 149 EUR generating profit of 2 257 EUR. The total sum of profits generated by Guderna reached 11 284 EUR.

Similarly below are the authors generating the highest absolute profits.

name profit_relative profit_absolute artwork_price count sum
BENKA MARTIN 2.11% 100 9,700 109 332550
HÁLA JAN 0% 0 5,800 86 182950
MEDNYÁNSZKY LADISLAV 0% 0 3,500 208 153835
ĽUDOVÍT FULLA 2.71% 49.5 630.5 30 96959
ANTON JASUSCH 22.9% 498 5,394 28 62938
BAZOVSKÝ MILOŠ ALEXANDER 0% 0 5,200 57 61050
FULLA ĽUDOVÍT 12.5% 50 600 45 50640
JAKOBY JÚLIUS -1.52% -400 31,000 16 50500
WEISZKUBÍNČAN PETER ARNOLD 2.78% 50 2,850 12 44900
MOUSSON JOZEF TEODOR 0% 0 6,850 40 44360
PAŠTÉKA MILAN 0% 0 800 53 41715
LALUHA MILAN 0% 0 1,032.5 62 40175
MIKULÁŠ GALANDA 20% 99.5 647.5 16 39698
MILOŠ ALEXANDER BAZOVSKÝ 6.31% 67 3,485 21 26457
LADISLAV MEDNYÁNSZKY 1.11% 33 2,539 36 24595

As expected, the paintings of famous slovak artist Martin Benka generated the highest profit. Wiki page

The last chart displays distribution of the returns per artist.

df_soga_tidy %>% 
        drop_na(final_price,diff_price_rel,diff_price_abs) %>% 
        group_by(name) %>% 
        summarise(profit_relative = median(diff_price_rel, na.rm = T) ,
                  profit_absolute = median(diff_price_abs, na.rm = T) ,
                  artwork_price = median(final_price, na.rm = T) ,
                  count = n(),
                  sum = sum(diff_price_abs)) %>% 
        filter(count > 5) %>% 
        ggplot(aes(profit_relative))  +
        geom_histogram(bins = 30) + 
        theme_minimal(base_family = "Verdana",
                      base_size = 12) + 
        labs(title = "Distribution of median returns",
             subtitle = "by Author",
             source = "source: www.soga.sk")  + 
        scale_x_continuous(label = scales::percent)

The histogram shows that the most common profit is zero. This means that the majority of authors sell at their work at the starting price. Also, a high number of authors sell at a loss. The distribution of returns is showing positive skew - means the large portion of authors sell at no profit or at a loss. Making informed beds is well adviced!

PS: The results are filtered with the condition of the artist selling at least 5 paintings trough SOGA to limit the number of records and filter out paintings that don’t generate enough liquidity. Owning artwork of authors that are rarely traded presents a risk for the investor and is recommended to avoid.

Conclusions

  • Average price per painting has been increasing by 10% per annum over the past 15 years
  • Purchasing famous painter does not always guarantee a good investment.
  • Gudena, Spannerová or Ignác artwork generated consistently the highest returns ~ 80% per painting
  • Highest absolute profit was recorded by the artist Benka (ca 332.000€) followed by Ján Halla (ca 185.000€) or Ladislav Mednyánszky (174.000€)
  • The distribution of retruns is right-skewed. High proportion of Artists generate 0% or negative returns.

Known limitations

  • The analysis would be more accurate if the names of the authors were labeled consistently (to avoid duplicate records for the same artist)
  • Profits are calculated based on the auction starting price, which might not reflect actual profit for the owner. Owners might set starting price artificially high.