On June 7th, Data Soc was pleased to host a talk in a field very different from our usual forays into business, but one no less important: HR. Professor Chris Forde gave the presentation and covered a variety of topics at pace, considering the use of big data in HR is still a very nascent proposition. We had a great turnout for our speaker, including lots of faces we had not seen before!
Professor Forde is Coordinator of the Q-Step Programme at the University of Leeds, which seeks to enable social scientists to use more quantitative skills in their research – more Big Data than Bourdieu. He is Professor of Employment Studies, in the Centre of Employment Relations Innovation and Change at Leeds University Business School.
Can data analytics help HR become more ‘strategic’? What challenges does the rise of big data generate for data analytics in HR? Professor Forde looked at the use of data analytics in HR in the past, present and future, and explored the some of the most commonly used techniques and tools of data analytics in HR, before considering the potential and limitations of these analytical tools.
With the recent emergence of cloud services providing an opportunity to clout the oligarchy of HR information systems services (and having previously used SAP HR, it deserves one), our speaker provided much food for thought on this emerging section of data research, which is often hobbled by a disconnect between needing to be the human face of a business to its employees, and the drive to ‘harden’ decision making with numbers and attributes.
Last Tuesday we hosted a second session by Prof Bill Gerrard, specialist in sports analytics at LUBS, for a working session on how to use Excel for data analytics. This was a very popular event, as Lawrence’s tweet below can show you…
Though we like R and Python at Leeds Data Soc, we know when we reach the big wide world of work, most people are wedded to Excel. Trusted tool and diviner of computer competency in the workplace for at least a decade now, there’s no getting away from the spread sheet. Bill guided us through some of the more advanced parts of excel such as Pivotables and how to use the analytics add in, which can be enabled through the options menu.
Everyone enjoyed the event and quite a few stayed with us to cool their brains down at The Victoria pub on Great George St. afterwards.
With much shorter notice, we also hosted a small event to show people how to contribute to OpenStreetMap, a huge open source map of the world which can be edited by users. OpenStreetMap is also used by charities and governments in natural disasters, because it’s entirely free to use – Google Maps puts a limit on the number of requests that you can make to it. This means that during major events, authorities can use it as a reliable mapping tool to reach isolated communities. A guide to getting started with openstreetmap and the hot tasking manager has been built by Andy Evans, who also guided our session earlier in the year about using Python for data analysis. It was originally built to help after the Nepalese earthquake in 2015, but by using the HOT tasking manager, you can see which projects are running at a given time.
After a brief demonstration, we set about contributing to maps of Ecuador and Japan, both of whom have been recently affected by major earthquakes – you can see that tasks for Ecuador have been assigned the highest priority in the HOT tasking manager.
Our society members are (mostly) university students and therefore, do a lot of reading!
Here are some of the most interesting things we’ve read and shared for the month of March. You can see posts for January and February by clicking on the ‘What we’ve been reading’ category.
We hosted our first speaker event with Professor Bill Gerrard in March. To show people some of the ideas Bill would be discussing, Lawrence posted an article from the Guardian discussing Bill’s work with the Saracens rugby union team in creating a statistically based performance management system. He also posted the trailer for Moneyball, the 2011 film based on the work of Billy Beane with the Oakland A’s baseball team.
Matt attended a bike hackathon where his group’s objective was to try to find a way to reduce the use of cars as the means of primary transport around the Lake District (now that they can stop using boats), which he wrote about in his blog here. The image on the right is one produced at the hackathon, where the team mapped the origins and destinations of 8000 visitors to the Lake District National Park. Following on from that, he introduced us to the CycleStreets open source project initiatives for developers, where they have a wishlist of challenges that anyone can try to work on.
Lastly, Louisa posted research from Facebook where using a few million of their users, they examined whether there was any pattern to the job that parents and children have. It’s interesting research as it examines quite a common trope that we often hear in descriptions of others – a military family, or that they come from a long line of lawyers, and there is an underlying assumption in common discourse that certain jobs seem to run in families. The graphics produced by the Facebook team are comprehensive and interactive, and thus well worth a look. This picture is taken from father son pairings, where the father’s profession is military. Before your mind thinks it has spotted a pattern, there is a thicker line between father – son pairings for management, but some families do have an unbreakable pattern of dull jobs.
The Leeds Institute for Data Analytics (LIDA) is holding a series of seminars and data soc were delighted to be invited to the first one, a lecture by Professor Adam Drewnowski, who researches (amongst other things) spatial epidemiology in diets and health. Professor Drewnowski leads the Seattle Obesity Study, an investigation into the socioeconomic factors behind the distribution of obesity in the city, and his presentation focused on outputs primarily from this tract of research.
Professor Drewnowski’s work was very engaging, with many insights that are valuable to this all too frequent hand-wringing discussion on public health.
The primary programming language that data soc members know is R (though certainly not the only one!), so we decided to challenge our evangelism with a tutorial from Dr Andy Evans, a senior lecturer in computational Geography from the School of Geography here in Leeds (FY maps!). Dr Evans very kindly created the front end for a tutorial on data manipulation with Python, guiding an intrepid crowd through Anaconda installation and their first steps of data processing in a new language. We were pleased to see that this was one of the larger tutorials that data soc has hosted, aided by turnout from some of Andy’s students looking for practice but also an increasingly academically diverse crowd of people curious about data science and programming – for a few, this was their first ever experience coding.
Andy opening the session.
Everyone getting stuck in.
After this we went to A Nation of Shopkeepers near Leeds city centre for Poutine, as one of our founders (Karen) is Canadian and was keen for a taste of home.
Missed the session? The opening for Andy’s tutorial is available here, which then leads you into the Data Carpentry course.
Our first speaker event (ever! but also of Semester II) featured Professor Bill Gerrard discussing what sports analytics can bring to analytics as a whole field. After speaking about his background and path into sports analytics from a start in economics and econometrics, Bill covered three topics; what does Moneyball tell us about using analytics effectively; why simple is often the best in analytics in sport and business; what skills do you need to be a great analyst?
It was a fascinating session and great to hear Bill’s insight, driven from his work with Billy Beane and then on bringing analytics to the UK. It seemed like a tough start but as anyone who even watches the highlights of any televised sport in the UK these days, it has now taken off with gusto. We’re really pleased our first event was a success and hope to hold follow up sessions with Bill, and with other guest speakers too!
Our society members are university students and therefore, do a lot of reading!
Here are some of the most interesting things we’ve read and shared for the month of February (which was quite a bookish month). You can see January’s picks here.
Like many of us looking nervously across the pond at the prospect of an electoral run by someone with worse hair than Boris Johnson, Adam posted New Scientist’s article about how Ted Cruz won the Iowa caucus using Facebook data driven microtargeting, borrowing techniques from President Obama’s reelection strategy. Often hailed as the future of campaigning, microtargeting campaigns still have significant hurdles to overcome, such as boors with a proverbial megaphone and no concept of the phrase ‘bad press’.
Myrian posted an article from Analytics Vidhya by Kunal Jain, who made a great graphic summarising 20 lessons a data scientist needs to master. Jain learnt these over a space of 10 years so do not think you need to grasp these over the summer, but particularly interesting is their division into two categories: data science is only half the battle, but non technical lessons such as knowing your business and remembering to practice your techniques and learn new ones as they emerge.
Matt posted Todd Schneider’s wonderfully detailed investigation into Uber usage in New York. Schneider examines the different subcultures of the city and gathers insight on their lives simply though this single aspect (taxi rides), such as most bankers getting to work between 7 and 8am, and understanding whether or not that chase in Die Hard with a Vengeance was really such a nightmare. Schneider’s other work with big data sets include a diverse range of subject from mortgages to marriages to gambling, all presented in a clear and often humorous way that carries even newcomers to data right through to the end of analyses.
Lastly, Louisa posted a link to a real time interactive weather map of Earth, created by Cameron Beccario. Beccario’s map is one of the most impressive weather visualisations available online, knitting together a vast array of data sources (e.g. NASA’s Goddard centre data on chemicals and particulates; NOAA data on global weather) and presenting them as a mesmerising map with many filters allowing you to change altitude or focus, such as wind speed, ocean currents, particulate density or even the type of map projection to use.
One of our more learned members on all things cycling and data science, Matt Whittle (Twitter, WordPress), presented a working seminar/tutorial on mapping data using R QGIS and JS. Matt’s project is to measure the cycling rates in various counties around the UK, and identify cycling hot spots in contrast to air pollution levels. Matt’s project has also garnered interest from Parliament, so our event was like a little preview (or maybe a practice run).
During the session Matt guided members through the process of creating an interactive map using R, QGIS and JS from the initial setting up of site structures, through to creating the map itself. It’s based on work he did for Road Safety Week, who are switching some of their focus to the air pollution caused by congested roads, and not just the singing hedgehogs who are surely a fixture in most British kids’ memories from the last thirty years. Many of our members are familiar with R but not so much QGIS so it was an interesting experience all round to use some specific data mapping software.
Herd are a technology focused recruitment agency who held their first digital jobs fair in the cake tin the First Direct Arena, which several members of data soc put a look-in to. It seemed quite busy around the halls, showing a good turn out from people both from Leeds University and Leeds Beckett, the latter of which sponsored the event. There were interesting talks on personal branding from Google and how to optimise the web presence when starting up one’s own business. Amongst others, companies featuring at the event included Call Credit, Sky, Plus Net, Unilver and William Hill. While data soc members found it interesting, it seemed quite heavily weighted towards the developers cross section of digital jobs, and so it would be nice next year if they could have a few more stands/companies interested in data scientists and analysts.
Leeds University School of Law hosted Professor David Lyon for its annual CCJS lecture, which several members of the Data Science Society were pleased to be able to attend. Professor Lyon is a pioneering figure in the study of surveillance and his lecture centred on how the rise of Big Data has affected the practice of surveillance by various authorities, most notably by the NSA whose activities were revealed by Edward Snowden, and how its development will continue to change surveillance practices.
A sociologist by background, Professor Lyon’s perspectives were particularly helpful to data soc members whose interests in public policy and health often intersect questions about privacy and decision making, as well as the nature of making predictions from aggregated data sets.