Posts by Philip May
- 18 November 2023
I often use Pandas to process NLP data. In many cases I want to create a new column from the information in an existing column. For example, if I want to have the number of characters or tokens.
- 12 October 2022
Some data, such as strings, must be encoded to be used in machine learning models. Here we explore the different options for encoding date fields.
- 23 July 2022
This article is about installing Python and package management. It is a subjective article and represents my own opinion and experience. The article is structured by several recommendations.
- 23 February 2022
While evaluating the ml6team/mt5-small-german-finetune-mlsum summarization model, my colleague Michal Harakal and I noticed that in many cases this model for summarization simply reproduces the first sentence of the input text. Instead, it should generate an independent summary of the whole text.
- 22 February 2022
Today I published a new Wikipedia-based German text corpus. It is to be used for NLP machine learning tasks.
- 20 February 2022
This week I published a project to show how to combine LightGBM and Optuna efficiently to train good models. The purpose of this work is to be able to be reused as a template for new projects.
- 10 April 2021
- 01 December 2020