May.la
This is Philip May’s page. It is about the following topics:
News about My Work
July 12, 2021: As part of my work for Deutsche Telekom, I published the first German and English summarization model: deutsche-telekom/mt5-small-sum-de-en-v1
April 18, 2021: German NLP Group releases The German colossal, cleaned Common Crawl corpus (GC4 corpus). A preprocessed (450 GB zipped) German text corpus based on Common Crawl. Many thanks to iisys (the Institute of Information Systems Hof University) for hosting this dataset.
April 5, 2021: Published training code for cross-lingual sentence-transformers models: German-NLP-Group/xlsr
March 29, 2021: Adding the machine translated multilingual STS benchmark (STSb) dataset called stsb_multi_mt as a HuggingFace dataset: https://huggingface.co/datasets/stsb_multi_mt
March 21, 2021: Started website for German NLP Group as a GitHub Page: https://german-nlp-group.github.io/
March 20, 2021: Published the machine translated multilingual STS benchmark (STSb) dataset under open-source license: PhilipMay/stsb-multi-mt
February 22, 2021: Together with Philipp Reissel from ambeRoad I released the 2nd version of our German Electra NLP language model. Training is continued for an additional 734,000 steps to a total of 1,500,000 steps. An extensive evaluation on the GermEval18 Coarse dataset shows that this is the best performing German language model of its size: german-nlp-group/electra-base-german-uncased
December 1, 2020: Together with Philipp Reissel from ambeRoad I gave a talk about the training and evaluation of our open-source German Electra NLP language model. Here is the recording on YouTube: https://www.youtube.com/watch?v=cxgrTd2AQis and the
slides
.October 24, 2020: Published a cross-lingual model for English and German semantic sentence embeddings under open-source license: T-Systems-onsite/cross-en-de-roberta-sentence-transformer
August 18, 2020: Together with Philipp Reissel from ambeRoad I released a new open-source German Electra NLP language model: german-nlp-group/electra-base-german-uncased
April 7, 2020 Started to maintain the python-configparser Arch Linux AUR package: python-configparser
March 14, 2020: Started to maintain the PyCharm IDE Arch Linux AUR package: pycharm-community-jre
September 22, 2019: On arXiv.org I publiched my first paper: Improved Image Augmentation for Convolutional Neural Networks by Copyout and CopyPairing
September 9, 2018: Started to maintain the conda-forge hyperopt package: hyperopt-feedstock
Feedback and Questions
If you have questions or comments about this content please feel free to open a GitHub issue.