Home page of Jonáš Vidra

About me

My name is Jonáš Vidra and I am a postgraduate student at the Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics of the Charles University. If you want to contact me, please write me an e-mail at my-surname@ufal.mff.cuni.cz

My interests and projects

Multilingual derivations

I'm working on a system for automatic discovery of derivational relations in many languages at once, using both monolingual data and cross-lingual information transfer.

Morphological segmentation

My master’s thesis topic was supervised morphological segmentation for Czech using data from the DeriNet project, supervised by Zdeněk Žabokrtský. The preliminary code is published as a Git repository.

DeriNet

I help develop DeriNet, a lexical network of derivational relations between Czech words. My focus is mostly on the technical side of things, although I do some linguistic work as well. I develop and maintain the (deprecated but still used) Perl API used for building the network and help develop the new Python API that will replace it. I also develop a search engine called DeriSearch for querying derivational relations. If you’re looking for the development version of DeriSearch, you can find it on this very server.

Ongoing projects:

  • Multilingual derivations.
  • Maintenance of the DeriNet build system.
  • Researching new visualization methods for DeriSearch. See Online Software Components for Accessing Derivational Networks (Jonáš Vidra and Zdeněk Žabokrtský, 2017. In: Proceedings of DeriMo 2017, p. 129–139).
  • Optimizing DeriSearch to cope with the influx of data between versions 0.9 and 1.0.

Past, completed projects I’ve worked on:

  • Segmentation of Czech words, see my master’s thesis above.
  • DeriSearch, a basic engine for searching the DeriNet database using regular expressions and simple structure queries.
  • DeriNet 1.0 – I’ve swapped the previous in-house developed lexeme database for a 3× larger, 14× better and 42× shinier source of lexemes based very closely on MorfFlex; and added tens of thousands new derivational relations that use the new lexemes.
  • A machine learning project to regularize the DeriNet database and help find errors.

Teaching (in Czech)

During the summer semester of 2018/19 I teach the practicals for Programming Ⅱ.

Theatre

I am a member of NoStraDivadlo amateur theatre group. No performances are currently planned, but we organize several festivals: Setkání ve Strašecí (student and experimental theatre), which will take place from 2019-04-12 to 14th, and Dětská Scéna (school theatre groups and drama clubs), which will take place from 2019-04-27 to 28th.