Creating Multiple Custom User Types Through Inheritance in Django

Django Logo

When developing applications in Django, the need might arise to customize the user model. Specifically, you might want to create different types of users. In my case, I’m interested in creating a person user and a kit user. A person user can own multiple kit users, and both need to be able to authenticate to access an API. Luckily, Django’s authentication system is very flexible, and there are multiple ways to achieve this goal.

The standard way to implement this is to stick with the default user model, django.contrib.auth.models.User, and create a complex user profile. The profile adds the desired fields and behaviors for the various user types in a new model, and links to the model through a field reference. This can get fairly complex quickly. It is especially difficult to express ownership of kits by users, without allowing ownership of users by users. Here, we will see how we can implement this using inheritance.

Continue reading “Creating Multiple Custom User Types Through Inheritance in Django”

Machine Translation Turing Test

Machine Translation
Will computers ever reach the quality of professional translators?

The U.S. government spent 4.5 billion USD from 1990 through 2009 on outsourcing translations and interpretations [4]. If these translations were automated, much of this money could have been spent elsewhere. The research field of Machine Translation (MT) tries to develop systems capable of translating verbal language (i.e. speech and writing) from a certain source language to a target language.

Because verbal language is broad, allowing people to express a great number of things, one must take into account many factors when translating text from a source language to a target language. Three main difficulties when translating are proposed in [6]: the translator must distinguish between general vocabulary and specialized terms, as well as various possible meanings of a word or phrase, and must take into account the context of the source text.

Machine Translation systems must overcome the same obstacles as professional human translators in order to accurately translate text. To try to achieve this, researchers have had a variety of approaches over the past decades, such as [3, 2, 5]. At first, the knowledge-based paradigm was dominant. After promising results on a statistical-based system ([2, 1]), the focus shifted towards this new paradigm.

Continue reading “Machine Translation Turing Test”


References

  • [1] P. F. Brown, V. D. J. Pietra, S. D. A. Pietra, and R. L. Mercer, “The mathematics of statistical machine translation: parameter estimation,” Computational linguistics, vol. 19, iss. 2, pp. 263-311, 1993.
    [Bibtex]
    @article{Brown1993,
    title={The mathematics of statistical machine translation: Parameter estimation},
    author={Brown, Peter F and Pietra, Vincent J Della and Pietra, Stephen A Della and Mercer, Robert L},
    journal={Computational linguistics},
    volume={19},
    number={2},
    pages={263--311},
    year={1993},
    publisher={MIT Press}
    }
  • [2] P. F. Brown, J. Cocke, S. D. A. Pietra, V. D. J. Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roossin, “A statistical approach to machine translation,” Computational linguistics, vol. 16, iss. 2, pp. 79-85, 1990.
    [Bibtex]
    @article{Brown1990,
    title={A statistical approach to machine translation},
    author={Brown, Peter F and Cocke, John and Pietra, Stephen A Della and Pietra, Vincent J Della and Jelinek, Fredrick and Lafferty, John D and Mercer, Robert L and Roossin, Paul S},
    journal={Computational linguistics},
    volume={16},
    number={2},
    pages={79--85},
    year={1990},
    publisher={MIT Press}
    }
  • [3] D. A. Gachot, “The systran renaissance,” in Mt summit ii, 1989, pp. 66-71.
    [Bibtex]
    @inproceedings{Gachot1989,
    title={The SYSTRAN renaissance},
    author={Gachot, Denis A},
    booktitle={MT SUMMIT II},
    pages={66--71},
    year={1989}
    }
  • [4] D. Isenberg, Translating For Dollars, 2010.
    [Bibtex]
    @misc{USSpend,
    author = {David Isenberg},
    title = {{Translating For Dollars}},
    howpublished = "\url{http://www.huffingtonpost.com/david-isenberg/translating-for-dollars_b_735752.html}",
    year = {2010},
    note = "[Online; accessed 2-October-2013]"
    }
  • [5] P. Koehn, F. J. Och, and D. Marcu, “Statistical phrase-based translation,” in Proceedings of the 2003 conference of the north american chapter of the association for computational linguistics on human language technology-volume 1, 2003, pp. 48-54.
    [Bibtex]
    @inproceedings{Koehn2003,
    title={Statistical phrase-based translation},
    author={Koehn, Philipp and Och, Franz Josef and Marcu, Daniel},
    booktitle={Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1},
    pages={48--54},
    year={2003},
    organization={Association for Computational Linguistics}
    }
  • [6] A. K. Melby, Some difficulties in translation, 1999.
    [Bibtex]
    @misc{TTT1999,
    author = {Alan K. Melby},
    title = {{Some difficulties in translation}},
    howpublished = "\url{http://www.ttt.org/theory/difficulties.html}",
    year = {1999},
    note = "[Online; accessed 2-October-2013]"
    }

Training, Testing and Development / Validation Sets

Finding models that predict or explain relationships in data is a big focus in information science. Such models often have many parameters that can be tuned, and in practice we only have limited data to tune the parameters with. If we make measurements of a function f(x)  at different values of x, we might find data like in Figure (a) below. If we now fit a polynomial curve to all known data points, we might find the model that is depicted in Figure (b). This model appears to explain the data perfectly: all data points are covered. However, such a model does not give any additional insight into the relationship between x and f(x). Indeed; if we make more measurements, we find the data in Figure (c). Now the model we found in (b) appears to not fit the data well at all. In fact, the function used to generate the data is f(x) = x + \epsilon with \epsilon Gaussian noise. The linear model f'(x) = x depicted in Figure (d) is the most suitable model to explain the found data and to make predictions of future data.

The overly complex model found in (b) is said to have overfitted. A model that has been overfitted fits the known data extremely well, but it is not suited for generalization to unseen data. Because of this, it is important to have some estimate of a model’s ability to be generalized to unseen data. This is where trainingtesting, and development sets come in. The full set of collected data is split into these separate sets.

Continue reading “Training, Testing and Development / Validation Sets”