Posts

Welocalize Discusses Innovation and the Future of Localization at 2015 TAUS Events in North America

Frederick, Maryland – October 1, 2015– Welocalize, global leader in innovative translation and localization solutions, will lead industry discussions at the TAUS Roundtable taking place in Washington DC, October 6, and the TAUS Annual Conference 2015, in San Jose, October 12-13.

“We’re delighted to welcome senior members of the Welocalize leadership team to the TAUS Roundtable in Washington and the TAUS Annual Conference in Silicon Valley,” said Jaap van der Meer, director and founder of TAUS. “The success of TAUS events are based on insights and input from buyers of language services and expert contributions from key global players in the translation and localization industry, like Welocalize. At this year’s TAUS events in North America, we are looking at how we can harness translation data and use innovative technology to predict future workflows, as well as discussing other key TAUS topics like MT, quality and the latest innovations, including the TAUS Quality Dashboard.”

At the TAUS Roundtable in Washington DC, Welocalize CEO and TAUS Advisory Board Member Smith Yewell will present “How to Predict the Future,” where he will outline new ways of using data and predictive analytics for rethinking how localization programs are implemented, quantified and justified today.

“The future of our industry lies in the ability to align localization programs to measurable business outcomes, which we can achieve by using big data and translation automation technology to predict and quantify results,” said Smith Yewell, Welocalize CEO. “We will be sharing our experience and findings at the upcoming TAUS events to help shape the future of localization.”

Olga Beregovaya, VP of Technology Solutions at Welocalize, will be moderating “Let Google and Microsoft Run with It: The Many Uses of MT,” at the TAUS Annual Conference 2015 in Silicon Valley, October 12-13. Her panel session focuses on how machine translation opens up many new markets and brings content to a wider global audience.

Welocalize’s VP of Software Development, Doug Knoll, will also be contributing to industry discussions at the TAUS Annual Conference as a panelist for “Datafication of Translation.”

Olga Beregovaya will be presenting Welocalize StyleScorer at the TAUS Insider Innovation Contest. StyleScorer is an innovative technology, part of the Welocalize weMT suite of language automation tools that provides linguistic style analysis to help streamline translation review software.

As part of the TAUS Annual Conference, Smith Yewell will be demonstrating his musical talents as a member of the TAUS HAUS Band, performing at the TAUS Rock ‘n Roll Dinner, taking place on Monday, October 12 at 6:30PM at The Continental Bar in San Jose.

For more information about TAUS Roundtable visit: https://events.taus.net/events/conferences/taus-roundtable-washington-dc.

For more information about the TAUS Annual Conference visit: https://events.taus.net/events/conferences/taus-annual-conference-2015.

taus_member_mark_on_whiteAbout TAUS – TAUS is a resource center for the global language and translation industries. Our mission is to enable better translation through innovation and automation. We envision translation as a standard feature, a utility, similar to the internet, electricity and water. Translation available in all languages to all people in the world will push the evolution of human civilization to a much higher level of understanding, education and discovery. We support all translation operators – translation buyers, language service providers, individual translators and government agencies – with a comprehensive suite of online services, software and knowledge that help them to grow and innovate their business. We extend the reach and growth of the translation industry through our execution with sharing translation data and quality evaluation metrics. For more information about TAUS, please visit: https://www.taus.net.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 157 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With more than 600 employees worldwide, Welocalize maintains offices in the United States, United Kingdom, Germany, Ireland, Italy, Japan and China. www.welocalize.com

Predictive MT and Quality Analysis

Predictive analysis in the localization industry is swiftly becoming a key approach to improving quality and efficiency in the translation workflow. Extracting information from existing datasets to determine patterns and predict future outcomes can significantly help the translation automation process. In this blog, Welocalize Technology Solutions team members Dave Clarke and Dave Landan give us an update on how two new Welocalize tools are benefiting clients.

It is becoming increasingly important for language service providers (LSPs) to quickly determine the nature of content; How suitable it actually is for the envisaged localization outcomes and, subsequently, the appropriate processes and workflows it should be routed through to successfully meet those expectations.  At the same time, today’s clients have an ever-increasing volume, and often diversity, of content to be translated.  Therefore, the ability to analyze large quantities of source content quickly, accurately, and consistently becomes an imperative.

Welocalize has recently added two new tools to its language tool portfolio to help automate these analyses:  TMTprime and StyleScorer. 

TMTprime was developed through a joint collaboration between Welocalize and the Centre for Next Generation Localisation (CNGL, now ADAPT).   TMTprime provides a way to predict which of multiple given translation assistance systems, whether translation memories (TMs) and/or machine translation (MT) engines, would provide the best output for any given content set.  By simply providing TMTprime with TMs and/or MT training data and a “tuning set,” TMTprime learns to predict which of the systems it is trained on is best for different source content types.  We are also currently researching the capabilities of TMTprime when applied to the task of predictive quality analysis, with a view to drastically reducing and, more often, replacing the running of costly and time-consuming human evaluations of multiple MT engines.

StyleScorer is a proprietary Welocalize tool that learns the content authoring style of a set of documents and then through a scoring system, rates how well new content matches the style of the initial documents.  Analytic tools like Welocalize StyleScorer can work with documents in any language and can be useful for analyzing source and target content.  Automated analysis of source content gives fast, accurate impression of suitability or potential difficulty of translation at the very beginning of the production cycle, which quite obviously, is exactly the right time to be informed. Further through the cycle, analyzing target content gives us a way to automate certain tasks in linguistic quality analysis (LQA).

By running StyleScorer on raw MT output, the scores can be used to rank documents that are likely to need more post-editing (PE) to bring them in line with the style of known target documents. This is good news when time is precious because it allows us to focus PE work where it is needed.

TMTprime and StyleScorer are just two examples of the cutting-edge tools that Welocalize uses to make sure that content gets translated as quickly as possible, to appropriate quality levels. More exciting innovation in the area of content analysis will be brought out later this year so watch this space!

Welocalize Technology Solutions

Dave Clarke and Dave Landan

David.clarke@welocalize.com

David.Landan@welocalize.com

For further reading on StyleScorer, read Dave Landan’s blog: Welocalize StyleScorer helps MT and Linguistic Review Workflow

Click here for more information on weMT

Welocalize Language Tools Team Highlights EAMT 2015 Conference

The Welocalize Language Tools Team recently presented at the 2015 EAMT Conference in Antalya, Turkey.  Olga Beregovaya, Welocalize VP of Language Tools and Automation was the invited guest speaker at the conference.  She presented, What we want, what we need, what we absolutely can’t do without – an enterprise user’s perspective on machine translation technology and stuff around it,with the main objective of promoting collaboration between academia and field users. Olga also presented with Welocalize Senior Computational Linguist Dave Landan “Streamlining Translation Workflows with Welocalize StyleScorer, as part of the project and product description poster session.

In this blog, Olga Beregovaya, Dave Landan and Dave Clarke, Principal Engineer for the Language Tools Team, share their insights from the 2015 EAMT Conference.

HIGHLIGHTS FROM OLGA BEREGOVAYA

Olga Beregovaya gives her impressions of EAMT 2015 and highlights her favorite presentations from the user track.

As a global language service provider, the language technology and translation automation strategy is very important. The EAMT conference and associated conferences are excellent forums to attend as the team can share real-life MT production experiences and learn more about the latest innovations and research projects. As always, there were many interesting research papers and posters at EAMT, all delivered by highly-talented colleagues in the field of MT and all describing very innovative and promising approaches.

Welocalize EAMT Poster Presenation 2015I was proud of Welocalize’s own poster presentation, describing work by colleague Dave Landan,  Streamlining Translation Workflows with StyleScorer. Capturing and evaluating the style of both training corpora and target text has traditionally been one of the biggest challenges in the industry. The tool Dave has created allows us to compare style of the input text and the available training data, and build the most relevant MT engine, and also to assess the stylistic consistency of the target text and its adherence to the client’s style guide.

The poster presented by Mārcis Pinnis, Dynamic Terminology Integration Methods in Statistical Machine Translation, was very interesting for the team. Integrating terminology in a linguistically aware way is a major pain point for domain adaptation of SMT engines. Speaking as a program owner, this poster presentation was particularly relevant to our work.

Another very relevant presentation was the paper delivered by Laxström et al, called Content Translation: Computer-assisted translation tool for Wikipedia articles. This presentation talked about a tool created by Wikipedia to promote translation and post-editing of machine-translated articles by Wikipedia users. Community translation is more important for Wikipedia than for any other organization in the world. As content democratization is the key paradigm shift of the modern times, such tools that enable a “casual translator” to contribute and make content available globally have become an essential component of the global content universe.

Finally, Joss Moorkens and Sharon O’Brien presented an excellent poster called Post-Editing Evaluations: Trade-offs between Novice and Professional Participants. Building an efficient and productive supply chain for  post-editing, that would be open to new tools and new ways of working, is an essential component of an LSP MT program success. Joss and Sharon compare the perception of MT output and a new CAT environment by experienced translators and by novice users.

HIGHLIGHTS FROM DAVE LANDAN

Dave Landan, Computational Linguist at Welocalize and EAMT 2015 presenter identified two presentations he found particularly interesting.

This year’s EAMT conference started strong with several interesting talks and papers on a range of topics.  While there were many strong research papers, I would like to mention two that stood out for me. Bruno Pouliquen presented findings on linear interpolation of small, domain-specific models with larger general models. At Welocalize, we hope to try these methods with our own data, and we are optimistic about the possibilities!  The other research paper that stood out for me was by Wäschle and Riezler. This paper presented innovations around using fuzzy matches from monolingual target language documents to improve translations. I am excited about expanding our collaborations with the academic community.

HIGHLIGHTS FROM DAVE CLARKE

Dave Clarke, Principal Engineer at Welocalize is a regular participant at EAMT. One topic that was touched on many times at EAMT 2015 was the evolution of CAT tools and their impact on productivity. He shared the following perspective.

From a technical or tools perspective, the EAMT conference provided considerable insight into how translation tools could and should evolve. One such insight was provided by the best paper award winner, “Assessing linguistically aware fuzzy matching in translation memories,” by Tom Vanallemeersch and Vincent Vandeghinste from the University of Leuven. The algorithms typically used in CAT tools to calculate fuzzy match values from translation memories have little or no linguistic awareness. They are firmly established as stable units in our industry word currency. This paper implemented and tested alternative fuzzy match algorithms that identify potentially useful matches, based on their linguistic similarities. The results were gathered from tests carried out with translation master’s degree students measuring translation time and keystrokes. The results strongly suggest the potential for unlocking further productivity from existing resources.

The other presentation that stood out for me was “Can Translation Memories afford not to use paraphrasing?” by Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela and Josef Van Genabith.

More MT productivity and quality can be achieved with incremental and specialized improvements; however, it will be a cumulative process. Importantly, NLP can drive ‘intelligent’ aids to productivity, including auto-suggest/complete, advanced fuzzy matching and automatic repair and others, within a translator’s working environment. Not all will benefit every user. CAT tool platforms may now evolve so that these innovations can be quickly absorbed into the environment with little cost or effort. This leads to how each translator can maximize their own productivity with the combination of aids that best suits their style of work. We even saw a project from ADAPT in the early stages of developing a platform for CAT tool designers that allows the fast definition and measuring of data during testing of prototype productivity-enhancement functions.

To echo the words of the outgoing EAMT President, Professor Andy Way, it was good to see researchers really getting to grips with specific, known problems. It was encouraging to see more focused work on such errors that we know first-hand to have a particular impact on productivity, for example, improvements in terminology selection, new methods to improve choice of preposition and more. It was also encouraging to see the increase in research presented with supporting data gained from end-user evaluation rather than the automatic evaluation metric staples that have long been the norm. In fact, ‘BLEU scores’ almost, just almost, became a dirty… bi-gram.

“Overall, EAMT 2015 was a great conference, attended by extremely talented people, and we should not forget to mention in beautiful Antalya, Turkey, where the conference was held this year,” Olga Beregovaya.

View Olga Beregovaya’s EAMT presentation, “What we want, what we need, what we absolutely can’t do without – an enterprise user’s perspective on machine translation technology and stuff around it” below.

For more information about Welocalize’s MT program, weMT, click here.

Click the link to see Dave Landan and Olga Beregovaya’s EAMT poster presentation, Streamlining Translation Workflows with StyleScorer: EAMT_POSTER 2015 by Welocalize.

Welocalize EAMT Poster Presenation 2015

Welocalize to Present at 18th European Association for Machine Translation Conference

Frederick, Maryland – May 7, 2015 – Welocalize, global leader in innovative translation and localization solutions, will share industry insight and expertise at the 18th Annual Conference of the European Association for Machine Translation (EAMT) taking place in Antalya, Turkey, May 11-13, 2015, at the WOW Topkapi Palace.

“I am very excited to be taking part as an invited speaker at this year’s EAMT 2015 Conference in Turkey,” said Olga Beregovaya, VP of language tools and automation at Welocalize. “EAMT is an important international conference for the MT community. It is where experts, thought leaders and users of machine translation can meet and share research, findings and new tools to help their language technology strategy.”

Featured Welocalize presentations at the 18th Annual Conference of the European Association for Machine Translation:

  • Welocalize VP of Language Tools and Automation, Olga Beregovaya will deliver her keynote, “What We Want, What We Need, What We Absolutely Can’t Do Without – An Enterprise User’s Perspective on Machine Translation Technology and Stuff Around It” at 9:30 – 10:00am on Tuesday, May 12.
  • Olga Beregovaya along with Welocalize Senior Computational Linguist Dave Landan will be presenting “Streamlining Translation Workflows with Welocalize StyleScorer” as part of the poster project and product description session on Tuesday, May 12.

For more information about the EAMT 2015 conference, visit http://www.eamt2015.org.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 157 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With over 600 employees worldwide, Welocalize maintains offices in the United States, United Kingdom, Germany, Ireland, Italy, Japan and China. www.welocalize.com

Welocalize StyleScorer Helps MT and Linguistic Review Workflow

GettyImages_476511721Innovation is one of Welocalize’s four pillars which form the foundation of everything we do as a business. Clients and partners rely on our leadership to drive technological innovation in the localization industry. One of our latest innovative efforts is the soon-to-be-deployed language tool, Welocalize StyleScorer which will form part of the Welocalize weMT suite of linguistic and automation language tools. One of the driving forces behind StyleScorer is Dave Landan, computational linguist at Welocalize and a key player in many Welocalize MT programs.

In this blog, Dave shares the key components of StyleScorer and how style analysis tools can help the MT and linguistic review workflow.

At Welocalize, we are constantly looking for ways to improve the quality and efficiency of the translation process. Part of my job as a computational linguist is to create tools that help people spend less time on looking for potential problems and more time on fixing them. One of my team’s latest efforts in this area is StyleScorer.

Welocalize StyleScorer is currently in the early deployment testing phase. This tool will be deployed as part of the Welocalize weMT suite of language tools around linguistic analysis and process automation. I’d like to share some of the key components of StyleScorer and the role it will play in the MT and linguistic review workflow.

What is StyleScorer?

Welocalize StyleScorer is a tool that compares a single document to a set of two or more other documents and evaluates how closely they match in terms of writing style. The documents being compared must all be in the same language; however, there is no restriction on what that language is in the source content.

The main difference between StyleScorer and existing style analysis tools is that rather than summarize types of style differences (for example: “17 sentences with passive voice”), it takes a gestalt approach and gives each document a score anywhere between 0 and 4, with 0 being a very poor match to the style and 4 being a very good match.

To do this, StyleScorer uses statistical language modeling as well as innovations from NLP (natural language processing), forensic linguistics and neural networks (machine learning) in order to rate documents on how closely they match the style of an existing body of work. Because it learns from the documents it’s given, even if you don’t have a formal style guide, StyleScorer will still work as long as the training documents can be identified by a human as belonging to a cohesive group.

How does StyleScorer help the MT workflow?

While we think StyleScorer will be very useful as part of the linguistic review workflow for human translation, we are even more excited about how it can benefit the MT (machine translation) workflow at several points of the process both on source and target language documents.

One of the key components to training a successful MT system is starting with a sufficient amount of quality bilingual data. We are seeing more and more clients who are very interested in MT; however, they don’t have a lot of bilingual training data to get started. In the past, the only option available to those clients was a generic MT engine (similar to what you’d get off-the-shelf). This gets someone started in MT, though the quality of generic engines is generally lower than engines trained with documents that match the client’s domain and style.

We can use StyleScorer to filter open-source training data to find additional documents to train from that are closest to the client’s documents. High-scoring open-source data can then be used to augment the client’s training data, which allows us to build better quality MT engines for those clients early in the project life cycle.

If some documents are getting lower quality translations from MT than others, we can use StyleScorer as a sanity check as to whether the source document being translated matches the style of the client’s other documents in the same language and domain. An engine trained exclusively on user manuals probably won’t do well on translating marketing materials. StyleScorer gives us a way to look for those anomalies automatically.

We are particularly excited about using StyleScorer on target language documents to help streamline workflows. If we run StyleScorer on raw MT output, we can use the scores to rank which documents are likely to need more PE (post-editing) effort to bring them in line with the style of known target documents. This is particularly useful for clients with limited budgets for PE and clients with projects that require extremely fast turnaround because it allows us to focus PE work where it is needed the most.

Finally, we envision StyleScorer becoming part of the QA & linguistic review process by spot-checking post-edited and/or human translated documents against existing target language documents. Translations that receive lower scores may need to be double-checked by a linguist to make sure the translations adhere to established style guides. If it turns out that low-scoring translations pass linguistic review, we use them to update the StyleScorer training set for the client’s next batch of documents.

Dave

david.landan@welocalize.com

Based in Portland, Oregon, Dave Landan is a Senior Computational Linguist for Welocalize’s MT and language tools team.